Tech

Docker: Paperless-NGX

This is a quick & dirty snippet to run a full paperless-ngx stack using docker-compose.

cl
Mar 22, 2024
2 min read
Docker
Photo by C M / Unsplash

This is a quick & dirty snippet to run a full paperless-ngx stack using docker-compose:


docker-compose.yml

# Docker Compose file for running paperless from the docker container registry.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
#   as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# SQLite is used as the database. The SQLite file is stored in the data volume.
#
# In addition to that, this Docker Compose file adds the following optional
# configurations:
#
# - Apache Tika and Gotenberg servers are started with paperless and paperless
#   is configured to use these services. These provide support for consuming
#   Office documents (Word, Excel, Power Point and their LibreOffice counter-
#   parts.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
#   and '.env' into a folder.
# - Run 'docker compose pull'.
# - Run 'docker compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.

version: "3.4"
services:
  broker:
    image: docker.io/library/redis:7
    container_name: paperless_broker
    restart: always
    volumes:
      - ./redisdata:/data

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    container_name: paperless_webserver
    restart: always
    depends_on:
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - /apps/paperless/data:/usr/src/paperless/data
      - /data/documents/paperless/media:/usr/src/paperless/media
      - ./export:/usr/src/paperless/export
      - /data/documents/paperless/consume:/usr/src/paperless/consume
      - ./scripts:/usr/src/paperless/scripts
    env_file: docker-compose.env
    environment:
      PAPERLESS_REDIS: redis://broker:6379
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
      PAPERLESS_TIKA_ENDPOINT: http://tika:9998
      #enable splitting with special barcode spacing site
      PAPERLESS_CONSUMER_ENABLE_BARCODES: 1
      #enable recursive consuming
      PAPERLESS_CONSUMER_RECURSIVE: 1
      #enable double-sided documents
      PAPERLESS_CONSUMER_ENABLE_COLLATE_DOUBLE_SIDED: 1
      PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_SUBDIR_NAME: double-sided
      PAPERLESS_CONSUMER_COLLATE_DOUBLE_SIDED_TIFF_SUPPORT: 1
      #performance settings
      PAPERLESS_TASK_WORKERS: 2
      PAPERLESS_THREADS_PER_WORKER: 3

  gotenberg:
    image: docker.io/gotenberg/gotenberg:7.10
    container_name: paperless_gotenberg
    restart: always

    # The gotenberg chromium route is used to convert .eml files. We do not
    # want to allow external content like tracking pixels or even javascript.
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"

  tika:
    image: ghcr.io/paperless-ngx/tika:latest
    container_name: paperless_tika
    restart: always

Document Splitting

For a faster workflow you can scan a bunch of letters. Since they should be handled individually you can automatically split them using a special barcode page.

The page for splitting documents can be found here:
http://www.alliancegroup.co.uk/downloads/patch-code-t.pdf