I Built an Automated Document Management System with Paperless-NGX

ankitg122 pts0 comments

How I Built a Fully Automated Document Management System with Paperless-NGX

Sign in<br>Subscribe

Managing paper and digital documents for a family of four across multiple countries and languages was becoming unmanageable. Tax documents in German, medical records in English, contracts scattered across three email accounts, and a growing pile of physical papers that I could never find when I needed them.

I spent weeks building a fully automated document management system that now handles 900+ documents with zero daily effort. This post walks you through the entire setup — from the scanner on my desk to the encrypted backups on GitHub — with every configuration file and script you need to build it yourself.

The Problem

Our household generates a surprising amount of paperwork. Insurance letters, tax assessments, medical bills, employment contracts, kindergarten registrations, vehicle documents — all in different languages, arriving through different channels, belonging to different family members.

Before this setup, my "system" was:

A physical folder that was always full and never organized

Email attachments scattered across three accounts

A phone full of photos of documents "I'll file later"

No way to search for anything — I had to remember where I put it

I needed a system that could:

Ingest documents from email, scanner, and manual upload automatically

Classify documents by type, person, and topic using AI

Track which physical documents I have and where they are

Be searchable and accessible from anywhere

Back itself up to multiple locations without my intervention

My Workflow: From Paper to Searchable Archive

Here's the daily workflow that now runs on autopilot:

Physical Documents

A letter arrives in the mail

I print an ASN barcode label using Avery Zweckform L4731REV-25 labels and their online designer, then stick it on the document

I place the document on my Ricoh ScanSnap iX1600 scanner

The scanner auto-uploads the PDF to a Google Drive folder , sorted into a subfolder by category (Finance, Health, Work, etc.)

Every 5 minutes, rclone moves new files from Google Drive to the server

Paperless-NGX detects the new file and starts processing

A workflow assigns the document type based on which subfolder it came from

Google Document AI performs high-quality OCR, then Gemini 2.5 Flash generates a clean title, identifies the correspondent, assigns tags, and extracts the creation date

The document is filed on disk by person, correspondent, and year

The ASN barcode is detected automatically — a cron job tags it as "Physical Filed" and syncs the ASN mapping to a Google Sheet

The physical document goes into a numbered binder, matching the ASN

Digital Documents

An invoice arrives by email

One of 24 mail rules detects it and consumes the attachment

The document is auto-tagged as "Digital Only" and goes through the same AI classification pipeline

The entire process takes about 30 seconds of my time (sticking the label and placing the paper on the scanner). Everything else is automated.

Architecture Overview

The system runs on a single VPS (4 vCPU, 8GB RAM, 80GB NVMe) with 9 Docker containers:

System architecture: Document sources, Docker containers, and backup destinations (click to enlarge)

ContainerRoleMemory Limit

Paperless-NGXCore document management2 GB<br>Paperless-GPTAI classification (Gemini 2.5 Flash + Document AI)256 MB<br>PostgreSQL 16Database512 MB<br>RedisCaching and task queue256 MB<br>GotenbergDocument conversion512 MB<br>Apache TikaText extraction from Office formats512 MB<br>Cloudflare TunnelSecure HTTPS access (zero open ports)128 MB<br>PortainerContainer management UI256 MB<br>WatchtowerAutomatic container updates256 MB

Total memory footprint: under 5 GB, leaving headroom on the 8 GB VPS.

Document Processing Flow

How a document flows from scan to searchable archive (click to enlarge)

Step-by-Step Build Guide

Step 1: Server Setup

Start with a fresh Ubuntu VPS. This script installs Docker, configures the firewall, sets up fail2ban, installs rclone for Google Drive sync, and creates the directory structure:

#!/bin/bash<br>set -e

# Update system<br>sudo apt update && sudo apt upgrade -y

# Install Docker<br>curl -fsSL https://get.docker.com | sudo sh<br>sudo usermod -aG docker $USER

# Firewall - only SSH (all web traffic goes through Cloudflare Tunnel)<br>sudo ufw allow OpenSSH<br>sudo ufw --force enable

# Brute-force protection<br>sudo apt install -y fail2ban<br>sudo systemctl enable fail2ban && sudo systemctl start fail2ban

# Directory structure<br>mkdir -p ~/paperless/{data,media,export,consume,redis,db,prompts,backups,scripts}<br>mkdir -p ~/paperless/consume/{Arbeit,Dokumente,Fahrzeuge,Finanzen,Gesundheit,Wohnen,Sonstiges}

# Install rclone for Google Drive sync<br>curl https://rclone.org/install.sh | sudo bash

# Backup cron (daily at 2 AM)<br>(crontab -l 2>/dev/null; echo '0 2 * * * ~/paperless/backup.sh >> ~/paperless/backup.log 2>&1') | crontab -

Step 2: Docker Compose Configuration

This is the complete...

document documents sudo paperless system from

Related Articles