Automation · LLM Pipeline · Production · Germany
Invoice Automation Pipeline
Invoices from multiple suppliers arrive as PDFs by email in different formats. The manual routine was: find email, download PDF, classify supplier, extract fields, file to Drive, enter into accounting. Now it runs on a schedule — automatically, end-to-end. Built for German business operations: multi-supplier, multi-format, idempotent.
Zero rerunsTwo-level deduplication prevents duplicate records from forwarded attachments
2-layerBaseline parser first, Gemini verification second — works even when API is down
Cron-readyGitHub Actions scheduling, deduplication, status tracking, Telegram summaries
Multi-supplierDifferent PDF formats per supplier handled by the LLM extraction layer
Pipeline flow
How it works
- IMAP inbox scan → PDF attachment detection → supplier classification.
- Baseline regex parser extracts structured fields first (fast, free, reliable).
- Gemini 2.5 Flash verifies and fills gaps, handles format variations across suppliers.
- Validated records upserted to Supabase, filed to Google Drive folder structure.
- Telegram summary with run results and any flagged anomalies after each run.
Engineering decisions
Why it's built this way
- Reliability over extraction accuracy — the pipeline never silently fails.
- Deterministic baseline ensures LLM is a verification layer, not a dependency.
- Idempotent design: re-running the pipeline never creates duplicates.
- Production-grade: logged, scheduled, monitored — not a script that runs once.
Business impact
What changed
- Weekly manual invoice processing routine completely eliminated.
- Multi-supplier formats handled automatically — no per-supplier configuration needed.
- Records available in Supabase and Drive immediately after email arrives.
- Anomaly detection catches edge cases that previously required manual review.
Relevance for German businesses
Your use case
- Any German business receiving 10+ invoices per week from multiple suppliers can automate this entirely.
- Works with DATEV-compatible output format for German accounting software.
- Handles German invoice fields: Steuernummer, USt-IdNr., Rechnungsdatum, Nettobetrag, MwSt.
- Compliant with GoBD archiving requirements via Google Drive structured storage.
Tech stack
Tools used
TypeScript
Node.js
Gemini 2.5 Flash
Supabase
Google Drive API
Gmail IMAP
GitHub Actions
Telegram Bot API
Rechnungen automatisch verarbeiten?
I build invoice and document automation pipelines for German businesses. Multi-supplier, multi-format, GoBD-compatible. Let's talk about your workflow.