Automated website archival using ArchiveBox via n8n. Runs as a scheduled n8n workflow that creates full offline snapshots of any website.
- n8n (self-hosted) with Docker CLI available in the container
- anvil — the recommended base image (includes Docker CLI for running ArchiveBox containers)
This workflow uses the Docker-outside-of-Docker (DooD) pattern: n8n launches an ArchiveBox container on the host via the mounted Docker socket. The base image linked above provides this capability.
- Schedule — runs weekly on Friday (configurable in n8n)
- Build Config — reads the target URL from environment variables, derives the site name and output path
- Run ArchiveBox — spins up an
archivebox/archiveboxcontainer that crawls the site and exports a full archive - Cleanup — removes the ArchiveBox Docker image to save disk space
- Retention — deletes archive snapshots older than the configured retention period
- Notify — on failure, routes to a notification placeholder (connect Slack, email, Telegram, etc.)
Archives are organized by site and date:
/DATA/Data/Archive/
└── example.com/
├── 2026-05-16/
├── 2026-05-23/
└── 2026-05-30/
Add these to your n8n instance (CasaOS app settings or container environment):
| Variable | Required | Default | Description |
|---|---|---|---|
ARCHIVE_TARGET_URL |
Yes | — | Website to archive |
ARCHIVE_DEPTH |
No | 1 |
Crawl depth (1 = linked pages) |
ARCHIVE_OUTPUT_DIR |
No | /DATA/Data/Archive |
Base output directory |
ARCHIVE_RETENTION_DAYS |
No | 90 |
Delete archives older than this |
Example:
ARCHIVE_TARGET_URL=https://example.com
ARCHIVE_DEPTH=1
ARCHIVE_OUTPUT_DIR=/DATA/Data/Archive
ARCHIVE_RETENTION_DAYS=90In n8n: Workflows → Import from File → select n8n-website-archive.json
Activate the workflow. Run manually once to verify it works.
To archive multiple websites, import the workflow multiple times and set a
different ARCHIVE_TARGET_URL for each. Alternatively, duplicate the workflow
in n8n and hardcode the URL in the Build Config node.
Default: weekly on Friday. Change this by editing the Schedule Trigger node in n8n (e.g., daily, biweekly, monthly).
| Site Size | Pages | Archive Size (per snapshot) |
|---|---|---|
| Small blog | <50 | 50-200 MB |
| Medium site | 50-500 | 200 MB - 1 GB |
| Large site | 500+ | 1-5 GB |
With 90-day retention and weekly snapshots, expect ~13 snapshots stored at any time.
The workflow has a Notify on Failure placeholder node. Connect your preferred service: Email/SMTP, Slack, Telegram, or webhook.
.
├── README.md # This file
├── LICENSE # GPL-3.0-or-later
└── n8n-website-archive.json # n8n workflow