Skip to content

code-lodge/fossil

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

fossil

Automated website archival using ArchiveBox via n8n. Runs as a scheduled n8n workflow that creates full offline snapshots of any website.

Prerequisites

  • n8n (self-hosted) with Docker CLI available in the container
  • anvil — the recommended base image (includes Docker CLI for running ArchiveBox containers)

This workflow uses the Docker-outside-of-Docker (DooD) pattern: n8n launches an ArchiveBox container on the host via the mounted Docker socket. The base image linked above provides this capability.

How It Works

  1. Schedule — runs weekly on Friday (configurable in n8n)
  2. Build Config — reads the target URL from environment variables, derives the site name and output path
  3. Run ArchiveBox — spins up an archivebox/archivebox container that crawls the site and exports a full archive
  4. Cleanup — removes the ArchiveBox Docker image to save disk space
  5. Retention — deletes archive snapshots older than the configured retention period
  6. Notify — on failure, routes to a notification placeholder (connect Slack, email, Telegram, etc.)

Archives are organized by site and date:

/DATA/Data/Archive/
└── example.com/
    ├── 2026-05-16/
    ├── 2026-05-23/
    └── 2026-05-30/

Setup

1. Set Environment Variables

Add these to your n8n instance (CasaOS app settings or container environment):

Variable Required Default Description
ARCHIVE_TARGET_URL Yes Website to archive
ARCHIVE_DEPTH No 1 Crawl depth (1 = linked pages)
ARCHIVE_OUTPUT_DIR No /DATA/Data/Archive Base output directory
ARCHIVE_RETENTION_DAYS No 90 Delete archives older than this

Example:

ARCHIVE_TARGET_URL=https://example.com
ARCHIVE_DEPTH=1
ARCHIVE_OUTPUT_DIR=/DATA/Data/Archive
ARCHIVE_RETENTION_DAYS=90

2. Import the Workflow

In n8n: WorkflowsImport from File → select n8n-website-archive.json

Activate the workflow. Run manually once to verify it works.

3. Archiving Multiple Sites

To archive multiple websites, import the workflow multiple times and set a different ARCHIVE_TARGET_URL for each. Alternatively, duplicate the workflow in n8n and hardcode the URL in the Build Config node.

Schedule

Default: weekly on Friday. Change this by editing the Schedule Trigger node in n8n (e.g., daily, biweekly, monthly).

Storage Estimates

Site Size Pages Archive Size (per snapshot)
Small blog <50 50-200 MB
Medium site 50-500 200 MB - 1 GB
Large site 500+ 1-5 GB

With 90-day retention and weekly snapshots, expect ~13 snapshots stored at any time.

Notifications

The workflow has a Notify on Failure placeholder node. Connect your preferred service: Email/SMTP, Slack, Telegram, or webhook.

Files

.
├── README.md                    # This file
├── LICENSE                      # GPL-3.0-or-later
└── n8n-website-archive.json     # n8n workflow

License

GNU General Public License v3.0 or later

About

Website archival workflow for n8n using ArchiveBox

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors