A Telegram bot that monitors a source channel (Channel A), extracts links from posts, scrapes article content, generates AI-powered summaries, and posts them to a destination channel (Channel B). Comes with an offline eval workflow lets you systematically hill-climb the prompt using collected user feedback.
- 📡 Automated Monitoring: Listens to every message in your designated source channel.
- 🔍 Smart Extraction: Automatically detects URLs and scrapes main content via a crawler chain: Defuddle (primary, Markdown output) → trafilatura (fallback).
- 🧠 AI Summarization: Powered by Gemini 3 Flash Preview for fast and intelligent summaries.
- 🎨 Rich Formatting: Delivers summaries in Telegram-compatible HTML with bold titles, blockquotes, and bullet points.
- 🛡️ Security First: Built-in User ID filtering to protect your API keys from unauthorized usage.
⚠️ Error Reporting: Notifies you in Channel B if a link fails to process.- 🔄 Retry Button: Failed scrapes or summarizations show a Retry button directly in Channel B.
- 👍👎 Feedback Buttons: Rate summaries inline; add free-form comments via bot DM deep-link.
- 📊 Langfuse Observability: Optional integration — logs every generation (prompt, response, latency) and user feedback scores to Langfuse.
- 🧪 Eval Tooling: Offline prompt hill-climbing loop — dump traces, generate rubrics, rate candidate prompts, and browse results in an HTML viewer.
- Telegram Bot Token: Create your bot via @BotFather.
- Gemini API Key: Grab an API key from Google AI Studio.
- Channel IDs:
- Add your bot as an Admin to both Channel A (source) and Channel B (destination).
- Find your ID: Forward a message from the channel to @userinfobot. IDs look like
-100xxxxxxxxxx.
To keep your bot safe and prevent unwanted API costs:
- Get your User ID: Message @userinfobot to get your numerical ID.
- Set the Variable: Add
AUTHORIZED_USER_ID=your_id_hereto your configuration. - Result: The bot will only process messages sent by you. Unauthorized attempts are silently logged.
-
Clone & Enter:
git clone <your-repo-url> cd post_summarizer_bot
-
Setup Environment: We recommend uv:
uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -r requirements.txt
Note: Python 3.13 is required (pinned via
.python-version).uvwill download it automatically. -
Configure:
cp .env.example .env
Fill in your tokens and IDs in the
.envfile. Langfuse keys are optional. -
Run:
uv run python -m post_summarizer_bot.main
-
Debug Scraping:
uv run python scripts/debug_scrape.py "https://example.com/article" -
Test Prompt Tuning:
uv run python scripts/test_prompt.py "https://example.com/article"
The bot integrates with Langfuse to log every summarization and collect user feedback. It is fully optional — the bot runs normally without it.
To enable, add to your .env:
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com # optional, this is the default
Each successful summarization creates a Langfuse trace containing the full prompt and response. User 👍/👎 ratings and text comments are attached as scores on the same trace.
The bot is a long-running process and needs to stay active 24/7 to poll Telegram.
- Python 3.13
- Persistent internet connection
- Environment variables (see
.env.example)
- Push your code to a GitHub repo.
- In Railway, click "New Project" → "Deploy from GitHub repo".
- Go to the Variables tab and add all keys from your
.env. - Railway will use the
Procfileandruntime.txtautomatically.
- Create a Background Worker (not a web service).
- Connect your GitHub repository.
- Set the start command to:
python -m post_summarizer_bot.main. - Add your environment variables in the Environment tab.
- Install the Fly CLI and run
fly launch. - Set secrets using
fly secrets set KEY=VALUE. - Run
fly deploy.
# /etc/systemd/system/telegram-bot.service
[Unit]
Description=Telegram Summarizer Bot
After=network.target
[Service]
WorkingDirectory=/path/to/bot
ExecStart=/path/to/venv/bin/python -m post_summarizer_bot.main
EnvironmentFile=/path/to/bot/.env
Restart=always
[Install]
WantedBy=multi-user.target- Prompt Tuning: Edit
SUMMARIZATION_PROMPT_TEMPLATEinpost_summarizer_bot/prompts.py. Usescripts/test_prompt.pyto preview changes immediately. - Model Choice: The model is
gemini-3-flash-preview(set asMODEL_NAMEinpost_summarizer_bot/main.py).
An offline hill-climbing loop for systematically improving the prompt using collected feedback. Datasets are always versioned (e.g. eval/data/v1/).
# 1. Pull new traces from Langfuse
make eval-dump VERSION=v1
# 2. Generate global or example-specific rubrics using LLMs (review before proceeding)
make eval-rubrics VERSION=v1
# 3. Browse traces, mark examples as eval-ready, and edit example-specific rubrics
make eval-data-viewer VERSION=v1
# 4. Score the baseline prompt
make eval-rate VERSION=v1 PROMPT=eval/prompts/v1_baseline.txt
# 4b. Inspect results visually (command is printed at the end of eval-rate)
make eval-result-viewer RESULT=eval/data/v1/results/v1_baseline_<timestamp>.json
# 5. Write a new prompt variant, then compare
make eval-rate VERSION=v1 PROMPT=eval/prompts/v2.txtmake eval-dataviewer VERSION=v1 starts a local server and opens the browser. You can:
- Edit example-specific rubrics — saved directly to
eval/data/v1/example_rubrics.jsonlon disk - Toggle "Ready for eval" on each trace —
autorater.pyfilters to only marked traces - Delete traces or export with automatic backup
After make eval-rate finishes it prints the exact command to open the viewer. Or run it manually:
make eval-result-viewer RESULT=eval/data/v1/results/run.jsonOpens a read-only browser UI where you can:
- Browse examples in the sidebar, with colored dots showing pass/fail per rubric
- Click any rubric in the header or Rubrics tab to filter the sidebar to failing examples
- Inspect the LLM Calls tab to see the exact prompts sent and raw verdicts returned
| Tier | Scope | Applied to | Output |
|---|---|---|---|
| Principle-based | Global (global_rubrics.jsonl) |
Every example | Per-rubric pass rate |
| Example-specific | Per-trace (example_rubrics.jsonl) |
Matching trace only | Overall pass rate |
The bot uses Defuddle as its primary scraper (good for clean long-form pages) with trafilatura as a fallback, but this chain is not robust enough for all sites. It will often fail or return poor results for:
- Social media (Twitter/X, Instagram, LinkedIn, etc.)
- Paywalled content (NYT, WSJ, The Atlantic, etc.)
- JavaScript-heavy sites that require a real browser to render
- Sites with aggressive anti-bot measures
For broader coverage, consider pre-processing links through a dedicated crawling service such as Firecrawl before they reach this bot. A common pattern is a separate Telegram automation that fetches and cleans article content, then forwards the result to Channel A — but that pipeline is out of scope for this project.
This bot only concerns itself with two channels: a source channel (Channel A) and a destination channel (Channel B). What gets posted to Channel A, and how, is up to you.
- 👥 Whitelisting: Support for multiple authorized users.
- 💬 Custom Instructions: Tailor summaries via message captions.
- 🔌 Multi-Model Support: Add OpenAI or Anthropic integration.
Built with ❤️ and Gemini / Claude 🚀