🤖 Telegram Post Summarizer Bot 📝

A Telegram bot that monitors a source channel (Channel A), extracts links from posts, scrapes article content, generates AI-powered summaries, and posts them to a destination channel (Channel B). Comes with an offline eval workflow lets you systematically hill-climb the prompt using collected user feedback.

✨ Features

📡 Automated Monitoring: Listens to every message in your designated source channel.
🔍 Smart Extraction: Automatically detects URLs and scrapes main content via a crawler chain: Defuddle (primary, Markdown output) → trafilatura (fallback).
🧠 AI Summarization: Powered by Gemini 3 Flash Preview for fast and intelligent summaries.
🎨 Rich Formatting: Delivers summaries in Telegram-compatible HTML with bold titles, blockquotes, and bullet points.
🛡️ Security First: Built-in User ID filtering to protect your API keys from unauthorized usage.
⚠️ Error Reporting: Notifies you in Channel B if a link fails to process.
🔄 Retry Button: Failed scrapes or summarizations show a Retry button directly in Channel B.
👍👎 Feedback Buttons: Rate summaries inline; add free-form comments via bot DM deep-link.
📊 Langfuse Observability: Optional integration — logs every generation (prompt, response, latency) and user feedback scores to Langfuse.
🧪 Eval Tooling: Offline prompt hill-climbing loop — dump traces, generate rubrics, rate candidate prompts, and browse results in an HTML viewer.

📋 Prerequisites

Telegram Bot Token: Create your bot via @BotFather.
Gemini API Key: Grab an API key from Google AI Studio.
Channel IDs:
- Add your bot as an Admin to both Channel A (source) and Channel B (destination).
- Find your ID: Forward a message from the channel to @userinfobot. IDs look like -100xxxxxxxxxx.

🔐 Security & User Filtering

To keep your bot safe and prevent unwanted API costs:

Get your User ID: Message @userinfobot to get your numerical ID.
Set the Variable: Add AUTHORIZED_USER_ID=your_id_here to your configuration.
Result: The bot will only process messages sent by you. Unauthorized attempts are silently logged.

💻 Local Development & Testing

Clone & Enter:

git clone <your-repo-url>
cd post_summarizer_bot

Setup Environment: We recommend uv:
```
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
```
Note: Python 3.13 is required (pinned via .python-version). uv will download it automatically.
Configure:
```
cp .env.example .env
```
Fill in your tokens and IDs in the .env file. Langfuse keys are optional.

Run:

uv run python -m post_summarizer_bot.main

Debug Scraping:

uv run python scripts/debug_scrape.py "https://example.com/article"

Test Prompt Tuning:

uv run python scripts/test_prompt.py "https://example.com/article"

📊 Langfuse Observability (Optional)

The bot integrates with Langfuse to log every summarization and collect user feedback. It is fully optional — the bot runs normally without it.

To enable, add to your .env:

LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com  # optional, this is the default

Each successful summarization creates a Langfuse trace containing the full prompt and response. User 👍/👎 ratings and text comments are attached as scores on the same trace.

🚀 Deployment

The bot is a long-running process and needs to stay active 24/7 to poll Telegram.

General Requirements

Python 3.13
Persistent internet connection
Environment variables (see .env.example)

Option 1: Railway (Recommended)

Push your code to a GitHub repo.
In Railway, click "New Project" → "Deploy from GitHub repo".
Go to the Variables tab and add all keys from your .env.
Railway will use the Procfile and runtime.txt automatically.

Option 2: Render

Create a Background Worker (not a web service).
Connect your GitHub repository.
Set the start command to: python -m post_summarizer_bot.main.
Add your environment variables in the Environment tab.

Option 3: Fly.io

Install the Fly CLI and run fly launch.
Set secrets using fly secrets set KEY=VALUE.
Run fly deploy.

Option 4: Linux VPS (Systemd)

# /etc/systemd/system/telegram-bot.service
[Unit]
Description=Telegram Summarizer Bot
After=network.target

[Service]
WorkingDirectory=/path/to/bot
ExecStart=/path/to/venv/bin/python -m post_summarizer_bot.main
EnvironmentFile=/path/to/bot/.env
Restart=always

[Install]
WantedBy=multi-user.target

🛠️ Customization

Prompt Tuning: Edit SUMMARIZATION_PROMPT_TEMPLATE in post_summarizer_bot/prompts.py. Use scripts/test_prompt.py to preview changes immediately.
Model Choice: The model is gemini-3-flash-preview (set as MODEL_NAME in post_summarizer_bot/main.py).

📊 Eval / Prompt Tuning Workflow

An offline hill-climbing loop for systematically improving the prompt using collected feedback. Datasets are always versioned (e.g. eval/data/v1/).

Usage

# 1. Pull new traces from Langfuse
make eval-dump VERSION=v1

# 2. Generate global or example-specific rubrics using LLMs (review before proceeding)
make eval-rubrics VERSION=v1

# 3. Browse traces, mark examples as eval-ready, and edit example-specific rubrics
make eval-data-viewer VERSION=v1

# 4. Score the baseline prompt
make eval-rate VERSION=v1 PROMPT=eval/prompts/v1_baseline.txt

# 4b. Inspect results visually (command is printed at the end of eval-rate)
make eval-result-viewer RESULT=eval/data/v1/results/v1_baseline_<timestamp>.json

# 5. Write a new prompt variant, then compare
make eval-rate VERSION=v1 PROMPT=eval/prompts/v2.txt

HTML Eval Data Viewer

make eval-dataviewer VERSION=v1 starts a local server and opens the browser. You can:

Edit example-specific rubrics — saved directly to eval/data/v1/example_rubrics.jsonl on disk
Toggle "Ready for eval" on each trace — autorater.py filters to only marked traces
Delete traces or export with automatic backup

HTML Eval Result Viewer

After make eval-rate finishes it prints the exact command to open the viewer. Or run it manually:

make eval-result-viewer RESULT=eval/data/v1/results/run.json

Opens a read-only browser UI where you can:

Browse examples in the sidebar, with colored dots showing pass/fail per rubric
Click any rubric in the header or Rubrics tab to filter the sidebar to failing examples
Inspect the LLM Calls tab to see the exact prompts sent and raw verdicts returned

Rubric Tiers

Tier	Scope	Applied to	Output
Principle-based	Global (`global_rubrics.jsonl`)	Every example	Per-rubric pass rate
Example-specific	Per-trace (`example_rubrics.jsonl`)	Matching trace only	Overall pass rate

⚠️ Link Crawling Limitations

The bot uses Defuddle as its primary scraper (good for clean long-form pages) with trafilatura as a fallback, but this chain is not robust enough for all sites. It will often fail or return poor results for:

Social media (Twitter/X, Instagram, LinkedIn, etc.)
Paywalled content (NYT, WSJ, The Atlantic, etc.)
JavaScript-heavy sites that require a real browser to render
Sites with aggressive anti-bot measures

For broader coverage, consider pre-processing links through a dedicated crawling service such as Firecrawl before they reach this bot. A common pattern is a separate Telegram automation that fetches and cleans article content, then forwards the result to Channel A — but that pipeline is out of scope for this project.

This bot only concerns itself with two channels: a source channel (Channel A) and a destination channel (Channel B). What gets posted to Channel A, and how, is up to you.

🗺️ Future Work

👥 Whitelisting: Support for multiple authorized users.
💬 Custom Instructions: Tailor summaries via message captions.
🔌 Multi-Model Support: Add OpenAI or Anthropic integration.

Built with ❤️ and Gemini / Claude 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
agent_tasks		agent_tasks
eval		eval
post_summarizer_bot		post_summarizer_bot
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
Makefile		Makefile
Procfile		Procfile
README.md		README.md
nixpacks.toml		nixpacks.toml
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Telegram Post Summarizer Bot 📝

✨ Features

📋 Prerequisites

🔐 Security & User Filtering

💻 Local Development & Testing

📊 Langfuse Observability (Optional)

🚀 Deployment

General Requirements

Option 1: Railway (Recommended)

Option 2: Render

Option 3: Fly.io

Option 4: Linux VPS (Systemd)

🛠️ Customization

📊 Eval / Prompt Tuning Workflow

Usage

HTML Eval Data Viewer

HTML Eval Result Viewer

Rubric Tiers

⚠️ Link Crawling Limitations

🗺️ Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Telegram Post Summarizer Bot 📝

✨ Features

📋 Prerequisites

🔐 Security & User Filtering

💻 Local Development & Testing

📊 Langfuse Observability (Optional)

🚀 Deployment

General Requirements

Option 1: Railway (Recommended)

Option 2: Render

Option 3: Fly.io

Option 4: Linux VPS (Systemd)

🛠️ Customization

📊 Eval / Prompt Tuning Workflow

Usage

HTML Eval Data Viewer

HTML Eval Result Viewer

Rubric Tiers

⚠️ Link Crawling Limitations

🗺️ Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages