TechDistill

TechDistill is a noise-reducing AI information workflow for developers and independent researchers.

It scrapes trending projects and posts from GitHub, Hugging Face, and Product Hunt, enriches them with detail, uses AI to generate brief commentary and a daily overview, and produces a set of Markdown reports you can archive, search, and share.

Vision

In a world of information overload, what is often scarce is not raw data, but stable, low-noise material you can keep reading and reasoning from.

TechDistill Pipeline aims to be more than a one-off trending scraper: it is a path from raw inputs to higher-signal outputs—continuous collection, measured deep dives, structured organization, summarized archives, and steady delivery of what matters. It is meant to grow into a lightweight, long-lived personal technical information pipeline.

Current capabilities

You can think of this project as a fully automated “information refinement” pipeline that runs end to end.

Collection. The tool pulls the most visible repositories on GitHub, models on Hugging Face, and posts on Product Hunt from three of the busiest corners of the tech community.

Enrichment. It does not stop at titles. It follows through to GitHub READMEs, Hugging Face Model Cards, and baseline metadata, as well as full post bodies and descriptions on Product Hunt, ensuring the source material is as complete as practical.

AI analysis. Through a pluggable API, the model reads that material, writes a short note for each item on what it actually is, and produces a bird’s-eye overview across everything captured in a run.

Delivery. The refined content is laid out as readable Markdown. What you get is a clean, high-signal briefing instead of a pile of tabs and long raw pages. You can also wire a Telegram bot to receive results in Telegram.

Architecture

By responsibility, the project is a lightweight technical information pipeline:

graph LR
    A[Sources] --> B[Spiders]
    B --> C[Pipeline]
    C --> D[AI Analysis]
    D --> E[Markdown Reports]
    E --> F[Telegram Push]

Sources
- GitHub Trending
- Hugging Face
- Product Hunt
Spiders
- List and detail scraping
Pipeline
- Deep fetch, AI analysis, overview generation, and aggregation
Delivery
- Markdown report output and Telegram push

Telegram delivery

When Telegram push is configured, each Markdown report is delivered as a file attachment in chat. Example (TechPulse bot):

Tech stack

Python 3.10+
requests
httpx
jinja2
rich
watchdog
diskcache
OpenRouter-compatible API

GitHub Actions

The repository includes .github/workflows/prism-pipeline.yml, which runs the full pipeline on GitHub-hosted runners: crawling, AI analysis, Markdown reports, and optional Telegram push (main.py).

Schedule: cron: "28 6 * * *" — once per day at 06:28 UTC (adjust the expression in the workflow file if you want a different time).
Manual runs: workflow_dispatch is enabled so you can start a run from the Actions tab.
Secrets: Configure repository secrets to match your needs (see the comments at the top of the workflow file and .env-example). Typical values include PH_API_TOKEN, GH_TOKEN (injected as GITHUB_TOKEN), HF_TOKEN, OPENROUTER_API_KEY, and optionally TG_BOT_TOKEN, TG_CHAT_ID, and OPENROUTER_CHAT_COMPLETIONS_EXTRA_JSON. How to obtain each credential is documented in Access_Token/en.md (English)。
Artifacts: The default job does not commit reports to the branch; output exists for that run on the runner unless you add upload/push steps.

Roadmap

The current release already fetches rich detail from each source, sends it through AI, produces de-noised text, can push notifications, and can run on a daily GitHub Actions schedule.

Planned work includes:

Dockerfile or other deployment options
Stronger history, state management, and deduplication
Personal context and preference weighting
Trend clustering, topic synthesis, and anti-hype filtering
More technical signal sources

Quick start

For a step-by-step setup guide, see the Quick Start Guide.

OpenRouter — model choice (read this). On GitHub Actions, do not use free / :free models. Runners are usually on Microsoft Azure; OpenRouter applies strict limits on concurrent requests from that region. For .github/workflows/openrouter-first-token-latency.yml, which runs test/test_first_token_latency.py, every run that used a free model has failed—there has never been a successful finish. Use a low-cost paid model id with the same workflow. Apply the same rule to OPENROUTER_MODEL, OVERVIEW_MODEL, and any model you set in Actions env/secrets when you rely on CI or concurrency.

1. Install dependencies

# Create a virtual environment (Recommended)
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Environment variables

Credentials tutorial: For full instructions on obtaining every token and secret used below (Product Hunt, GitHub, Hugging Face, OpenRouter, Telegram), see Access_Token/en.md (English) and Access_Token/cn.md (简体中文).

Copy .env-example to .env and fill in your values. The block below matches the repository .env-example (comments and placeholders included):

# Product Hunt API Token (v2 Developer Token)
PH_API_TOKEN=YOUR_PH_API_TOKEN_HERE

# GitHub Personal Access Token (higher API rate limits)
GITHUB_TOKEN=YOUR_GITHUB_TOKEN_HERE

# Hugging Face Access Token (required to bypass rate limits and enable deep data retrieval)
# Create at: https://huggingface.co/settings/tokens
HF_TOKEN=YOUR_HF_TOKEN_HERE

OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY_HERE
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_MODEL=deepseek/deepseek-v3.2

TG_BOT_TOKEN=YOUR_Telegram_Bot_Token_HERE

TG_CHAT_ID=YOUR_Telegram_Chat_ID_HERE

# Report directory to watch (default reports; used for Telegram auto-push)
REPORT_WATCH_DIR=reports

# Overview summary generation
OVERVIEW_ENABLED=true
OVERVIEW_AI_ENABLED=true
OVERVIEW_MODEL=minimax/minimax-m2.5:free
OVERVIEW_MAX_INPUT_ITEMS=6
OVERVIEW_MAX_OUTPUT_CHARS=1200
OVERVIEW_INCLUDE_AI_COMMENT=true

# Per-channel AI comments: max characters before writing to report (0 = no truncation)
AI_COMMENT_MAX_CHARS=2000
# Streaming uses delta.content only; if empty, fall back to concatenating reasoning (default false)
OPENROUTER_STREAM_FALLBACK_TO_REASONING=false
# Upper bound for max_tokens on channel analysis calls (aligned with short-comment prompts)
AI_COMMENT_MAX_TOKENS=768
# Optional: extra JSON merged into chat/completions body (gateway-specific)
# OPENROUTER_CHAT_COMPLETIONS_EXTRA_JSON=

Notes:

PH_API_TOKEN is required for Product Hunt scraping
GITHUB_TOKEN and HF_TOKEN are required to bypass rate limits and enable deep data retrieval
OPENROUTER_* configures an OpenRouter-compatible gateway; if OPENROUTER_API_KEY is missing, AI analysis and the overview may be skipped or degraded
Telegram push is enabled only when both TG_BOT_TOKEN and TG_CHAT_ID are set
REPORT_WATCH_DIR is the watch path; default is reports
OVERVIEW_* toggles the overview, AI vs non-AI paths, model, item count, output length, and whether channel AI blurbs are included
AI_COMMENT_MAX_CHARS, AI_COMMENT_MAX_TOKENS, and OPENROUTER_STREAM_FALLBACK_TO_REASONING control comment length, token caps, and streaming fallback; OPENROUTER_CHAT_COMPLETIONS_EXTRA_JSON is optional for extra request fields

For parsing rules and defaults, see utils/config.py.

3. Run

python main.py

CLI flags are defined in main.py:

python main.py [--deep|--no-deep] [--ai|--no-ai] [--watch|--no-watch] [--limit N]

Examples:

# Default run
python main.py

# No deep fetch or AI
python main.py --no-deep --no-ai

# Only three items per source
python main.py --limit 3

# Generate reports but do not push Telegram
python main.py --no-watch

4. Unit tests

From the repository root (stdlib unittest; pytest not required):

python -m unittest discover -s test -p "test_*.py" -v

Note: test/test_first_token_latency.py and test/openrouter_raw_probe.py need network access and API keys; they are not picked up by the pattern above—run them manually when needed.

5. Output

Each run creates a new batch directory under reports/, for example:

reports/
`-- TECH_PULSE_20260328_091045/
    |-- overview.md
    |-- github.md
    |-- hf.md
    `-- ph.md

overview.md — daily overview
github.md — GitHub Trending report
hf.md — Hugging Face report
ph.md — Product Hunt report

Troubleshooting

Issue	Resolution
`PH_API_TOKEN` missing	Product Hunt scraping requires a token. Generate one in the Product Hunt Developer Dashboard.
AI analysis skipped	Verify `OPENROUTER_API_KEY` is correctly set and has remaining credit.
Telegram notification failed	Ensure both `TG_BOT_TOKEN` and `TG_CHAT_ID` are valid.
Provider Error	In the OpenRouter model overview, check the "Providers" section. Add the specific provider to your "Allowed Providers" list in OpenRouter's privacy settings.
Privacy Policy Error (Free Models)	In OpenRouter Settings > Privacy > Data Policies, ensure the two checkboxes for "Free endpoints" are enabled.

Technical Edge Cases

Error	Cause	Resolution
`LocalProtocolError`	Triggered by Rate Limiting when calling free OpenRouter APIs concurrently (common on GHA nodes like Azure).	Recommended: Switch to a low-cost paid model for higher QPS. For local testing, you may continue using free models as concurrency is usually lower.

Support Me

If this project has saved you time or solved a problem, please consider supporting its ongoing development. Your contributions help cover infrastructure costs and allow me to dedicate more time to maintaining and improving the software for everyone.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
Access_Token		Access_Token
QuickStart		QuickStart
core		core
spiders		spiders
test		test
utils		utils
.env-example		.env-example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README-CN.md		README-CN.md
README.md		README.md
RESULT.jpg		RESULT.jpg
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TechDistill

Vision

Current capabilities

Architecture

Telegram delivery

Tech stack

GitHub Actions

Roadmap

Quick start

1. Install dependencies

2. Environment variables

3. Run

4. Unit tests

5. Output

Troubleshooting

Technical Edge Cases

Support Me

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TechDistill

Vision

Current capabilities

Architecture

Telegram delivery

Tech stack

GitHub Actions

Roadmap

Quick start

1. Install dependencies

2. Environment variables

3. Run

4. Unit tests

5. Output

Troubleshooting

Technical Edge Cases

Support Me

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages