Skip to content

Add feapder crawler Codex skill#312

Open
ShellMonster wants to merge 2 commits into
Boris-code:masterfrom
ShellMonster:docs/add-feapder-crawler-skill
Open

Add feapder crawler Codex skill#312
ShellMonster wants to merge 2 commits into
Boris-code:masterfrom
ShellMonster:docs/add-feapder-crawler-skill

Conversation

@ShellMonster
Copy link
Copy Markdown
Contributor

Summary

Add a Codex Skill for feapder crawler development under Skills/feapder-crawler.

The skill packages feapder usage guidance for AI agents, covering:

  • spider selection: AirSpider, Spider, TaskSpider, BatchSpider
  • standard project workflow: feapder create -p <project_name> and feapder create -s <spider_name>
  • existing-project handling: locate the real project root and merge changes into the active setting.py
  • request/response patterns, item pipelines, parser integration, rendering, proxy, dedup, retry, and failure hooks
  • feaplat deployment diagnosis and commonly used feapder utilities
  • guardrails that require agents to stay on feapder paths unless the user explicitly authorizes a fallback

The guidance text is written in Chinese, while code, class names, commands, and config identifiers remain in English.

Testing

  • Ran Codex Skill validation:

    python /Users/daozhang/.codex/skills/.system/skill-creator/scripts/quick_validate.py /Users/daozhang/Downloads/feapder/Skills/feapder-crawler

    Result: Skill is valid!

  • Ran approximately 76 Codex sub-agent evaluation cases while developing and tuning the skill:

    • 45 trigger and boundary cases:

      • 15 explicit feapder should-trigger prompts
      • 15 contextual should-trigger prompts where the project/dependency context uses feapder
      • 15 should-not-trigger prompts for requests-only, Scrapy, Playwright-only, FastAPI, Celery, BeautifulSoup, SQLAlchemy, aiohttp, and other non-feapder tasks
      • Result: 45/45 matched expected trigger behavior
    • 10 output adherence prompts, checked across 20 expected behaviors:

      • AirSpider crawler generation
      • CsvPipeline and ITEM_PIPELINES
      • BatchSpider task source handling
      • Redis/settings troubleshooting
      • JavaScript rendering through feapder-supported paths
      • curl/debug conversion
      • Scrapy boundary handling
      • feapder project requesting direct requests usage
      • failed request handling
      • plain Python crawler boundary behavior
      • Result: 20/20 followed expected feapder guidance
    • 8 authorization and fallback cases:

      • Verified the agent must explain reason, impact, and alternatives, then ask the user for authorization before falling back to requests, Scrapy, standalone Playwright, handwritten CSV, or direct SQL
      • Result: 8/8 passed
    • 8 project creation and settings cases:

      • Existing project setting.py incremental edits
      • Preserving existing ITEM_PIPELINES
      • New project creation with feapder create -p
      • Existing standard project spider creation from <project_root>/spiders/ using feapder create -s
      • Single-file AirSpider boundary handling
      • Result: 16/16 expected checks passed
    • Focused retests after tuning:

      • feaplat read-only diagnosis
      • parser integration templates
      • TaskSpider MySQL/Redis task source templates
      • failure hooks such as exception_request and failed_request
      • project root and CLI execution directory rules
      • Result: 10/10 focused checks passed

@ShellMonster
Copy link
Copy Markdown
Contributor Author

补充一个这个 Skill 能解决的具体例子:

feapder 的 Response.jsonrequests.Response.json() 不一样。源码里 json@property,文档也写了 feapder 应该使用:

data = response.json

而不是:

data = response.json()

但如果没有这个 Skill,AI 即使读过源码和文档,也很容易被 requests 的通用习惯带偏,在生成“请求某个 JSON API 并解析字段”的爬虫代码时写成 response.json(),导致运行时报错。

这个 PR 里的 Skill 把这类 feapder 专属 API 差异收进了 references/request-response.md,并在测试里专门验证了 JSON API 解析场景:明确 feapder、现有 feapder 项目、以及用户没明说 feapder 但项目上下文是 feapder 的 3 类 case,生成代码都使用了 response.json,没有退化成 requests 风格。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant