feat(catalog): seed unauthenticated public APIs (arXiv, OpenAlex, Crossref)#592
feat(catalog): seed unauthenticated public APIs (arXiv, OpenAlex, Crossref)#592AlyciaBHZ wants to merge 2 commits into
Conversation
…ssref) Adds DEFAULT_PUBLIC_SERVICE_SEEDS + a parallel seed loop in seed_default_services for catalog entries that don't bind to any ProviderConfig. Resulting DownstreamService rows have: - provider_config_id: None - auth_method: "none" - requires_user_credential: false - no ServiceProviderRequirement build_catalog_entry already tolerates `provider: None` and emits `requires_credential: false`, so these surface in the AI Services dialog as one-click no-auth services. The proxy injects nothing — the benefit is centralised audit logging and a single place to manage polite-pool / rate-limit headers across agents that hit the same public source. Three initial seeds: - `arxiv-api` (http://export.arxiv.org/api): Atom feed search/metadata - `api-openalex` (https://api.openalex.org): 240M+ scholarly works graph - `api-crossref` (https://api.crossref.org): DOI metadata + citations Each description includes the polite-pool convention so agents can discover it from `nyxid catalog show <slug>` without leaving NyxID. Tests: - `public_service_seeds_have_unique_slugs_and_no_collision_with_default_seeds` - `arxiv_public_seed_is_present_and_unauthenticated` Motivation: agents working on academic / open-problem domains (e.g. literature staleness checks against erdosproblems / RESEARCH_BOARD targets, citation graph mining) need these sources first-class. Today they have to use `service add --custom` per machine and lose the audit trail. Seeding them in catalog gives one-line `nyxid service add arxiv-api` everywhere.
edb9f9d to
d1934e0
Compare
|
Thanks for the careful read. Taking all three. 1. arXiv → httpsFixing. One-liner. Will land in the next push. 2. AI Services dialog filter — taking option (a)Agreed (a) is the right call: the description's framing should be true, and Plan:
3. Audit-trail framing — fixing the descriptionYou're right, the "loses audit trail" line is wrong as written. Will rewrite the motivation as:
Smaller / optionalAdding the comment line on The parallel-seed-table vs. threading- Push orderI'll batch (1) https, (2a) |
|
Pushed the requested follow-up fixes in What changed:
Validation:
For the polite-pool / |
Summary
Adds three first-class catalog entries for unauthenticated public academic APIs:
arxiv-api(https://export.arxiv.org/api) -- Atom feed search/metadata for arXiv papers.api-openalex(https://api.openalex.org) -- OpenAlex scholarly works, authors, institutions, concepts, and citations.api-crossref(https://api.crossref.org) -- Crossref DOI metadata and citation graph.These have no
ProviderConfigto bind to. The implementation introduces a parallelDEFAULT_PUBLIC_SERVICE_SEEDStable and a second seed loop inseed_default_servicesthat producesDownstreamServicerows withprovider_config_id: None,auth_method: "none",requires_user_credential: false, and noServiceProviderRequirement.build_catalog_entryalready toleratesprovider: Noneand returnsrequires_credential: false, so these services are represented as no-auth catalog entries rather than credential setup flows.Why route public APIs through NyxID?
The proxy injects nothing on these calls. The benefit is operational:
arxiv-apiinstead of repeating--custom --base-url ...setup on each machine.default_request_headersand inherited by every agent that enables the service.nyxid catalog show arxiv-apican explain the no-auth policy and official API docs from inside NyxID.This is not an audit-log distinction between catalog and custom services: custom services also route through NyxID and are audited. The PR is about reducing per-user boilerplate and avoiding drift for common public academic sources.
Motivation
I am using NyxID to broker external APIs for an outreach/research pipeline that scans scholarly sources while working on open mathematical problems. arXiv, OpenAlex, and Crossref are common enough in agent literature workflows that they are better represented as shared catalog entries than as repeated local custom definitions.
The same argument extends to citation mining, paper deduplication, related-work search, and research-board refresh tasks.
Implementation notes
DEFAULT_SERVICE_SEEDSrather than threadingOption<&str>through the existingprovider_slugfield. This isolates the no-provider path from the existing provider-backed seeds, so the SPR / token-exchange logic stays unchanged for credentialed cases.list_catalognow includes no-auth public API catalog rows by accepting the explicitauth_method: "none",service_category: "internal",provider_config_id: nullcase.DEFAULT_SERVICE_SEEDS.https://export.arxiv.org/api.Test plan
cargo fmt --checkcargo test -p nyxid provider_service::tests --no-fail-fastcargo test -p nyxid catalog_service::tests --no-fail-fastFollow-up ideas
default_request_headersfor polite-pool conventions where appropriate.documentation_urltoDownstreamServiceand move documentation links out of descriptions.