Skip to content

Commit 4149a18

Browse files
committed
Enhance arXiv documentation with OAI-PMH interface details
1 parent 6c95f00 commit 4149a18

1 file changed

Lines changed: 17 additions & 6 deletions

File tree

sources.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,32 @@ public domain. Below are the sources and their respective information:
66

77
## arXiv
88

9-
**Description:** arXiv is a free distribution service and an open-access archive for scholarly articles in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. All arXiv articles are available under various open licenses or are in the public domain.
9+
**Description:** arXiv is a free distribution service and an open-access archive
10+
for scholarly articles in physics, mathematics, computer science, quantitative
11+
biology, quantitative finance, statistics, electrical engineering and systems
12+
science, and economics. All arXiv articles are available under various open
13+
licenses or are in the public domain.
1014

1115
**API documentation link:**
1216
- [arXiv API User Manual](https://arxiv.org/help/api/user-manual)
1317
- [arXiv API Reference](https://arxiv.org/help/api)
14-
- [Base URL](http://export.arxiv.org/api/query)
18+
- [arXiv OAI-PMH Interface](https://arxiv.org/help/oa/index)
19+
- [Base URL (Standard API)](http://export.arxiv.org/api/query)
20+
- [Base URL (OAI-PMH)](https://oaipmh.arxiv.org/oai)
1521
- [arXiv Subject Classifications](https://arxiv.org/category_taxonomy)
1622
- [Terms of Use for arXiv APIs](https://info.arxiv.org/help/api/tou.html)
1723

1824
**API information:**
19-
- No API key required
25+
- No API key required for either interface
2026
- Query limit: No official limit, but requests should be made responsibly
21-
- Data available through Atom XML format
22-
- Supports search by fields: title (ti), author (au), abstract (abs), comment (co), journal reference (jr), subject category (cat), report number (rn), id, all (searches all fields), and submittedDate (date filter)
23-
- Metadata includes licensing information for each paper
27+
- **Standard API**: Data available through Atom XML format, supports search by
28+
various fields
29+
- **OAI-PMH Interface** (used by `arxiv_fetch.py`):
30+
- Structured metadata harvesting with resumption tokens
31+
- Better license metadata extraction for CC-licensed papers
32+
- Recommended 3-second delays between requests
33+
- Supports date-based filtering for bulk harvesting
34+
- Metadata includes comprehensive licensing information for each paper
2435

2536

2637
## CC Legal Tools

0 commit comments

Comments
 (0)