arxiv
This Claude Code skill searches and retrieves academic papers from arXiv's free REST API using curl and optional Python parsing, supporting queries by keyword, author, category, or paper ID. Use it when you need to find research papers on specific topics, locate papers by particular authors, or retrieve abstracts and PDF links for further analysis with other tools like web_extract.
git clone --depth 1 https://github.com/moltis-org/moltis /tmp/arxiv && cp -r /tmp/arxiv/crates/skills/src/assets/research/arxiv ~/.claude/skills/arxivSKILL.md
# arXiv Research
Search and retrieve academic papers from arXiv via their free REST API. No API key, no dependencies — just curl.
## Quick Reference
| Action | Command |
|--------|---------|
| Search papers | `curl "https://export.arxiv.org/api/query?search_query=all:QUERY&max_results=5"` |
| Get specific paper | `curl "https://export.arxiv.org/api/query?id_list=2402.03300"` |
| Read abstract (web) | `web_extract(urls=["https://arxiv.org/abs/2402.03300"])` |
| Read full paper (PDF) | `web_extract(urls=["https://arxiv.org/pdf/2402.03300"])` |
## Searching Papers
The API returns Atom XML. Parse with `grep`/`sed` or pipe through `python3` for clean output.
### Basic search
```bash
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5"
```
### Clean output (parse XML to readable format)
```bash
curl -s "https://export.arxiv.org/api/query?search_query=all:GRPO+reinforcement+learning&max_results=5&sortBy=submittedDate&sortOrder=descending" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom'}
root = ET.parse(sys.stdin).getroot()
for i, entry in enumerate(root.findall('a:entry', ns)):
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
arxiv_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
published = entry.find('a:published', ns).text[:10]
authors = ', '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
summary = entry.find('a:summary', ns).text.strip()[:200]
cats = ', '.join(c.get('term') for c in entry.findall('a:category', ns))
print(f'{i+1}. [{arxiv_id}] {title}')
print(f' Authors: {authors}')
print(f' Published: {published} | Categories: {cats}')
print(f' Abstract: {summary}...')
print(f' PDF: https://arxiv.org/pdf/{arxiv_id}')
print()
"
```
## Search Query Syntax
| Prefix | Searches | Example |
|--------|----------|---------|
| `all:` | All fields | `all:transformer+attention` |
| `ti:` | Title | `ti:large+language+models` |
| `au:` | Author | `au:vaswani` |
| `abs:` | Abstract | `abs:reinforcement+learning` |
| `cat:` | Category | `cat:cs.AI` |
| `co:` | Comment | `co:accepted+NeurIPS` |
### Boolean operators
```
# AND (default when using +)
search_query=all:transformer+attention
# OR
search_query=all:GPT+OR+all:BERT
# AND NOT
search_query=all:language+model+ANDNOT+all:vision
# Exact phrase
search_query=ti:"chain+of+thought"
# Combined
search_query=au:hinton+AND+cat:cs.LG
```
## Sort and Pagination
| Parameter | Options |
|-----------|---------|
| `sortBy` | `relevance`, `lastUpdatedDate`, `submittedDate` |
| `sortOrder` | `ascending`, `descending` |
| `start` | Result offset (0-based) |
| `max_results` | Number of results (default 10, max 30000) |
```bash
# Latest 10 papers in cs.AI
curl -s "https://export.arxiv.org/api/query?search_query=cat:cs.AI&sortBy=submittedDate&sortOrder=descending&max_results=10"
```
## Fetching Specific Papers
```bash
# By arXiv ID
curl -s "https://export.arxiv.org/api/query?id_list=2402.03300"
# Multiple papers
curl -s "https://export.arxiv.org/api/query?id_list=2402.03300,2401.12345,2403.00001"
```
## BibTeX Generation
After fetching metadata for a paper, generate a BibTeX entry:
{% raw %}
```bash
curl -s "https://export.arxiv.org/api/query?id_list=1706.03762" | python3 -c "
import sys, xml.etree.ElementTree as ET
ns = {'a': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}
root = ET.parse(sys.stdin).getroot()
entry = root.find('a:entry', ns)
if entry is None: sys.exit('Paper not found')
title = entry.find('a:title', ns).text.strip().replace('\n', ' ')
authors = ' and '.join(a.find('a:name', ns).text for a in entry.findall('a:author', ns))
year = entry.find('a:published', ns).text[:4]
raw_id = entry.find('a:id', ns).text.strip().split('/abs/')[-1]
cat = entry.find('arxiv:primary_category', ns)
primary = cat.get('term') if cat is not None else 'cs.LG'
last_name = entry.find('a:author', ns).find('a:name', ns).text.split()[-1]
print(f'@article{{{last_name}{year}_{raw_id.replace(\".\", \"\")},')
print(f' title = {{{title}}},')
print(f' author = {{{authors}}},')
print(f' year = {{{year}}},')
print(f' eprint = {{{raw_id}}},')
print(f' archivePrefix = {{arXiv}},')
print(f' primaryClass = {{{primary}}},')
print(f' url = {{https://arxiv.org/abs/{raw_id}}}')
print('}')
"
```
{% endraw %}
## Reading Paper Content
After finding a paper, read it:
```
# Abstract page (fast, metadata + abstract)
web_extract(urls=["https://arxiv.org/abs/2402.03300"])
# Full paper (PDF → markdown via Firecrawl)
web_extract(urls=["https://arxiv.org/pdf/2402.03300"])
```
For local PDF processing, see the `ocr-and-documents` skill.
## Common Categories
| Category | Field |
|----------|-------|
| `cs.AI` | Artificial Intelligence |
| `cs.CL` | Computation and Language (NLP) |
| `cs.CV` | Computer Vision |
| `cs.LG` | Machine Learning |
| `cs.CR` | Cryptography and Security |
| `stat.ML` | Machine Learning (Statistics) |
| `math.OC` | Optimization and Control |
| `physics.comp-ph` | Computational Physics |
Full list: https://arxiv.org/category_taxonomy
## Helper Script
The `scripts/search_arxiv.py` script handles XML parsing and provides clean output:
```bash
python scripts/search_arxiv.py "GRPO reinforcement learning"
python scripts/search_arxiv.py "transformer attention" --max 10 --sort date
python scripts/search_arxiv.py --author "Yann LeCun" --max 5
python scripts/search_arxiv.py --category cs.AI --sort date
python scripts/search_arxiv.py --id 2402.03300
python scripts/search_arxiv.py --id 2402.03300,2401.12345
```
No dependencies — uses only Python stdlib.
---
## Semantic Scholar (Citations, Related Papers, Author Profiles)
arXiv doesn't provide citation data or recommendations. Use the **Semantic Scholar API** for that — free, no key needed for basic use (1 req/sec), returns JSON.
### Get paper details + cCommit all changes, push branch, create/update PR, and run local validation
Manage Apple Notes via the memo CLI on macOS (create, view, search, edit).
Manage Apple Reminders via remindctl CLI (list, add, complete, delete).
Track Apple devices and AirTags via FindMy.app on macOS using AppleScript and screen capture.
Send and receive iMessages/SMS via the imsg CLI on macOS.
Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Local speech-to-text with the Whisper CLI (no API key).
ElevenLabs text-to-speech with mac-style say UX.