Hugging Face Trending
The Hugging Face Trending skill fetches and curates the day's most notable AI models, datasets, and spaces from the Hugging Face Hub, filtering noise like test models and redundant fine-tunes to surface 5–8 genuinely useful picks with one-line explanations of why each matters. Use it daily to stay informed on new AI artifacts landing first on the Hub without wading through low-signal trending pages.
git clone --depth 1 https://github.com/aaronjmars/aeon /tmp/hugging-face-trending && cp -r /tmp/hugging-face-trending/skills/huggingface-trending ~/.claude/skills/hugging-face-trendingSKILL.md
> **${var}** — Optional. One of `models`, `datasets`, `spaces` to scope the digest to a single resource type. Empty = pull from all three and let the curator pick the best 5–8 across them.
Today is ${today}. The Hugging Face Hub is where new AI artifacts land first — models hours after a paper, datasets before they get cited, spaces as the first runnable form of a technique. The Hub's own front page lists "trending" but doesn't filter the noise (test models, gated previews, redundant fine-tunes of the same base). This skill mirrors `github-trending`'s contract for the AI ecosystem: don't dump the top 10, deliver a **curated** slate of 5–8 picks a busy AI/dev reader would actually want to click, with a one-line "why notable" each.
Read `memory/MEMORY.md` for context.
Read the last 3 days of `memory/logs/` to dedupe artifacts already featured.
Read `soul/SOUL.md` + `soul/STYLE.md` if populated to match voice.
## Steps
### 1. Fetch candidates
The Hugging Face Hub REST API is fully keyless for the list endpoints used here. Pull trending across all three resource types unless `${var}` narrows it:
```bash
# Models — sort=trendingScore returns the same ranking that backs the HF front page
curl -sf "https://huggingface.co/api/models?sort=trendingScore&direction=-1&limit=20" \
-H "accept: application/json" \
-H "user-agent: aeon/1.0 (+https://github.com/aaronjmars/aeon)" \
> .hf-models.json
# Datasets
curl -sf "https://huggingface.co/api/datasets?sort=trendingScore&direction=-1&limit=15" \
-H "accept: application/json" \
-H "user-agent: aeon/1.0 (+https://github.com/aaronjmars/aeon)" \
> .hf-datasets.json
# Spaces
curl -sf "https://huggingface.co/api/spaces?sort=trendingScore&direction=-1&limit=15" \
-H "accept: application/json" \
-H "user-agent: aeon/1.0 (+https://github.com/aaronjmars/aeon)" \
> .hf-spaces.json
```
If `${var}` is set to `models` / `datasets` / `spaces`, fetch only that endpoint.
If any `curl` fails (sandbox blocks outbound from bash on some runs), use **WebFetch** as a fallback for the same URL. WebFetch bypasses the sandbox and parses the JSON for you. If both fail across all three resources, log `HF_TRENDING_ERROR` with the failure detail, send a brief notify (*"Hugging Face Trending — sources unavailable today."*), and exit.
For each entry extract:
- `id` (always present, format `owner/name`) — split on `/` to get author + name
- `likes`, `downloads` (models/datasets only, spaces have no `downloads`), `trendingScore`
- `tags` (filter out `region:*`, `license:*`, and storage-format noise like `endpoints_compatible`, `safetensors`, `gguf`)
- `pipeline_tag` (models) — the canonical task label (e.g. `text-generation`, `text-to-image`)
- `library_name` (models) — `transformers`, `diffusers`, `mlx`, etc.
- `sdk` (spaces) — `gradio` / `streamlit` / `docker` / `static`
- `createdAt`, `lastModified` (when present)
- Resource type (`models` / `datasets` / `spaces`) — preserve so the renderer can pick the right footer
- Permalink: `https://huggingface.co/{id}` for models, `/datasets/{id}` for datasets, `/spaces/{id}` for spaces
### 2. Filter noise (required)
Drop entries matching these patterns — they're low-signal:
- **Test / debug artifacts**: `id` containing `-test`, `-debug`, `-tmp`, `-scratch`, `-playground`, or starting with `test-` / `debug-`
- **Gated / private preview shells**: entries flagged `gated: true` *and* with `<10` likes (HF gates lots of legit work, but a gated artifact with no community signal is usually a draft)
- **Trivial fine-tunes**: model `id` ending in `-finetune`, `-ft`, `-lora-test`, or with `<5` likes AND `<100` downloads (real momentum picks both)
- **Already featured**: anything that appeared in `memory/logs/YYYY-MM-DD.md` for the last 3 days
- **Quantization-only forks**: `id` ending in `-gguf`, `-awq`, `-gptq`, `-int4`, `-int8`, `-fp8` *unless* it has `>500` likes — quantizations of a base model are useful but rarely the most interesting story; the base usually carries the narrative
- **Spaces with `runtime.status: ERROR`** if the field is present (broken demos shouldn't be recommended)
- **Spaces called "demo"** or "example" with `<20` likes — boilerplate scaffolds
If an entry barely fails a filter but is genuinely interesting (novel architecture, first-of-kind dataset, reference implementation of a fresh paper), you may keep it — note it as a judgment call in the log.
### 3. Require a "why notable" for each survivor
For every survivor, write **one line** (≤ 18 words) explaining *why someone should care today*. No paraphrasing the model card / dataset description.
Good: *"First open-weight 70B trained end-to-end with online RL — beats Llama 3 70B on AGIEval, MIT-licensed."*
Bad: *"A new instruction-tuned LLM."* (that's just the description)
If you can't write a concrete "why notable" line for an entry, **drop it**. The filter is the feature.
When the artifact references a paper, you may pull one verifying detail via **WebFetch** on the arxiv URL or the HF model card — but cap at 1 fetch per pick, and only when it materially sharpens the line.
### 4. Tag momentum
Tag each survivor with one of:
- **DEBUT** — `createdAt` within the last 7 days (first-time trending)
- **ACCELERATING** — older than 7 days, `trendingScore > 50` AND `likes > 200`
- **RETURNING** — `createdAt` older than 90 days but trending again — usually a release, a viral post, or a paper drop reviving interest. Note the reason in "why notable" when known
- **HOLDOVER** — appeared in the last day's logs (use sparingly; prefer to drop unless there's a new development)
### 5. Cluster into categories
Buckets are heuristic — classify by what the artifact does, not by author self-description. Cap total buckets at **5** (merge if you hit 6+). Group survivors:
- **LLMs / Reasoning** — text-generation, instruction-tuned, reasoning-tuned, RAG models
- **Multimodal** — text-to-image, text-to-video, vision-language, speech, music
- **Agents / Tooling** —Mention/keyword sweep on social platforms for [REPLACE: KEYWORDS] — trends, sentiment, top posts
5 concrete real-life actions, leverage-scored against open loops with specificity and anti-fluff gates
Curated AI-agent tweets, clustered into narratives with insight summaries
Tracker of AI agent substitution signals — which roles, companies, and industries show real headcount displacement. Named roles + real deployments only.
Competitive-intelligence digest on the AI agent framework space — momentum, releases, breaking changes across a curated watchlist
Cross-domain market pulse from AIXBT's free grounding endpoint — crypto, macro, tradfi, geopolitics. Refreshes taxonomy references (clusters, chains) as a bonus.
Pre-batch API provider health check — detects credit exhaustion or auth failure for every configured provider key before the scheduled batch runs, giving the operator a window to act before skills degrade
List a wallet's live ERC-20 token approvals on Base and flag unlimited / risky spender grants. Keyless via Base RPC (eth_getLogs + eth_call) — no explorer key needed.