Skip to main content
ClaudeWave
Skill510 estrellas del repoactualizado today

Hugging Face Trending

The Hugging Face Trending skill fetches and curates the day's most notable AI models, datasets, and spaces from the Hugging Face Hub, filtering noise like test models and redundant fine-tunes to surface 5–8 genuinely useful picks with one-line explanations of why each matters. Use it daily to stay informed on new AI artifacts landing first on the Hub without wading through low-signal trending pages.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/aaronjmars/aeon /tmp/hugging-face-trending && cp -r /tmp/hugging-face-trending/skills/huggingface-trending ~/.claude/skills/hugging-face-trending
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

> **${var}** — Optional. One of `models`, `datasets`, `spaces` to scope the digest to a single resource type. Empty = pull from all three and let the curator pick the best 5–8 across them.

Today is ${today}. The Hugging Face Hub is where new AI artifacts land first — models hours after a paper, datasets before they get cited, spaces as the first runnable form of a technique. The Hub's own front page lists "trending" but doesn't filter the noise (test models, gated previews, redundant fine-tunes of the same base). This skill mirrors `github-trending`'s contract for the AI ecosystem: don't dump the top 10, deliver a **curated** slate of 5–8 picks a busy AI/dev reader would actually want to click, with a one-line "why notable" each.

Read `memory/MEMORY.md` for context.
Read the last 3 days of `memory/logs/` to dedupe artifacts already featured.
Read `soul/SOUL.md` + `soul/STYLE.md` if populated to match voice.

## Steps

### 1. Fetch candidates

The Hugging Face Hub REST API is fully keyless for the list endpoints used here. Pull trending across all three resource types unless `${var}` narrows it:

```bash
# Models — sort=trendingScore returns the same ranking that backs the HF front page
curl -sf "https://huggingface.co/api/models?sort=trendingScore&direction=-1&limit=20" \
  -H "accept: application/json" \
  -H "user-agent: aeon/1.0 (+https://github.com/aaronjmars/aeon)" \
  > .hf-models.json

# Datasets
curl -sf "https://huggingface.co/api/datasets?sort=trendingScore&direction=-1&limit=15" \
  -H "accept: application/json" \
  -H "user-agent: aeon/1.0 (+https://github.com/aaronjmars/aeon)" \
  > .hf-datasets.json

# Spaces
curl -sf "https://huggingface.co/api/spaces?sort=trendingScore&direction=-1&limit=15" \
  -H "accept: application/json" \
  -H "user-agent: aeon/1.0 (+https://github.com/aaronjmars/aeon)" \
  > .hf-spaces.json
```

If `${var}` is set to `models` / `datasets` / `spaces`, fetch only that endpoint.

If any `curl` fails (sandbox blocks outbound from bash on some runs), use **WebFetch** as a fallback for the same URL. WebFetch bypasses the sandbox and parses the JSON for you. If both fail across all three resources, log `HF_TRENDING_ERROR` with the failure detail, send a brief notify (*"Hugging Face Trending — sources unavailable today."*), and exit.

For each entry extract:
- `id` (always present, format `owner/name`) — split on `/` to get author + name
- `likes`, `downloads` (models/datasets only, spaces have no `downloads`), `trendingScore`
- `tags` (filter out `region:*`, `license:*`, and storage-format noise like `endpoints_compatible`, `safetensors`, `gguf`)
- `pipeline_tag` (models) — the canonical task label (e.g. `text-generation`, `text-to-image`)
- `library_name` (models) — `transformers`, `diffusers`, `mlx`, etc.
- `sdk` (spaces) — `gradio` / `streamlit` / `docker` / `static`
- `createdAt`, `lastModified` (when present)
- Resource type (`models` / `datasets` / `spaces`) — preserve so the renderer can pick the right footer
- Permalink: `https://huggingface.co/{id}` for models, `/datasets/{id}` for datasets, `/spaces/{id}` for spaces

### 2. Filter noise (required)

Drop entries matching these patterns — they're low-signal:

- **Test / debug artifacts**: `id` containing `-test`, `-debug`, `-tmp`, `-scratch`, `-playground`, or starting with `test-` / `debug-`
- **Gated / private preview shells**: entries flagged `gated: true` *and* with `<10` likes (HF gates lots of legit work, but a gated artifact with no community signal is usually a draft)
- **Trivial fine-tunes**: model `id` ending in `-finetune`, `-ft`, `-lora-test`, or with `<5` likes AND `<100` downloads (real momentum picks both)
- **Already featured**: anything that appeared in `memory/logs/YYYY-MM-DD.md` for the last 3 days
- **Quantization-only forks**: `id` ending in `-gguf`, `-awq`, `-gptq`, `-int4`, `-int8`, `-fp8` *unless* it has `>500` likes — quantizations of a base model are useful but rarely the most interesting story; the base usually carries the narrative
- **Spaces with `runtime.status: ERROR`** if the field is present (broken demos shouldn't be recommended)
- **Spaces called "demo"** or "example" with `<20` likes — boilerplate scaffolds

If an entry barely fails a filter but is genuinely interesting (novel architecture, first-of-kind dataset, reference implementation of a fresh paper), you may keep it — note it as a judgment call in the log.

### 3. Require a "why notable" for each survivor

For every survivor, write **one line** (≤ 18 words) explaining *why someone should care today*. No paraphrasing the model card / dataset description.

Good: *"First open-weight 70B trained end-to-end with online RL — beats Llama 3 70B on AGIEval, MIT-licensed."*
Bad: *"A new instruction-tuned LLM."* (that's just the description)

If you can't write a concrete "why notable" line for an entry, **drop it**. The filter is the feature.

When the artifact references a paper, you may pull one verifying detail via **WebFetch** on the arxiv URL or the HF model card — but cap at 1 fetch per pick, and only when it materially sharpens the line.

### 4. Tag momentum

Tag each survivor with one of:

- **DEBUT** — `createdAt` within the last 7 days (first-time trending)
- **ACCELERATING** — older than 7 days, `trendingScore > 50` AND `likes > 200`
- **RETURNING** — `createdAt` older than 90 days but trending again — usually a release, a viral post, or a paper drop reviving interest. Note the reason in "why notable" when known
- **HOLDOVER** — appeared in the last day's logs (use sparingly; prefer to drop unless there's a new development)

### 5. Cluster into categories

Buckets are heuristic — classify by what the artifact does, not by author self-description. Cap total buckets at **5** (merge if you hit 6+). Group survivors:

- **LLMs / Reasoning** — text-generation, instruction-tuned, reasoning-tuned, RAG models
- **Multimodal** — text-to-image, text-to-video, vision-language, speech, music
- **Agents / Tooling** —