FunASR

Name: modelscope/FunASR
Author: modelscope

Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenAI-compatible/MCP serving.

MCP ServersRegistry oficial19.5k estrellas2k forks● PythonMITActualizado today

Nota editorial

FunASR is a self-hosted speech recognition toolkit that transcribes audio files and live streams across 50-plus languages, combining voice activity detection (using the FSMN-VAD model), speaker diarization (CAM++ model), punctuation restoration, and emotion classification (happy, sad, angry) in a single Python call. The toolkit's flagship SenseVoice-Small model processes audio at 170 times realtime on GPU and 17 times realtime on CPU, which the project benchmarks as 13 times faster than Whisper-large-v3 on 192 minutes of long-form audio. It connects to Claude through a dedicated MCP server in the examples/mcp_server directory, allowing Claude-based agents and tools like Cursor to invoke transcription directly. A built-in `funasr-server` CLI launches an OpenAI-compatible REST endpoint on localhost:8000, which integrates with LangChain, Dify, and AutoGen pipelines. The newer Fun-ASR-Nano variant pairs a SenseVoice encoder with a Qwen3-0.6B decoder and supports vLLM acceleration for batch workloads. Developers building meeting transcription tools, multilingual pipelines, or voice-driven AI agents who need accurate, cost-free, on-premises speech processing are the primary audience.

ClaudeWave Trust Score

100/100

✓ Verified

Passed

✓Open-source license (MIT)
✓Actively maintained (<30d)
✓Healthy fork ratio
✓Clear description
✓Topics declared
✓Mature repo (>1y old)

Last scanned: 6/11/2026

Install in Claude Code / Claude Desktop

Method: pip / Python · torch

Claude Code CLI

claude mcp add funasr -- python -m torch

claude_desktop_config.json (Claude Desktop)

{
  "mcpServers": {
    "funasr": {
      "command": "python",
      "args": ["-m", "torch"]
    }
  }
}

1. Run the command above in your terminal (Claude Code), or paste the JSON config into claude_desktop_config.json (Claude Desktop).

2. Replace any <placeholder> values with your API keys or paths.

3. Restart Claude. The MCP server and its tools appear automatically.

💡 Install first: pip install torch

Casos de uso

AI / ML Media Dev Tools

Sobre el repo

Resumen de MCP Servers

([简体中文](./README_zh.md)|English|[日本語](./README_ja.md)|[한국어](./README_ko.md))

<p align="center">
<a href="https://github.com/modelscope/FunASR"><img src="https://svg-banners.vercel.app/api?type=origin&text1=FunASR🤠&text2=💖%20A%20Fundamental%20End-to-End%20Speech%20Recognition%20Toolkit&width=800&height=210" alt="FunASR"></a>
</p>

<p align="center">
  <strong>Industrial speech recognition toolkit for offline, streaming, and edge deployment.</strong><br>
  <em>ASR · VAD · punctuation · speaker pipelines · emotion and audio-event models · OpenAI-compatible serving</em>
</p>

<p align="center">
  <a href="https://pypi.org/project/funasr/"><img src="https://img.shields.io/pypi/v/funasr" alt="PyPI"></a>
  <a href="https://github.com/modelscope/FunASR"><img src="https://img.shields.io/github/stars/modelscope/FunASR?style=social" alt="Stars"></a>
  <a href="https://pypi.org/project/funasr/"><img src="https://img.shields.io/pypi/dm/funasr" alt="Downloads"></a>
  <a href="https://modelscope.github.io/FunASR/"><img src="https://img.shields.io/badge/docs-online-blue" alt="Docs"></a>
  <a href="https://mcptoplist.com/server/io.github.modelscope%2Ffunasr-mcp"><img src="https://mcptoplist.com/badge/io.github.modelscope%2Ffunasr-mcp.svg" alt="MCP Toplist"></a>
</p>

<p align="center">
<a href="https://trendshift.io/repositories/10479" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10479" alt="modelscope%2FFunASR | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>

<p align="center">
  <a href="#quick-start">Quick Start</a> · <a href="./examples/colab/">Colab</a> · <a href="#benchmark">Benchmark</a> · <a href="./docs/model_selection.md">Model selection</a> · <a href="./docs/migration_from_whisper.md">Migration guide</a> · <a href="./docs/use_case_showcase.md">Use cases</a> · <a href="./docs/community_projects.md">Community integrations</a> · <a href="./docs/deployment_matrix.md">Deployment matrix</a> · <a href="https://www.funasr.com/">Deployment hub</a> · <a href="./docs/troubleshooting.md">Troubleshooting</a> · <a href="#model-zoo">Models</a> · <a href="https://modelscope.github.io/FunASR/agent.html">Agent Integration</a> · <a href="https://modelscope.github.io/FunASR/">Docs</a> · <a href="./CONTRIBUTING.md">Contribute</a>
</p>

---

## Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/modelscope/FunASR/blob/main/examples/colab/funasr_quickstart.ipynb)

No local setup? Open the [Colab quickstart](./examples/colab/) to transcribe a public sample or upload your own audio in a browser.

```bash
# CPU-only installs can use the default PyPI wheels.
pip install torch torchaudio
pip install funasr
```

For GPU quickstarts, install the PyTorch and torchaudio wheels that match your
NVIDIA driver from [pytorch.org](https://pytorch.org/get-started/locally/)
before installing FunASR. After installation, confirm the GPU is visible:

```bash
python - <<'PY'
import torch
print(torch.cuda.is_available())
PY
```

Only use `device="cuda"` when this prints `True`; otherwise use `device="cpu"`
or reinstall PyTorch with the correct CUDA wheel.

**Flagship model — Fun-ASR-Nano** (LLM-ASR for Chinese, English, and Japanese, plus Chinese dialect groups and regional accents; needs a GPU):

```python
from funasr import AutoModel

model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", device="cuda")
result = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav")
print(result[0]["text"])
# 欢迎大家来体验达摩院推出的语音识别模型。
```

For the separate 31-language checkpoint, use
[Fun-ASR-MLT-Nano-2512](https://huggingface.co/FunAudioLLM/Fun-ASR-MLT-Nano-2512).
Language coverage is checkpoint-specific, so Nano and MLT-Nano should be treated as distinct model choices.

On CPU (or for five-language ASR plus emotion and audio-event tags), use
**SenseVoiceSmall**. The pipeline below composes SenseVoiceSmall with FSMN-VAD
and CAM++; diarization is provided by the separate CAM++ model, not by the
SenseVoiceSmall checkpoint:
See the [SenseVoice paper](https://arxiv.org/abs/2407.04051),
[Hugging Face checkpoint](https://huggingface.co/FunAudioLLM/SenseVoiceSmall),
and [GGUF edge checkpoint](https://huggingface.co/FunAudioLLM/SenseVoiceSmall-GGUF).

```python
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")  # use device="cpu" if you don't have a GPU
result = model.generate(
    input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/asr_example_zh.wav",
    batch_size_s=300,
)

# The AutoModel pipeline returns VAD segments with speaker ids and timestamps:
for seg in result[0]["sentence_info"]:
    print(f"[{seg['start']/1000:.1f}s] Speaker {seg['spk']}: {rich_transcription_postprocess(seg['sentence'])}")
```

**Output** — structured text with speaker labels, timestamps, and punctuation:
```
[0.6s] Speaker 0: 欢迎大家来体验达摩院推出的语音识别模型
```

One `AutoModel` pipeline call coordinates the configured ASR, VAD, and speaker
models and returns the combined result.

### Scale & deploy the flagship

At scale, accelerate Fun-ASR-Nano with vLLM (batch processing):

```python
from funasr.auto.auto_model_vllm import AutoModelVLLM

model = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Nano-2512", tensor_parallel_size=1)
results = model.generate(["audio1.wav", "audio2.wav"], language="auto")
```

> **Deploy as API server:** `funasr-server --device cuda` → OpenAI-compatible endpoint at localhost:8000
>
> **Use with AI agents:** [MCP Server](examples/mcp_server/) for Claude/Cursor · [OpenAI API](examples/openai_api/) for LangChain/Dify/AutoGen

### Why FunASR?

Whisper is a single model; **FunASR is a toolkit** — you pick the right model
per job: **Fun-ASR-Nano** (Chinese, English, Japanese, and Chinese dialects;
GPU), **Fun-ASR-MLT-Nano** (31 languages), **SenseVoiceSmall** (five-language
ASR plus emotion and audio events), and **Paraformer** (low-latency streaming).
The table shows toolkit-level capabilities and names the model or pipeline that
provides each one:

| | FunASR (toolkit) | Whisper | Cloud APIs |
|---|---|---|---|
| Top speed | **340x realtime** (Fun-ASR-Nano + vLLM) | 13x realtime | ~1x realtime |
| Speaker ID | ✅ via VAD + CAM++ pipeline | ❌ Needs pyannote | ✅ Extra cost |
| Emotion | ✅ via SenseVoice | ❌ | ❌ |
| Languages | Checkpoint-specific (for example Qwen3-ASR 52, MLT-Nano 31, Nano zh/en/ja) | 57 | Varies |
| Streaming | ✅ WebSocket (Paraformer) | ❌ | ✅ |
| CPU viable | ✅ 17x realtime (SenseVoice) | ❌ Too slow | N/A |
| Self-hosted | ✅ Yes (toolkit: MIT; model licenses vary) | ✅ MIT license | ❌ Cloud only |
| Cost | Free | Free | $0.006/min+ |

Trying FunASR for the first time? Use the [Colab quickstart](./examples/colab/) before setting up a local environment. Choosing a first model? Start with the [model selection guide](./docs/model_selection.md). Planning a switch from Whisper or a cloud ASR provider? Use the [migration guide](./docs/migration_from_whisper.md) and [benchmark example](./examples/migration/) to test representative audio, map features, and roll out safely.

---

## Installation

```bash
pip install funasr
```

<details><summary>From source / Requirements</summary>

```bash
git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./
```
Requirements: Python ≥ 3.8. Install PyTorch + torchaudio first ([pytorch.org](https://pytorch.org/get-started/locally/)), then `pip install funasr`.

</details>

---

## Model Zoo

| Model | Task | Languages | Params | Links |
|-------|------|-----------|--------|-------|
| **Fun-ASR-Nano** | ASR | zh/en/ja + Chinese dialects and accents | 800M | [⭐](https://www.modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) [🤗](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) [GGUF](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-GGUF) |
| **Fun-ASR-MLT-Nano** | ASR | 31 languages | 800M | [⭐](https://www.modelscope.cn/models/FunAudioLLM/Fun-ASR-MLT-Nano-2512) [🤗](https://huggingface.co/FunAudioLLM/Fun-ASR-MLT-Nano-2512) |
| **SenseVoiceSmall** | ASR + emotion + events | zh/en/ja/ko/yue | 234M | [⭐](https://www.modelscope.cn/models/iic/SenseVoiceSmall) [🤗](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) [GGUF](https://huggingface.co/FunAudioLLM/SenseVoiceSmall-GGUF) [paper](https://arxiv.org/abs/2407.04051) |
| **Paraformer-zh** | ASR + timestamps | zh/en | 220M | [⭐](https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗](https://huggingface.co/funasr/paraformer-zh) |
| Paraformer-zh-streaming | Streaming ASR | zh/en | 220M | [⭐](https://modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗](https://huggingface.co/funasr/paraformer-zh-streaming) |
| Qwen3-ASR | ASR, 52 languages | multilingual | 1.7B | [usage](examples/industrial_data_pretraining/qwen3_asr) |
| GLM-ASR-Nano | ASR, 17 languages | multilingual | 1.5B | [usage](examples/industrial_data_pretraining/glm_asr) |
| Whisper-large-v3 | ASR + translation | multilingual | 1550M | [usage](examples/industrial_data_pretraining/whisper) |
| Whisper-large-v3-turbo | ASR + translation | multilingual | 809M | [usage](examples/industrial_data_pretraining/whisper) |
| ct-punc | Punctuation | zh/en | 290M | [⭐](https://modelscope.cn/models/iic/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) |
| fsmn-vad | VAD | zh/en | 0.4M | [⭐](https://modelscope.cn/models/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) |
| cam++ | Speaker diarization | — | 7.2M | [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/ca

Topics

asraudiochineseemotion-recognitionfunasrmcp-servermultilingual-asropenai-compatible-apiparaformerpunctuationpytorchreal-time-asrspeaker-diarizationspeech-recognitionspeech-to-textstreaming-asrtranscriptionvllmvoice-activity-detectionwhisper-alternative

Preguntas frecuentes

Lo que la gente pregunta sobre FunASR

¿Qué es modelscope/FunASR?

modelscope/FunASR es mcp servers para el ecosistema de Claude AI. Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenAI-compatible/MCP serving. Tiene 19.5k estrellas en GitHub y se actualizó por última vez today.

¿Cómo se instala FunASR?

Puedes instalar FunASR clonando el repositorio (https://github.com/modelscope/FunASR) o siguiendo las instrucciones del README en GitHub. ClaudeWave también te ofrece bloques de instalación rápida en esta misma página.

¿Es seguro usar modelscope/FunASR?

Nuestro agente de seguridad ha analizado modelscope/FunASR y le ha asignado un Trust Score de 100/100 (tier: Verified). Revisa el desglose completo de comprobaciones superadas y flags en esta página.

¿Quién mantiene modelscope/FunASR?

modelscope/FunASR es mantenido por modelscope. La última actividad registrada en GitHub es de today, con 2 issues abiertos.

¿Hay alternativas a FunASR?

Sí. En ClaudeWave puedes explorar mcp servers similares en /categories/mcp, ordenados por popularidad o actividad reciente.

Deploy en 1 click

Despliega FunASR en tu cloud

Lleva este repo a producción en minutos. Cada plataforma genera su propio entorno con variables de entorno editables.

Vercel Railway Render

Badge embebible

¿Mantienes este repo? Añade un badge a tu README

Pega el badge en tu README de GitHub para mostrar que está auditado por ClaudeWave. Cada badge enlaza de vuelta a esta página y muestra el Trust Score actual.

Markdown (README)

[![Featured on ClaudeWave](https://claudewave.com/api/badge/modelscope-funasr)](https://claudewave.com/repo/modelscope-funasr)

HTML

<a href="https://claudewave.com/repo/modelscope-funasr"><img src="https://claudewave.com/api/badge/modelscope-funasr" alt="Featured on ClaudeWave: modelscope/FunASR" width="320" height="64" /></a>

Relacionados

Más MCP Servers

Alternativas a FunASR

n8n-io

n8n

today

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

198.3k59.7kTypeScript

MCP ServersaiapisInstall

open-webui

today

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

147k21.4kPython

MCP ServersaillmInstall

google-gemini

gemini-cli

today

An open-source AI agent that brings the power of Gemini directly into your terminal.

106.2k14.3kTypeScript

MCP Serversaiai-agentsInstall

netdata

today

The fastest path to AI-powered full stack observability, even for lean teams.

79.9k6.5kGo

MCP ServersaialertingInstall

koala73

worldmonitor

today

Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface

75.4k11.3kTypeScript

MCP ServersagentaiInstall

D4Vinci

Scrapling

today

🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!

71.5k7.1kPython

MCP Serversaiai-scrapingInstall