Skip to main content
ClaudeWave

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

MCP Servers17.9k stars1.8k forksPythonMITUpdated yesterday
Editor's note

FunASR is a self-hosted speech recognition toolkit that transcribes audio files and live streams across 50-plus languages, combining voice activity detection (using the FSMN-VAD model), speaker diarization (CAM++ model), punctuation restoration, and emotion classification (happy, sad, angry) in a single Python call. The toolkit's flagship SenseVoice-Small model processes audio at 170 times realtime on GPU and 17 times realtime on CPU, which the project benchmarks as 13 times faster than Whisper-large-v3 on 192 minutes of long-form audio. It connects to Claude through a dedicated MCP server in the examples/mcp_server directory, allowing Claude-based agents and tools like Cursor to invoke transcription directly. A built-in `funasr-server` CLI launches an OpenAI-compatible REST endpoint on localhost:8000, which integrates with LangChain, Dify, and AutoGen pipelines. The newer Fun-ASR-Nano variant pairs a SenseVoice encoder with a Qwen3-0.6B decoder and supports vLLM acceleration for batch workloads. Developers building meeting transcription tools, multilingual pipelines, or voice-driven AI agents who need accurate, cost-free, on-premises speech processing are the primary audience.

ClaudeWave Trust Score
100/100
Verified
Passed
  • Open-source license (MIT)
  • Actively maintained (<30d)
  • Healthy fork ratio
  • Clear description
  • Topics declared
  • Mature repo (>1y old)
Last scanned: 6/11/2026
Install in Claude Code / Claude Desktop
Method: pip / Python · torch
Claude Code CLI
claude mcp add funasr -- python -m torch
claude_desktop_config.json (Claude Desktop)
{
  "mcpServers": {
    "funasr": {
      "command": "python",
      "args": ["-m", "torch"]
    }
  }
}
1. Run the command above in your terminal (Claude Code), or paste the JSON config into claude_desktop_config.json (Claude Desktop).
2. Replace any <placeholder> values with your API keys or paths.
3. Restart Claude. The MCP server and its tools appear automatically.
💡 Install first: pip install torch
Use cases

MCP Servers overview

([简体中文](./README_zh.md)|English|[日本語](./README_ja.md)|[한국어](./README_ko.md))

<p align="center">
<a href="https://github.com/modelscope/FunASR"><img src="https://svg-banners.vercel.app/api?type=origin&text1=FunASR🤠&text2=💖%20A%20Fundamental%20End-to-End%20Speech%20Recognition%20Toolkit&width=800&height=210" alt="FunASR"></a>
</p>

<p align="center">
  <strong>Industrial speech recognition. 170x faster than Whisper. 50+ languages.</strong><br>
  <em>Speaker diarization · Emotion detection · Streaming · One API call</em>
</p>

<p align="center">
  <a href="https://pypi.org/project/funasr/"><img src="https://img.shields.io/pypi/v/funasr" alt="PyPI"></a>
  <a href="https://github.com/modelscope/FunASR"><img src="https://img.shields.io/github/stars/modelscope/FunASR?style=social" alt="Stars"></a>
  <a href="https://pypi.org/project/funasr/"><img src="https://img.shields.io/pypi/dm/funasr" alt="Downloads"></a>
  <a href="https://modelscope.github.io/FunASR/"><img src="https://img.shields.io/badge/docs-online-blue" alt="Docs"></a>
</p>

<p align="center">
<a href="https://trendshift.io/repositories/10479" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10479" alt="modelscope%2FFunASR | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</p>

<p align="center">
  <a href="#quick-start">Quick Start</a> · <a href="./examples/colab/">Colab</a> · <a href="#benchmark">Benchmark</a> · <a href="./docs/model_selection.md">Model selection</a> · <a href="./docs/migration_from_whisper.md">Migration guide</a> · <a href="./docs/use_case_showcase.md">Use cases</a> · <a href="./docs/deployment_matrix.md">Deployment matrix</a> · <a href="#model-zoo">Models</a> · <a href="https://modelscope.github.io/FunASR/agent.html">Agent Integration</a> · <a href="https://modelscope.github.io/FunASR/">Docs</a> · <a href="./CONTRIBUTING.md">Contribute</a>
</p>

---

## Quick Start

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/modelscope/FunASR/blob/main/examples/colab/funasr_quickstart.ipynb)

No local setup? Open the [Colab quickstart](./examples/colab/) to transcribe a public sample or upload your own audio in a browser.

```bash
pip install torch torchaudio
pip install funasr
```

```python
from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", vad_model="fsmn-vad", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav")
```

**Output** — structured text with speaker labels, timestamps, and punctuation:
```
[00:00.4 → 00:03.8] Speaker 0: Let's discuss the Q3 plan.
[00:04.2 → 00:07.1] Speaker 1: Sounds good. I have three points.
[00:07.5 → 00:12.3] Speaker 0: Go ahead. We have 30 minutes.
```

That's it. **One model, one call** — VAD segmentation, speech recognition, punctuation, speaker diarization all happen automatically.

### LLM-powered ASR: Fun-ASR-Nano

For highest accuracy across 31 languages (including Chinese dialects), use [Fun-ASR-Nano](https://github.com/FunAudioLLM/Fun-ASR) — an LLM-based ASR combining SenseVoice encoder with Qwen3-0.6B decoder:

```python
from funasr import AutoModel

model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", vad_model="fsmn-vad", device="cuda")
result = model.generate(input="meeting.wav")
```

With vLLM acceleration (16x faster, batch processing):

```python
from funasr.auto.auto_model_vllm import AutoModelVLLM

model = AutoModelVLLM(model="FunAudioLLM/Fun-ASR-Nano-2512", tensor_parallel_size=1)
results = model.generate(["audio1.wav", "audio2.wav"], language="auto")
```

> **Deploy as API server:** `funasr-server --device cuda` → OpenAI-compatible endpoint at localhost:8000
>
> **Use with AI agents:** [MCP Server](examples/mcp_server/) for Claude/Cursor · [OpenAI API](examples/openai_api/) for LangChain/Dify/AutoGen

### Why FunASR?

| | FunASR | Whisper | Cloud APIs |
|---|---|---|---|
| Speed | **170x realtime** | 13x realtime | ~1x realtime |
| Speaker ID | ✅ Built-in | ❌ Needs pyannote | ✅ Extra cost |
| Emotion | ✅ Happy/Sad/Angry | ❌ | ❌ |
| Languages | 50+ | 57 | Varies |
| Streaming | ✅ WebSocket | ❌ | ✅ |
| vLLM Acceleration | ✅ 2-3x faster | ❌ | N/A |
| Self-hosted | ✅ MIT license | ✅ MIT license | ❌ Cloud only |
| Cost | Free | Free | $0.006/min+ |
| CPU viable | ✅ 17x realtime | ❌ Too slow | N/A |

Trying FunASR for the first time? Use the [Colab quickstart](./examples/colab/) before setting up a local environment. Choosing a first model? Start with the [model selection guide](./docs/model_selection.md). Planning a switch from Whisper or a cloud ASR provider? Use the [migration guide](./docs/migration_from_whisper.md) and [benchmark example](./examples/migration/) to test representative audio, map features, and roll out safely.

---

<a name="benchmark"></a>

## Benchmark

> 184 long-form audio files (192 min). [Full report →](https://modelscope.github.io/FunASR/benchmark.html)

| Model | GPU Speed | CPU Speed | vs Whisper-large-v3 |
|-------|-----------|-----------|-------------------|
| **SenseVoice-Small** | **170x** realtime | **17x** realtime | 🚀 **13x faster** |
| **Paraformer-Large** | **120x** realtime | **15x** realtime | 🚀 **9x faster** |
| Whisper-large-v3-turbo | 46x realtime | ❌ | 3.4x faster |
| **Fun-ASR-Nano** | 17x realtime | 3.6x realtime | 1.3x faster |
| Whisper-large-v3 | 13x realtime | ❌ | baseline |

> **Key takeaway:** FunASR models run on CPU faster than Whisper runs on GPU.

---

## What's new

- 2026/05/24: **vLLM Inference Engine** — 2-3x faster LLM decoding for Fun-ASR-Nano. Streaming WebSocket service with VAD + Speaker Diarization. [Guide →](docs/vllm_guide.md)
- 2026/05/24: **Dynamic VAD** — adaptive silence threshold (default on). Short sentences stay intact, long segments get auto-split. [Details →](docs/vllm_guide.md#附录dynamicstreamingvad)
- 2026/05/24: **v1.3.3** — `funasr-server` CLI, OpenAI-compatible API, MCP Server for AI agents. `pip install --upgrade funasr`
- 2026/05/20: Added Qwen3-ASR (0.6B/1.7B) — 52 languages, auto detection. [usage](examples/industrial_data_pretraining/qwen3_asr)
- 2026/05/20: Added GLM-ASR-Nano (1.5B) — 17 languages, dialect support. [usage](examples/industrial_data_pretraining/glm_asr)
- 2026/05/19: Fun-ASR-Nano and SenseVoice now support speaker diarization.
- 2025/12/15: [Fun-ASR-Nano-2512](https://github.com/FunAudioLLM/Fun-ASR) — 31 languages, tens of millions of hours training.

<details><summary>Older</summary>

- 2024/10/10: Whisper-large-v3-turbo support added.
- 2024/07/04: [SenseVoice](https://github.com/FunAudioLLM/SenseVoice) — ASR + emotion + audio events.
- 2024/01/30: FunASR 1.0 released.

</details>

---

## Installation

```bash
pip install funasr
```

<details><summary>From source / Requirements</summary>

```bash
git clone https://github.com/modelscope/FunASR.git && cd FunASR
pip install -e ./
```
Requirements: Python ≥ 3.8. Install PyTorch + torchaudio first ([pytorch.org](https://pytorch.org/get-started/locally/)), then `pip install funasr`.

</details>

---

<a name="model-zoo"></a>

## Model Zoo

| Model | Task | Languages | Params | Links |
|-------|------|-----------|--------|-------|
| **Fun-ASR-Nano** | ASR + timestamps | 31 languages | 800M | [⭐](https://www.modelscope.cn/models/FunAudioLLM/Fun-ASR-Nano-2512) [🤗](https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512) |
| **SenseVoiceSmall** | ASR + emotion + events | zh/en/ja/ko/yue | 234M | [⭐](https://www.modelscope.cn/models/iic/SenseVoiceSmall) [🤗](https://huggingface.co/FunAudioLLM/SenseVoiceSmall) |
| **Paraformer-zh** | ASR + timestamps | zh/en | 220M | [⭐](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) [🤗](https://huggingface.co/funasr/paraformer-zh) |
| Paraformer-zh-streaming | Streaming ASR | zh/en | 220M | [⭐](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online/summary) [🤗](https://huggingface.co/funasr/paraformer-zh-streaming) |
| Qwen3-ASR | ASR, 52 languages | multilingual | 1.7B | [usage](examples/industrial_data_pretraining/qwen3_asr) |
| GLM-ASR-Nano | ASR, 17 languages | multilingual | 1.5B | [usage](examples/industrial_data_pretraining/glm_asr) |
| Whisper-large-v3 | ASR + translation | multilingual | 1550M | [usage](examples/industrial_data_pretraining/whisper) |
| Whisper-large-v3-turbo | ASR + translation | multilingual | 809M | [usage](examples/industrial_data_pretraining/whisper) |
| ct-punc | Punctuation | zh/en | 290M | [⭐](https://modelscope.cn/models/damo/punc_ct-transformer_cn-en-common-vocab471067-large/summary) [🤗](https://huggingface.co/funasr/ct-punc) |
| fsmn-vad | VAD | zh/en | 0.4M | [⭐](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) [🤗](https://huggingface.co/funasr/fsmn-vad) |
| cam++ | Speaker diarization | — | 7.2M | [⭐](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) [🤗](https://huggingface.co/funasr/campplus) |
| emotion2vec+large | Emotion recognition | — | 300M | [⭐](https://modelscope.cn/models/iic/emotion2vec_plus_large/summary) [🤗](https://huggingface.co/emotion2vec/emotion2vec_plus_large) |

---

## Usage

> Full examples with parameter docs: [Tutorial →](https://modelscope.github.io/FunASR/tutorial.html)

```python
from funasr import AutoModel

# Chinese production (VAD + ASR + punctuation + speaker)
model = AutoModel(model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", device="cuda")
result = model.generate(input="meeting.wav", hotword="关键词 20")

# 31 languages with timestamps
model = AutoModel(model="FunAudioLLM/Fun-ASR-Nano-2512", hub="hf", trust_remote_code=True,
                  vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda")
result = model.generate(input="audio.wav", batch_size=1)

# Streaming real-time
model = AutoModel(model="paraformer-zh-streaming"
asraudiochineseemotion-recognitionmcp-servermultilingual-asropenai-compatible-apiparaformerpunctuationpytorchreal-timespeaker-diarizationspeech-recognitionspeech-to-textstreaming-asrtranscriptionvadvllmvoice-activity-detectionwhisper-alternative

What people ask about FunASR

What is modelscope/FunASR?

+

modelscope/FunASR is mcp servers for the Claude AI ecosystem. Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API. It has 17.9k GitHub stars and was last updated yesterday.

How do I install FunASR?

+

You can install FunASR by cloning the repository (https://github.com/modelscope/FunASR) or following the README instructions on GitHub. ClaudeWave also provides quick install blocks on this page.

Is modelscope/FunASR safe to use?

+

Our security agent has analyzed modelscope/FunASR and assigned a Trust Score of 100/100 (tier: Verified). See the full breakdown of passed checks and flags on this page.

Who maintains modelscope/FunASR?

+

modelscope/FunASR is maintained by modelscope. The last recorded GitHub activity is from yesterday, with 16 open issues.

Are there alternatives to FunASR?

+

Yes. On ClaudeWave you can browse similar mcp servers at /categories/mcp, sorted by popularity or recent activity.

Deploy FunASR to your cloud

Ship this repo to production in minutes. Each platform spins up its own environment with editable env vars.

Maintain this repo? Add a badge to your README

Drop the badge into your GitHub README to show it's tracked on ClaudeWave. Each badge links back to this page and reflects the live Trust Score.

Featured on ClaudeWave: modelscope/FunASR
[![Featured on ClaudeWave](https://claudewave.com/api/badge/modelscope-funasr)](https://claudewave.com/repo/modelscope-funasr)
<a href="https://claudewave.com/repo/modelscope-funasr"><img src="https://claudewave.com/api/badge/modelscope-funasr" alt="Featured on ClaudeWave: modelscope/FunASR" width="320" height="64" /></a>

More MCP Servers

FunASR alternatives