speech-recognition
This Claude Code skill provides Swift implementations for transcribing speech to text on iOS using Apple's Speech framework, covering both modern SpeechAnalyzer and SFSpeechRecognizer APIs. Use it when building live microphone transcription with AVAudioEngine, processing pre-recorded audio files, managing speech and microphone permissions, selecting between on-device and server-backed recognition, or adopting async result streams on iOS 10 and later.
git clone --depth 1 https://github.com/dpearson2699/swift-ios-skills /tmp/speech-recognition && cp -r /tmp/speech-recognition/skills/speech-recognition ~/.claude/skills/speech-recognitionSKILL.md
# Speech Recognition
Transcribe live and pre-recorded audio to text using Apple's Speech framework.
Covers `SpeechAnalyzer` / `SpeechTranscriber` (iOS 26+) and
`SFSpeechRecognizer` (iOS 10+). Targets Swift 6.3 / iOS 26+ while preserving
fallback guidance for apps that support older OS versions.
**Scope boundary:** Use this skill for speech-to-text recognition, speech
authorization, microphone capture plumbing, and result handling. Hand off text
analysis, language identification after transcription, sentiment, embeddings,
and translation to `natural-language`; hand off audio playback UI to `avkit`;
hand off summarization or generation over transcripts to `apple-on-device-ai`.
## Contents
- [SpeechAnalyzer Strategy (iOS 26+)](#speechanalyzer-strategy-ios-26)
- [SFSpeechRecognizer Setup](#sfspeechrecognizer-setup)
- [Authorization](#authorization)
- [Live Microphone Transcription](#live-microphone-transcription)
- [Pre-Recorded Audio File Recognition](#pre-recorded-audio-file-recognition)
- [On-Device vs Server Recognition](#on-device-vs-server-recognition)
- [Handling Results](#handling-results)
- [Common Mistakes](#common-mistakes)
- [Review Checklist](#review-checklist)
- [References](#references)
## SpeechAnalyzer Strategy (iOS 26+)
Use `SpeechAnalyzer` for modern iOS 26+ speech analysis, especially long-form
recordings, live transcription, time-indexed transcripts, and fully on-device
flows. Keep `SFSpeechRecognizer` for iOS 10+ deployment targets, server-backed
locale coverage, or existing callback/delegate implementations.
Read [SpeechAnalyzer patterns](references/speechanalyzer-patterns.md) when
implementing an iOS 26+ transcription pipeline, model asset handling, volatile
results, or file/buffer examples.
### SpeechAnalyzer setup checklist
1. Choose the module:
- `SpeechTranscriber` for the newer general-purpose on-device model.
- `DictationTranscriber` when `SpeechTranscriber` is unavailable for the
current device or locale and dictation-compatible support is acceptable.
- `SpeechDetector` only in conjunction with a transcriber when voice
activity detection is worth the accuracy/power tradeoff.
2. Check support before creating the session:
- `SpeechTranscriber.isAvailable`
- `SpeechTranscriber.supportedLocale(equivalentTo:)`
- `SpeechTranscriber.installedLocales` / `supportedLocales` when showing
language choices.
3. Pick a documented preset:
- `.transcription` for basic accurate transcription.
- `.progressiveTranscription` for live UI updates.
- `.timeIndexedProgressiveTranscription` when playback highlighting needs
`audioTimeRange`.
4. Install required assets with `AssetInventory.assetInstallationRequest`.
5. Convert live audio buffers to
`SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:)` before yielding
`AnalyzerInput`.
6. Consume module results from their `AsyncSequence` in a separate task.
7. Finish explicitly with `finalizeAndFinish(through:)`,
`finalizeAndFinishThroughEndOfInput()`, or `cancelAndFinishNow()`.
Do not use an `offlineTranscription` preset; Apple does not document one.
Finishing an `AsyncStream` input sequence does not finish the analyzer session.
## SFSpeechRecognizer Setup
### Creating a recognizer with locale
```swift
import Speech
// Default locale (user's current language)
let recognizer = SFSpeechRecognizer()
// Specific locale
let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))
// Check if recognition is available for this locale
guard let recognizer, recognizer.isAvailable else {
print("Speech recognition not available")
return
}
```
### Monitoring availability changes
```swift
final class SpeechManager: NSObject, SFSpeechRecognizerDelegate {
private let recognizer = SFSpeechRecognizer()!
override init() {
super.init()
recognizer.delegate = self
}
func speechRecognizer(
_ speechRecognizer: SFSpeechRecognizer,
availabilityDidChange available: Bool
) {
// Update UI — disable record button when unavailable
}
}
```
## Authorization
Request **both** speech recognition and microphone permissions before starting
live transcription. Add these keys to `Info.plist`:
- `NSSpeechRecognitionUsageDescription`
- `NSMicrophoneUsageDescription`
```swift
import Speech
import AVFoundation
func requestPermissions() async -> Bool {
let speechStatus = await withCheckedContinuation { continuation in
SFSpeechRecognizer.requestAuthorization { status in
continuation.resume(returning: status)
}
}
guard speechStatus == .authorized else { return false }
let micStatus: Bool
if #available(iOS 17, *) {
micStatus = await AVAudioApplication.requestRecordPermission()
} else {
micStatus = await withCheckedContinuation { continuation in
AVAudioSession.sharedInstance().requestRecordPermission { granted in
continuation.resume(returning: granted)
}
}
}
return micStatus
}
```
## Live Microphone Transcription
The standard pattern: `AVAudioEngine` captures microphone audio → buffers are
appended to `SFSpeechAudioBufferRecognitionRequest` → results stream in.
```swift
import Speech
import AVFoundation
final class LiveTranscriber {
private let recognizer = SFSpeechRecognizer(locale: Locale(identifier: "en-US"))!
private let audioEngine = AVAudioEngine()
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var recognitionTask: SFSpeechRecognitionTask?
func startTranscribing() throws {
// Cancel any in-progress task
recognitionTask?.cancel()
recognitionTask = nil
// Configure audio session
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)Discover and configure Bluetooth and Wi-Fi accessories using AccessorySetupKit. Use when presenting a privacy-preserving accessory picker, defining discovery descriptors for BLE or Wi-Fi devices, handling accessory session events, migrating from CoreBluetooth permission-based scanning, or setting up accessories without requiring broad Bluetooth permissions.
Implement, review, or improve Live Activities and Dynamic Island experiences in iOS apps using ActivityKit. Use when building real-time updating widgets for the Lock Screen and Dynamic Island — delivery tracking, sports scores, ride-sharing status, workout timers, media playback, or any time-sensitive information that updates in real time. Also use when working with ActivityKit, ActivityAttributes, Activity lifecycle (request/update/end), Dynamic Island layouts (compact/minimal/expanded), push-to-update Live Activities, or Lock Screen live widgets.
Measure ad effectiveness with privacy-preserving attribution using AdAttributionKit. Use when registering ad impressions, handling attribution postbacks, updating conversion values, implementing re-engagement attribution, configuring publisher or advertiser apps, or replacing SKAdNetwork with AdAttributionKit for ad measurement.
Implement AlarmKit alarms and countdown timers for iOS and iPadOS with Lock Screen, Dynamic Island, StandBy, and paired Apple Watch system UI. Covers AlarmManager scheduling, AlarmAttributes and AlarmPresentation, AlarmButton stop and snooze actions, authorization, state observation, countdown widget-extension handoff, and Live Activity integration. Use when building wake-up alarms, countdown timers, or alarm-style alerts that need Apple's system alarm experience.
Build iOS App Clips with invocation URLs, App Clip Codes, NFC, QR codes, Safari banners, Maps, Messages, target setup, App Store Connect experiences, size/capability constraints, NSUserActivity routing, SKOverlay promotion, App Group/keychain handoff, ephemeral notifications, location confirmation, and full-app migration. Use when creating App Clips or wiring App Clip invocation, experience configuration, or full-app handoff.
Implement App Intents for Siri, Shortcuts, Spotlight, widgets, Control Center, and Apple Intelligence on iOS. Covers AppIntent actions, AppEntity and EntityQuery models, AppShortcutsProvider phrases, IndexedEntity Spotlight indexing, WidgetConfigurationIntent, SnippetIntent, and assistant schemas. Use when exposing app actions or entities to system surfaces.
Optimize App Store product pages for search visibility and conversion. Use for App Store Optimization (ASO), keyword research, app name/subtitle/keyword-field strategy, conversion-focused descriptions and promotional text, screenshot captions and ordering, Custom Product Pages with assigned search keywords, In-App Events, Product Page Optimization tests, localized metadata, ratings/review strategy, and in-app review prompt timing with RequestReviewAction or AppStore.requestReview. Also use when routing ASO vs App Store review, privacy/ATT, or StoreKit implementation boundaries.
Prepare for App Store review and prevent rejections. Covers App Store review guidelines, app rejection reasons, PrivacyInfo.xcprivacy privacy manifest requirements, required API reason codes, in-app purchase IAP and StoreKit rules, App Store Guidelines compliance, ATT App Tracking Transparency, EU DMA Digital Markets Act, HIG compliance checklist, app submission preparation, review preparation, metadata requirements, entitlements, widgets, and Live Activities review rules. Use when preparing for App Store submission, fixing rejection reasons, auditing privacy manifests, implementing ATT consent flow, configuring StoreKit IAP, or checking HIG compliance.