Skill342 repo starsupdated 2d ago

interview-transcription

The interview-transcription skill provides workflows for processing audio and video recordings into timestamped transcripts with speaker labels, extracting quotes for fact-checking, and organizing interview source data. Use it when transcribing recorded interviews, extracting attributed quotes, managing source databases, or converting recordings into publishable material. For pre-interview question design and consent procedures, activate the companion interview-prep skill instead.

View source Repository: claude-skills-journalism

Install in Claude Code

Copy

git clone --depth 1 https://github.com/jamditis/claude-skills-journalism /tmp/interview-transcription && cp -r /tmp/interview-transcription/journalism-core/skills/interview-transcription ~/.claude/skills/interview-transcription

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Interview transcription and management

Practical workflows for journalists managing interviews from preparation through publication.

## When to activate

- Preparing questions for an interview
- Processing audio/video recordings
- Creating or managing transcripts
- Organizing notes from multiple sources
- Building a source relationship database
- Generating timestamped quotes for fact-checking
- Converting recordings to publishable quotes

## Recording setup for transcription

For pre-interview research, question design, attribution agreements, and consent scripts, use the **interview-prep** skill. The notes here cover only the recording configuration that affects transcription quality.

```python
# Standard recording configuration for clean transcription
RECORDING_SETTINGS = {
    'format': 'wav',           # Lossless for transcription
    'sample_rate': 16000,      # Whisper resamples to 16k anyway; 16k saves disk
    'channels': 1,             # Mono is fine for speech; stereo only if mics are positionally distinct
    'backup': True,            # Always run a backup recorder
}

# File naming convention
# YYYY-MM-DD_source-lastname_topic.wav
# Example: 2026-05-08_smith_budget-hearing.wav
```

**Two-device rule.** Always record on two devices. Phone as backup minimum. If using a wireless lav mic, the recorder built into the lav unit is one device; the phone running a backup app is the second.

**Mono is preferred** unless each speaker has their own dedicated microphone routed to a distinct channel. Stereo with both speakers bleeding into both channels is worse for diarization than clean mono.

## Transcription workflows

### Automated transcription pipeline

Vanilla OpenAI Whisper transcribes audio to text but does **not** assign speaker labels. To get diarized output ("Speaker 1:" / "Speaker 2:" / etc.) you need a tool that combines Whisper with a diarization model — typically **WhisperX** (`m-bain/whisperX`), which wraps faster-whisper transcription with pyannote.audio diarization and produces word-level timestamps with speaker IDs in one pass.

```python
from pathlib import Path
import subprocess
import json

def transcribe_interview(
    audio_path: str,
    output_dir: str = "./transcripts",
    diarize: bool = True,
    hf_token: str | None = None,
    min_speakers: int = 2,
    max_speakers: int = 2,
) -> dict:
    """
    Transcribe an interview using WhisperX (Whisper + pyannote diarization).
    Returns a transcript with word-level timestamps and speaker labels.

    Diarization needs a Hugging Face token with access to the pyannote
    speaker-diarization-3.1 model. Accept the model EULA at
    huggingface.co/pyannote/speaker-diarization-3.1 once, then pass the token.
    """
    Path(output_dir).mkdir(exist_ok=True)

    cmd = [
        'whisperx', audio_path,
        '--model', 'large-v3',
        '--output_format', 'json',
        '--output_dir', output_dir,
        '--language', 'en',
        '--compute_type', 'int8',     # CPU-friendly; use 'float16' on GPU
        '--min_speakers', str(min_speakers),
        '--max_speakers', str(max_speakers),
    ]

    if diarize:
        cmd.append('--diarize')
        if hf_token:
            cmd += ['--hf_token', hf_token]

    subprocess.run(cmd, check=True, capture_output=True)

    json_path = Path(output_dir) / f"{Path(audio_path).stem}.json"
    with open(json_path) as f:
        return json.load(f)

def format_for_editing(transcript: dict) -> str:
    """Convert to journalist-friendly format with timestamps."""
    lines = []
    for segment in transcript.get('segments', []):
        timestamp = format_timestamp(segment['start'])
        text = segment['text'].strip()
        lines.append(f"[{timestamp}] {text}")
    return '\n\n'.join(lines)

def format_timestamp(seconds: float) -> str:
    """Convert seconds to HH:MM:SS format."""
    h = int(seconds // 3600)
    m = int((seconds % 3600) // 60)
    s = int(seconds % 60)
    return f"{h:02d}:{m:02d}:{s:02d}"
```

**Falling back to plain Whisper.** If diarization is overkill or you can't get a Hugging Face token, drop the `--diarize` flag — the model still produces accurate timestamped transcription and you label speakers manually based on context. `faster-whisper` (CTranslate2 backend) is the speed-optimized variant and works the same way at the CLI. `whisper.cpp` is the C++ port for resource-constrained machines (Raspberry Pi, older laptops); it doesn't include diarization but runs the small/medium models on CPU comfortably.

### Manual transcription template

For sensitive interviews or when AI transcription fails:

```markdown
## Transcript: [Source] - [Date]

**Recording file**: [filename]
**Duration**: [XX:XX]
**Transcribed by**: [name]
**Verified against recording**: [ ] Yes / [ ] No

---

[00:00:15] **Q**: [Your question]

[00:00:45] **A**: [Source response - verbatim, including ums, pauses noted as (...)]

[00:01:30] **Q**: [Follow-up]

[00:01:42] **A**: [Response]

---

## Notes
- [Anything not captured in audio: gestures, documents shown, etc.]

## Potential quotes
- [00:01:42] "Quote that stands out" - context: [why it matters]
```

## Quote extraction and verification

### Pull quotes workflow

```python
from dataclasses import dataclass
from typing import Optional
import re

@dataclass
class Quote:
    text: str
    timestamp: str
    speaker: str
    context: str
    verified: bool = False
    used_in: Optional[str] = None

class QuoteBank:
    """Manage quotes from interview transcripts."""

    def __init__(self):
        self.quotes = []

    def extract_quote(self, transcript: str, start_time: str,
                      end_time: str, speaker: str, context: str) -> Quote:
        """Extract and store a quote with metadata."""
        # Pull text between timestamps
        pattern = rf'\[{re.escape(start_time)}\](.+?)(?=\[\d|$)'
        match = re.search(pattern, transcript, re.DOTALL)

        if match:
            text = match.group(1).strip()