An Offline Meeting Transcriber Built on Swamp | matgreten.dev
← home
I take "handwritten" meeting notes in Obsidian when I can. Some of these meetings I'd want transcribed as well, but they are high trust, private, and sensitive — the exact kinds of meetings you don't invite strangers to. So the notes and recordings shouldn't leave my laptop either to be processed or stored by strangers.
On a whim, I had Claude Code build offline-meeting-transcriber: audio in, Granola-style markdown out, nothing leaves the MacBook. To be clear, I didn't write any of this by hand — Claude Code did the bash, the Python, and later the swamp models, while I steered, reviewed diffs, and made the design calls. The first version was a bash wrapper around four Python files. It worked. It was also a dead end the moment I wanted to use any piece of it for something else.
I had it migrated to swamp for the same reason I migrated ADW — but this time the lever was composability, not observability. The pieces wanted to be reused. Bash had them welded together.
What it Does
bin/meeting-process samples/standup.m4a "Engineering Standup" runs three stages locally:
mlx-whisper transcribes the audio to segment-level JSON. --no-speech-threshold 0.6 is in there because without it Whisper hallucinates "Thanks for watching!" on every silent stretch.
pyannote/speaker-diarization-3.1 tags each Whisper segment with the speaker whose timeline overlaps it the most. Segments reach the LLM as [SPEAKER_00]: …, [SPEAKER_01]: …. Label stability inside a meeting matters more than getting names right — renaming SPEAKER_00 → Alice is a one-line sed after the fact.
qwen3.6:35b-a3b-nvfp4 on local Ollama summarizes the transcript into a Granola-style note. Chunked at ~2500 tokens with 200-token overlap. The token estimator is len(text.split()) * 1.3 — no tiktoken, no extra dep, accurate enough for chunk boundaries.
Final markdown lands in ~/Obsidian/Meetings/Unsorted/ for now.
Why Swamp
Three things in the bash version were obviously reusable and obviously stuck:
Transcription. mlx_whisper doesn't care that it's transcribing a meeting. It would transcribe a podcast, an interview, a voice memo I left myself in the car. The bash script only knew about meetings.
Diarization. Same model, same merge logic, same [SPEAKER_xx]: … output. Useful anywhere I have multi-speaker audio.
Summarization with a Granola-style prompt. This one is meeting-shaped, but the chunking + merge infrastructure underneath it isn't. Different prompt, different downstream consumer, same machinery.
In bash, none of those were components. They were lines in a single script, with implicit data passing through filenames in out/, and the only way to "reuse" any of them was to copy-paste the script and edit it. That's the path I've been on for years and the path I keep regretting.
So I had the pipeline broken out into three swamp models under models/@mgreten/, all published on swamp.club:
@mgreten/mlx-whisper — wraps the binary, exposes transcribe(audioPath), output is a typed transcript artifact.
@mgreten/pyannote-diarizer — takes the audio and the transcript artifact, returns a diarized artifact. Soft-fails when the HF token is missing; the workflow continues on the undiarized transcript. A bad diarization never blocks the note.
@mgreten/meeting-summarizer — takes the transcript artifact and a model tag, returns markdown plus a separate write_note method that lands the file in the vault. The combine_notes method (handwritten + analysis merge) lives here too.
Each one has a typed input, a typed output, and exactly one job. The output of each step is a data artifact the next step pulls by name, not a file path I have to remember to clean up:
- name: summarize<br>steps:<br>- name: run-summarize<br>task:<br>type: model_method<br>modelIdOrName: meeting-summarizer<br>methodName: summarize<br>inputs:<br>transcriptJson: ${{ data.latest("pyannote-diarizer", inputs.noteName).attributes.transcriptJson }}<br>instanceName: ${{ inputs.noteName }}<br>dependsOn:<br>- job: diarize<br>condition:<br>type: succeeded
The workflow YAML is one way of wiring those three models. It's not the only way. That's the whole point.
What Composability Actually Buys Me
Hermes is the obvious next caller. When I want my agent to be able to transcribe a recording I just dropped into it, I don't expose bin/meeting-process to it as a shell command. I expose the model. Same typed inputs, same typed outputs, no shell quoting, no parsing stdout. The bash wrapper still exists for me at the terminal — it shells out to swamp now and gives me a progress counter — but it's no longer the only entry point.
The watch-folder daemon is the next one after that. v2 territory, not built. But when I build it, it's calling the workflow, not re-implementing it.
This is the part bash couldn't give me. Not "the pipeline is observable." The pipeline is separable. The day I want to reuse the diarizer in something that has nothing to do with meetings, I'm not...