Dictating anywhere with NVIDIA open models - Nemotron ASR + Tambourine

Experiments in voice AI dictation

I saw NVIDIA’s announcement on X about their new series of open Nemotron models and ended up reading a blog post from Daily.co about building voice agents with Nemotron and Pipecat.

I’ve been using voice dictation daily since building Tambourine, so I decided to test Nemotron with it. It’s fast for real-time use, and the accuracy is solid, unfortunately you do need a CUDA GPU. Here’s how I set it up.

My Setup

Tambourine is the desktop app that ties everything together. It captures audio, sends it through the pipeline, and types the result wherever my cursor is.

[Microphone] -> [Nemotron ASR] -> [LLM] -> [Insert text at cursor]

There are great options for AI voice dictation like Wispr Flow or Willow, but Tambourine is open source and allows you to have full control over the pipeline: swap STT/ASR and LLM, customizable formatting prompts, and import dictionaries for different domains.

As Tambourine is built on top of Pipecat, adding Nemotron as an ASR provider was straightforward.

I’m running this on a desktop with an RTX 4080 Super. Nemotron ASR requires a GPU with CUDA support, which means Windows or Linux only. macOS users can still use Tambourine with cloud STT/ASR providers.

You can use any LLM provider compatible with Pipecat, I tend to use Cerebras with gpt-oss-120b for fast cloud inference.

Setting Up Nemotron ASR

To run only the ASR model via Docker, I forked PipeCat’s example to strip it down to an ASR-only image.

git clone https://github.com/kstonekuan/nemotron-january-2026
cd nemotron-january-2026

docker build -f Dockerfile.asr -t nemotron-asr .

docker run --gpus all -p 9765:9765 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  nemotron-asr

Once running, you can check the logs in docker.

Setting Up Tambourine

See the Tambourine repo for full setup instructions.

To use Nemotron, set NEMOTRON_ASR_URL in .env and select Nemotron as the STT provider in the app settings.

NEMOTRON_ASR_URL=ws://localhost:9765/

If running the Tambourine server in docker as well you might have to use

NEMOTRON_ASR_URL=ws://host.docker.internal:9765/

Wrap Up

I’ve really enjoyed exploring this stack, and look forward to trying out new models as they come out. AI-powered voice dictation has changed how I interact with all my apps. Especially when it comes to using AI tools like Claude Code or ChatGPT, I find myself reaching for the keyboard less and just thinking out loud, trusting Tambourine to capture my intent on top of the words.

Quick Links: