I don’t know why I started doing this today, but since it worked … I’m a bit thrilled.
The repo to reference is for Whisper.NET — the .NET/C# implementation of the OpenAI Whisper voice transcription technology.
Recipe
First off, you want to use this program as the foundation for your overall destination. Let’s get going:
1/ Make a .NET console project in your directory of choice
% dotnet new console
2/ Then you want to install a few packages
% dotnet add package Whisper.net
% dotnet add package Whisper.net.Runtime.CoreML
3/ Next you want to clone the repo for Whisper.cpp to create the special modelc
file you need to run accelerated on your Arm Mac. Those instructions are all here.
- Make a Python environment
- Install cool ML packages
- Generate a model appropriate for M-series Macs that is GPU-enhanced
% cd <to that repo locally>
% conda create -n py310-whisper python=3.10 -y
% conda activate py310-whisper
% pip install ane_transformers
% pip install openai-whisper
% pip install coremltools
% models/generate-coreml-model.sh base.en
Awesome. You should have been able to generate a folder
— ./models/ggml-base.en-encoder.mlmodelc
4/ You’ll need to download a model from HuggingFace:
% bash ./models/download-ggml-model.sh base.en
5/ Keep in mind that Whisper needs stuff to be recorded as 16-bit audio files. You might bump into that issue later. And on MacOS there’s no obvious Nuget package to record audio, so I used ffmpeg
which is a bit clumsy but it worked fine for my eneds.
% brew install ffmpeg
Note there are a few other things you can do directly from the Whisper.cpp repo documented in the MacOS Arm core-ml support section that worked when I ran them, but I didn’t need them for my little example app in .NET/C#.
6/ Jump back to the original .NET/C# console project you were working on and be sure to copy the folder entitled ./models/ggml-base.en-encoder.mlmodelc
to the root of your project. And also the ggml-base.en.bin
mode file.
For reference, my directory looks like this:

The Finish Line
Okay! You should be able to record a file and transcribe it with the following code:
using System;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
using Whisper.net;
using Whisper.net.Ggml;
class Program
{
public static async Task Main(string[] args)
{
var ggmlType = GgmlType.Base;
var modelFileName = "ggml-base.en.bin";
var wavFileName = "recorded_audio.wav";
Console.WriteLine("Press Enter to start recording...");
Console.ReadLine();
Console.WriteLine("Recording... Press Enter to stop.");
await RecordAudio(wavFileName);
Console.WriteLine("Recording stopped. Transcribing...");
if (!File.Exists(modelFileName))
{
await DownloadModel(modelFileName, ggmlType);
}
using var whisperFactory = WhisperFactory.FromPath(modelFileName);
using var processor = whisperFactory.CreateBuilder()
.WithLanguage("auto")
.Build();
using var fileStream = File.OpenRead(wavFileName);
await foreach (var result in processor.ProcessAsync(fileStream))
{
Console.WriteLine($"{result.Start}->{result.End}: {result.Text}");
}
}
private static async Task RecordAudio(string outputPath)
{
using var ffmpegProcess = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = "ffmpeg",
// you want to have the -y in here to erase the file if it exists -- otherwise it stalls
// you want to get the input device correct; the way you list the devices is with `ffmpeg -f avfoundation -list_devices true -i ""`
Arguments = $"-f avfoundation -y -i \":3\" -ar 16000 {outputPath}",
RedirectStandardInput = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false
}
};
ffmpegProcess.Start();
// Await user input to stop recording
await Task.Run(() => Console.ReadLine());
// Stop FFmpeg process
ffmpegProcess.StandardInput.Write('q');
await ffmpegProcess.WaitForExitAsync();
}
private static async Task DownloadModel(string fileName, GgmlType ggmlType)
{
Console.WriteLine($"Downloading Model {fileName}");
using var modelStream = await WhisperGgmlDownloader.GetGgmlModelAsync(ggmlType);
using var fileWriter = File.OpenWrite(fileName);
await modelStream.CopyToAsync(fileWriter);
}
}
And then just a simple dotnet run
and you’re off to the races. Good luck!
Why am I doing this?
I wanted to start polling the microphone for input to Semantic Kernel and as I go deeper into the .NET/C# world I needed something like this. Of course I could have just called the OAI endpoint but since I’ve been watching the cool kids use local models, I couldn’t help but see if this would really work.
You must be logged in to post a comment.