[{"data":1,"prerenderedAt":962},["ShallowReactive",2],{"blog-post-en-016_on-device-speech-models-coreml":3,"blog-translations-016_on-device-speech-models-coreml":951},{"id":4,"title":5,"body":6,"date":935,"description":936,"draft":937,"extension":938,"image":939,"meta":940,"navigation":321,"path":941,"publishDate":935,"readingTime":261,"seo":942,"stem":943,"tags":944,"translationKey":949,"__hash__":950},"blog_en/blog/en/016_on-device-speech-models-coreml.md","Shipping ML Models on macOS: What Nobody Warns You About",{"type":7,"value":8,"toc":928},"minimark",[9,13,16,21,24,94,101,108,155,161,164,168,171,177,180,264,267,380,383,388,391,394,512,520,525,528,531,611,614,618,621,626,629,746,757,762,765,768,779,783,786,868,876,879,883,889,895,901,908,911,914,924],[10,11,12],"p",{},"When I started building Yakki, the transcription engine was Whisper. OpenAI had released the model, the community had ported it to CoreML via WhisperKit, and it worked. It was accurate. It supported 99 languages. And it took 5 seconds to load, consumed 1.8GB of RAM, and required users to download 626MB of model files before they could say a single word.",[10,14,15],{},"Then I found Parakeet, NVIDIA's streaming ASR engine, ported to Apple platforms as FluidAudio. Smaller models (170-600MB). Streaming transcription (words appear as you speak, not after). Sub-second initialization. Suddenly the app could start fast enough to disappear.",[17,18,20],"h2",{"id":19},"two-engines-one-protocol","Two Engines, One Protocol",[10,22,23],{},"Users switch between Parakeet and Whisper in settings. The rest of the app shouldn't care which one is active:",[25,26,31],"pre",{"className":27,"code":28,"language":29,"meta":30,"style":30},"language-swift shiki shiki-themes github-dark github-dark","protocol TranscriptionProvider {\n    func transcribe(_ audio: [Float]) async throws -> String\n}\n","swift","",[32,33,34,51,88],"code",{"__ignoreMap":30},[35,36,39,43,47],"span",{"class":37,"line":38},"line",1,[35,40,42],{"class":41},"sOPea","protocol",[35,44,46],{"class":45},"sFR8T"," TranscriptionProvider",[35,48,50],{"class":49},"suv1-"," {\n",[35,52,54,57,60,63,66,69,73,76,79,82,85],{"class":37,"line":53},2,[35,55,56],{"class":41},"    func",[35,58,59],{"class":45}," transcribe",[35,61,62],{"class":49},"(",[35,64,65],{"class":45},"_",[35,67,68],{"class":49}," audio: [",[35,70,72],{"class":71},"s8ozJ","Float",[35,74,75],{"class":49},"]) ",[35,77,78],{"class":41},"async",[35,80,81],{"class":41}," throws",[35,83,84],{"class":41}," ->",[35,86,87],{"class":71}," String\n",[35,89,91],{"class":37,"line":90},3,[35,92,93],{"class":49},"}\n",[10,95,96,97,100],{},"FluidAudio (Parakeet) and WhisperKit implement this protocol differently, FluidAudio streams partial results through an ",[32,98,99],{},"AsyncSequence",", while WhisperKit processes complete audio chunks, but the consuming code doesn't need to know.",[10,102,103,104,107],{},"A ",[32,105,106],{},"ModelDescriptor"," enum represents all available models uniformly:",[25,109,111],{"className":27,"code":110,"language":29,"meta":30,"style":30},"enum TranscriptionEngine {\n    case fluidAudio(version: AsrModelVersion)\n    case whisperKit(model: String)\n}\n",[32,112,113,123,134,150],{"__ignoreMap":30},[35,114,115,118,121],{"class":37,"line":38},[35,116,117],{"class":41},"enum",[35,119,120],{"class":45}," TranscriptionEngine",[35,122,50],{"class":49},[35,124,125,128,131],{"class":37,"line":53},[35,126,127],{"class":41},"    case",[35,129,130],{"class":71}," fluidAudio",[35,132,133],{"class":49},"(version: AsrModelVersion)\n",[35,135,136,138,141,144,147],{"class":37,"line":90},[35,137,127],{"class":41},[35,139,140],{"class":71}," whisperKit",[35,142,143],{"class":49},"(model: ",[35,145,146],{"class":71},"String",[35,148,149],{"class":49},")\n",[35,151,153],{"class":37,"line":152},4,[35,154,93],{"class":49},[10,156,157,158,160],{},"This enum drives the model selection UI, download flow, initialization path, and transcription pipeline. Adding a new engine means adding a case and implementing the protocol. The rest of the app, settings, onboarding, telemetry, handles it automatically because everything operates on ",[32,159,106],{},", not engine-specific types. If I add a v4 model tomorrow, the change is in one place.",[10,162,163],{},"The real complexity is in initialization and lifecycle, not in the transcription call itself.",[17,165,167],{"id":166},"model-loading-the-part-nobody-warns-you-about","Model Loading: The Part Nobody Warns You About",[10,169,170],{},"ML model loading on macOS has three phases that each feel deceptively simple in isolation:",[10,172,173],{},[174,175,176],"strong",{},"Phase 1: Getting the model files.",[10,178,179],{},"Parakeet models ship in two versions: v2 (English-only, 170MB) and v3 (multilingual, 600MB). WhisperKit models range from tiny (74MB) to large-v3 (1.5GB). Users pick their model during onboarding, and I download it to Application Support.",[25,181,183],{"className":27,"code":182,"language":29,"meta":30,"style":30},"public func downloadModel(_ descriptor: ModelDescriptor) async throws -> URL {\n    guard !isDownloading else {\n        throw ModelDownloadError.downloadFailed(\"Download already in progress\")\n    }\n    // ...\n}\n",[32,184,185,212,228,247,252,259],{"__ignoreMap":30},[35,186,187,190,193,196,198,200,203,205,207,209],{"class":37,"line":38},[35,188,189],{"class":41},"public",[35,191,192],{"class":41}," func",[35,194,195],{"class":45}," downloadModel",[35,197,62],{"class":49},[35,199,65],{"class":45},[35,201,202],{"class":49}," descriptor: ModelDescriptor) ",[35,204,78],{"class":41},[35,206,81],{"class":41},[35,208,84],{"class":41},[35,210,211],{"class":49}," URL {\n",[35,213,214,217,220,223,226],{"class":37,"line":53},[35,215,216],{"class":41},"    guard",[35,218,219],{"class":41}," !",[35,221,222],{"class":49},"isDownloading ",[35,224,225],{"class":41},"else",[35,227,50],{"class":49},[35,229,230,233,236,239,241,245],{"class":37,"line":90},[35,231,232],{"class":41},"        throw",[35,234,235],{"class":49}," ModelDownloadError.",[35,237,238],{"class":71},"downloadFailed",[35,240,62],{"class":49},[35,242,244],{"class":243},"s4wv1","\"Download already in progress\"",[35,246,149],{"class":49},[35,248,249],{"class":37,"line":152},[35,250,251],{"class":49},"    }\n",[35,253,255],{"class":37,"line":254},5,[35,256,258],{"class":257},"sJ8bj","    // ...\n",[35,260,262],{"class":37,"line":261},6,[35,263,93],{"class":49},[10,265,266],{},"The obvious mistake I made early on: downloading models nobody asked for. If the user selected Parakeet, I still downloaded Whisper's large-v3 model \"just in case.\" That's 626MB of bandwidth for a model they never selected. The fix was checking before downloading:",[25,268,270],{"className":27,"code":269,"language":29,"meta":30,"style":30},"let modelPath = getLocalModelPath(for: modelName)\nlet modelExists = FileManager.default.fileExists(atPath: modelPath)\n\nif modelExists {\n    print(\"Model already downloaded at \\(modelPath)\")\n} else {\n    print(\"Model not found locally, will download: \\(modelName)\")\n}\n",[32,271,272,294,317,323,331,349,358,375],{"__ignoreMap":30},[35,273,274,277,280,283,286,288,291],{"class":37,"line":38},[35,275,276],{"class":41},"let",[35,278,279],{"class":49}," modelPath ",[35,281,282],{"class":41},"=",[35,284,285],{"class":71}," getLocalModelPath",[35,287,62],{"class":49},[35,289,290],{"class":71},"for",[35,292,293],{"class":49},": modelName)\n",[35,295,296,298,301,303,306,309,311,314],{"class":37,"line":53},[35,297,276],{"class":41},[35,299,300],{"class":49}," modelExists ",[35,302,282],{"class":41},[35,304,305],{"class":49}," FileManager.default.",[35,307,308],{"class":71},"fileExists",[35,310,62],{"class":49},[35,312,313],{"class":71},"atPath",[35,315,316],{"class":49},": modelPath)\n",[35,318,319],{"class":37,"line":90},[35,320,322],{"emptyLinePlaceholder":321},true,"\n",[35,324,325,328],{"class":37,"line":152},[35,326,327],{"class":41},"if",[35,329,330],{"class":49}," modelExists {\n",[35,332,333,336,338,341,344,347],{"class":37,"line":254},[35,334,335],{"class":71},"    print",[35,337,62],{"class":49},[35,339,340],{"class":243},"\"Model already downloaded at ",[35,342,343],{"class":243},"\\(modelPath)",[35,345,346],{"class":243},"\"",[35,348,149],{"class":49},[35,350,351,354,356],{"class":37,"line":261},[35,352,353],{"class":49},"} ",[35,355,225],{"class":41},[35,357,50],{"class":49},[35,359,361,363,365,368,371,373],{"class":37,"line":360},7,[35,362,335],{"class":71},[35,364,62],{"class":49},[35,366,367],{"class":243},"\"Model not found locally, will download: ",[35,369,370],{"class":243},"\\(modelName)",[35,372,346],{"class":243},[35,374,149],{"class":49},[35,376,378],{"class":37,"line":377},8,[35,379,93],{"class":49},[10,381,382],{},"Embarrassingly straightforward. But it saved 626MB per installation for Parakeet users, which was most of my users. This sounds obvious. It wasn't obvious to me for three months.",[10,384,385],{},[174,386,387],{},"Phase 2: Loading into memory and compiling CoreML models.",[10,389,390],{},"This is where the real time goes. CoreML models need to be compiled the first time they're loaded on a device. This can take 10-30 seconds depending on the model and hardware. Subsequent loads are faster because the compiled model is cached, but that first launch is brutal.",[10,392,393],{},"FluidAudio handles this with a two-path loading strategy:",[25,395,397],{"className":27,"code":396,"language":29,"meta":30,"style":30},"do {\n    // Use loadFromCache when models are pre-downloaded (from onboarding)\n    _ = try await AsrModels.loadFromCache(version: selectedVersion)\n    print(\"Loaded models from cache\")\n} catch {\n    // Fallback to downloadAndLoad if cache not available\n    print(\"Cache load failed, downloading: \\(error.localizedDescription)\")\n    _ = try await AsrModels.downloadAndLoad(version: selectedVersion)\n}\n",[32,398,399,406,411,439,450,459,464,486,507],{"__ignoreMap":30},[35,400,401,404],{"class":37,"line":38},[35,402,403],{"class":41},"do",[35,405,50],{"class":49},[35,407,408],{"class":37,"line":53},[35,409,410],{"class":257},"    // Use loadFromCache when models are pre-downloaded (from onboarding)\n",[35,412,413,416,419,422,425,428,431,433,436],{"class":37,"line":90},[35,414,415],{"class":71},"    _",[35,417,418],{"class":41}," =",[35,420,421],{"class":41}," try",[35,423,424],{"class":41}," await",[35,426,427],{"class":49}," AsrModels.",[35,429,430],{"class":71},"loadFromCache",[35,432,62],{"class":49},[35,434,435],{"class":71},"version",[35,437,438],{"class":49},": selectedVersion)\n",[35,440,441,443,445,448],{"class":37,"line":152},[35,442,335],{"class":71},[35,444,62],{"class":49},[35,446,447],{"class":243},"\"Loaded models from cache\"",[35,449,149],{"class":49},[35,451,452,454,457],{"class":37,"line":254},[35,453,353],{"class":49},[35,455,456],{"class":41},"catch",[35,458,50],{"class":49},[35,460,461],{"class":37,"line":261},[35,462,463],{"class":257},"    // Fallback to downloadAndLoad if cache not available\n",[35,465,466,468,470,473,476,479,482,484],{"class":37,"line":360},[35,467,335],{"class":71},[35,469,62],{"class":49},[35,471,472],{"class":243},"\"Cache load failed, downloading: ",[35,474,475],{"class":243},"\\(error.",[35,477,478],{"class":49},"localizedDescription",[35,480,481],{"class":243},")",[35,483,346],{"class":243},[35,485,149],{"class":49},[35,487,488,490,492,494,496,498,501,503,505],{"class":37,"line":377},[35,489,415],{"class":71},[35,491,418],{"class":41},[35,493,421],{"class":41},[35,495,424],{"class":41},[35,497,427],{"class":49},[35,499,500],{"class":71},"downloadAndLoad",[35,502,62],{"class":49},[35,504,435],{"class":71},[35,506,438],{"class":49},[35,508,510],{"class":37,"line":509},9,[35,511,93],{"class":49},[10,513,514,516,517,519],{},[32,515,430],{}," is the fast path, it assumes the model files and compiled artifacts already exist. ",[32,518,500],{}," is the slow path for first-time users or when something got corrupted. I pre-download during onboarding so that the first real dictation session hits the fast path.",[10,521,522],{},[174,523,524],{},"Phase 3: Prewarming.",[10,526,527],{},"WhisperKit has a \"prewarm\" step that runs a dummy inference pass to initialize GPU buffers, allocate memory, and trigger any lazy CoreML compilation. This is the difference between a 3-second first transcription and a sub-second one.",[10,529,530],{},"The optimization that halved startup time: only prewarm the model the user actually selected.",[25,532,534],{"className":27,"code":533,"language":29,"meta":30,"style":30},"// Before: Always initialize (regardless of user choice)\nwhisperKitIntegration = WhisperKitIntegration.shared\n\n// After: Check user's model selection first\nif dictationConfig.localModelID?.starts(with: \"whisper-\") ?? false {\n    whisperKitIntegration = WhisperKitIntegration.shared\n}\n",[32,535,536,541,551,555,560,598,607],{"__ignoreMap":30},[35,537,538],{"class":37,"line":38},[35,539,540],{"class":257},"// Before: Always initialize (regardless of user choice)\n",[35,542,543,546,548],{"class":37,"line":53},[35,544,545],{"class":49},"whisperKitIntegration ",[35,547,282],{"class":41},[35,549,550],{"class":49}," WhisperKitIntegration.shared\n",[35,552,553],{"class":37,"line":90},[35,554,322],{"emptyLinePlaceholder":321},[35,556,557],{"class":37,"line":152},[35,558,559],{"class":257},"// After: Check user's model selection first\n",[35,561,562,564,567,570,573,576,578,581,584,587,590,593,596],{"class":37,"line":254},[35,563,327],{"class":41},[35,565,566],{"class":49}," dictationConfig.localModelID",[35,568,569],{"class":41},"?",[35,571,572],{"class":49},".",[35,574,575],{"class":71},"starts",[35,577,62],{"class":49},[35,579,580],{"class":71},"with",[35,582,583],{"class":49},": ",[35,585,586],{"class":243},"\"whisper-\"",[35,588,589],{"class":49},") ",[35,591,592],{"class":41},"??",[35,594,595],{"class":71}," false",[35,597,50],{"class":49},[35,599,600,603,605],{"class":37,"line":261},[35,601,602],{"class":49},"    whisperKitIntegration ",[35,604,282],{"class":41},[35,606,550],{"class":49},[35,608,609],{"class":37,"line":360},[35,610,93],{"class":49},[10,612,613],{},"This one conditional brought Parakeet users from 4.2s startup to 2.1s. I was spending half the startup time initializing a speech engine that wasn't going to be used.",[17,615,617],{"id":616},"streaming-vs-chunk-processing","Streaming vs. Chunk Processing",[10,619,620],{},"Streaming makes the app feel alive. That's the real difference between Parakeet and Whisper. Not accuracy. Not language support. It's whether the user sees words appearing as they speak or waits for a \"processing\" indicator to finish.",[10,622,623],{},[174,624,625],{},"Parakeet (FluidAudio): Streaming.",[10,627,628],{},"Audio buffers flow continuously into the engine. Words appear on screen as you speak. The engine maintains a sliding window of context and emits two kinds of updates: \"hypothesis\" text (best guess so far, subject to change) and \"confirmed\" text (finalized, won't change).",[25,630,632],{"className":27,"code":631,"language":29,"meta":30,"style":30},"let lowLatencyConfig = StreamingAsrConfig(\n    chunkSeconds: 11.0,              // Stable transcription window\n    hypothesisChunkSeconds: 0.2,      // Faster initial feedback\n    leftContextSeconds: 2.0,          // Keep context for accuracy\n    rightContextSeconds: 1.0,         // Catch tails of words\n    minContextForConfirmation: 4.0,   // Quality confirmation threshold\n    confirmationThreshold: 0.85       // Prevent flickering confirmations\n)\n",[32,633,634,649,665,681,697,713,729,742],{"__ignoreMap":30},[35,635,636,638,641,643,646],{"class":37,"line":38},[35,637,276],{"class":41},[35,639,640],{"class":49}," lowLatencyConfig ",[35,642,282],{"class":41},[35,644,645],{"class":71}," StreamingAsrConfig",[35,647,648],{"class":49},"(\n",[35,650,651,654,656,659,662],{"class":37,"line":53},[35,652,653],{"class":71},"    chunkSeconds",[35,655,583],{"class":49},[35,657,658],{"class":71},"11.0",[35,660,661],{"class":49},",              ",[35,663,664],{"class":257},"// Stable transcription window\n",[35,666,667,670,672,675,678],{"class":37,"line":90},[35,668,669],{"class":71},"    hypothesisChunkSeconds",[35,671,583],{"class":49},[35,673,674],{"class":71},"0.2",[35,676,677],{"class":49},",      ",[35,679,680],{"class":257},"// Faster initial feedback\n",[35,682,683,686,688,691,694],{"class":37,"line":152},[35,684,685],{"class":71},"    leftContextSeconds",[35,687,583],{"class":49},[35,689,690],{"class":71},"2.0",[35,692,693],{"class":49},",          ",[35,695,696],{"class":257},"// Keep context for accuracy\n",[35,698,699,702,704,707,710],{"class":37,"line":254},[35,700,701],{"class":71},"    rightContextSeconds",[35,703,583],{"class":49},[35,705,706],{"class":71},"1.0",[35,708,709],{"class":49},",         ",[35,711,712],{"class":257},"// Catch tails of words\n",[35,714,715,718,720,723,726],{"class":37,"line":261},[35,716,717],{"class":71},"    minContextForConfirmation",[35,719,583],{"class":49},[35,721,722],{"class":71},"4.0",[35,724,725],{"class":49},",   ",[35,727,728],{"class":257},"// Quality confirmation threshold\n",[35,730,731,734,736,739],{"class":37,"line":360},[35,732,733],{"class":71},"    confirmationThreshold",[35,735,583],{"class":49},[35,737,738],{"class":71},"0.85",[35,740,741],{"class":257},"       // Prevent flickering confirmations\n",[35,743,744],{"class":37,"line":377},[35,745,149],{"class":49},[10,747,748,749,752,753,756],{},"Every parameter was tuned through real-world testing. The ",[32,750,751],{},"hypothesisChunkSeconds"," at 0.2s means the user sees their first word within 200ms of speaking, fast enough that it feels like the app is listening in real time. The ",[32,754,755],{},"confirmationThreshold"," at 0.85 prevents the unsettling experience of confirmed text changing after you thought it was done.",[10,758,759],{},[174,760,761],{},"Whisper (WhisperKit): Chunk-based.",[10,763,764],{},"You record audio, stop, and send the complete buffer for processing. The model sees all the audio at once and returns the full transcription. Higher accuracy on longer passages, but zero feedback while you're speaking.",[10,766,767],{},"For push-to-talk dictation, the distinction matters less than you'd think. Users hold a key, speak for 5-15 seconds, release. With Parakeet, they see text flowing in real time. With Whisper, they see a brief \"processing\" indicator then the complete text. Both work. But the psychological gap is real.",[10,769,770,771,774,775,778],{},"Yakki supports 25 languages, which further constrains the choice. Parakeet v3 covers all 25; Whisper covers everything. The ",[32,772,773],{},"LanguageManager"," selects the model version, and the ",[32,776,777],{},"FluidAudioManager"," initializes with it. One indirection layer that means language support changes never touch the transcription pipeline.",[17,780,782],{"id":781},"memory-the-invisible-budget","Memory: The Invisible Budget",[10,784,785],{},"Running ML models on a Mac is a memory game:",[787,788,789,808],"table",{},[790,791,792],"thead",{},[793,794,795,799,802,805],"tr",{},[796,797,798],"th",{},"Model",[796,800,801],{},"Download Size",[796,803,804],{},"RAM Usage",[796,806,807],{},"Init Time",[809,810,811,826,840,854],"tbody",{},[793,812,813,817,820,823],{},[814,815,816],"td",{},"Parakeet v3",[814,818,819],{},"~600MB",[814,821,822],{},"~400MB",[814,824,825],{},"\u003C1s",[793,827,828,831,834,837],{},[814,829,830],{},"Whisper tiny",[814,832,833],{},"74MB",[814,835,836],{},"~200MB",[814,838,839],{},"~1s",[793,841,842,845,848,851],{},[814,843,844],{},"Whisper small",[814,846,847],{},"472MB",[814,849,850],{},"~500MB",[814,852,853],{},"~2s",[793,855,856,859,862,865],{},[814,857,858],{},"Whisper large-v3",[814,860,861],{},"1.5GB",[814,863,864],{},"~1.8GB",[814,866,867],{},"~5s",[10,869,870,871,875],{},"The critical insight: these costs are only paid for the active model. Before the conditional initialization fix, I was loading ",[872,873,874],"em",{},"both"," engines on startup. A Parakeet user was paying 400MB for Parakeet plus 1.8GB for Whisper large-v3. After the fix, they pay 400MB. Period.",[10,877,878],{},"This matters for a menu bar app. Users don't think about Yakki's memory usage, it's supposed to be invisible. When Activity Monitor shows 2.2GB for a dictation utility, the user gets nervous. When it shows 400MB, they forget about it. That's the goal.",[17,880,882],{"id":881},"what-id-do-differently","What I'd Do Differently",[10,884,885,888],{},[174,886,887],{},"Bundle a small model with the app."," The current flow requires a download during onboarding. Users with slow internet or corporate firewalls sometimes fail here. If I bundled Parakeet v3 directly in the app bundle (adding ~100MB to the download since it compresses well), first-time experience would be: install, open, start dictating. No onboarding download step. I haven't done this yet because of App Store size concerns, but for direct distribution it's a clear win.",[10,890,891,894],{},[174,892,893],{},"Lazy-load everything."," I still initialize the transcription engine at app launch. For users who open the app but don't dictate for 10 minutes, that's wasted work. A better pattern: initialize on first hotkey press. The 0.5-1s delay on the very first dictation is negligible, and every subsequent session is pre-initialized.",[10,896,897,900],{},[174,898,899],{},"Measure CoreML compilation time separately."," The startup telemetry lumps together model loading and CoreML compilation. But compilation only happens once (the compiled model is cached). I should track these separately so I can tell the difference between \"slow because first launch\" and \"slow because something is wrong.\"",[10,902,903,904,907],{},"The common thread: every optimization that mattered was about choosing ",[872,905,906],{},"not"," to do work. Not downloading models nobody selected. Not initializing engines nobody chose. Not prewarming models that aren't active. The hardest part of shipping ML on-device isn't making it fast. It's recognizing all the work you're doing that you shouldn't be.",[10,909,910],{},"Open Activity Monitor. Find your app. If you're surprised by the number, you probably have a model loaded that nobody asked for.",[912,913],"hr",{},[10,915,916],{},[872,917,918,919,572],{},"This is part of an ongoing series about building Yakki, a macOS dictation app. For how the context-aware detection system builds on this transcription engine, see ",[920,921,923],"a",{"href":922},"15-context-aware-dictation-email-detection","Your macOS App Should Know What the User Is Doing",[925,926,927],"style",{},"html pre.shiki code .sOPea, html code.shiki .sOPea{--shiki-default:#F97583;--shiki-dark:#F97583}html pre.shiki code .sFR8T, html code.shiki .sFR8T{--shiki-default:#B392F0;--shiki-dark:#B392F0}html pre.shiki code .suv1-, html code.shiki .suv1-{--shiki-default:#E1E4E8;--shiki-dark:#E1E4E8}html pre.shiki code .s8ozJ, html code.shiki .s8ozJ{--shiki-default:#79B8FF;--shiki-dark:#79B8FF}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html pre.shiki code .s4wv1, html code.shiki .s4wv1{--shiki-default:#9ECBFF;--shiki-dark:#9ECBFF}html pre.shiki code .sJ8bj, html code.shiki .sJ8bj{--shiki-default:#6A737D;--shiki-dark:#6A737D}",{"title":30,"searchDepth":90,"depth":90,"links":929},[930,931,932,933,934],{"id":19,"depth":53,"text":20},{"id":166,"depth":53,"text":167},{"id":616,"depth":53,"text":617},{"id":781,"depth":53,"text":782},{"id":881,"depth":53,"text":882},"2026-04-01","Two speech engines, one app, and the optimization that halved startup time. A practical guide to managing CoreML model loading, memory budgets, and the tradeoffs between streaming and chunk-based transcription.",false,"md",null,{},"/blog/en/016_on-device-speech-models-coreml",{"title":5,"description":936},"blog/en/016_on-device-speech-models-coreml",[945,29,946,947,948],"engineering","macos","machine-learning","coreml","on-device-speech-models-coreml","uD7GKQFiLZgnrRe9ouCH80fKsoKN-K7rO9xOtcQTEaY",[952,957],{"code":953,"name":954,"title":955,"path":956},"es","Español","Integrar modelos de ML en macOS: lo que nadie te cuenta","/es/blog/016_on-device-speech-models-coreml",{"code":958,"name":959,"title":960,"path":961},"fr","Français","Intégrer des modèles de ML dans une app macOS : ce qu'on ne vous dit pas","/fr/blog/016_on-device-speech-models-coreml",1775207564257]