NVIDIA Launches Open-Source AI Tools to Bridge Language Gap Across Europe

NVIDIA Launches Open-Source AI Tools to Bridge Language Gap Across Europe

Artificial intelligence may be transforming industries, but it still speaks only a fraction of the world’s 7,000 languages—leaving many communities out of the conversation. NVIDIA is setting out to change that, particularly for Europe’s diverse linguistic landscape.

World Leader in AI Computing
We create the world’s fastest supercomputer and largest gaming platform.

The company has unveiled a new suite of open-source tools designed to help developers build advanced speech AI for 25 European languages, from widely spoken ones to those often overlooked in tech, such as Croatian, Estonian, and Maltese.

The aim is simple but far-reaching: make it possible to create voice-powered services—from multilingual chatbots to lightning-fast translation tools—that truly understand local languages.

At the heart of this release is Granary, a vast speech library containing roughly one million hours of curated audio. This resource is designed to train AI systems in the nuances of speech recognition and translation. Supporting Granary are two purpose-built AI models:

  • Canary-1b-v2 – optimized for high-accuracy transcription and translation, even with complex tasks.
  • Parakeet-tdt-0.6b-v3 – engineered for real-time applications where speed is critical.
Granary: Speech Recognition and Translation Dataset in 25 European Languages

Both models and the dataset are already available on Hugging Face, and the research paper detailing Granary will be presented later this month at the Interspeech conference in the Netherlands.

What sets this project apart is how the data was created. Traditionally, training AI for speech requires vast amounts of manually annotated audio—a slow and costly process. NVIDIA’s speech AI team, working with Carnegie Mellon University and Fondazione Bruno Kessler, built an automated data-processing pipeline using its NeMo toolkit. This system turns raw, unlabelled recordings into structured, high-quality data in a fraction of the time.

Carnegie Mellon University | CMU
A private, global research university, Carnegie Mellon stands among the world’s top educational institutions with its cutting-edge programs and innovations.

The results speak for themselves: Granary enables developers to hit accuracy targets with about half the data needed compared to other popular datasets. Canary delivers accuracy levels on par with models three times larger—while operating up to ten times faster. Parakeet can process a 24-minute meeting in one go, detect the language automatically, and produce transcripts complete with punctuation, capitalisation, and precise timestamps.

Embrace AI or leave career, say developers
For many developers, their first brush with AI coding tools feels less like a career threat and more like a letdown.

For developers in cities from Riga to Zagreb, this could be transformative—unlocking the ability to build professional-grade AI tools in their own languages.

By making these resources open and accessible, NVIDIA is not just releasing new models; it’s laying the groundwork for a more linguistically inclusive AI future—one where technology speaks to everyone, everywhere.

Read more