close
close

Solondais

Where news breaks first, every time

sinolod

Gladia raises $16M for AI transcription and analytics


Join our daily and weekly newsletters for the latest updates and exclusive content covering cutting-edge AI. Learn more


Gladia, an AI transcription and audio intelligence provider, has raised $16 million in funding.

The Paris, France-based company will use the funding to develop an end-to-end audio infrastructure – starting with a new real-time audio transcription and analysis engine – enabling voice-first platforms to deliver more of value to their users across borders with cutting-edge AI.

That’s a challenge for competitors like Otter.ai and Fireflies.ai, as well as other AI-based services that transcribe voice conversations into text. In an interview with VentureBeat, CEO Jean-Louis Quéguiner explained to me why he created the company.

“As you can hear with a beautiful French accent, I’m not an English speaker and I was extremely frustrated with the accents,” Quéguiner said. “That’s why I founded the company.”

I had a demo of the AI ​​transcription, and it worked in real time as Quéguiner spoke English with his strong French accent. I’m used to services like Otter having a lot of wrong words in a transcript, but in Gladia’s first results page, I didn’t see any errors. He also showed how he could speak two different languages ​​and how the system could switch from one language to another as needed.

XAnge led the round, with participation from Illuminate Financial, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures and Soma Capital.

Gladia uses AI for transcription.
Gladia uses AI for audio transcription.

Founded in 2022, Gladia has now raised a total of $20.3 million, with previous seed investments led by New Wave, Sequoia Capital (under the First Sequoia Arc program), Cocoa and GFC. Gladia was recently selected to participate in the AWS Generative AI Accelerator Program.

“Gladia represents the qualities we love to champion at XAnge: a bold global technology team at the forefront of AI innovation, with a proven business model to unlock new opportunities across all industries,” said Alexis du Peloux, partner at XAnge, in a press release. . “In a rapidly evolving AI environment, Jean-Louis Quéguiner and his team executed extremely well, and we are proud to support Gladia for Series A.”

Since most current speech recognition models are trained primarily on English audio data and are therefore inherently biased, Gladia has prioritized creating the first truly multilingual real-time product.

The new optimized engine offers advanced real-time transcription in over 100 languages, as well as improved accent support and the unique ability to adapt to different languages ​​on the fly.

Gladia’s new engine is unique in its ability to extract information from a call, such as caller sentiment, key information and conversation summary, in real time. This means it takes less than a second to generate both the transcript and information for a call or meeting using Gladia.

New real-time AI transcription

Gladia founders Jonathan Soto (left) and Jean-Louis Quéguiner.

Building an accurate, low-latency, multilingual engine in-house is a complex and resource-intensive task. This requires deep expertise in language understanding, real-time data processing, with continuous optimization and maintenance. Real-time models require more computing power and may struggle to immediately produce accurate results due to limited context.

Gladia’s new product allows businesses to circumvent these challenges. The real-time text-to-speech engine delivers peak latency of less than 300 milliseconds without compromising accuracy, regardless of language, geography, or technology stack used.

“Companies are spending valuable time and resources trying to incorporate multiple AI capabilities into their existing platforms,” Gladia CTO Jonathan Soto said in a statement. “Our unique API is compatible with all existing technology stacks and protocols, including SIP, VoIP, FreeSwitch and Asterisk. This allows us to easily integrate real-time transcription and analytics into our clients’ AI platforms, so they can focus on providing the best services to their end users.

What awaits us

The company’s first asynchronous transcription and audio intelligence API was launched in June 2023 and was based on a proprietary version of Whisper ASR.

It quickly gained traction in the enterprise market, particularly with meeting recorders and note-taking assistants. The API is now adopted by more than 600 customers worldwide, including Attention, Circleback, Method Financial, Recall, Sana and VEED.IO, and has more than 70,000 users.

“Gladia’s technology enables businesses in vertical markets that need industry-leading real-time transcription, including a sales enablement and contact center platform, to seamlessly move from post-call manual processing to proactive, low-latency workflows,” Quéguiner said. “Whether it’s automated CRM enrichment or real-time guidance for support agents, Gladia is designed to help businesses operate smarter and more efficiently in record time, without requiring expertise in In-house AI.

Gladia will use the new capital to advance its R&D efforts and will soon commercialize a unique AI toolkit for audio and expand its product offerings with additional a la carte models, including Extended Language Models (LLM) and recovery augmented generation (RAG). ). With several design partners in the contact center as a service (CCaaS) segment, the company is currently testing an agent assistance solution powered by Gladia’s real-time AI engine. Additionally, Gladia will continue to expand its talent base while preparing for international expansion.

“We are multilingual and we have something called ‘code-switching,’ which makes it unique,” ​​Quéguiner said. “You can start with the language and move on to another.”

He then showed me that he could start a call in English and start transcribing. Then he spoke French words, and the model correctly translated them into French.

“Keep in mind that (others) are not real-time right now, and this one is real-time,” he said. “Usually real time is a little less precise. You can also have your own personalized vocabulary in real time, which is quite unusual for us. We have the ability to extract information in real time.

The service has an AI synthesizer and will have new optional features in the coming months. Quéguiner said his service can also correct acronyms and detect switching to another language.

“The model we use is very similar to LLMs (large language models). It doesn’t have a code decoder architecture, which is not the case with most of the models you’ve seen with Fireflies, for example.

The market includes “meeting recorders,” Quéguiner said. The results can be transmitted to real-time information, which can help people such as prospects close deals faster.

The company also works with call centers, allowing them to complete tasks 30% faster while on the phone thanks to improved accuracy. The company will charge a flat fee such as an hourly rate.