Pioneering Voice AI with Carleton University

We sat down with Edge Signal CTO Burak Cakmak to discuss the recent "Voice AI Project" in collaboration with Carleton University.

Q: What inspired the Voice AI Project, and what challenges were you aiming to solve?

Burak: One of our customers had a unique use case: they wanted insights into what is spoken in their stores—keywords, brand mentions, campaign names, and sentiment analysis—without transferring or recording audio files. This was crucial to maintain privacy and confidentiality.

Edge computing became a must for this project. Beyond keyword and sentiment extraction, context was vital. For instance, how often is a product mentioned, and what’s the emotional tone? Are customers happy or upset? The challenge was achieving all this using resource-intensive large language models (LLMs) on edge devices, ensuring on-the-fly processing without saving audio recordings.

Q: That sounds like a tall order. How did the Carleton University team contribute to overcoming these challenges?

Burak: The team from Carleton University was a great help! Aidan Lochbihler, Julien Lariviere-Chartier, and Phil Masson, under Dr. Bruce Wallace’s guidance, collaborated closely with our Edge Signal development team to design a system capable of:

Keyword extraction: Identifying specific terms like product names or campaign mentions.
Sentiment analysis: Understanding whether speakers are happy, angry, or neutral.
Context extraction: Interpreting the broader conversation and distinguishing between customers and store personnel.

Achieving this required running LLMs directly on edge devices—no cloud processing. The biggest hurdles included optimizing these heavy-weight models for limited-resource devices and supporting multiple languages seamlessly. Thanks to everyone’s hard work, we proved it’s doable.

The collaboration was made possible in part through funding from the National Research Council of Canada Industrial Research Assistance Program (NRC IRAP) for Carleton University’s SAM3 innovation hub.

Q: How did you address privacy while maintaining accuracy?

Burak: Privacy was non-negotiable. We used LLMs to transcribe audio on the fly without saving recordings. To improve transcription accuracy, we introduced a two-pillar approach for context generation:

User-generated hints: Customers provide a list of keywords (e.g., brand names or campaign terms) to enhance transcription performance without retraining the model.
Auto-generated hints: Using computer vision, we analyzed images captured by in-store cameras to generate context-specific keywords automatically. These hints significantly improved the transcription quality.

By combining these approaches, we avoided retraining models for each customer in different countries and different languages—a process that’s both costly and impractical—while delivering accurate results. This was a big accomplishment!

Q: What other challenges did you run into? And how did you handle multi language support?

Burak: Splitting audio channels was critical. By isolating individual speakers, we provided cleaner, more structured input to the LLMs as opposed to inputting raw text files. This greatly enhanced the system’s ability to extract sentiment and context from conversations.

As for multilingual support, it was one of our toughest challenges. Edge devices have limited resources, yet we needed to accommodate multiple languages without compromising performance. Through careful optimization and collaboration with the Carleton team, we developed a solution that’s both efficient and scalable.

Q: What’s next?

Burak: While the collaboration proved the concept, there’s still work to be done to make it production-grade. Optimization will be my team’s responsibility, to refine the system to handle larger-scale deployments while maintaining speed and accuracy. Another important step will be to fine tune the model to accommodate various accents as that’ll be a future requirement. We also plan to adapt our technology to support aging in place and use cases related to assisted living.

This project demonstrated that edge computing and LLMs can coexist to deliver real-time insights while upholding privacy. It’s a game-changer for businesses seeking actionable data without compromising customer trust.

Q: Any final thoughts on working with Carleton University?

Burak: Partnering with Carleton University was a great experience. The team brought fresh perspectives and technical depth, helping us solve some of the most challenging aspects of the project. Together, we achieved something truly groundbreaking, and I’m excited to see how this technology evolves in the future.

This latest collaboration between Edge Signal and the team at Carleton University is driving innovation in voice AI by leveraging edge computing to create more secure, efficient, and advanced AI solutions.

Edge AI

Pioneering Voice AI with Carleton University

Q: What inspired the Voice AI Project, and what challenges were you aiming to solve?

Q: That sounds like a tall order. How did the Carleton University team contribute to overcoming these challenges?

Q: How did you address privacy while maintaining accuracy?

Q: What other challenges did you run into? And how did you handle multi language support?

Q: What’s next?

Q: Any final thoughts on working with Carleton University?

Get started today by contacting us.

Similar posts

Meike Buechler

Edge AI

Using Edge AI to Define Normal—and Flag the Abnormal

Meike Buechler

Edge AI

Pioneering Voice AI with Carleton University

Meike Buechler

Edge AI

The Future of Business: Integrating Voice AI with Edge Computing

Subscribe to our newsletter