Truleo’s models analyze thousands audio interactions to extract sentiment and insights, and one of the most important (components of our process) is converting audio speech into text. This process, called speech recognition or transcription, is second-nature to most humans but can be complex when it comes to Body Worn Camera audio.
Humans interpret speech every day, and it’s often thought that humans are 100% accurate at speech recognition. However, two different people listening to the same audio have about a 5% disagreement between what they believe was said for clean, clear audio speech. The error rate goes up to 15% for noisy audio. (cite http://arxiv.org/abs/1512.02595v1)
When it comes to Body Worn Camera audio, humans tend to disagree even more. Expert BWC transcriptionists disagree by 15% on what was said in a BWC audio track. When multiple experts can review the same transcript, they can converge on one that’s 100% “truth”. Non-expert transcriptionists have about a 30% error rate compared to this truth.
Why is BWC audio so hard for even humans to understand?
Noisy, chaotic environments
When you listen to a podcast or news report, it’s easy to understand what’s being said - clear, crisp audio in a directional mic. In contrast, BWC audio scenarios often have officers and civilians coming in and out of focus of the microphone. That audio is punctuated with tons of background noise and police radio.
It’s no wonder that not only humans struggle with BWC audio, but most speech recognition systems do as well. While many of these engines can exceed human-level accuracy on clean audio, none get close to the range of human-level accuracy (shaded region) for BWC audio.
Building the best BWC transcription model
At Truleo, we took the complexities of BWC into account when building our transcription model. We finetuned our model on hundreds of hours of BWC audio cleaned up by law enforcement transcription experts. Our transcriptionists meticulously tagged officer and civilian audio, as well as police radio, so our models could learn the diversity of the audio environment that BWCs are in every day.
The work paid off - by building a transcription model specifically for Body Worn Camera audio, Truleo achieves the best speech recognition accuracy out there, even beating many humans on understanding what’s happening in a BWC interaction. High accuracy ensures the rest of our models can precisely interpret BWC conversations and help departments leverage these insights to manage risk.