How Truleo Built the Best Body-Worn Camera Transcription Model
Truleo’s models analyze thousands audio interactions to extract sentiment and insights, and one of the most important (components of our process) is converting audio speech into text. This process, called speech recognition or transcription, is second-nature to most humans but can be complex when it comes to Body Worn Camera audio.
Humans interpret speech every day, and it’s often thought that humans are 100% accurate at speech recognition. However, two different people listening to the same audio have about a 5% disagreement between what they believe was said for clean, clear audio speech. The error rate goes up to 15% for noisy audio. (cite http://arxiv.org/abs/1512.02595v1)
When it comes to Body Worn Camera audio, humans tend to disagree even more. Expert BWC transcriptionists disagree by 15% on what was said in a BWC audio track. When multiple experts can review the same transcript, they can converge on one that’s 100% “truth”. Non-expert transcriptionists have about a 30% error rate compared to this truth.
Why is BWC audio so hard for even humans to understand?
