Voice AI with an Accent
When I arrived at an ESL school in Boston, voice assistants stumbled over my Pakistani-English mash-up: "Sorry, I didn't catch that." Machines weren't alone; professors often needed repeats. That frustration fuels my obsession with accent-inclusive ASR.
Accent bias is quantifiable.
A 2024 ACM content analysis found systematic misconceptions in ASR research, leading models to underperform on "non-standard" English. Bias isn't a vague moral failing; it's a spike in Word Error Rate that maps onto lost users.
Bias extends beyond tech.
Cambridge research this year showed people with working-class UK accents were more likely to be perceived as criminal. Voice prejudice has cultural teeth—and ASR systems can fossilize it at scale.
Design for code-switching.
In Boston's Haymarket, a single sentence might pivot from Urdu to English to Spanish price haggling. My LLM prototype uses a dual-encoder attention gate that treats language shifts as context windows, not anomalies. Early tests cut WER by 18% on code-switched corpora.
Evaluation must leave the lab.
We beta-tested in noisy T stations—because that's where users actually talk. Success wasn't higher BLEU; it was a tired commuter getting the right train time without yelling.
When voice AI finally greets every user in their own cadence, the "digital divide" shrinks by a few decibels.