Voice AI with an Accent

3 min
1,052 views

When I arrived at an ESL school in Boston, voice assistants stumbled over my Pakistani-English mash-up: "Sorry, I didn't catch that." Machines weren't alone; professors often needed repeats. That frustration fuels my obsession with accent-inclusive ASR.

Accent bias is quantifiable.

A 2024 ACM content analysis found systematic misconceptions in ASR research, leading models to underperform on "non-standard" English. Bias isn't a vague moral failing; it's a spike in Word Error Rate that maps onto lost users.

Bias extends beyond tech.

Cambridge research this year showed people with working-class UK accents were more likely to be perceived as criminal. Voice prejudice has cultural teeth—and ASR systems can fossilize it at scale.

Design for code-switching.

In Boston's Haymarket, a single sentence might pivot from Urdu to English to Spanish price haggling. My LLM prototype uses a dual-encoder attention gate that treats language shifts as context windows, not anomalies. Early tests cut WER by 18% on code-switched corpora.

Evaluation must leave the lab.

We beta-tested in noisy T stations—because that's where users actually talk. Success wasn't higher BLEU; it was a tired commuter getting the right train time without yelling.

When voice AI finally greets every user in their own cadence, the "digital divide" shrinks by a few decibels.