
In the domain of conversational AI, particularly in applications like drive-through services or phone-based assistants, the choice of voice used for the AI interface has emerged as a subject of considerable interest. Notably highlighted in a recent NPR interview, it's observed that the majority of Voice AI systems predominantly employ female voices. This trend raises the question: why is this the preference?
From our research and practical observations, it has been noted that men marginally favor a female voice, whereas women have a slight preference for a male voice. Corporate clients, on the other hand, often opt for a singular, consistent voice selection, usually chosen by us. My team and I, however, have experimented with alternating voices to identify the most effective option. This approach is primarily driven by customer preferences and lacks a strict methodological framework. Historically, most of these voices are sourced from platforms such as Azure, Google, or OpenAI's Whisper. Recently, a new entrant, ElevenLabs, has introduced voices that remarkably resemble human speech in terms of natural cadence and customization. However, a notable drawback in our trials has been an additional second or two in response time, which is impractical for our purposes.
In our experience, corporate clients exhibit the most concern regarding the choice of voice, often taking the lead in selecting it. Some businesses prefer a standard voice to maintain brand recognition. Interestingly, these voices are typically not proprietary but are sourced from the aforementioned companies.
In conclusion, while the choice of voice in AI systems may not be fundamentally critical, it is ultimately governed by customer preferences. An intriguing aspect of this discussion is the varied descriptions customers have for these AI voices, with comments such as “this voice sounds angry” or “that one lacks confidence” being quite common. Another new fad is using "celebrity voices", Rick Ross for Wingstop, Shaq for Papa John's. I think it's a great idea however figuring out the licensing seems to be problematic for now.
In your opinion, does the voice of an AI system significantly influence how customers perceive their interaction with it?
Comments