As we can see now many businesses start to use synthesized voices to create voiceover for a video and to create audio from text, while a few years ago almost a hundred percent of this market was occupied by real voice actors. What has changed with users minds or with technology? Let’s try to figure it out.
Previously text to speech conversions were used mainly for technical purposes and in specific industries. Gradually they grew in popularity across multiple businesses. Programmers, filmmakers, game developers, online educators and advertisers, one way or another started using synthesised voices.
Of course, businessmen can quickly calculate the benefits of new tools. Previously, they had to invite professional actors for voiceover, pay for their work and studio rental. Each hour of recording used to cost $300-500 and along with all necessary preparations that could easily take several days at least. Therefore, when high-quality voice synthesis technologies appeared, many customers began to pay attention to them. Why not simplify the process, make it faster and much cheaper with voice converters?
How quickly users change their approach to voice over can be judged by the growing popularity of online text to speech converter Kukarella. The user base of the site is growing at about 50% per month, and among active users there are people of various professions. With this trend, it is obvious that the market’s need in voiceover actors will gradually decrease, and modern computer text to voice converters will become the best alternative.
How did this shift happen? And why did it become visible right now? After all, the technology of converting text to voice has been around for half a century and during this time has taken a strong position in robotics and the computer industry. Why, until recently, companies continued to lay out thousands of dollars to rent studios and hire actors for voice acting?
In everyday life, we are constantly dealing with computer speech. How do we perceive Siri or for example GPS navigator? Is it like a living person? We take only the necessary information from them and neglect the unnatural, synthetic speech. We don’t even think about it and have no complaints. These devices have a clear task to convey technical information to users, and they cope with it perfectly.
One of the early computer voices named Stephen Hawking is still gaining more than 3,000 queries per month on Google. Until recently this robotic voice quite satisfied the creators of computer games. But in recent years, much has begun to change. Computer voices became more realistic, and former synthetic voices began to dramatically lose their audience.
Realistic computer voices began to be used in cinema, video games, in transport, for dubbing videos and movies. These industries quickly appreciated the benefits of computer-assisted voice synthesis — the speed of voice creation and its low cost. At the same time, they showed the direction in which voice synthesis technologies should develop. That is today’s reality.
It is not surprising that last year one of the videos from the presentation of Google literally blew up the Internet. Then Google unveiled one of its projects to create artificial intelligence – a voice assistant. They presented an audio where a very lively, natural voice calls the hairdresser and makes an appointment. In the second audio, the client’s voice reserves a table in the restaurant. Both dialogs sounded completely natural, with pauses and interjections. The administrators of the women’s salon and restaurant did not even guess that they were dealing with synthesized voices and artificial intelligence.
After this video, it became clear that the question of replacing a live speaker with a computer voice is just a matter of time. The bank of voices is growing rapidly. Today at Kukarella, which acts as an aggregator of voices, you can access 300 voices in 60 languages. If desired, the voice can be made very realistic. There are many effects on the site, such as breathing, whispering, exclamations, pauses. It is these shades that make the voice natural. So in a few years it will probably be difficult to determine if you are talking on the phone with a person or a computer program.
Of course, businessmen, people in creative professions, computer technology developers immediately calculated their benefits of new computer voiceovers and began to actively use them in their work. A new idea spread on the Internet faster than coronavirus. Still, an easy-to-use and easily accessible technology for converting text into voice saves a lot of time and money.
Let’s say I’m a film author. I’ve sketched the main dialogs and now I need to do draft dubbing of the script. I print a dialogue and with two or three clicks I choose narrators. If I did not like the timbre of the voice or voice emotions – I change the character, add intonations, pauses, all sorts of um, hm … and so on. It’s done! Work on the dialogue took just 5 minutes and cost a few cents. Whereas earlier for the same result I would have to rent a studio and hire actors.
The leading platforms in the field of text to speech conversion are still Google, Amazon, IBM, Microsoft – in the Western world, Canada and America. Baidu is popular in China, Yandex is in Russia. However, these voice synthesizers have a rather complicated interface and require lots of settings. It will be difficult for an ordinary, not very advanced user to deal with them. And this is not surprising, since these platforms develop technology and do not focus on the needs of the end user.
Therefore, alternative applications and sites are beginning to appear today that focus entirely on converter quality. They offer a wide range of voices, accents, tonalities and various speech strokes in all languages of the world. Companies compete in the speed of converting text to voice and guarantee strict confidentiality of all texts. In less than one year of operation, Kukarella’s website conducted more than a million text to voice conversions. Not surprising, since on the platform you can convert text to voice for less than $5. And the whole process takes seconds.
This is especially convenient if you need to convert text in different languages. Can you compare that with the costs of renting a studio of an actor’s hourly rates? As you can see, the prices are not even comparable. Voice converters win all the time.
Another example: there are two assistants at the clinic reception. One of them helps patients with paperwork, and another one calls them on the phone and reminds them of the date and time of the next visit, explains the rules of preparation, etc. She is doing that during business hours, so she often communicates with the answering machine. She greets, reminds about appointment time, explains the rules and says goodbye. She has the same salary as the first employee – in America and Canada it is about 25 dollars per hour or four thousand dollars a month!!! Can you compare that with the $5 that clinic would spend on Kukarella?
Economic benefits will always drive technology. Today, more and more businesses are noticing how much savings they can make using computer-based voice synthesis technology. And every second these technologies are becoming much better, more realistic. Therefore, we see such a rapid growth of the user base for services such as Kukarella. So whether computer voices in the voice market will become absolute leaders is just a matter of time.