I fell into the rabbit hole of TTS models lately. Tried all major paid tools (ElevenLabs/InWorld/etc.), and all the newest open-source models.I started asking myself: what happens when the voice is solved ? E.g. it gets impossible to distinguish it from a human. Wanted to hear your opinions!Sketched some of my own thoughts, and I see two futures:Future 1: the nuanced versionAudiobooks: I think established authors will still prefer human narrators. If you can afford a $3k–$4k fixed cost for narration, a good human voice is usually worth it. TTS may even push human narration prices down, making that choice easier.But for new/self-published authors, especially in non-fiction, AI narration may become the default. The choice is often not “AI vs. human narrator,” but “AI audiobook vs. no audiobook.” There will be backlash, but I think people will partly get used to it.The more interesting threat may be AI readers. If I can buy an ebook for $8–$10 and have it narrated in a voice/style I like for $1–$2, why pay for an AI-narrated audiobook as a separate product? This could partly unbundle audiobooks from platforms like Audible. I’m torn here: AI-narrated self-published audiobooks and AI readers may co-exist, but AI readers could eventually replace most non-human audiobook editions.Business content: training videos, museum guides, phone systems, short ads, internal explainers, etc. will be mostly AI. Anywhere “good enough is good enough” meets budget pressure, TTS wins. It already does.Content creation: YouTube, podcasts, TikTok, etc. are different. Among top creators, I think human narration still dominates because personality and authenticity matter. If the voice is part of the brand, TTS is counterproductive.That said, AI narration will explode in low-effort content. As generative text/video tools create more slop, most of that slop will probably have AI narration. So maybe the ratio of human vs. TTS voices on social media becomes 1:10 by volume, but 10:1 by total viewership in favor of human voices.Dubbing/translations: heavily AI-dominated, except for high-end creative work like major films or books.Films: only humans for now, but it could change. I can easily see generative AI technology going far enough that films of Hollywood quality are fully produced with AI. It would involve a new type of “producer,” someone who could manipulate generative AI and mold it into something beautiful, and it would require a new set of tools. Essentially, there would be many, many Pixar-style studios focused on ultra-realistic video with relatively small budgets. For such cases, AI narration would be used, and eventually it could eat almost the whole industry.Games: TTS seems especially strong here: many distinct voices, short lines, lots of minor characters, and poor economics for hiring actors for everything. I think studios will still use humans for main characters, but many NPCs and indie-game voices will become AI.Future 2: the hardline versionAnything outside of personal-brand stuff would be AI-generated. If it gets cheap and good enough, and society accepts it, everything from books to films and ads would be AI-narrated.Human narrator would evolve as a profession — you would “sell” the rights to your voice being AI-generated.A new profession of AI sound engineers will emerge, who will use AI to get creative with voice design and voice orchestration to get the best results.I also feel like voice is quite different from text or image generation, in the sense that there is a weaker uncanny valley. In 95% of cases, voice is just a tool to convey creatively written text, hopefully written by a human, correctly. And for tools, it is mostly a question of getting good enough.It is also possible that it is not either/or between the two futures: the first future is the next 10 years, and the second future is a bit ahead of that.