Convert a Book to Audiobook with AI Voices (2026)

The audiobook market hit $7.7 billion in 2025, growing 26 % year over year. AI-narrated audiobooks were 23 % of new releases last year — up from less than 5 % in 2023. Audible just opened its catalog to AI submissions. Google Play Books, Apple Books, Spotify, Kobo, and INaudio all accept AI narration with disclosure. Production cost dropped from $5,000 per single-narrator audiobook to under $50 with the right tooling.

This guide walks through the actual workflow we use to take a finished manuscript and ship a publish-ready audiobook in under 30 minutes — using FreeTTS's new Audiobook Batch Export, which runs on Azure Batch Synthesis (the same long-form pipeline Microsoft uses internally). No microphone, no studio, no human narrator, no royalty contracts. Just text in, ZIP out, upload.

Why 2026 is the year self-publishers stopped paying for narration

Three things changed at once:

Distribution opened. Audible's program for AI narration expanded in 2025; Apple Books rolled out its Digital Narration program; Google Play Books and Kobo dropped their human-narrator-only requirements. Five years ago AI audiobooks were essentially un-distributable. Today every major retailer accepts them.
Voices crossed the uncanny line. Azure's DragonHD models, ElevenLabs v3, and Google's Chirp HD voices all read with cadence and emotion that 70 % of consumers report being willing to listen to. That's down from 77 % in 2023 — there's still some hesitation — but it's well past the threshold where AI narration is "good enough" for most genres.
Batch infrastructure became free or near-free for indies. Azure Batch Synthesis caps a single async job at 10,000 inputs and lets you submit up to 2 MB per input. That's enough room for entire novels in one job. Companies that used to charge $99/month for "long-form audiobook" tiers now compete with $19–39/month flat plans.

The math for self-publishers: traditional human-narrated audiobook production runs $200–400 per finished hour. A 10-hour audiobook = $2,000–$4,000 + 3–6 months turnaround. With AI batch tools, the same 10-hour audiobook costs under $5 in synthesis cost, takes 30 minutes wall-clock, and you keep a perpetual master file you can re-render in different voices, languages, or formats whenever you want.

What you need before starting

Three things, in order of importance:

1. A clean manuscript file

Plain text. Strip out images, hyperlinks, footnotes, and PDF artifacts. The narrator can't read those, and they cause weird pauses. Quick checklist:

Convert from .docx / .epub / .pdf into plain UTF-8 .txt or paste directly into the textarea.
Remove page numbers, running headers, and footers.
Standardize quote marks (smart quotes are fine, but pick one style).
Spell out abbreviations the narrator might mispronounce — "Dr." → "Doctor", "St." → "Saint" or "Street" depending on context.
Add em-dashes or pauses where you want the narrator to breathe — neural voices respect punctuation.
Each chapter on a fresh page or with a clear blank-line break, so the paragraph splitter can pick them up.

2. A voice that fits your genre

Match voice to content. Quick taxonomy:

Literary fiction, narrative nonfiction: en-US-Andrew (warm baritone), en-GB-Sonia (crisp British female), en-US-Aria (clear modern). DragonHD versions of all three on Creator tier deliver the most natural cadence currently available.
Children's books, light fiction: en-US-Jenny (warm female), en-US-Davis (newscast male) with the "cheerful" expressive style at intensity 1.4.
Technical, business, self-help: en-US-Davis (newscast cadence), en-US-AndrewMultilingual (works for multilingual technical books).
Romance, dramatic fiction: Switch styles per chapter using the Single tab — empathetic for emotional beats, cheerful for happy scenes, sad for grief, then stitch the resulting MP3s together. The Audiobook Batch tool runs one voice per job; for full multi-character work, use the Dialogue tab in Studio.
Sleep stories, meditation: en-US-Aria with the "whispering" style at intensity 0.6, narration speed 0.9.

You can audition every voice on the FreeTTS voices page before you commit to a long batch.

3. The right tier

For full-novel batches you need FreeTTS Creator ($39/month, or $349 once for Creator Lifetime). Free and PRO tiers are for shorter content — Creator is the only tier with the Audiobook Batch Export feature unlocked because Azure Batch Synthesis costs scale with audio length and the per-job character ceilings are higher.

The 7-day money-back guarantee covers the entire month. Cancel any time before day 7 with no questions.

The 3-minute setup, the 30-minute wait

Here's the actual click-by-click. Open the Audiobook tab in your dashboard sidebar (Creator-tier entry, book icon).

Step 1: Paste the manuscript

Paste up to 2,000,000 characters in the script box. The form shows live character count, word count, and estimated audio duration as you type. A typical 90,000-word novel is around 540,000 characters and produces about 11 hours of audio — well within the per-job limit.

Optional: give it a title. The title becomes the ZIP filename when you download. "The Compass of Dust" → the-compass-of-dust.zip. If you skip the title, the filename is freetts-audiobook-{job_id}.zip.

Step 2: Pick voice + split + format

Three quick decisions:

Voice: Click the voice picker, filter by ✨ HD, pick your DragonHD narrator. The picker shows language, gender, and a short sample for each.
Split strategy: Default Paragraph works for novels (splits on blank lines so each chapter break creates a separate audio file). Use Sentence for dense academic or business writing without paragraph breaks. Use Fixed 5K for bulk pipelines where you don't care about chapter alignment.
Output format: WAV if you're going to edit the audio in Audacity / Pro Tools / a DAW (lossless). MP3 if you're uploading directly to a distributor (smaller files, universal compatibility). OGG for low-bandwidth web/app distribution.

Step 3: Submit

Hit Submit. The job goes to Azure Batch Synthesis. The page polls every 7 seconds and shows live progress. You can close the tab. Job runs server-side; come back anytime and the dashboard rehydrates the job state.

For a 90,000-word novel, expect 15–30 minutes of wall-clock time. For a 250,000-word epic, 45–60 minutes. The async architecture means you don't have to babysit a synchronous loop or worry about your laptop sleeping mid-job.

Step 4: Download the ZIP

When the status flips to "succeeded," a download button appears. The ZIP contains:

Per-chapter audio files — part-01.mp3 through part-NN.mp3 (or .wav / .ogg, your pick). One file per paragraph block (or sentence, or fixed-5K block, depending on split strategy).
Ready-made SRT & WebVTT subtitles — subtitles.srt and subtitles.vtt with cumulative timestamps spanning every chapter. Drop the SRT straight onto YouTube, into Premiere, DaVinci, CapCut, or any video editor — no JSON parsing, no conversion scripts. Caveat: when batching with a DragonHD voice, Azure doesn't emit sentence boundaries — the bundled SRT comes from a word-grouping fallback (still usable, slightly choppier splits at quote marks). For the cleanest subtitles, batch with a standard neural voice and HD-narrate a separate master if you want both.
manifest.json — machine-readable per-part durations, format, voice, and total length. For pipelines that auto-stitch chapters or build chapter markers.
README.txt — short usage guide bundled in plain text so you don't have to remember any of the above.

The signed download URL stays live for 7 days. After that, the result is purged from the server (free up storage, keep cost down). Re-download anytime within that window — the URL refreshes on every status poll so you never hit an expired-token error.

Format specs every distributor wants

Each audiobook distributor has slightly different technical requirements. The most permissive — and the one we recommend submitting to first — is Google Play Books. Most others accept what Google accepts.

Distributor	Format	Sample rate	Bitrate	Per-file
Audible / ACX	MP3 (CBR) or WAV	44.1 kHz	192 kbps min	One file per chapter, < 120 min
Google Play Books	MP3, M4B, FLAC, WAV	22.05 kHz min	96 kbps min	One file per chapter
Apple Books	MP3 or M4A	44.1 kHz	128 kbps min	One file per chapter
Kobo Writing Life	MP3	44.1 kHz	192 kbps min	One file per chapter
Spotify (audiobooks)	MP3 or M4B	44.1 kHz	128 kbps min	Single-file or chaptered
INaudio	MP3	44.1 kHz	192 kbps min	One file per chapter

FreeTTS WAV output is 24 kHz / 16-bit by default, which is above the minimum for every distributor. MP3 output is encoded at variable bitrate equivalent to ~128 kbps — meets Apple, Google, Spotify, INaudio thresholds out of the box. For Audible / Kobo's 192 kbps minimum, re-encode the WAV files in Audacity or use ffmpeg: ffmpeg -i chapter.wav -b:a 192k chapter.mp3.

Where to publish AI audiobooks in 2026

Distribution options stratify by reach and how strict the AI-narration policy is. Use this as a starting checklist:

Wide-distribution channels (one upload, many stores)

INaudio (formerly Findaway Voices) — single upload, distributes to libraries, retailers, and subscription services worldwide. Accepts AI narration with disclosure. Best ROI for indie authors who don't want to manage 5+ portals.
Draft2Digital Audio — wide retailer distribution; check current AI policy before submitting.

Direct-to-platform

Audible (via ACX) — Amazon's own audiobook store. AI narration program expanded in 2025 for publishers; check the latest ACX guidelines for the exact disclosure language. Highest single-store volume in the US, UK, AU, Germany, Japan, France.
Google Play Books — officially supports AI-narrated audiobooks since 2023. Easiest submission flow: literally upload your manuscript and they'll narrate it for you with their voices, OR upload your own AI narration. We use the second option because Google's voices are decent but FreeTTS DragonHD is better.
Apple Books Digital Narration — Apple has its own AI-narration program. Self-uploaded AI audiobooks are accepted with metadata flag.
Kobo Writing Life — accepts AI narration with no disclosure requirement at submission as of writing. Good for international reach (especially Europe, Canada, Australia).
Spotify (audiobooks + podcasts) — Spotify Audiobooks for the audiobook channel (US-first), or release as a podcast for global reach. Disclosure required.

Niche / experimental

Storytel, Scribd, Chirp — subscription audiobook services. AI policies vary; check before submitting.
Direct sale on your own site — Gumroad, Stripe checkout, Lemon Squeezy. You keep 95–97 % of revenue (vs ~25 % on Audible) at the cost of doing your own marketing.

Important: distributor policies on AI narration change frequently. Always re-verify each platform's current AI-narration policy before uploading. The FreeTTS commercial license bundled with PRO and Creator covers all of the above — you don't need to license anything else from us.

When AI narration wins, when human still wins

Honest take: AI narration is the right choice for most self-publishers in most situations, but not all of them. Here's the decision matrix we use:

AI wins when:

Cost matters. $5 of synthesis vs $2,000+ for human narration is a 400× difference. For first-time authors and tight-margin niches (genre fiction, technical, short-form), AI wins by default.
Speed matters. 30 minutes vs 3–6 months. Critical for time-sensitive content (news, current-events nonfiction, course material that updates yearly).
You write in multiple languages. AI can narrate the same book in 75+ languages from one master text; human narrators are language-specific and you'd need 75 separate productions.
You'll re-render often. A new edition, an updated chapter, a typo fix — re-render the affected chunks in seconds. Human narrators charge per-session for re-records.
You're a series author. Voice consistency across 10 books is automatic with AI; with humans you're locked to one narrator's availability and aging voice.

Human still wins when:

Multi-character literary fiction with heavy dialogue. A great human voice actor can do five distinct character voices in one scene. AI can switch styles per paragraph but not mid-paragraph yet.
Audiobooks that lean on the narrator's reputation. Bestseller-tier authors whose audiobooks sell partly because of a celebrity narrator (Stephen Fry on Sherlock Holmes; Tom Hanks on Ann Patchett).
Highly emotional memoir or poetry. AI handles narrative beats well; raw emotional prose still benefits from human inflection AI hasn't fully nailed.
Languages without strong AI voice support. Most major languages are well-covered, but some regional dialects and lower-resource languages still sound stilted.

Common mistakes (we've made all of these)

Picking the wrong split strategy

Default Paragraph split assumes blank-line breaks between paragraphs. If your manuscript has no blank lines (everything in one giant block), the splitter will produce one giant audio file — defeating the chapter structure. Either reformat the source with blank lines, OR switch to Sentence split, OR Fixed-5K split. Test on a single chapter first before submitting the full novel.

Wrong voice gender for the genre

Romance readers expect specific voice profiles for the protagonist's POV. Thriller readers expect tense, lower-pitched narration. Get this wrong and reviews will tell you. When in doubt, listen to the bestselling audiobook in your subgenre and pick a similar voice from the FreeTTS catalog. Browse English female narrators and male narrators.

Missing pronunciations

Proper nouns, brand names, made-up words in fantasy/sci-fi — narrators get these wrong. The fix is two-fold: (1) spell phonetically in the manuscript wherever you can ("Hermione" → "Her-MY-oh-nee" tested first), or (2) use SSML pronunciation overrides via the Studio's pronunciation panel and re-render the affected chunks. Test on chapter 1 before batching the full book.

Stripping italics and emphasis

Plain text loses italics. Some narrators slightly emphasize italicized words; AI doesn't see them at all. Either preserve emphasis with explicit punctuation ("Don't ever say that" → "Don't... ever... say that") or accept that emphasis flatlines.

Skipping the test chunk

Always render chapter 1 alone first, listen end-to-end, fix what's wrong, THEN batch the full book. A bad voice choice on a 90,000-word novel is 30 minutes of wasted synthesis and a re-batch. A bad voice choice on chapter 1 is 2 minutes.

FAQ

What's the difference between this and PDF to Audiobook?

PDF to Audiobook is a free tool that takes a PDF, auto-detects chapters from the table of contents, and generates per-chapter MP3s through the real-time TTS pipeline. Perfect for readers, students, and accessibility use. Text to Audiobook (this guide) is a Creator-tier batch tool for production audiobook export — multi-format output, 2M-char jobs, bundled SRT + WebVTT subtitles. Use PDF flow if you have a PDF, text flow if you have a manuscript.

Can I make money selling AI-narrated audiobooks?

Yes. The commercial license bundled with PRO and Creator covers selling on every major distributor. We have indie authors making $200–$3,000 per month from AI-narrated audiobooks distributed through Audible, INaudio, and Google Play Books. Ceiling depends on the source book's quality, niche, and marketing.

Do listeners care that it's AI?

70 % of consumers say they'd be willing to listen to AI-narrated audiobooks (down from 77 % in 2023). The remaining 30 % is concentrated in literary fiction; genre fiction, business, self-help, and technical readers care much less. Disclosure is the right move regardless — readers respect transparency, and most distributors require it anyway.

Are subtitles included for video use?

Yes — every batch ZIP ships subtitles.srt and subtitles.vtt with cumulative timestamps that span every chapter, so they line up against the concatenated audio without re-alignment. Drop them straight onto YouTube, Vimeo, into Premiere, DaVinci, or CapCut — no scripts, no JSON parsing. Mux with chapter cover art (any 1400×1400 image) and you have YouTube-ready audiobook chapter videos with synced captions. We've seen authors do exactly this for marketing — Chapter 1 as a free YouTube video drives sales of the full audiobook on Audible.

What if my manuscript is over 2M characters?

Split into multiple jobs. A 1,500,000-word epic fantasy = ~9 million characters = 5 batch jobs of ~300K words each. Submit them in sequence; each job is independent. Total wall-clock time for 5 sequential 90-min jobs is ~7.5 hours — overnight is fine.

How do I cancel a running batch?

Cancel button on the running job card. Best-effort against Azure (sometimes the job has already completed by the time you click — Azure 404 is treated as success because your intent is satisfied either way). Cancellation stops the job from counting against your monthly character budget.

What's the lifetime deal — should I get it?

PRO Lifetime is $199 once, pays back in month 11. Creator Lifetime is $349 once, pays back in month 9 vs the $39/month subscription. First 100 founder seats only — currently 19 left. Both include every future feature added to PRO/Creator. If you're publishing more than one audiobook over the next year, Creator Lifetime is the obvious math.

The whole stack, in one paragraph

Open the Audiobook tab → paste your manuscript → pick a DragonHD voice → pick output format → submit. Wait 15–30 minutes. Download the ZIP. Upload to Audible via ACX, Google Play Books, Apple Books, Kobo Writing Life, INaudio, and Spotify. Disclose AI narration where required. Profit. Repeat for the next book in your series in the same voice. The whole pipeline that used to take 3–6 months and $2,000–4,000 now takes one afternoon and the cost of a Netflix subscription.

Try it free at freetts.org/text-to-audiobook.