The audiobook market hit $7.7 billion in 2025, growing 26 % year over year. AI-narrated audiobooks were 23 % of new releases last year — up from less than 5 % in 2023. Audible just opened its catalog to AI submissions. Google Play Books, Apple Books, Spotify, Kobo, and INaudio all accept AI narration with disclosure. Production cost dropped from $5,000 per single-narrator audiobook to under $50 with the right tooling.
This guide walks through the actual workflow we use to take a finished manuscript and ship a publish-ready audiobook in under 30 minutes — using FreeTTS's new Audiobook Batch Export, which runs on Azure Batch Synthesis (the same long-form pipeline Microsoft uses internally). No microphone, no studio, no human narrator, no royalty contracts. Just text in, ZIP out, upload.
Why 2026 is the year self-publishers stopped paying for narration
Three things changed at once:
- Distribution opened. Audible's program for AI narration expanded in 2025; Apple Books rolled out its Digital Narration program; Google Play Books and Kobo dropped their human-narrator-only requirements. Five years ago AI audiobooks were essentially un-distributable. Today every major retailer accepts them.
- Voices crossed the uncanny line. Azure's DragonHD models, ElevenLabs v3, and Google's Chirp HD voices all read with cadence and emotion that 70 % of consumers report being willing to listen to. That's down from 77 % in 2023 — there's still some hesitation — but it's well past the threshold where AI narration is "good enough" for most genres.
- Batch infrastructure became free or near-free for indies. Azure Batch Synthesis caps a single async job at 10,000 inputs and lets you submit up to 2 MB per input. That's enough room for entire novels in one job. Companies that used to charge $99/month for "long-form audiobook" tiers now compete with $19–39/month flat plans.
The math for self-publishers: traditional human-narrated audiobook production runs $200–400 per finished hour. A 10-hour audiobook = $2,000–$4,000 + 3–6 months turnaround. With AI batch tools, the same 10-hour audiobook costs under $5 in synthesis cost, takes 30 minutes wall-clock, and you keep a perpetual master file you can re-render in different voices, languages, or formats whenever you want.
What you need before starting
Three things, in order of importance:
1. A clean manuscript file
Plain text. Strip out images, hyperlinks, footnotes, and PDF artifacts. The narrator can't read those, and they cause weird pauses. Quick checklist:
- Convert from .docx / .epub / .pdf into plain UTF-8 .txt or paste directly into the textarea.
- Remove page numbers, running headers, and footers.
- Standardize quote marks (smart quotes are fine, but pick one style).
- Spell out abbreviations the narrator might mispronounce — "Dr." → "Doctor", "St." → "Saint" or "Street" depending on context.
- Add em-dashes or pauses where you want the narrator to breathe — neural voices respect punctuation.
- Each chapter on a fresh page or with a clear blank-line break, so the paragraph splitter can pick them up.
2. A voice that fits your genre
Match voice to content. Quick taxonomy:
- Literary fiction, narrative nonfiction: en-US-Andrew (warm baritone), en-GB-Sonia (crisp British female), en-US-Aria (clear modern). DragonHD versions of all three on Creator tier deliver the most natural cadence currently available.
- Children's books, light fiction: en-US-Jenny (warm female), en-US-Davis (newscast male) with the "cheerful" expressive style at intensity 1.4.
- Technical, business, self-help: en-US-Davis (newscast cadence), en-US-AndrewMultilingual (works for multilingual technical books).
- Romance, dramatic fiction: Switch styles per chapter using the Single tab — empathetic for emotional beats, cheerful for happy scenes, sad for grief, then stitch the resulting MP3s together. The Audiobook Batch tool runs one voice per job; for full multi-character work, use the Dialogue tab in Studio.
- Sleep stories, meditation: en-US-Aria with the "whispering" style at intensity 0.6, narration speed 0.9.
You can audition every voice on the FreeTTS voices page before you commit to a long batch.
3. The right tier
For full-novel batches you need FreeTTS Creator ($39/month, or $349 once for Creator Lifetime). Free and PRO tiers are for shorter content — Creator is the only tier with the Audiobook Batch Export feature unlocked because Azure Batch Synthesis costs scale with audio length and the per-job character ceilings are higher.
The 7-day money-back guarantee covers the entire month. Cancel any time before day 7 with no questions.
The 3-minute setup, the 30-minute wait
Here's the actual click-by-click. Open Advanced Studio in your dashboard, then click the Audiobook tab.
Step 1: Paste the manuscript
Paste up to 2,000,000 characters in the script box. The form shows live character count, word count, and estimated audio duration as you type. A typical 90,000-word novel is around 540,000 characters and produces about 11 hours of audio — well within the per-job limit.
Optional: give it a title. The title becomes the ZIP filename when you download. "The Compass of Dust" → the-compass-of-dust.zip. If you skip the title, the filename is freetts-audiobook-{job_id}.zip.
Step 2: Pick voice + split + format
Three quick decisions:
- Voice: Click the voice picker, filter by ✨ HD, pick your DragonHD narrator. The picker shows language, gender, and a short sample for each.
- Split strategy: Default Paragraph works for novels (splits on blank lines so each chapter break creates a separate audio file). Use Sentence for dense academic or business writing without paragraph breaks. Use Fixed 5K for bulk pipelines where you don't care about chapter alignment.
- Output format: WAV if you're going to edit the audio in Audacity / Pro Tools / a DAW (lossless). MP3 if you're uploading directly to a distributor (smaller files, universal compatibility). OGG for low-bandwidth web/app distribution.
Step 3: Submit
Hit Submit. The job goes to Azure Batch Synthesis. The page polls every 7 seconds and shows live progress. You can close the tab. Job runs server-side; come back anytime and the dashboard rehydrates the job state.
For a 90,000-word novel, expect 15–30 minutes of wall-clock time. For a 250,000-word epic, 45–60 minutes. The async architecture means you don't have to babysit a synchronous loop or worry about your laptop sleeping mid-job.
Step 4: Download the ZIP
When the status flips to "succeeded," a download button appears. The ZIP contains:
- Per-chapter audio files in your chosen format. One file per paragraph block (or sentence, or fixed-5K block, depending on split strategy).
- Per-chapter timing JSON — millisecond-precise word-level and sentence-level offsets straight from Azure's WordBoundary and SentenceBoundary events. Convert to SRT or VTT in two lines, or pipe directly into a karaoke player, chapter editor, or per-paragraph dashboard.
The signed download URL stays live for 7 days. After that, the result is purged from the server (free up storage, keep cost down). Re-download anytime within that window — the URL refreshes on every status poll so you never hit an expired-token error.
Format specs every distributor wants
Each audiobook distributor has slightly different technical requirements. The most permissive — and the one we recommend submitting to first — is Google Play Books. Most others accept what Google accepts.
| Distributor | Format | Sample rate | Bitrate | Per-file |
|---|---|---|---|---|
| Audible / ACX | MP3 (CBR) or WAV | 44.1 kHz | 192 kbps min | One file per chapter, < 120 min |
| Google Play Books | MP3, M4B, FLAC, WAV | 22.05 kHz min | 96 kbps min | One file per chapter |
| Apple Books | MP3 or M4A | 44.1 kHz | 128 kbps min | One file per chapter |
| Kobo Writing Life | MP3 | 44.1 kHz | 192 kbps min | One file per chapter |
| Spotify (audiobooks) | MP3 or M4B | 44.1 kHz | 128 kbps min | Single-file or chaptered |
| INaudio | MP3 | 44.1 kHz | 192 kbps min | One file per chapter |
FreeTTS WAV output is 24 kHz / 16-bit by default, which is above the minimum for every distributor. MP3 output is encoded at variable bitrate equivalent to ~128 kbps — meets Apple, Google, Spotify, INaudio thresholds out of the box. For Audible / Kobo's 192 kbps minimum, re-encode the WAV files in Audacity or use ffmpeg: ffmpeg -i chapter.wav -b:a 192k chapter.mp3.
Where to publish AI audiobooks in 2026
Distribution options stratify by reach and how strict the AI-narration policy is. Use this as a starting checklist:
Wide-distribution channels (one upload, many stores)
- INaudio (formerly Findaway Voices) — single upload, distributes to libraries, retailers, and subscription services worldwide. Accepts AI narration with disclosure. Best ROI for indie authors who don't want to manage 5+ portals.
- Draft2Digital Audio — wide retailer distribution; check current AI policy before submitting.
Direct-to-platform
- Audible (via ACX) — Amazon's own audiobook store. AI narration program expanded in 2025 for publishers; check the latest ACX guidelines for the exact disclosure language. Highest single-store volume in the US, UK, AU, Germany, Japan, France.
- Google Play Books — officially supports AI-narrated audiobooks since 2023. Easiest submission flow: literally upload your manuscript and they'll narrate it for you with their voices, OR upload your own AI narration. We use the second option because Google's voices are decent but FreeTTS DragonHD is better.
- Apple Books Digital Narration — Apple has its own AI-narration program. Self-uploaded AI audiobooks are accepted with metadata flag.
- Kobo Writing Life — accepts AI narration with no disclosure requirement at submission as of writing. Good for international reach (especially Europe, Canada, Australia).
- Spotify (audiobooks + podcasts) — Spotify Audiobooks for the audiobook channel (US-first), or release as a podcast for global reach. Disclosure required.
Niche / experimental
- Storytel, Scribd, Chirp — subscription audiobook services. AI policies vary; check before submitting.
- Direct sale on your own site — Gumroad, Stripe checkout, Lemon Squeezy. You keep 95–97 % of revenue (vs ~25 % on Audible) at the cost of doing your own marketing.
Important: distributor policies on AI narration change frequently. Always re-verify each platform's current AI-narration policy before uploading. The FreeTTS commercial license bundled with PRO and Creator covers all of the above — you don't need to license anything else from us.
When AI narration wins, when human still wins
Honest take: AI narration is the right choice for most self-publishers in most situations, but not all of them. Here's the decision matrix we use:
AI wins when:
- Cost matters. $5 of synthesis vs $2,000+ for human narration is a 400× difference. For first-time authors and tight-margin niches (genre fiction, technical, short-form), AI wins by default.
- Speed matters. 30 minutes vs 3–6 months. Critical for time-sensitive content (news, current-events nonfiction, course material that updates yearly).
- You write in multiple languages. AI can narrate the same book in 75+ languages from one master text; human narrators are language-specific and you'd need 75 separate productions.
- You'll re-render often. A new edition, an updated chapter, a typo fix — re-render the affected chunks in seconds. Human narrators charge per-session for re-records.
- You're a series author. Voice consistency across 10 books is automatic with AI; with humans you're locked to one narrator's availability and aging voice.
Human still wins when:
- Multi-character literary fiction with heavy dialogue. A great human voice actor can do five distinct character voices in one scene. AI can switch styles per paragraph but not mid-paragraph yet.
- Audiobooks that lean on the narrator's reputation. Bestseller-tier authors whose audiobooks sell partly because of a celebrity narrator (Stephen Fry on Sherlock Holmes; Tom Hanks on Ann Patchett).
- Highly emotional memoir or poetry. AI handles narrative beats well; raw emotional prose still benefits from human inflection AI hasn't fully nailed.
- Languages without strong AI voice support. Most major languages are well-covered, but some regional dialects and lower-resource languages still sound stilted.
Common mistakes (we've made all of these)
Picking the wrong split strategy
Default Paragraph split assumes blank-line breaks between paragraphs. If your manuscript has no blank lines (everything in one giant block), the splitter will produce one giant audio file — defeating the chapter structure. Either reformat the source with blank lines, OR switch to Sentence split, OR Fixed-5K split. Test on a single chapter first before submitting the full novel.
Wrong voice gender for the genre
Romance readers expect specific voice profiles for the protagonist's POV. Thriller readers expect tense, lower-pitched narration. Get this wrong and reviews will tell you. When in doubt, listen to the bestselling audiobook in your subgenre and pick a similar voice from the FreeTTS catalog. Browse English female narrators and male narrators.
Missing pronunciations
Proper nouns, brand names, made-up words in fantasy/sci-fi — narrators get these wrong. The fix is two-fold: (1) spell phonetically in the manuscript wherever you can ("Hermione" → "Her-MY-oh-nee" tested first), or (2) use SSML pronunciation overrides via the Studio's pronunciation panel and re-render the affected chunks. Test on chapter 1 before batching the full book.
Stripping italics and emphasis
Plain text loses italics. Some narrators slightly emphasize italicized words; AI doesn't see them at all. Either preserve emphasis with explicit punctuation ("Don't ever say that" → "Don't... ever... say that") or accept that emphasis flatlines.
Skipping the test chunk
Always render chapter 1 alone first, listen end-to-end, fix what's wrong, THEN batch the full book. A bad voice choice on a 90,000-word novel is 30 minutes of wasted synthesis and a re-batch. A bad voice choice on chapter 1 is 2 minutes.
FAQ
What's the difference between this and PDF to Audiobook?
PDF to Audiobook is a free tool that takes a PDF, auto-detects chapters from the table of contents, and generates per-chapter MP3s through the real-time TTS pipeline. Perfect for readers, students, and accessibility use. Text to Audiobook (this guide) is a Creator-tier batch tool for production audiobook export — multi-format output, 2M-char jobs, timing JSON. Use PDF flow if you have a PDF, text flow if you have a manuscript.
Can I make money selling AI-narrated audiobooks?
Yes. The commercial license bundled with PRO and Creator covers selling on every major distributor. We have indie authors making $200–$3,000 per month from AI-narrated audiobooks distributed through Audible, INaudio, and Google Play Books. Ceiling depends on the source book's quality, niche, and marketing.
Do listeners care that it's AI?
70 % of consumers say they'd be willing to listen to AI-narrated audiobooks (down from 77 % in 2023). The remaining 30 % is concentrated in literary fiction; genre fiction, business, self-help, and technical readers care much less. Disclosure is the right move regardless — readers respect transparency, and most distributors require it anyway.
Can I use the timing JSON for video subtitles too?
Yes. The same word-level + sentence-level boundaries work for video. Generate per-chapter SRT or VTT from the JSON, mux with chapter cover art (any 1400×1400 image), and you have YouTube-ready audiobook chapter videos with synced karaoke captions. We've seen authors do exactly this for marketing — Chapter 1 as a free YouTube video drives sales of the full audiobook on Audible.
What if my manuscript is over 2M characters?
Split into multiple jobs. A 1,500,000-word epic fantasy = ~9 million characters = 5 batch jobs of ~300K words each. Submit them in sequence; each job is independent. Total wall-clock time for 5 sequential 90-min jobs is ~7.5 hours — overnight is fine.
How do I cancel a running batch?
Cancel button on the running job card. Best-effort against Azure (sometimes the job has already completed by the time you click — Azure 404 is treated as success because your intent is satisfied either way). Cancellation stops the job from counting against your monthly character budget.
What's the lifetime deal — should I get it?
PRO Lifetime is $199 once, pays back in month 11. Creator Lifetime is $349 once, pays back in month 9 vs the $39/month subscription. First 100 founder seats only — currently 19 left. Both include every future feature added to PRO/Creator. If you're publishing more than one audiobook over the next year, Creator Lifetime is the obvious math.
The whole stack, in one paragraph
Open Advanced Studio → Audiobook tab → paste your manuscript → pick a DragonHD voice → pick output format → submit. Wait 15–30 minutes. Download the ZIP. Upload to Audible via ACX, Google Play Books, Apple Books, Kobo Writing Life, INaudio, and Spotify. Disclose AI narration where required. Profit. Repeat for the next book in your series in the same voice. The whole pipeline that used to take 3–6 months and $2,000–4,000 now takes one afternoon and the cost of a Netflix subscription.
Try it free at freetts.org/text-to-audiobook.
