REST API · No Key Required · Free Forever

Free Text to Speech API
for Developers

A real REST endpoint backed by Microsoft Neural voices. POST your text, get an MP3 back. No API key, no OAuth dance, no $0.006/character meter running in the background.

View Endpoints ↓ Try the Tool →
4 Endpoints
400+ Neural Voices
20/min Rate Limit
$0 Cost

Get audio in under 60 seconds

No setup, no account, no waiting for an API key email that never arrives. Pick your language, copy the code, run it. That's it.

bash
# Step 1: Generate audio
curl -X POST https://freetts.org/api/tts \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello from FreeTTS API","voice":"en-US-JennyNeural","rate":"+0%","pitch":"+0Hz"}' \
  -o response.json

# Step 2: Extract file_id and download MP3
FILE_ID=$(cat response.json | python3 -c "import sys,json; print(json.load(sys.stdin)['file_id'])")
curl https://freetts.org/api/audio/$FILE_ID -o speech.mp3
Python
import requests

# Generate speech
response = requests.post("https://freetts.org/api/tts", json={
    "text": "Hello from FreeTTS API",
    "voice": "en-US-JennyNeural",
    "rate": "+0%",
    "pitch": "+0Hz"
})
file_id = response.json()["file_id"]

# Download the MP3
audio = requests.get(f"https://freetts.org/api/audio/{file_id}")
with open("speech.mp3", "wb") as f:
    f.write(audio.content)

print(f"Done. Saved as speech.mp3")
JavaScript
// Generate speech
const res = await fetch('https://freetts.org/api/tts', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: 'Hello from FreeTTS API',
    voice: 'en-US-JennyNeural',
    rate: '+0%',
    pitch: '+0Hz'
  })
});
const { file_id } = await res.json();

// Download MP3
const audio = await fetch(`https://freetts.org/api/audio/${file_id}`);
const blob = await audio.blob();
const url = URL.createObjectURL(blob);

// Play it
new Audio(url).play();
Node.js
const https = require('https');
const fs = require('fs');

function tts(text, voice = 'en-US-JennyNeural') {
  return new Promise((resolve, reject) => {
    const body = JSON.stringify({ text, voice, rate: '+0%', pitch: '+0Hz' });
    const req = https.request({
      hostname: 'freetts.org',
      path: '/api/tts',
      method: 'POST',
      headers: { 'Content-Type': 'application/json', 'Content-Length': Buffer.byteLength(body) }
    }, res => {
      let data = '';
      res.on('data', chunk => data += chunk);
      res.on('end', () => resolve(JSON.parse(data).file_id));
    });
    req.on('error', reject);
    req.write(body);
    req.end();
  });
}

tts('Hello from Node.js').then(id => {
  https.get(`https://freetts.org/api/audio/${id}`, res => {
    res.pipe(fs.createWriteStream('speech.mp3'));
  });
});

Endpoints

Four endpoints, all straightforward. Generate audio, download it, grab the subtitles, or fetch the full voice list. No authentication headers, no tokens.

Base URL https://freetts.org/api
POST /tts

Generate text to speech audio. Send JSON with your text and voice preferences, get back a file_id you can use to download the MP3 and SRT.

Parameter Type Required Default Description
text string required The text to synthesize. Max 5000 characters.
voice string optional en-US-JennyNeural Voice name from GET /voices. Format: locale-NameNeural.
rate string optional +0% Speaking speed as percentage offset. Range: -50% to +100%.
pitch string optional +0Hz Pitch offset in Hz from baseline. Range: -20Hz to +20Hz.
Response — 200 OK
JSON
{ "file_id": "a3f7c012-58b4-4e2a-9d1c-0f83abc12345" }

Errors: 400 invalid input · 429 rate limit exceeded · 500 synthesis failed. Rate limit: 20 req/min per IP.

GET /audio/{file_id}

Download the generated MP3 file. Use the file_id returned by POST /tts. Files are available for 1 hour, then auto-deleted.

ParameterTypeLocationDescription
file_id string (UUID) URL path UUID returned from POST /tts.
Response — 200 OK

Content-Type: audio/mpeg · Content-Disposition: attachment; filename="freetts-audio.mp3"

Errors: 400 invalid UUID format · 404 file expired or not found.

GET /srt/{file_id}

Download the SRT subtitle file that matches the generated audio. The timestamps are word-level accurate, derived directly from the voice synthesis metadata — not estimated after the fact.

ParameterTypeLocationDescription
file_id string (UUID) URL path Same UUID as the audio. Both expire at the same time.
Response — 200 OK

Content-Type: text/plain · filename: freetts-subtitles.srt

Errors: 400 invalid UUID · 404 expired or not found. Same 1-hour expiry as audio.

GET /voices

Returns the full list of available voices as a JSON array. No parameters, no rate limit. Cache this response — the voice list doesn't change often and it's a big payload.

Response — 200 OK (excerpt)
JSON
[
  {
    "Name": "Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)",
    "ShortName": "en-US-JennyNeural",
    "Gender": "Female",
    "Locale": "en-US",
    "SuggestedCodec": "audio-24khz-48kbitrate-mono-mp3",
    "FriendlyName": "Microsoft Jenny Online (Natural) - English (United States)"
  },
  ...
]

Use the ShortName field as the voice parameter in POST /tts. No rate limit on this endpoint.

Request parameters, explained properly

The quick start gets you going, but here's what the parameters actually do — including the less obvious bits.

voice

The voice name determines everything: language, accent, gender, and speaking style. The format is [locale]-[VoiceName]Neural or [locale]-[VoiceName]MultilingualNeural for multilingual voices. Multilingual voices can switch languages mid-sentence, which is useful if your text mixes languages.

The full list comes from GET /api/voices. It returns 400+ voices. Some popular starting points:

en-US-JennyNeural en-US-AndrewMultilingualNeural es-ES-AlvaroNeural fr-FR-DeniseNeural ar-SA-ZariyahNeural ja-JP-KeitaNeural zh-CN-XiaoxiaoNeural de-DE-ConradNeural hi-IN-MadhurNeural pt-BR-FranciscaNeural

rate

Controls how fast the voice speaks, as a percentage relative to the voice's natural default speed. +0% is the default. +50% means 50% faster than normal — good for short instructional content. -30% slows it down, useful for language learners or accessibility tools.

The useful range is roughly -50% to +100%. Go beyond that and it starts to sound unnatural. The voice synthesis engine doesn't always enforce hard limits, but the quality drops off noticeably past those boundaries.

-50% → very slow -20% → relaxed +0% → default +25% → brisk +50% → fast +100% → very fast

pitch

Adjusts pitch in Hz, relative to the voice's natural baseline. +10Hz gives a slightly higher, brighter tone. -10Hz adds depth — good for narration or authoritative reads. It's subtle at low values and sounds increasingly artificial past ±15Hz. The practical range is -20Hz to +20Hz.

-20Hz → deep -10Hz → lower +0Hz → natural +10Hz → higher +20Hz → bright

Drop a TTS widget into any webpage — one script tag

For teams who want to add a TTS widget to their own site without building a UI from scratch. Copy one script tag, paste it anywhere. The widget handles input, voice selection, playback, and MP3 download.

HTML
<!-- FreeTTS Embed Widget -->
<div id="freetts-widget"></div>
<script>
  (function() {
    var s = document.createElement('script');
    s.src = 'https://freetts.org/embed.js';
    s.dataset.voice = 'en-US-JennyNeural';
    s.dataset.theme = 'auto'; // 'light', 'dark', or 'auto'
    s.dataset.lang = 'en';
    s.dataset.placeholder = 'Type your text here...';
    document.head.appendChild(s);
  })();
</script>
Coming soon: embed.js is in development. Drop your email to be notified when it's ready. The API endpoints above are live and working right now.

Widget parameters

Attribute Values Default Description
data-voice any voice ShortName en-US-JennyNeural Pre-selected default voice in the widget
data-theme light / dark / auto auto Widget color scheme. auto follows system preference
data-lang language code (e.g. en, fr, ar) en Default language filter in the voice dropdown
data-placeholder any string Type your text here... Placeholder text in the input area
data-max-chars number 5000 Maximum character limit enforced in the widget input

Rate limits — what they are and why

Short version: 20 requests per minute, no daily cap, no monthly cap. Here's the longer version.

20 requests per minute per IP
1 hr file storage before auto-delete
daily / monthly requests

This is a shared public API built on Microsoft's Edge TTS infrastructure. The 20 req/min window exists to keep it usable for everyone — not as a paywall. If you stay under 20 requests per minute, there's no daily cap and no monthly quota. Build whatever you want.

Generated files are deleted 1 hour after they're created. This keeps storage costs down and means no user audio sits around on the server indefinitely. If you need a file again, regenerate it — synthesis takes 1 to 3 seconds. The GET /voices endpoint has no rate limit at all, so cache that list locally.

Rate limit error response (HTTP 429):

JSON
{
  "detail": "Too many requests. Please wait a minute."
}

When you hit 429, wait 60 seconds and the window resets. In code, implement exponential backoff: catch the 429, sleep 60 seconds, retry. Don't retry immediately — it'll just keep returning 429 until the window clears.

Need more than 20 requests per minute for a production application? Get in touch — we can discuss options for higher-volume use cases.

Popular voices to start with

10 well-tested voices across the most common languages. These all work well out of the box. The full list of 400+ voices is at freetts.org/voices or GET /api/voices.

Voice Name (ShortName) Language Gender Style
en-US-JennyNeural English (US) Female Conversational
en-US-AndrewMultilingualNeural English (US) Male Multilingual
en-US-AriaNeural English (US) Female Natural
es-ES-AlvaroNeural Spanish (Spain) Male Professional
fr-FR-DeniseNeural French (France) Female Elegant
ar-SA-ZariyahNeural Arabic (Saudi Arabia) Female Clear
ja-JP-KeitaNeural Japanese Male Natural
zh-CN-XiaoxiaoNeural Chinese (Mandarin) Female Warm
de-DE-ConradNeural German Male Professional
hi-IN-MadhurNeural Hindi Male Expressive

Full list of 400+ voices: freetts.org/voices or GET /api/voices

What people build with this

A free TTS API without an account requirement opens up a lot of projects that would've been impractical with a paid service. Here are six that make a lot of sense.

Accessibility Tools

Build screen readers, reading assistants, and dyslexia support tools without licensing $0.006/character voice APIs. The math adds up fast at scale — FreeTTS keeps it workable.

🌍

Language Learning Apps

Generate pronunciation audio for vocabulary drills. 75+ languages, multiple accents per language. A Spanish learner can hear both es-ES-AlvaroNeural and es-MX-JorgeNeural for the same word.

🎙️

Podcast Automation

Turn article URLs into podcast episodes. Parse the text, call the API, upload the MP3 to your podcast host. It's a three-step pipeline that takes maybe 50 lines of Python.

🧩

Browser Extensions

Chrome extensions that read selected text aloud. No API key to ship with the extension, no costs to track per-user. The rate limit is per-IP, so individual users are their own buckets.

🎓

Education Platforms

Auto-generate lecture audio from slide text. Works for Moodle, Canvas, or any custom LMS. Pair it with the SRT endpoint and you've got fully captioned audio without manual transcription.

🎬

Video Production Pipelines

Feed scripts through the API, get MP3 and SRT back, pipe both into your video editor programmatically. The SRT word-timing syncs directly with the audio — no manual alignment needed.

What's actually running under the hood

No mystery boxes. Here's exactly how FreeTTS works.

FreeTTS is built on edge-tts, an open-source Python library that interfaces with Microsoft's Edge browser read-aloud service. The voices are Microsoft Neural TTS voices — the same ones used in Edge browser, Azure Cognitive Services, and Microsoft Office. If you've used Edge's read-aloud feature and thought it sounded surprisingly good, that's the same engine.

The backend is FastAPI (Python), running on a Hetzner VPS behind Cloudflare. Audio files are stored temporarily for 1 hour and then deleted automatically. No audio content is logged, retained, or used for any purpose after delivery. The server doesn't know what text you sent or which voice you used.

The voice list — 400+ voices across 75+ languages — comes directly from Microsoft's Edge TTS service. When you call GET /api/voices, you're getting the current list as it exists at that moment. Voice availability depends on Microsoft maintaining their service.

Honest note: This is not a proprietary AI voice model. It's Microsoft's neural voice infrastructure, made freely accessible through the edge-tts library. If Microsoft changes their Edge TTS service, this API could be affected. For production applications that need guaranteed uptime SLAs, consider Azure Cognitive Services directly — it's paid, but it comes with proper enterprise guarantees. FreeTTS is the right choice for personal projects, prototypes, tools, and anything where occasional brief downtime is acceptable.

Common questions

No. FreeTTS API is completely open — no authentication, no API key, no account. Just send the POST request and you'll get audio back. The only constraint is 20 requests per minute per IP address. There's no registration form, no email verification, no waiting. It works the first time you try it.
Yes. Build apps, extensions, automations, whatever you need. The API is free to use in commercial projects. Attribution is appreciated but not required. There's no license agreement to accept, no terms that restrict commercial use. Just don't abuse the rate limit and it'll keep being free.
One hour after generation. After that, the files are automatically deleted from the server. Download your MP3 and SRT within that window. If you need the file again, just regenerate it — synthesis takes 1 to 3 seconds depending on text length, so there's no real reason to hold onto the URL. Your application should download the file immediately after getting the file_id back.
Yes. Every /api/tts call generates both an MP3 and an SRT file. Use the same file_id with GET /api/srt/{file_id} to download the subtitle file. The SRT timestamps are word-level accurate, derived directly from the voice synthesis metadata — not estimated after the fact. This makes them much more accurate than after-the-fact transcription services.
Any voice name returned by GET /api/voices. That endpoint returns a full JSON array of 400+ voices, each with a ShortName field — that's the value you pass as the voice parameter in your POST request. Examples: en-US-JennyNeural, ja-JP-KeitaNeural, ar-SA-ZariyahNeural. You can also browse them visually at freetts.org/voices.
You'll get HTTP 429 with the response body {"detail": "Too many requests. Please wait a minute."}. The rate window is a rolling 60-second window per IP address. Wait 60 seconds and the window resets. In your code, implement simple retry logic: catch the 429 response, wait 60 seconds, then retry the request. Don't retry immediately in a tight loop — the window won't clear until the full 60 seconds have passed.