You already shot the thing. Now it needs a voice. Upload your clip, paste what you want said, pick from 400+ voices in 75+ languages, and we drop a clean narration straight onto the footage and fit it to the exact length. Replace the audio, or talk over it. No editor. No timeline. You get a finished MP4 back, not an audio file you have to go wrestle into some other app.
Script → MP4 writes a vertical/square/wide video from text alone with the voice and captions baked in. Upload + add VO takes a video you already have and lays the voice and captions on top. No signup for either, free taste.
Drop your video here
or click to pick a file (up to 1024 MB on Creator)
Free taste, no signup. Full-length scripts, larger uploads, brand-kit presets and your own caption customization on Creator. TTS-only tool lives at the main page.
Drop a clip, paste your script, pick a voice. We voice it and fit it to your video. Want to build a video from a script instead, with b-roll and captions? Use our AI faceless video generator. Our text to speech tool and PDF to MP3 are free too.
Let me be specific, because a lot of tools blur this. This is not a tool that makes a video for you. You already have the video. Maybe you filmed it, maybe you screen recorded it, maybe a designer handed you a motion graphics clip. The picture is done. What is missing, or what is wrong, is the voice. That is the one thing this fixes.
So you upload the clip. You paste in the script, the actual words you want spoken over it. You pick who says them from 400+ voices. And we generate that narration and lay it straight onto your footage, lined up to the exact length of the clip, then hand you back a single finished MP4. The frames you shot never change. Only the audio riding on top does.
Here is the part people miss until they have felt the pain. Making an AI voice is the easy half. Loads of tools do that and spit out an MP3. The hard half, the half that quietly burns an hour, is getting that audio onto the right video, starting at the right moment, ending without running off the cliff, exported as one file. We do that half for you. That is the difference between an audio generator and this.
And the voices are not the flat robots from a decade ago. They breathe. They pause where a person would. They put weight on the words that matter, mostly. Not perfect, they still fumble a strange acronym here and there, but good enough that most viewers never notice, and a hundred times better than dead air or a block of text sitting on the screen.
Here is how this normally goes, and you have probably lived it. You write the script in one place. You paste it into a voice tool and download an MP3. You open a video editor, drag your clip onto the timeline, drag the MP3 onto an audio track underneath. Then you scrub back and forth, nudging the audio so the first word does not land while the screen is still black. The narration runs a few seconds long, so you either speed it up by hand or trim a sentence. You lower the original audio if you want to keep the music. Then you export. Then you realize the level is off and you do it again.
That is four tools and a lot of patience for what should be one step. And the worst part is the sync. Your AI voice almost always comes out three to five seconds shorter or longer than the clip, and there is no button anywhere that says "just make this fit the video." So you do it by hand, every single time, for every single video.
We took that whole shuffle and turned it into one screen. Upload, paste, pick a voice, choose replace or talk over, hit go. The fitting happens for you. No editor to learn, no tracks to align, no export settings to second guess. You came here with a video and a script. You leave with a voiced video. That is the entire job.
Same upload, two very different results. Pick based on whether the original sound is worth keeping.
The original audio goes away completely and the new voice takes over. This is what you want when the original sound is the problem. A rough mic, wind, an echoey room, a take where you mumbled, or silent footage that needed narration in the first place. Clean slate, new voice, done.
The original audio stays, ducked low underneath, while the new voice sits on top. This is the one for footage that already has a vibe. Music you picked, crowd noise, gameplay sound, the hum of a real room. The clip keeps feeling alive and the narration explains what is happening. Almost no other tool ships this as a one-click choice. Most only do replace.
Quick rule of thumb. If you would be sad to lose the original sound, use talk over. If the original sound is the reason you are here, use replace. You can run the same clip both ways and keep whichever one feels right. It costs you nothing but a second render.
This is the bit that quietly matters most, so let me spell it out. Before we generate a single word, we read how long your clip actually is. Then we shape the narration to land inside that window. If the read comes out a hair long, we tighten the pace, but only inside a range that still sounds like a person talking, never the chipmunk speed-up you get from cranking playback rate in an editor. If it comes out short, we pad the end with a beat of quiet so the audio settles instead of slamming to a stop.
And if your script is just way too long or way too short for the clip, we do not silently ship something awkward. We tell you, right there, so you can trim a sentence or add one. No surprises after the fact, no re-uploading three times to figure out why it feels off.
That is the feature people search for without knowing the words for it. "Make the voiceover match my video length." Nobody owns it because almost nobody built it. We did, because it is the exact thing that turns a ten minute editor chore into a non-event.
Drop an MP4, MOV, MKV, or WebM. We read its exact length right away so the voice can be fit to it at the end.
Type or paste the narration, choose from 400+ voices across 75+ languages, and pick replace or talk over.
Hit go. We narrate, auto-fit it to your clip, merge it in, and hand back the finished MP4 plus the standalone audio.
Not "anyone with a video." Specific jobs people do here over and over.
You have one good video and an audience in five countries. Paste a translated script, pick a voice that speaks the language, and you have a Spanish cut, a Hindi cut, a German cut, without filming again. The AI dubbing market is growing fast for exactly this reason. It is the cheapest reach you can buy.
Great footage, rough sound. Wind, a cheap mic, an echoey room, traffic. Instead of reshooting, write out what was said, pick a clean voice, choose replace. The picture stays, the rough track is gone. This alone saves a reshoot.
Record your screen with Loom or OBS, silent or with a scratch track, then narrate it cleanly afterward. The UI changed? Re-voice just the new bit. No more redoing a ten minute walkthrough because you flubbed one line near the end.
Product demos, real estate tours, e-commerce clips. Narrate the walkthrough, then swap the script when the price drops or a feature changes, and re-voice in minutes. Make a second-language version for overseas buyers off the same footage.
Audio description and narration are part of WCAG and the rules tightening around it. Adding a clear spoken track to a silent or visual-heavy video is a fast way to make it usable for people who cannot see the screen, and to tick a compliance box that is getting harder to ignore.
Need the same ad in three tones, or a calmer read for a different platform? Keep the cut, change the voice or the words, render again. No studio booking to change one line.
This is the use case that pays for itself fastest, so it gets its own spot. You made one good video in English. Your audience is not all in English. Normally that means hiring a dub studio, or filming the whole thing again with a different presenter, or just giving up on those viewers. None of that is fun, and two of those are expensive.
Here it is three steps. Upload the video you already have. Paste the script in the language you want, you can run it through a translator first, and tidy up the names and slang so they land right. Pick a voice that actually speaks that language, not an English voice forcing its way through foreign words. Hit go. Now you have a Spanish version, or a Hindi version, or a Japanese version, off the exact same footage, in the time it takes to make a coffee.
We have 75+ languages, so the same clip can become five or ten cuts for different markets without you touching a camera. It is not lip synced, the mouths still move in the original language, so for a tight closeup of someone speaking it reads as a dub. For everything else, screen recordings, tours, demos, B roll, voiceover-led marketing, nobody is staring at lips anyway, and the new voice just makes the video make sense to a whole new audience. The dubbing market is growing fast for exactly this reason. It is the cheapest reach you will ever buy.
Plenty of tools can put a voice on a video. The honest question is how much work it leaves for you.
| Tool | Upload your own video | Editor or timeline | Auto-fit to length | Talk over (duck) | Roughly |
|---|---|---|---|---|---|
| FreeTTS | Yes | No, none | Yes, automatic | Yes, one click | Free taste, full on Creator $39/mo |
| CapCut | Yes | Yes, full editor | No, you trim by hand | Manual | Free, paid upgrades |
| Descript | Yes | Yes, transcript editor | No | Manual | From about $16/mo |
| Clipchamp | Yes | Yes, timeline | No, manual align | Manual | Free, M365 upsell |
| ElevenLabs | Studio timeline | Yes, audio-first | No | No | From about $5/mo |
| Canva | Yes | Yes, design editor | No | Manual | Free dubbing capped near 1 min |
Let me be fair here, because the comparison only counts if it is honest. Most of these tools can put an AI voice on a video now. CapCut, Descript, Clipchamp, Canva, they all can. They are good editors. So the pitch is not "only we do this." The pitch is the work. Every one of them is a timeline you drive yourself. You generate the voice, you place it, you trim it, you duck the music, you export. That is real control, and if you want that control, use them.
What we do is take the editor out. No timeline. The voice gets fit to your video automatically. Talk over is a single click instead of a manual ducking job. And it is genuinely free to try, not free up to one minute and then a paywall. If your day is "I have a video and a script and I want the file back," this is the shortest line between those two points. If your day is "I want to hand-craft every frame," it is not, and that is fine.
One more honest note. We do not block legal narration. True crime, horror, dark fiction, edgy comedy, the read your video actually calls for. Within the law, of course. If your script is the kind another tool quietly refuses to voice, this is the one that reads it.
Formats are the usual ones. MP4, MOV, MKV, and WebM. That covers basically anything off a phone, a screen recorder, or an editor export. The length you can run depends on your plan, and I am going to be straight about it because hidden caps are the worst part of "free" tools.
The free taste handles a short clip, up to about thirty seconds, so you can hear the quality and see the flow before you spend a cent. PRO opens it up to five minutes a video. Creator goes all the way to thirty minutes and a one gigabyte file, which is more than enough for any Short, Reel, demo, tour, lesson, or listing video you are likely to make. If you are trying to voice a feature-length film, that is not what this is for, and our PDF to audiobook tool is honestly a better road for book-length work.

The tool does the heavy lifting. These small moves are the difference between fine and good.
Read your script out loud before you paste it. If you trip over a sentence, the voice will too. Short lines beat long ones. Contractions sound human. A comma is a breath, so put one where you would actually pause.
Warm and calm for a tutorial. Bright and quick for an ad. A deep movie-trailer voice over a cooking clip just feels wrong, and viewers feel it before they can say why. Spend the extra minute auditioning two or three voices.
Got music you picked, or street noise, or gameplay sound that makes the shot feel alive? Do not kill it. Use talk over so it sits low underneath. Replace is for when the original sound is the problem, not when it is part of the charm.
The voice reads your punctuation. A full stop is a real pause. Three dots make it trail off. A question mark lifts the end of the line. If a name or acronym gets mangled, spell it the way it sounds and the read cleans right up.
Aim for somewhere around negative fourteen LUFS so it sits right on YouTube and does not blast someone on earbuds. If the voice feels buried under your talk-over music, pull the script tighter so the voice has room rather than cranking everything.
If the narration runs long for the clip, cut filler words first. "Really," "just," "basically," "in order to." They add seconds and say nothing. A tight script fits the video cleanly and the auto-fit barely has to nudge the pace.
The biggest one, by a mile, is a script that has nothing to do with the clip length. People paste two hundred words of narration onto a fifteen second video and wonder why it feels like a race. The fix is boring but it works. Watch your clip once, time it in your head, and write to that. If you are way over, we will tell you before we render, so listen to that warning instead of forcing it.
Second one. Picking the first voice in the list and never trying another. The default is fine. It is not always right. A bored corporate voice on a hype reel kills the energy, and a hyper voice on a serious explainer feels fake. Audition a couple. It takes thirty seconds and it is the single thing most likely to make people stick around.
Third. Forgetting to choose replace or talk over and then being surprised by the result. If your music vanished, you were on replace. If the new voice is fighting the old audio, you wanted replace and got talk over. It is one toggle. Glance at it before you hit go.
And the last one is expecting the mouths to move with the new words. They will not. This is a voiceover, not a face reshaper. For screen recordings, demos, tours, slideshows, and faceless content you never see a mouth, so it does not matter at all. For a talking-head closeup where the lips are dead center, a different voice on top reads as a dub, which is fine for some things and odd for others. Know which one you are making before you upload.
Upload a clip, paste your script, pick a voice, and grab the finished MP4. Free to try, no signup. Full videos and a commercial license live on Creator.