A pragmatic playbook for clinicians who want to cut documentation time, build accessible client handouts, and listen to research without setting another evening on fire. Three workflows mapped to the right voice and the right HIPAA boundary. No IT approval, no enterprise sales call. Free to try right now. No PHI ever leaves your screen.
PRO: $19/mo. Creator: $39/mo (commercial license). Lifetime: $199/$349 once.
Hear it for yourself: same body-scan line, three reads
The sound of a worksheet matters. Below: the same opening grounding line, narrated three different ways. Press play on each to hear the difference.
Standard read
Aria standard voice, default cadence
Empathetic style
Aria with the empathetic expressive style
Real whispering
Aria with the whispering expressive style
Same Aria voice on all three. The free tier delivers her standard cadence. PRO ($19/mo) unlocks 95 expressive emotional styles on the same voices, including whispering, empathetic, gentle, calm, cheerful, and newscast.
See PRO pricing →Burnout is the default. This is one fix.
Therapists use text-to-speech for three primary tasks: proofreading de-identified clinical notes, listening to journal articles during commutes, and producing accessible audio handouts for clients.
Roughly 61% of psychologists report at least one symptom of burnout, with administrative tasks among the leading contributors (American Psychological Association, 2022 Practitioner Pulse). The Mayo Clinic Proceedings burnout work shows similar numbers across the broader clinical workforce. Most of that admin burden is not direct clinical care. It is documentation, client material prep, and trying to keep up with a literature that produces more than a million new biomedical citations a year through PubMed alone.
Audio is the part of the workday clinicians chronically underuse. Commute time. Treadmill time. Walking the dog. Folding laundry. None of those windows work for reading, but all of them work for listening. If you can shift even 30 minutes of weekly journal reading and 30 minutes of note proofreading into audio that runs while you do something else, that is roughly an hour a week back. Over a year, that is a working week.
And then there is the client side. Adults with low literacy benefit measurably from audio health information per the HHS Healthy People 2030 framework. Therapy worksheets in audio format see meaningfully higher between-session homework completion, especially among clients with ADHD, dyslexia, or simple auditory-learning preference. The honest version: clients do not always read the PDF you handed them on Tuesday. They almost always listen to a 4-minute audio clip on the bus.
The line everyone needs to know
Text-to-speech is HIPAA-neutral; HIPAA risk depends entirely on whether the input text contains protected health information.
Here is the key point. FreeTTS, like every consumer TTS tool, is not a HIPAA covered entity by itself. We do not sign Business Associate Agreements at the consumer tier. That is fine for most therapy uses, because most useful conversions are not PHI. The honest framing is: the tool is HIPAA-neutral, and the risk lives entirely in what you choose to paste in.
Generally safe to convert: psychoeducation handouts, generic CBT and DBT worksheets, mindfulness and grounding scripts, exposure-hierarchy templates, journal articles and book chapters you have a license to consume, your own conference notes, regulatory updates, ethics CEU material, your own dictated narration of a workshop you are building.
Not safe to convert (or only after stripping identifiers): session notes that name a client, intake forms with DOB or address, treatment plans that reference specific people, anything with MRNs or insurance information, voicemails or messages from clients, anything from your EHR that you have not personally cleaned of identifiers first.
The pattern works because most of what therapists actually want to convert is generic content: psychoeducation, scripts, articles, materials you would happily hand a stranger in a workshop. Use the checker below to spot the obvious patterns. The final eyeball pass on names and locations is still on you. Tools help. They do not replace clinical judgment.
HIPAA-safe pre-flight check
Paste a worksheet, handout, or note. Runs locally in your browser, never sent anywhere. Catches common PHI patterns so you can strip them before pasting into any TTS tool.
Three workflows that actually save time
In our conversations with clinicians, the same three patterns come up over and over. Different specialties, different settings, similar workflows.
Strip names and dates. Paste into FreeTTS. Listen at 1.2x while you eat. Catches awkward phrasing and missing sections that your eyes skim past on the third reread.
PubMed, your specialty's flagship journal, the chapter your supervisor sent. Convert to MP3, listen during commute or run. Saves the evening reading slot you never had energy for anyway.
Convert a generic CBT thought record, a body-scan script, or an exposure-hierarchy walkthrough into client-facing audio. Drop into the secure portal. Higher homework completion, especially with neurodiverse clients.
Match voice to content type
For guided relaxation scripts, therapists choose warm female neural voices at 0.85x speed paired with SSML pause tags.
The voice you pick is half the experience. For relaxation and grounding work, you want warm and slow. For psychoeducation and journal listening, you want clear and neutral. For dictation playback, you want fast and accurate. The settings table below is a starting point. Most clinicians end up with two or three saved presets they use across all content.
| Content type | Voice | Speed | Notes |
|---|---|---|---|
| Guided relaxation, body scan | Ava or Jenny | 0.85 to 0.9 | SSML break tags between cues for silence-driven regulation |
| Psychoeducation handout | Andrew or Jenny | 0.95 to 1.0 | Calm, neutral, slightly serious cadence works best |
| CBT thought record narration | Ava or Andrew | 1.0 | Clients pause and reflect between prompts naturally |
| Journal article listening | Andrew | 1.2 to 1.5 | Faster speeds work for content you already have context on |
| Note proofread | Any clear voice | 1.1 to 1.3 | Voice quality matters less, speed matters more |
Step by step
The actual workflow. Tested with real clinicians. The first time it takes maybe five minutes because you are getting comfortable with the steps. After that it is genuinely a two to three minute task.
Open the worksheet or note. Replace any client name with 'the client.' Delete the date. Remove DOB, MRN, phone, address, and anything else that ties content to a specific person. Run the HIPAA-safe checker above if you want a second pass.
Open freetts.org/text-to-speech in another tab. Paste the de-identified text. Pick a voice (Ava or Jenny work well for warm narration). Drop the speed to 0.85x for relaxation content or keep it at 1.0x for psychoeducation.
For grounding scripts and guided imagery, insert pause tags between cues so silence does the regulation work. Example: '... breathe in. <break time="800ms"/> breathe out.' SSML break and phoneme tags pass through to the engine on PRO and Creator tiers.
Click Generate. Wait a few seconds. Click Download. The audio file is yours, no watermark on PRO, ready to host on your client portal or send via secure share. The source text is deleted server-side after generation.
Drop the MP3 into your client portal (SimplePractice, TheraNest, Hushmail) or password-protected practice site. Do not email audio directly to a client unless you are using a HIPAA-compliant email service. Generic therapeutic audio is low risk when delivered properly.
Meet clients where they are
Audio handouts increase between-session homework completion among clients with ADHD, dyslexia, and auditory-learning preferences.
Roughly 15 to 20 percent of the U.S. population has a language-based learning disability such as dyslexia (National Center for Learning Disabilities, 2020). ADHD prevalence in adults sits around 4 to 5 percent. Add anxiety disorders that make sustained reading cognitively expensive, and the share of your caseload who would do better with audio over text is substantial. Often higher than people guess.
Three things change when you offer audio versions of worksheets:
Completion goes up. Research on therapy homework consistently finds substantial non-completion rates. The Cognitive and Behavioral Practice work suggests roughly 30 to 50 percent of assignments go undone, with methodology varying study to study. Clinicians offering audio versions report meaningful improvements, especially among clients with auditory-learning preferences.
The therapeutic alliance gets reinforced. Voice familiarity strengthens the therapeutic alliance, which is itself a known mediator of treatment outcomes (Norcross and Lambert, 2018). Clients hearing a consistent voice between sessions stay more anchored. If you are using voice cloning on the Creator tier with your own voice, the effect is more pronounced.
Cultural-competency surface area widens. Offering content in multiple formats demonstrates that you are meeting clients where they are. For clients whose first language is not English, switching the voice to a multilingual neural voice reading translated text is a small adjustment that lands big.
Different tools, different jobs
These four tools are often confused as alternatives. They are not. Each does a different job. Picking the right one for your week saves real money.
| Tool | What it does | Therapist use case | Price (verify before buying) |
|---|---|---|---|
| FreeTTS | Text into audio. 400+ voices, SSML support on PRO. | Audio handouts, journal listening, note proofreading. | Free, $19/mo PRO, $39/mo Creator |
| Otter.ai | Transcription. Audio into text. | Recording sessions to transcribe. HIPAA on Enterprise only. | $10 to $20/user/mo, Enterprise custom |
| Speechify | Text into audio. Consumer-focused. | Personal article listening. Heavier UI for casual use. | ~$139/yr (verify, prices shift) |
| Descript | Audio editing with voice cloning. | Recording and editing podcast-style content. Overkill for handouts. | $15 to $30/mo |
The short version. If you want to dictate notes by speaking, you need transcription (Otter, or your EHR's built-in dictation if it has one with HIPAA covered). If you want audio from text, you need TTS (FreeTTS, Speechify, NaturalReader). If you are producing recorded audio content with editing, you need a podcast tool (Descript). Many clinicians use a combination: Otter for sessions, FreeTTS for handouts and journals.
Pick the tier that matches your practice
The pricing decision is genuinely simple. Solo clinician using audio for your own caseload: PRO. Group practice or selling content as a course or workshop: Creator (commercial license matters). On the fence: try free, decide later.
For perspective on the math: PRO at $19 a month is roughly 1 to 2% of a typical session's billable rate, and PRO covers about 14 hours of audio output per month. If you produce one audio worksheet a week and listen to two journal articles, you are nowhere near the cap. Lifetime ($199 for PRO, $349 for Creator) exists too. Some clinicians prefer the one-time payment so it never appears as a recurring expense in their practice accounting.
Quick answers
Open the studio. Paste a generic CBT thought record. Hear it. The whole demo is 90 seconds. You will know if it fits your practice immediately.
Last reviewed April 2026. Sources cited: APA 2022 Practitioner Pulse, NIMH 2021, NASW workforce data 2021, NCLD/IDA 2020, Mayer Cognitive Theory of Multimedia Learning (2014), Norcross and Lambert (2018), HHS Healthy People 2030. Related guides: Therapy worksheet narration, CEU coursework, Voice cloning for clinicians.