Studio-quality text-to-speech in 30 languages, with voice design and controllable cloning. Use our cloud — or self-host the engine so your audio never leaves your network.
One engine, three ways to make a voice, thirty languages.
Describe a voice in plain words — “warm, middle-aged, calm” — and get a brand-new voice. No recording needed.
Clone any voice from a short clip, with consent built in. Steer emotion, pace and style while keeping the timbre.
From English and Mandarin to Hindi, Arabic and Swahili — plus dialects. Just type; no language tag required.
Crisp, broadcast-grade output with built-in super-resolution. Ready for video, IVR and audiobooks.
Audio starts before generation finishes. Fast enough for live agents and conversational apps.
Swap one base URL and migrate off your current provider. SDKs for Python and JavaScript.
Most voice AI is cloud-only — every recording and transcript leaves your perimeter. Vocala lets you run the whole engine in your own VPC.
The controls regulated teams need — built in, not bolted on. This is what sets Vocala apart from a model with an API.
Every generation returns an Ed25519-signed manifest — discloses it's AI, binds to the audio bytes, and is verifiable by anyone with the public key. No trust in us required.
An AudioSeal watermark embedded on synthesis and detectable after the fact — proven 0.0 on clean audio, 1.0 on watermarked.
Cloning requires a recorded consent acknowledgement, stored in a per-voice consent ledger. Responsible by design.
Redact PII (email, phone, card) and block disallowed content by policy — before a word is ever spoken.
Per-team lexicons teach the engine your brand names, drug names and tickers, so they're said right every time.
Run the whole stack in your VPC with an offline license; only usage counts — never audio — leave for billing.
Any of 30 languages. Long-form or a single line.
Use a preset, describe a new voice, or clone one with consent.
The engine runs where you choose. We never see self-hosted audio.
Play in the Studio, export 48kHz WAV, or hit the REST API.
Create a workspace and generate your first clip in under a minute. No credit card.
Open the Studio →Prices in AUD per seat / month. Cancel anytime. Enterprise unlocks self-hosting, SSO and unlimited volume.
Usage-based API billing also available (~A$0.18 / 1k characters).
The voice engine runs on your own infrastructure — your cloud VPC or on-prem. Your text and audio never touch our servers. You manage it from the same Vocala control plane; only metadata (usage counts, plan) syncs.
Quality is comparable, but those are cloud-only and proprietary. Vocala is built on the open Apache-2.0 VoxCPM engine, so you can run it yourself, avoid lock-in, and keep regulated data in-house — usually at lower cost at volume.
Cloning requires an explicit, recorded consent acknowledgement that we store in a per-voice consent ledger. We strongly discourage impersonation and provide audit trails for every cloned voice.
30, including English, Chinese (and several dialects), Spanish, Hindi, Arabic, Japanese, French, German, Portuguese, Russian and more — no language tag needed.
Yes. The Free plan needs no credit card — create a workspace and generate right away. Upgrade in-app when you’re ready.
Start free in the cloud today, move to self-host when compliance calls for it.