Vocala — Voice AI you can run inside your own walls

Capabilities

Everything the cloud players do — without sending your data away

One engine, three ways to make a voice, thirty languages.

🎨

Voice Design

Describe a voice in plain words — “warm, middle-aged, calm” — and get a brand-new voice. No recording needed.

🎛️

Controllable Cloning

Clone any voice from a short clip, with consent built in. Steer emotion, pace and style while keeping the timbre.

🌍

30 Languages

From English and Mandarin to Hindi, Arabic and Swahili — plus dialects. Just type; no language tag required.

🔊

48kHz Studio Audio

Crisp, broadcast-grade output with built-in super-resolution. Ready for video, IVR and audiobooks.

⚡

Real-Time Streaming

Audio starts before generation finishes. Fast enough for live agents and conversational apps.

🔌

Drop-in API

Swap one base URL and migrate off your current provider. SDKs for Python and JavaScript.

The difference

Your audio is sensitive. Keep it that way.

Most voice AI is cloud-only — every recording and transcript leaves your perimeter. Vocala lets you run the whole engine in your own VPC.

Cloud-only voice AI

Your audio + transcripts sent to a third party
Off-limits for many healthcare / finance / gov teams
Proprietary lock-in, opaque pricing
Data residency is whatever they decide

Vocala self-host edition

Engine runs inside your network — audio never leaves
Meets data-residency & compliance requirements
Open Apache-2.0 engine — no lock-in, commercial-ready
Production targets blocked by a built-in safety guard
SSO, audit logs, per-team usage metering

Talk to us about self-hosting →

Trust & governance

Every clip is disclosed, traceable, and tamper-evident

The controls regulated teams need — built in, not bolted on. This is what sets Vocala apart from a model with an API.

🔏

Signed provenance

Every generation returns an Ed25519-signed manifest — discloses it's AI, binds to the audio bytes, and is verifiable by anyone with the public key. No trust in us required.

💧

Inaudible watermark

An AudioSeal watermark embedded on synthesis and detectable after the fact — proven 0.0 on clean audio, 1.0 on watermarked.

✅

Consent-first cloning

Cloning requires a recorded consent acknowledgement, stored in a per-voice consent ledger. Responsible by design.

🛡️

Content guardrails

Redact PII (email, phone, card) and block disallowed content by policy — before a word is ever spoken.

🗣️

Pronunciation control

Per-team lexicons teach the engine your brand names, drug names and tickers, so they're said right every time.

🏢

Self-host licensing

Run the whole stack in your VPC with an offline license; only usage counts — never audio — leave for billing.

How it works

From text to voice in four steps

Type or paste your text

Any of 30 languages. Long-form or a single line.

Pick, design or clone a voice

Use a preset, describe a new voice, or clone one with consent.

Generate — in cloud or your VPC

The engine runs where you choose. We never see self-hosted audio.

Stream, download or call the API

Play in the Studio, export 48kHz WAV, or hit the REST API.

🎙️

Try it free in the Studio

Create a workspace and generate your first clip in under a minute. No credit card.

Open the Studio →

Pricing

Simple plans. Self-host when you need it.

Prices in AUD per seat / month. Cancel anytime. Enterprise unlocks self-hosting, SSO and unlimited volume.

Free

A$0

Try the Studio, watermarked, non-commercial.

1 seat
10k characters / mo
Preset voices
Cloud only

Start free

Creator

A$29/seat/mo

For solo creators & small teams.

3 seats
500k characters / mo
Voice Design + Cloning
Commercial use, 48kHz

Choose Creator

Pro

A$149/seat/mo

For product teams shipping voice.

10 seats
2M characters / mo
Everything in Creator
REST API + keys
Batch generation

Choose Pro

Enterprise

Custom

For regulated & high-volume teams.

Unlimited seats & volume
Self-host in your VPC
SSO / SAML + audit logs
SLA & priority support
Custom branded voices

Book a demo

Usage-based API billing also available (~A$0.18 / 1k characters).

FAQ

Questions, answered

What does “self-hosted” actually mean?

The voice engine runs on your own infrastructure — your cloud VPC or on-prem. Your text and audio never touch our servers. You manage it from the same Vocala control plane; only metadata (usage counts, plan) syncs.

How is this different from ElevenLabs or OpenAI TTS?

Quality is comparable, but those are cloud-only and proprietary. Vocala is built on the open Apache-2.0 VoxCPM engine, so you can run it yourself, avoid lock-in, and keep regulated data in-house — usually at lower cost at volume.

Is voice cloning safe and legal?

Cloning requires an explicit, recorded consent acknowledgement that we store in a per-voice consent ledger. We strongly discourage impersonation and provide audit trails for every cloned voice.

Which languages are supported?

30, including English, Chinese (and several dialects), Spanish, Hindi, Arabic, Japanese, French, German, Portuguese, Russian and more — no language tag needed.

Can I try before paying?

Yes. The Free plan needs no credit card — create a workspace and generate right away. Upgrade in-app when you’re ready.