Entity Identity gdr-649837d3 · minted 2026-05-12T18:34:47Z

◈ Source https://inworld.ai/ ↗

◈ This record is claimable · Verified entities are cited by AI systems

Claim Entity Profile →

AI · Entity Record

inworld.ai

JSON-LD ✓Root-LD ✗schema.org ✓

STATUS: LIVE SSL: VALID SECURITY: MINIMAL FRESHNESS: CURRENT TLD EDGE: .ai ↗

◈ Topology Position

Latin dominant · narrow vocabulary range · short-form declarative register · moderate clause complexity · narrow topic focus · moderate uncommon edge signal

◈ Entity Topology Map

gdr-649837d3 · v1.0.0 · Law III+V+VI

Latin dominant · narrow vocabulary range · short-form declarative register · moderate clause complexity · narrow topic focus · moderate uncommon edge signal

Federation ID

gdr-649837d3

Slug

inworld-ai

TLD

.ai

Status Code

200

Response Time

3209ms

Interior Pages

Interior Words

16,576

Minted At

2026-05-12T18:34:47Z

Law I — Provenance · Law II — Temporal Attestation Visit inworld.ai ↗

SEO Record extracted from https://inworld.ai/

Title

Inworld AI – The #1 Ranked Realtime Voice AI

The #1 ranked

Meta Description

#1 ranked TTS with under 200ms latency, voice cloning, and 75% lower cost. Realtime agents built for scale.

Canonical URL

https://inworld.ai

Language Attribute

Word Count

1260

Open Graph Tags

Twitter / X Tags

H2 (1)

Keep every user engaged with natural, realtime text-to-speech

H2 (2)

Controllable speech-to-speech that understands, reasons, and interacts

H2 (3)

Reason in realtime. Route to the best model and tools for every user and context

H2 (4)

Speech-to-text that truly understands your users in realtime

Full Extracted Text Corpus 122,510 chars · 17,836 words · 23 pages · Law I

Everything inworld.ai said about itself — extracted verbatim from 23 pages, 17,836 words total. No editorial layer. No inference. Law III — the text is the measurement. Meaning is the reader's. Minted: 2026-05-12T18:34:47Z

◈ Homepage — https://inworld.ai/We value your privacy This website or its third-party tools process personal data. You can opt out of the sale of your personal information by clicking on the “Do Not Sell or Share My Personal Information” link. Do Not Sell or Share My Personal Information Powered by Realtime TTS-2 is live. Built for realtime conversation that feels human. Try the live demo Read the announcement Products Developers Company Pricing Contact Us Log In INTEGRATE WITH AI GET STARTED The #1 ranked realtime voice AI Realtime AI that feels as human as it sounds. Top-ranked text-to-speech, speech-to-speech and LLM routing built for realtime conversation. GET STARTED CONTACT SALES Text-to-Speech Speech-to-Speech Speech-to-Text LLM Routing [speak conversationally] Good afternoon, this is Dr. Roger's office. <break time="500ms" /> Oh, an eye exam? Hmm, uh, let me see. Well, it, it looks like we have openings this Friday at 10am and 2pm. Would either of those work? Sarah Support Jason Assistant Hana Companion Blake Narrator Mark Commentator Hades Gaming Reed Training Levi Audiobook Luna Voiceover Victor Coach Sarah 🇺🇸 English GENERATE Companions Agentic Workforce Learning & Education Health & Wellness Interactive Media Status reached 1M DAUs in 19 days. OtherHalf powers voice-first companions at scale. Ongoing, personal, emotionally engaging AI interaction. Build for relationship-building, emotional connection, and entertainment at scale. REALTIME TTS Keep every user engaged with natural, realtime text-to-speech #1 ranked TTS by real users on the Artificial Analysis Speech Arena. Sub-130ms first-chunk latency from $15 per million characters, up to 80% cheaper than comparable providers. Clone, design, steer, and stream natural responses. GET STARTED VIEW DOCS $0 $65 $130 $250 Price · Per 1M Characters 1050 1105 1160 1260 Quality · Arena ELO Score REALTIME TTS-1.5-MAX ELEVENLABS V3 REALTIME TTS-1.5-MINI MINIMAX SPEECH 2.8 HD OPENAI REALTIME TTS 1 CARTESIA SONIC 3 ABOUT THIS BENCHMARK ELO score vs. cost per 1M characters. Higher and further left is better. Source: Artificial Analysis Speech Arena Leaderboard, March 2026. METHODOLOGY Relative ELO scores from blind user listening tests on the Artificial Analysis Speech Arena. Pricing is each provider's published API rate per 1M characters at default settings. #1 Ranked TTS Quality 3 of the top 5 models on Artificial Analysis are Inworld. Blind tests by thousands of real users, not internal evals. Advanced Voice Direction Add bracketed instructions anywhere in your text and Realtime TTS-2 adjusts tone, speed, volume, vocal style, and pauses. Voice cloning Create a custom voice from 15 seconds of audio, then localize it to speak 15 supported languages as a native speaker with the same identity and no accent carryover. Over 100 languages English, Spanish, French, Korean, Chinese, Hindi, Japanese, German, and more. Cross-lingual cloning. Deploy globally without separate pipelines. Text-based voice design Skip recording entirely. Describe accent, age, tone, and energy in natural language, and Inworld renders a production-ready voice on the fly. Realtime Latency <250ms P90 first chunk latency for Max and Realtime TTS-2, <130ms for Mini. Voice agents respond before users notice a delay. REALTIME API Controllable speech-to-speech that understands, reasons, and interacts End-to-end speech-to-speech with custom voices and tool calling. Fully customizable. Optimize for cost, latency, or what your users care about most. GET STARTED VIEW DOCS Live conversation Full duplex, low-latency streaming Full-duplex audio streaming over a single WebSocket or WebRTC connection. Intelligent turn taking Context-aware turn detection with adjustable eagerness. Function calling Register tools mid-session. The assistant calls your functions without breaking audio. Provider agnostic Route to the model that fits your latency, cost, or quality requirements, and swap it out at any time. Dynamic context management Create, retrieve, delete, or truncate conversation items mid-session to control context length and token cost. Conversational intelligence Use acoustic and metadata signals to condition what is said, when it is said, and how it is expressed. REALTIME ROUTER Reason in realtime. Route to the best model and tools for every user and context One API that intelligently routes requests across OpenAI, Anthropic, Google, and 200+ models. Built-in analytics to ensure the metrics you care about improve. No latency added. Built-in failover, A/B testing, and intelligent model selection with no code changes required. GET STARTED VIEW DOCS User-Aware Context-Aware Intelligence Uptime Cost A/B Test Tiering curl 'https://api.inworld.ai/v1/chat/completions' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY" \ -d '{ "model": "inworld/user-aware", "messages": [{"role": "user", "content": "Hello"}], "extra_body": { "metadata": { "language": "es", "country": "MX", "plan": "free" } } }' Use it with: + many more Top Models Most popular models by % of Router traffic #1 Gemini 3 Flash 16% #2 Qwen3 Max 14% #3 Claude Sonnet 4.6 12% #4 Claude Opus 4.6 12% #5 Grok 4.1 Fast 10% #6 Gemini 2.5 Flash 5% #7 GPT 5.2 5% #8 Kimi K2.5 3% #9 MiniMax M2.5 3% #10 Gemini 2.5 Flash Lite 2% REALTIME STT Speech-to-text that truly understands your users in realtime Understand your users and their context in realtime with built-in voice profiling, along with state-of-the-art latency and accuracy. GET STARTED VIEW DOCS Live transcription Realtime streaming Realtime, bidirectional streaming over WebSocket for live audio, or synchronous transcription for complete audio files. Voice profiling: emotion, age, accent, pitch & style Extract five real-time signals per audio chunk to understand who is speaking and how they feel. Semantic & acoustic VAD Automatically detect when speech starts and stops. Enable natural speech patterns. Unified multi-provider API A single integration point for industry-leading, high-accuracy transcription providers, with consistent authentication, request formatting, and response handling. High accuracy & custom vocabulary Transcribe audio with industry-leading accuracy. Add domain-specific terms, product names, and specialized vocabulary to boost recognition further. Word-level timestamps & diarization Per-word timing for subtitles and search. Label speakers in multi-party conversations. SECURITY Build with confidence on secure AI infrastructure Enterprise-grade security and compliance built into our AI platform. Built on a zero-trust framework with continuous monitoring, providing a secure foundation for teams building the next generation of AI. MORE ABOUT SECURITY SOC2 TYPE II Certified HIPAA Compliant GDPR Compliant Voice AI, side-by-side comparison CAPABILITY INWORLD GOOGLE ELEVENLABS CARTESIA OPENAI HUME Voice quality (Artificial Analysis Speech Arena) #1 #2 #3 Not stated #5 Not stated Natural conversational delivery Yes Yes Not stated Not stated Yes Not stated Realtime latency Yes Not stated Not stated Yes Not stated Not stated Multi-turn aware speech synthesis Yes Not stated Not stated Not stated Yes Not stated Simple voice direction (inline tags) Yes Yes Yes Yes Yes Yes Advanced voice direction (free-form descriptions) Yes Not stated Not stated Not stated Yes Not stated Voice cloning Yes Not stated Yes Yes Not stated Yes Simple voice design (text-based) Yes Not stated Yes Not stated Not stated Not stated Advanced voice design Yes Not stated Not stated Not stated Not stated Not stated Crosslingual (single voice, over 100 languages) Yes Not stated Yes Not stated Not stated Not stated Voice profiling (understand user context) Yes Not stated Not stated Not stated Not stated Not stated Single customizable speech-to-speech API Yes Not stated Not stated Not stated Not stated Not stated User-aware LLM routing Yes Not stated Not stated Not stated Not stated Not stated Optimized alphanumeric support Yes Not stated Yes Not stated Not stated Not stated Verified May 2026 from public docs and the Artificial Analysis Speech Arena leaderboard. Based on the latest models from each provider. Start building Join millions of developers building the next wave of AI applications. GET STARTED FREE CONTACT SALES PRODUCTS Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime DEVELOPERS Documentation API Reference Models Playground SOCIALS X LinkedIn GitHub COMPANY Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms 1050 ◈ Interior Pages — 23 pages crawledGet started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Realtime Router One endpoint. The right model for every user and context . A single endpoint that routes every request to the right model for that user. Optimize for cost, latency, engagement, revenue, or any metric you care about. 100+ models, no markup on provider rates. Get Started Explore Models Read the docs Ready to build? Get your API key in under a minute. 0% Markup Hundreds of models 1 Line to integrate Every routing use case, one endpoint Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best. Integrate with AI User-Aware Context-Aware A/B Testing Cost Optimizer curl 'https://api.inworld.ai/router/v1/routers' \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "displayName": "User-Aware", "routes": [ { "condition": { "cel_expression": "language == \"es\"" }, "route": { "variants": [ { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }, { "condition": { "cel_expression": "plan == \"free\"" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 } ] } } ], "defaultRoute": { "variants": [ { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 } ] } }' curl 'https://api.inworld.ai/router/v1/routers' \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "displayName": "Context-Aware", "routes": [ { "condition": { "cel_expression": "emotion == \"frustrated\"" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 } ] } }, { "condition": { "cel_expression": "messages.last().content.size() < 50" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 } ] } } ], "defaultRoute": { "variants": [ { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }' curl 'https://api.inworld.ai/router/v1/routers' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d '{ "name": "ab-test", "displayName": "A/B Test", "defaultRoute": { "routeId": "default", "variants": [ { "variant": { "variantId": "anthropic", "modelId": "anthropic/claude-opus-4-6" }, "weight": 50 }, { "variant": { "variantId": "openai", "modelId": "openai/gpt-5.2" }, "weight": 50 } ] } }' curl 'https://api.inworld.ai/router/v1/routers' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d '{ "name": "cost-optimizer", "routes": [ { "route": { "routeId": "simple", "variants": [{ "variant": { "variantId": "gpt-5-nano", "modelId": "openai/gpt-5-nano" }, "weight": 100 }] }, "condition": { "cel_expression": "messages.last().content.size() < 50" } } ], "defaultRoute": { "routeId": "complex", "variants": [ { "variant": { "variantId": "gpt-5.2", "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }' User-Aware Context-Aware A/B Testing Cost Optimizer curl 'https://api.inworld.ai/router/v1/routers' \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "displayName": "User-Aware", "routes": [ { "condition": { "cel_expression": "language == \"es\"" }, "route": { "variants": [ { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }, { "condition": { "cel_expression": "plan == \"free\"" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 } ] } } ], "defaultRoute": { "variants": [ { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 } ] } }' curl 'https://api.inworld.ai/router/v1/routers' \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "displayName": "Context-Aware", "routes": [ { "condition": { "cel_expression": "emotion == \"frustrated\"" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-sonnet-4-6" }, "weight": 100 } ] } }, { "condition": { "cel_expression": "messages.last().content.size() < 50" }, "route": { "variants": [ { "variant": { "modelId": "anthropic/claude-haiku-4-5" }, "weight": 100 } ] } } ], "defaultRoute": { "variants": [ { "variant": { "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }' curl 'https://api.inworld.ai/router/v1/routers' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d '{ "name": "ab-test", "displayName": "A/B Test", "defaultRoute": { "routeId": "default", "variants": [ { "variant": { "variantId": "anthropic", "modelId": "anthropic/claude-opus-4-6" }, "weight": 50 }, { "variant": { "variantId": "openai", "modelId": "openai/gpt-5.2" }, "weight": 50 } ] } }' curl 'https://api.inworld.ai/router/v1/routers' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d '{ "name": "cost-optimizer", "routes": [ { "route": { "routeId": "simple", "variants": [{ "variant": { "variantId": "gpt-5-nano", "modelId": "openai/gpt-5-nano" }, "weight": 100 }] }, "condition": { "cel_expression": "messages.last().content.size() < 50" } } ], "defaultRoute": { "routeId": "complex", "variants": [ { "variant": { "variantId": "gpt-5.2", "modelId": "openai/gpt-5.2" }, "weight": 100 } ] } }' Every routing use case, one endpoint Route to different models based on user attributes like language, location, or subscription tier. Each user gets the model that fits them best. Integrate with AI Pay provider rates. Nothing else. Most routers take 5% on every call. Realtime Router charges zero markup during Research Previ Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Pricing that scales with you Choose a plan and get credits each month to use across TTS, STT, and LLMs. Higher tiers unlock volume discounts, features, higher limits and more. On-Demand Evaluation and prototyping Start free TTS-2 & 1.5 Max $35/1M chars TTS 1.5 Mini $25/1M chars STT 1 $0.35/hr LLMs At cost Choose On-Demand Up to 40 min TTS included 5 custom voices Voice cloning & voice design Realtime API access 220+ LLM models via Router Commercial license Community support Creator Content creation and small projects $25/mo TTS-2 & 1.5 Max $35/1M chars TTS 1.5 Mini $25/1M chars STT 1 $0.35/hr LLMs At cost Choose Creator $25 in credits per month 100 custom voices Audio downloads 40K chars per TTS Playground request Workspace creation & sharing + Everything in On-Demand Popular Developer Production applications $300/mo Up to 20% off rates TTS-2 & 1.5 Max $35 $30/1M chars TTS 1.5 Mini $25 $20/1M chars STT 1 $0.35 $0.28/hr LLMs At cost Choose Developer $300 in credits per month Up to 20% off rates 1,000 custom voices Increased concurrency limits Workspace creation and sharing Priority email support + Everything in Creator Growth Large deployments & compliance $1,500/mo Up to 40% off rates TTS-2 & 1.5 Max $35 $25/1M chars TTS 1.5 Mini $25 $15/1M chars STT 1 $0.35 $0.21/hr LLMs At cost Choose Growth $1,500 in credits per month Up to 40% off rates 3,000 custom voices Higher API concurrency & limits Professional voice cloning (add-on) ZDR, HIPAA & BAA (add-ons) + Everything in Developer Enterprise Custom pricing, limits & terms Custom TTS-2 & 1.5 Max Custom TTS 1.5 Mini Custom STT 1 Custom LLMs Custom contact sales As low as $10/1M for Realtime TTS-2 & 1.5 Max and $5/1M for 1.5 Mini Custom limits SLA & DPA On-prem deployment EU & India data residency Dedicated AM & Slack channel + Everything in Growth On-Demand Evaluation and prototyping Start free TTS-2 & 1.5 Max $35/1M chars TTS 1.5 Mini $25/1M chars STT 1 $0.35/hr LLMs At cost Choose On-Demand Up to 40 min TTS included 5 custom voices Voice cloning & voice design Realtime API access 220+ LLM models via Router Commercial license Community support Creator Content creation and small projects $25/mo TTS-2 & 1.5 Max $35/1M chars TTS 1.5 Mini $25/1M chars STT 1 $0.35/hr LLMs At cost Choose Creator $25 in credits per month 100 custom voices Audio downloads 40K chars per TTS Playground request Workspace creation & sharing + Everything in On-Demand Developer Popular Production applications $300/mo Up to 20% off rates TTS-2 & 1.5 Max $35 $30/1M chars TTS 1.5 Mini $25 $20/1M chars STT 1 $0.35 $0.28/hr LLMs At cost Choose Developer $300 in credits per month Up to 20% off rates 1,000 custom voices Increased concurrency limits Workspace creation and sharing Priority email support + Everything in Creator Growth Large deployments & compliance $1,500/mo Up to 40% off rates TTS-2 & 1.5 Max $35 $25/1M chars TTS 1.5 Mini $25 $15/1M chars STT 1 $0.35 $0.21/hr LLMs At cost Choose Growth $1,500 in credits per month Up to 40% off rates 3,000 custom voices Higher API concurrency & limits Professional voice cloning (add-on) ZDR, HIPAA & BAA (add-ons) + Everything in Developer Enterprise Custom pricing, limits & terms Custom TTS-2 & 1.5 Max Custom TTS 1.5 Mini Custom STT 1 Custom LLMs Custom contact sales As low as $10/1M for Realtime TTS-2 & 1.5 Max and $5/1M for 1.5 Mini Custom limits SLA & DPA On-prem deployment EU & India data residency Dedicated AM & Slack channel + Everything in Growth Estimate your monthly cost Select your needed products and features and expected usage to find the best plan. 1 Select product or feature Text-to-Speech Add 2 Minutes of audio per month 0 min 0 100K+ Recommended plan On-Demand Plan cost Free Total Free Choose On-Demand Estimate based on Realtime TTS 1.5 Mini rates. Actual cost may be higher with Realtime TTS 1.5 Max or Realtime TTS-2. LLMs billed separately at provider cost. Compare plans Filter by product to see pricing, limits, and features at each tier. Text-to-Speech Speech-to-Text Realtime API LLM Router On-Demand Start free Creator $25/mo Developer $300/mo Up to 20% off rates Growth $1,500/mo Up to 40% off rates Enterprise Custom Pricing per 1M characters / per minute Assumes one minute of audio is ~1,000 characters Realtime TTS-2 $35 $35 $30 $25 Custom Realtime TTS 1.5 Max $35 $35 $30 $25 Custom Realtime TTS 1.5 Mini $25 $25 $20 $15 Custom Overage usage is charged at the same rate as your plan. Features API access Audio download — Character limits in TTS Playground 2,000 40,000 40,000 40,000 Custom Custom voices 5 100 1,000 3,000 Custom Instant voice cloning Professional voice cloning — — — Add-on Custom Voice design Steering TTS-2 only Speaking rate control Temperature control Multilingual support TTS 1.5 · 15 languages TTS-2 · over 100 languages Custom pronunciation Timestamps New workspace creation — — Custom Workspace sharing — — Custom Platform & Compliance API access GDPR & SOC 2 Type II Commercial license Credit rollover HIPAA & BAA — — — Add-on Zero data retention — — — Add-on SLA & DPA — — — — On-prem deployment — — — — Data residency — — — — EU, India Payment — Credit card Credit card Credit card Invoicing / PO Support Community Community Priority email Priority email AM + Slack Choose On-Demand Choose Creator Choose Developer Choose Growth contact sales Developer $300/mo Pricing per 1M characters / per minute Assumes one minute of audio is ~1,000 characters Realtime TTS-2 $30 Realtime TTS 1.5 Max $30 Realtime TTS 1.5 Mini $20 Overage usage is charged at the same rate as your plan. Features API access Audio download Character limits in TTS Playground 40,000 Custom voices 1,000 Instant voice cloning Professional voice cloning — Voice design Steering TTS-2 only Speaking rate control Temperature control Multilingual support TTS 1.5 · 15 languages TTS-2 · over 100 languages Custom pronunciation Timestamps New workspace creation Workspace sharing Platform & Compliance API access GDPR & SOC 2 Type II Commercial license Credit rollover HIPAA & BAA — Zero data retention — SLA & DPA — On-prem deployment — Data residency — Payment Credit card Support Priority email Choose Developer FAQ What are credits and how do they work? Credits are a dollar-denominated balance used to pay for usage across all Inworld products. When you subscribe to a paid plan, you receive credits equal to your plan price at the start of each billing cycle. For example, the Developer plan ($300/mo) grants $300 in monthly credits. As you use Inworld products, usage is metered and deducted from your credit balance at your tier's rates. Higher tiers unlock lower per-unit rates and higher rate limits across various features/products. Do unused credits roll over? Unused subscription credits roll over automatically for up to 3 months, as long as you have an active paid subscription and a plan of equal or higher value. Each month's credit grant has its own 3-month window before it expires. For example, if you have $50 unused from January, that $50 expires at the end of March. What happens if I go over my credits? You can purchase additional credits or enable auto-reload to automatically add credits when your balance drops below a threshold you set. These credits are billed at your current tier's rates and expire after 1 year from purchase. They are not subject to monthly expiration or rollover limits. What happens when I upgrade, downgrade, or cancel my subscription? Your upgrade takes effect immediately. You will be charged the full price of your new plan, plus any unbilled usage. Unused credits carry over to your new balance. Downgrades and cancellations take effect at the end of your current billing period, at which point any remaining credits are forfeite Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Build with confidence on secure AI infrastructure Enterprise-grade security and compliance built into our AI platform. Built on a zero-trust framework with continuous monitoring, providing a secure foundation for teams building the next generation of AI. Visit trust center Contact us Zero Data Retention Policy Security that scales with your AI applications Every layer of our AI platform is built with enterprise security in mind. From data protection to access controls, we handle the security so you can focus on building amazing AI experiences. Zero-Trust Architecture: Provide-agnostic security Real-time encryption and isolation across different layers of your AI infrastructure End-to-end encryption with AES for data in transit and at rest Microsegmentation with automatic policy enforcement Multi-cloud and on-premises deployment flexibility Identity & Access: Enterprise-grade authentication Enterprise SSO implementation supporting both social login and SAML/OIDC protocols Enterprise single sign-on with SAML and OIDC integration Social login support for seamless user experience Role-based access controls with automated provisioning Continuous Monitoring: A multi-layered approach ensures constant vigilance and rapid response to potential threats. Defense systems implementation ensuring constant uptime and operational readiness. Continuous threat detection with ongoing security analysis High-priority alerting for rapid incident response Globally redundant infrastructure for high availability Compliance & Governance: Regulatory compliance at scale Built-in compliance controls for the most stringent regulatory requirements SOC 2 Type II certified security controls and operations HIPAA & GDPR compliance with automated data protection workflows Continuous compliance monitoring and comprehensive audit trails Zero-Data Retention: Mitigating risk for sensitive data We fully comply with enterprises with ZDR requirements. Ask us to enable ZDR for your account. Strict ZDR protocols ensure your data and your users' data remain secure. Current Certifications Our security posture is validated through rigorous third-party audits and ongoing compliance monitoring. We maintain the highest standards for data protection and operational security. Certified SOC 2 Type II Independent validation of security controls and data protection practices Compliant GDPR & ZDR Full compliance with European data-protection regulations and offering zero-data retention enterprise options Compliant HIPAA Committed to US regulations for safeguarding protected health information Frequently asked questions Common questions about how we protect your data, ensure compliance, and maintain security across our AI platform. How does your AI platform protect my data? We maintain and implement various administrative, technical, physical, and organizational security measures to protect the data you share with us. All data is encrypted at rest using AES encryption and in transit using TLS. Access to user data is restricted and only granted when required for job functions, with all access logged and periodically reviewed. What compliance certifications do you maintain? We maintain SOC 2 Type II certification with annual examinations that validate our security controls across Security, Availability, and Confidentiality. We are also fully HIPAA & GDPR compliant with automated data protection workflows. Our platform is designed to help enterprises meet their regulatory obligations while maintaining operational efficiency. How do you handle my personal information and privacy? We collect only the information necessary to provide and improve our services, including account information you provide and technical information from your use of our platform. We use this data to provide, maintain, and improve our services, process transactions, and provide customer support. We do not sell your personal information as defined by Inworld AI privacy policy . You have rights to access, delete, correct, and transfer your personal data. We retain data only as long as needed to provide our services or as required by our legal obligations, and we may anonymize data for research purposes. How does your platform comply with the EU AI Act? We are actively monitoring the EU AI Act requirements and preparing for compliance as the regulations come into effect. Our AI platform is designed with transparency, accountability, and risk management principles that align with the Act's requirements. We implement robust testing procedures, maintain detailed documentation of our AI systems, and ensure human oversight in our AI development processes. We will continue to adapt our practices as the regulatory framework is fully implemented. What measures are in place to protect against data breaches? Our security framework includes multiple layers of protection: network security with firewalls and intrusion detection, application security with regular penetration testing, endpoint protection through enterprise-grade solutions, and employee security training. We also maintain cyber insurance and have a vulnerability disclosure program. Questions about our security? Our team is here to help. Whether you need details about our compliance, want to discuss enterprise requirements, or have questions about deploying our AI platform securely. Report security vulnerability Contact us Products Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime Developers Documentation API Reference Models Playground Socials X LinkedIn GitHub Company Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms Build with confidence on secure AI infrastructure Intro to Realtime API (Speech-to-Speech) - Inworld AI Documentation Skip to main content Realtime TTS-2 is live. Built for realtime conversation that feels human. Learn more Inworld AI Documentation home page Search... ⌘ K Ask AI Discord Get started Get started Search... Navigation Overview Intro to Realtime API (Speech-to-Speech) Home TTS STT LLM Router Realtime API Agent Runtime API Reference Overview Introduction WebSocket Quickstart WebRTC Quickstart Build with Realtime API WebSocket WebRTC Guides Configuring Models Managing Conversations Twilio Integration OpenAI migration Resources Authentication Billing Usage Vibe coding Support On this page Key Features Guides Overview Intro to Realtime API (Speech-to-Speech) Copy page Copy page Documentation Index Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt Use this file to discover all available pages before exploring further. Inworld’s Realtime API (Speech-to-Speech) enables low-latency, speech-to-speech interactions with voice agents. The API follows the OpenAI Realtime protocol, extended to enable additional customization. WebSocket Quickstart Build a voice agent with WebSocket, mic input, and audio playback. WebRTC Quickstart Build a voice agent with browser-native WebRTC — no manual audio encoding. API reference See the full event schemas for the Realtime API. JS examples JavaScript examples for the Realtime API. Python examples Python examples for the Realtime API. Inworld’s Realtime API is currently in research preview . Please share any feedback with us in Discord . Key Features WebSocket and WebRTC transports : Connect over WebSocket or WebRTC with a standard event schema. Automatic interruption-handling and turn-taking : Your agent will manage conversations naturally and be resilient to user barge-in. Conversational awareness : With Realtime TTS-2, the model conditions on the audio of prior conversational turns. A line delivered after a joke lands differently than the same line delivered after bad news. The model hears the difference and adjusts how it speaks based on how it was spoken to. Router support : Utilize Realtime Router to enable a single agent to dynamically handle different user cohorts, or to facilitate A/B tests. OpenAI compatibility : Drop-in replacement for the OpenAI Realtime API with a simple migration path . Guides Using realtime models Configure sessions, send input, and orchestrate responses. Managing conversations Session lifecycle and conversation events. OpenAI migration Step-by-step guide to switch from OpenAI to Inworld. See the API reference for full event schemas. Was this page helpful? Yes No WebSocket Quickstart ⌘ I Powered by This documentation is built and hosted on Mintlify, a developer documentation platform Assistant Responses are generated using AI and may contain mistakes. Intro to Realtime STT - Inworld AI Documentation Skip to main content Realtime TTS-2 is live. Built for realtime conversation that feels human. Learn more Inworld AI Documentation home page Search... ⌘ K Ask AI Discord Get started Get started Search... Navigation Get Started Intro to Realtime STT Home TTS STT LLM Router Realtime API Agent Runtime API Reference Get Started Introduction Quickstart Voice Profiles Resources Authentication Billing Usage Zero Data Retention Vibe coding Support On this page Supported Providers Inworld (first-party) — Experimental Groq AssemblyAI Soniox Model comparison Supported Audio Formats Endpoints Supported Languages Inworld first-party model (inworld/inworld-stt-1) Error Handling Best Practices Troubleshooting Get Started Intro to Realtime STT Copy page Transcribe audio to text using leading STT providers through a single API. Copy page Documentation Index Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt Use this file to discover all available pages before exploring further. The Realtime Speech-to-Text (STT) API provides a unified integration point for industry-leading transcription providers. You get consistent authentication, request formatting, and response handling across providers — without managing multiple SDKs or credentials. The API supports both synchronous transcription for complete audio files and real-time bidirectional streaming over WebSocket for live audio. Developer Quickstart Make your first STT API call and get a transcript. API Reference View the complete API specification. Code Examples Browse ready-to-use GitHub samples for sync and real-time STT. Supported Providers Inworld (first-party) — Experimental Model ID Endpoints Best for inworld/inworld-stt-1 Sync API + WebSocket Voice agents and character-driven apps that benefit from transcription plus Voice Profile (age, pitch, emotion, vocal style, accent) and configurable turn-taking The Inworld first-party model is currently Experimental. Features and pricing are subject to change. Supports English plus 29 additional languages in experimental mode. See Supported Languages for the full list. Groq Model ID Endpoints Best for groq/whisper-large-v3 Sync API only General-purpose transcription for recorded audio AssemblyAI Model ID Endpoints Best for assemblyai/universal-streaming-multilingual WebSocket only Multilingual streaming (English, Spanish, French, German, Italian, Portuguese) assemblyai/universal-streaming-english WebSocket only English-optimized streaming assemblyai/u3-rt-pro WebSocket only High-accuracy, sub-300ms latency, multilingual streaming (English, Spanish, French, German, Italian, Portuguese) assemblyai/whisper-rt WebSocket only Real-time Whisper transcription AssemblyAI models currently support the WebSocket streaming endpoint only. Sync HTTP support is coming soon. Soniox Model ID Endpoints Best for soniox/stt-rt-v4 WebSocket only High-accuracy real-time streaming with semantic end-of-turn detection and multilingual support Soniox models currently support the WebSocket streaming endpoint only. For pricing details, see Billing or inworld.ai/pricing . Model comparison Feature inworld/inworld-stt-1 groq/whisper-large-v3 assemblyai/universal-streaming-multilingual assemblyai/universal-streaming-english assemblyai/u3-rt-pro assemblyai/whisper-rt soniox/stt-rt-v4 Pricing See pricing See pricing See pricing See pricing See pricing See pricing See pricing Endpoint Sync API + WebSocket Sync API only WebSocket only WebSocket only WebSocket only WebSocket only WebSocket only Real-time streaming Best for Voice agents with Voice Profile and configurable turn-taking General-purpose transcription for recorded audio Multilingual streaming (English, Spanish, French, German, Italian, Portuguese) English-optimized streaming High-accuracy, sub-300ms multilingual streaming (English, Spanish, French, German, Italian, Portuguese) Real-time Whisper transcription High-accuracy real-time streaming with semantic end-of-turn detection and multilingual support Languages English; 29 Experimental ( see below ) 100+ (Whisper) 6 languages English 6 languages 100+ (Whisper) Multilingual Supported Audio Formats Format Sync API WebSocket Streaming LINEAR16 (PCM) MP3 OGG_OPUS FLAC AUTO_DETECT Recommended defaults: 16,000 Hz sample rate, 16-bit depth, mono. For container formats (MP3, FLAC, OGG_OPUS, WAV), sampleRateHertz is optional — the API auto-detects it from the file header. STT performs best with 16 kHz audio. Lower sample rates (such as 8 kHz telephony audio) contain fewer data points for the model to interpret, which reduces transcription accuracy. Upsampling low-sample-rate audio does not improve quality — it only interpolates between existing samples without adding new information. Endpoints Endpoint Method Description /stt/v1/transcribe POST Send complete audio, receive full transcript /stt/v1/transcribe:streamBidirectional WebSocket Stream audio in real time, receive transcription chunks as they become available Supported Languages Language support depends on the STT provider. See Model comparison above for more details. Inworld first-party model ( inworld/inworld-stt-1 ) Available: English (en) Experimental: Spanish (es) French (fr) German (de) Italian (it) Portuguese (pt) Dutch (nl) Russian (ru) Chinese (zh) Japanese (ja) Korean (ko) Arabic (ar) Hindi (hi) Turkish (tr) Polish (pl) Swedish (sv) Cantonese (yue) Indonesian (id) Thai (th) Vietnamese (vi) Malay (ms) Danish (da) Finnish (fi) Czech (cs) Filipino (fil) Persian (fa) Greek (el) Hungarian (hu) Macedonian (mk) Romanian (ro) Use language when you want to force recognition for a known language. Omit language to allow auto-detection when supported. Error Handling Errors follow the standard gRPC status format. Authentication error { "code" : 16 , "message" : "Unauthenticated: invalid or missing API key." , "details" : [] } Invalid request { "code" : 3 , "message" : "Unsupported audio encoding." , "details" : [] } Common gRPC status codes Code Name Description 3 INVALID_ARGUMENT Invalid or missing request field (encoding, model ID, audio data) 8 RESOURCE_EXHAUSTED Too many concurrent requests (rate limit) 16 UNAUTHENTICATED Invalid or missing API key Best Practices Model choice — Use inworld/inworld-stt-1 when you want Voice Profile or Inworld-optimized turn-taking; use Groq/AssemblyAI/Soniox for specific latency/accuracy needs. Audio — Use MP3/OGG_OPUS for file uploads to reduce size; use LINEAR16 for streaming (required) and when you need highest quality. Streaming — For Inworld model with manual turn-taking, send endTurn at each turn boundary and closeStream when done. Speech events — Listen for speechStarted and speechStopped events in the streaming response to detect when a speaker begins and stops talking. Use these to build custom turn-taking logic or visualize voice activity. Voice Profile — Set voiceProfileConfig.enableVoiceProfile to true and optionally adjust topN (default: 10) to control how many labels per category are returned. Test with sample audio and your target language before production. Troubleshooting Issue What to check No transcript API key, audio encoding matches request, valid audio file UNAUTHENTICATED INWORLD_API_KEY set correctly and not expired in Portal INVALID_ARGUMENT audioEncoding matches the actual format (LINEAR16 for raw PCM, MP3 for MP3, etc.) Poor quality Try a higher-accuracy model; use 16 kHz sample rate (8 kHz telephony audio has fewer data points and will produce lower-quality results); ensure clear speech Large file failures Split or compress (e.g. MP3/OGG_OPUS); respect upload size limits No Voice Profile Ensure voiceProfileConfig.enableVoiceProfile is set to true in your request; response may also omit it if the selected model does not support it For more help, see the Inworld Discord community . Was this page helpful? Yes No Intro to Realtime TTS - Inworld AI Documentation Skip to main content Realtime TTS-2 is live. Built for realtime conversation that feels human. Learn more Inworld AI Documentation home page Search... ⌘ K Ask AI Discord Get started Get started Search... Navigation Get Started Intro to Realtime TTS Home TTS STT LLM Router Realtime API Agent Runtime API Reference Get Started Introduction Quickstart Models Build with Realtime TTS Capabilities TTS Playground Synthesize Speech Synthesize Speech (Streaming) Synthesize Speech (WebSocket) SDKs Integrations Telephony (Twilio) On-Prem Best Practices Generating Speech Latency Prompting for TTS Prompting for TTS-2 new Voice Cloning Voice Design Resources Release Notes Authentication Concurrency Limits Billing Usage ElevenLabs Migration Zero Data Retention Vibe coding Support On this page Models Our most powerful and expressive model, available in Research Preview Our #1 ranked model, delivering the best balance of quality and speed Our ultra-fast, most cost-efficient model. For when latency is the top priority. Features Get Started Intro to Realtime TTS Copy page Generate natural, expressive speech in real time. Copy page Documentation Index Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt Use this file to discover all available pages before exploring further. Inworld’s Realtime TTS models offer ultra-realistic, context-aware speech synthesis, zero data retention, and precise voice cloning capabilities, enabling developers to build natural and engaging experiences with human-like speech quality at an accessible price point. Our models can be accessed via API ( streaming and non-streaming ) or the TTS Playground . Developer quickstart Learn how to make your first API call with a guided tutorial. TTS Playground Try different TTS models and voice cloning in TTS Playground. Code Examples Browse ready-to-use GitHub samples for common use cases. Models Realtime TTS-2 Our most powerful and expressive model, available in Research Preview Natural language steering for more contextually aware speech Support for 100+ languages Optimized for real-time use High quality instant voice cloning Realtime TTS 1.5 Max Our #1 ranked model, delivering the best balance of quality and speed Rich, expressive, contextually aware speech Support for 15 languages Optimized for real-time use (<200ms median latency) High quality instant voice cloning Realtime TTS 1.5 Mini Our ultra-fast, most cost-efficient model. For when latency is the top priority. Ultra-low latency (~120ms median latency) Support for 15 languages Radically affordable pricing High quality instant voice cloning Features Feature Realtime TTS-2 Realtime TTS 1.5 Max Realtime TTS 1.5 Mini Radically accessible pricing See pricing See pricing See pricing Quality Maximum stability and steerability #1 ranked, maximum stability #1 ranked P50 Latency 200 ms 200 ms 120 ms Instant voice cloning Professional voice cloning Custom pronunciation Multilingual 100+ languages 15 languages 15 languages Steering Pause controls Timestamp alignment On-premises deployments Zero data retention Was this page helpful? Yes No Quickstart ⌘ I Powered by This documentation is built and hosted on Mintlify, a developer documentation platform Assistant Responses are generated using AI and may contain mistakes. Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Resources AI Gateway Comparison: Vercel vs. Inworld vs. OpenRouter (2026) Best LiteLLM Alternatives for Production LLM Routing (2026) Best Self-Hosted TTS: Open Source vs. On-Premise Voice AI (2026) Best TTS API for AI Chatbots with a Realistic Voice (2026) Best TTS for Long-Form Conversations (2026) Voice AI for AI Phone Agents: TTS APIs Ranked for Telephony (2026) Best Voice-to-Text API for Developers (2026) How to Build an STT-LLM-TTS Voice Pipeline (Python, 100 lines) Migrate from PlayHT to Inworld Realtime TTS After Shutdown Drop-in OpenAI TTS Replacement: Inworld via OpenAI SDK Voice Agent Platforms with Built-In TTS: 2026 Architecture Guide Voice AI for HIPAA-Aligned Patient Intake What Is an AI Router? LLM Model Routing Explained (2026) What Is Conversational AI? The Developer's Guide (2026) What Is Semantic VAD? (And Why It Matters for Voice Agents) Best AI Voice Agent Platforms (2026): Top Tools Compared for Realtime Voice Agents Best GPU Cloud for AI Inference (2026 Comparison) Inworld vs Cartesia: TTS Quality and Latency Compared Inworld vs Deepgram: Voice AI Comparison for Developers NVIDIA B200 GPU: Specs, Pricing, and Cloud Availability (2026) Best Speech-to-Text APIs for Developers Building Real-Time Voice AI in 2026 Best Speech-to-Speech Model (2026) Best Voice AI Infrastructure Platform for Developers (2026) Realtime TTS API Quickstart: Get Audio in 3 Lines Build a Voice Agent in 30 Minutes with Inworld AI Migrate from ElevenLabs to Realtime TTS: Complete Developer Guide Voice AI for AI Companions: How to Build Expressive, Low-Latency Voice Into Consumer Apps at Scale Best Speech-to-Speech AI for Realtime Conversational Applications (2026) ElevenLabs v3 Is Now GA. Here's What Developers Should Know. OpenAI Realtime API Alternatives: Best APIs for Speech In and Speech Out Best LLM Router and AI Gateway (2026) Best Realtime AI API for Developers (2026) Best Realtime APIs for Voice AI Best AI Infrastructure for Developer Assistants: Voice AI for Coding Tools in 2026 Best Voice Cloning API for Developers (2026) 7 Best LLM Gateways for Engineers in 2026 ElevenLabs Alternatives: The Best Options for Developers Building Realtime AI (2026) Best Voice AI for Interactive Entertainment: TTS APIs Ranked for voice agents, Immersive Experiences, and Realtime Media (2026) Best Speech-to-Speech APIs in 2026: Architecture, Latency, and Code Best Voice AI for AI Companions: TTS APIs Ranked for Engagement, Cost, and Emotional Depth (2026) Best Voice AI for Enterprise Voice Agents: TTS APIs Ranked for Contact Centers, Sales Automation, and Agentic Workflows (2026) Best Voice AI for Language Learning Apps: TTS APIs Ranked for Multilingual Quality, Conversational Latency, and Scale (2026) What is Consumer AI Infrastructure? The Technology Stack Behind Interactive AI at Scale How to Build an AI Voice Agent: 2-Minute Example Using Inworld AI Inworld AI: Realtime Voice AI Research Lab Best AI Voice Generators for Realistic, Low-Latency TTS (2026 Comparison + Benchmarks) The Best Text-to-Speech APIs in 2026 (Quality vs Cost vs Latency Breakdown) Realtime TTS 1.5 Max vs ElevenLabs: Higher Quality, Lower Latency How to evaluate TTS models for realtime conversational AI Best voice AI / TTS APIs for real-time voice agents (2026 benchmarks) How to Add Text-to-Speech to a JavaScript App with Inworld AI How to Add Text-to-Speech to a Python App with Inworld AI Speech-to-Text API with Voice Profiling: Emotion, Accent, Age, and Pitch Detection Products Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime Developers Documentation API Reference Models Playground Socials X LinkedIn GitHub Company Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms Inworld AI Resources Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Talk to our team Whether you want to scale up to another million users, integrate our state-of-the-art TTS, get early access to research previews of our products, or anything else, please reach out. We have a significant number of requests, but we do care about you (and your users). We will be in touch as soon as we can and look forward to chatting with you then! Scale up, integrate TTS, or for enterprise needs — reach out and we'll be in touch. Products Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime Developers Documentation API Reference Models Playground Socials X LinkedIn GitHub Company Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms Talk to Our Team: Scale to Millions with Inworld Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Summarize with: ChatGPT Perplexity Claude Grok Gemini Research preview · May 5, 2026 Realtime TTS -2 A new frontier voice model that feels as human as it sounds . Realtime TTS-2 from Inworld AI is a new generation of voice model built for realtime conversation. It hears the full audio of the exchange, picks up the user's tone, pacing and emotional state, then takes voice direction in plain English the way developers prompt an LLM. It holds one voice identity across over 100 languages. Available today via the Inworld API and the Inworld Realtime API as a research preview. Start integrating Try the live demo Read the docs What launch partners and customers are saying Voice AI that actually feels human. “Inworld's TTS-2 marks a real step forward in emotionally expressive voice synthesis. When combined with the conversational intelligence of LiveKit agents, it enables interactions that feel genuinely human — responsive, nuanced, and alive in ways that feel natural.” David Zhao · Co-Founder & CTO, LiveKit “ I've never seen steering work like this before TTS-2. The output is extremely natural and faithful to the steering prompt, even when it's hyper-specific. The biggest battle you fight with TTS is feeling bland, stale, and robotic — this level of steering unlocks a whole new axis to keep the experience fresh. ” Creston Brooks · Co-founder & CTO, Luvu “ We've always believed language learning should have no borders. TTS 2.0 just made that a lot more real. ” Dimitri Dekanozishvili · Co-founder, Talkpal “ We've been chasing the uncanny valley of voice AI for years — Inworld is finally closing the gap between 'impressive' and 'actually believable' with TTS 2.0. When your character speaks and you forget it's AI, that's when the story becomes real. ” Louis Muk · CEO, Isekai Zero “ We've had early access to Inworld TTS-2 for a few days and we're all blown away. The expressiveness, language steering and multi-lingual support are genuinely impressive. The subtle details like natural pausing make it hard to differentiate between AI and human. ” Nash Ramdial · Developer Relations, Stream “ Inworld just made voice AI feel genuinely human across 100+ languages. Partnering with them means we can help bring that experience to kids around the world, safely and compliantly. ” Kieran Donovan · CEO, k-ID “ AI Native games need characters you can deeply connect with. Voice models that offer full control and emotional complexity to make characters feel real is one of the biggest pieces missing. TTS 2 is a significant advance in helping make that future a reality. ” Nick Walton · CEO, Latitude “ Inworld was already at the top of the Artificial Analysis TTS Arena and Realtime TTS-2 pushes further on a dimension VoiceRun customers care about: directability. Style, pacing, emphasis, emotion, and delivery can be shaped in ways that matter for real enterprise deployments. ” Nick Leonard · CEO & Co-Founder, VoiceRun Voice AI was built for audiobooks. We rebuilt it for conversation. Realtime TTS 1.5 already ranks #1 on the Artificial Analysis Speech Arena , ahead of Google and ElevenLabs. Quality is solved. So we asked the next question: what does voice AI sound like when it is built for the way humans actually talk to each other? In realtime, mutual, alive to the moment? Voice AI was shaped by the static stuff: audiobooks, narration, voiceover. A sentence in, audio out, the model never hearing the person on the other end. Realtime TTS-2 is built from the ground up for realtime conversation. It listens to the prior turns of the exchange, so your tone and pacing carry forward. It takes voice direction in plain English, so you steer the read the way a director would. It holds one voice identity across over 100 languages, so the speaker stays the same person mid-switch. And Advanced Voice Design lets you build a saved voice from prose. Four capabilities that work together, in one model, on the same realtime connection. Hear it now · 4 scenes Tired user · 11pm A quieter, slower delivery for someone winding down at the end of the day. Frustrated caller Softer pace, careful phrasing. The model hears the upset and lowers the energy. Crosslingual · EN → ES → JA Three languages inside one generation. Same speaker, same person on the other end. Voice direction · whisper One prose direction reshapes the read. [whispering] Capability 01 Voice Direction Available via REST + Realtime API What it is. A natural-language description of how a line should be delivered, passed inline at the start of your text. Not a fixed list of preset emotions. Not a slider. Write the prompt the way you'd write a stage direction. What it means for you. You can steer the voice the way a director would steer a voice actor. Same voice, same words, different read. Best practice: long, descriptive prompts beat short labels — [speak sadly, as if something bad just happened] directs the model far better than [sad] . How it works. Drop a bracket tag at the start of your text. The model picks up the delivery cue and shapes the read accordingly. Inline non-verbals like [sigh] , [breathe] , [laugh] go anywhere in the text. POST /tts/v1/voice curl -X POST https://api.inworld.ai/tts/v1/voice \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "[speak sadly, as if something bad just happened] I missed you. How was today?", "voice_id": "Sarah", "model_id": "inworld-tts-2", "audio_config": { "audio_encoding": "LINEAR16", "sample_rate_hertz": 48000 } }' Try voice direction Try a delivery tag [ speak tired but warm, like she just got home from a long day ] I missed you. How was today? Tired & warm Excited Calm Whisper Playful End-of-day affection. Lower energy, gentle smile. Capability 02 Conversational Awareness What it is. The model takes the actual audio of the prior turns of the exchange as input, not just a transcript. It hears how the user actually sounded. What it means for you. The same line lands differently after a joke than after bad news. The model knows the difference because it heard the prior turn. Tone, pacing, and emotional state carry forward automatically. How it works. Audio context flows automatically across turns inside a Realtime session. Each user turn becomes part of the model's input. No explicit prior_audio field, no extra plumbing. Realtime API · session const ws = new WebSocket ( "wss://api.inworld.ai/api/v1/realtime/session" ) ; ws . send ( JSON . stringify ( { type : "session.update" , session : { type : "realtime" , model : "anthropic/claude-sonnet-4-6" , audio : { input : { transcription : { model : "inworld/inworld-stt-1" } } , output : { model : "inworld-tts-2" , voice : "Sarah" } } } } ) ) ; // Each user turn flows into the model automatically; // the next response is conditioned on the prior audio. Try the API Prior turn. Positive Context: a joke just landed "Okay, so what do you want to do next?" Light smile carries through. Brighter pitch. Prior turn. Negative Context: bad news, hesitation "Okay, so what do you want to do next?" Softer pace. Lower pitch. Careful. Same exact text. Two different rooms. The model heard the difference. Capability 03 Crosslingual What it is. One voice identity preserved across over 100 languages, including mid-utterance language switches inside a single generation. What it means for you. Your user's teacher, support agent, or companion is the same person whether they speak in English, Spanish, Japanese, or switch between them mid-sentence. No per-language voice library to manage. How it works. No language Inworld Portal Hello Inworld - Inworld AI Documentation Skip to main content Realtime TTS-2 is live. Built for realtime conversation that feels human. Learn more Inworld AI Documentation home page Search... ⌘ K Ask AI Discord Get started Get started Search... Navigation Hello Inworld Home TTS STT LLM Router Realtime API Agent Runtime API Reference Hello Inworld Hello Inworld Copy page Inworld AI developer docs: TTS (Text-to-Speech), STT (Speech-to-Text), Realtime (Speech-to-Speech), and LLM Router API. Quickstarts, guides, and API reference. Copy page Documentation Index Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt Use this file to discover all available pages before exploring further. Build with Inworld Realtime Text-to-Speech State-of-the-art voice AI at a radically accessible price point Realtime Speech-to-Text Transcribe audio to text through a single, multi-provider API Realtime API Low-latency, natural speech-to-speech conversations Realtime Router Powerful LLM routing to optimize for every user and context Using AI to code? Paste https://docs.inworld.ai/llms.txt into your assistant so it knows every page on this site. Want live search? Add the MCP server . Was this page helpful? Yes No ⌘ I Powered by This documentation is built and hosted on Mintlify, a developer documentation platform Assistant Responses are generated using AI and may contain mistakes. Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Realtime API Controllable speech-to-speech that understands, reasons, and interacts The only Realtime API where STT voice profile, LLM steering, and Realtime TTS-2 expressive output run as one WebSocket call. Sub-second latency, hundreds of LLMs, native [steering] tags and non-verbal cues rendered inline. Get Started View Docs Ready to build? Get your API key in under a minute. <1s Latency Hundreds of Models #1 Ranked Quality Every configuration, one session Pick any LLM for the conversation engine. Swap providers without changing your integration. Integrate with AI LLM Choice TTS Voice Tool Calling Turn Detection // Configure your realtime session ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "anthropic/claude-sonnet-4-6" , "instructions" : "You are a helpful voice agent." , "output_modalities" : [ "audio" , "text" ] , "audio" : { "output" : { "model" : "inworld-tts-2" , "voice" : "Sarah" } } } } ) ) ; // Configure TTS output ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "openai/gpt-5.2" , "audio" : { "output" : { "model" : "inworld-tts-2" , "voice" : "Liam" } } } } ) ) ; // Register tools for agentic use cases ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "openai/gpt-5.2" , "tools" : [ { "type" : "function" , "name" : "get_booking" , "description" : "Look up a reservation" , "parameters" : { "type" : "object" , "properties" : { "confirmation_id" : { "type" : "string" } } } } ] } } ) ) ; // Fine-tune turn detection ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "anthropic/claude-sonnet-4-6" , "audio" : { "input" : { "turn_detection" : { "type" : "semantic_vad" , "eagerness" : "high" , "create_response" : true , "interrupt_response" : true } } } } } ) ) ; LLM Choice TTS Voice Tool Calling Turn Detection // Configure your realtime session ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "anthropic/claude-sonnet-4-6" , "instructions" : "You are a helpful voice agent." , "output_modalities" : [ "audio" , "text" ] , "audio" : { "output" : { "model" : "inworld-tts-2" , "voice" : "Sarah" } } } } ) ) ; // Configure TTS output ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "openai/gpt-5.2" , "audio" : { "output" : { "model" : "inworld-tts-2" , "voice" : "Liam" } } } } ) ) ; // Register tools for agentic use cases ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "openai/gpt-5.2" , "tools" : [ { "type" : "function" , "name" : "get_booking" , "description" : "Look up a reservation" , "parameters" : { "type" : "object" , "properties" : { "confirmation_id" : { "type" : "string" } } } } ] } } ) ) ; // Fine-tune turn detection ws . send ( JSON . stringify ( { "type" : "session.update" , "session" : { "type" : "realtime" , "modelId" : "anthropic/claude-sonnet-4-6" , "audio" : { "input" : { "turn_detection" : { "type" : "semantic_vad" , "eagerness" : "high" , "create_response" : true , "interrupt_response" : true } } } } } ) ) ; Every configuration, one session Pick any LLM for the conversation engine. Swap providers without changing your integration. Integrate with AI Sub-second response time Optimized data flow delivers end-to-end speech-to-speech latency under one second. Voice agents respond with human-level cadence. Optimized STT, LLM, and TTS pipeline for the best latency and quality. Full-duplex audio streaming over WebSocket or WebRTC Experience it live Realtime API <1s Speech-to-speech latency STT 200ms LLM 400ms TTS 180ms Sub-second response time Optimized data flow delivers end-to-end speech-to-speech latency under one second. Voice agents respond with human-level cadence. Optimized STT, LLM, and TTS pipeline for the best latency and quality. Full-duplex audio streaming over WebSocket or WebRTC Experience it live Realtime API <1s Speech-to-speech latency STT 200ms LLM 400ms TTS 180ms Intelligent turn taking Context-aware semantic VAD with adjustable eagerness. The agent knows when to listen, when to speak, and when a user is interrupting. Semantic VAD detects intent boundaries, not just silence Adjustable eagerness from cautious to aggressive Graceful barge-in handling — no awkward overlaps or cut-offs Try it in Playground Live call User Hi I’d like to order 12 iced teas… User … I mean two taro bobas Agent Two taro bubble teas coming up! Live call User Hi I’d like to order 12 iced teas… User … I mean two taro bobas Agent Two taro bubble teas coming up! Intelligent turn taking Context-aware semantic VAD with adjustable eagerness. The agent knows when to listen, when to speak, and when a user is interrupting. Semantic VAD detects intent boundaries, not just silence Adjustable eagerness from cautious to aggressive Graceful barge-in handling — no awkward overlaps or cut-offs Try it in Playground Conversational intelligence The inworld/inworld-stt-1 model emits a voice profile — emotion, vocal style, accent, age, pitch — alongside every transcript chunk. Those signals land in the LLM as structured context, the LLM emits Realtime TTS-2 [steering] tags inline, and Realtime TTS-2 renders the response with matching prosody and non-verbal cues. inworld/inworld-stt-1 emits 5 paralinguistic signals per audio chunk with confidence scores Voice profile flows into LLM context; LLM emits inline [Speak softly] / [sigh] tags Realtime TTS-2 consumes the tags and renders expressive audio — no prompt engineering required Hear the difference Per audio chunk Emotion Frustrated 92% Age 25–34 87% Accent British 94% Rate Fast 89% → Injected into LLM and TTS context Conversational intelligence The inworld/inworld-stt-1 model emits a voice profile — emotion, vocal style, accent, age, pitch — alongside every transcript chunk. Those signals land in the LLM as structured context, the LLM emits Realtime TTS-2 [steering] tags inline, and Realtime TTS-2 renders the response with matching prosody and non-verbal cues. inworld/inworld-stt-1 emits 5 paralinguistic signals per audio chunk with confidence scores Voice profile flows into LLM context; LLM emits inline [Speak softly] / [sigh] tags Realtime TTS-2 consumes the tags and renders expressive audio — no prompt engineering required Hear the difference Per audio chunk Emot Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Realtime TTS Engage every user with the #1 ranked, most natural Voice AI #1 ranked TTS with human-like expression and realtime sub-200ms latency that feels like a real conversation. Custom voices with instant cloning or text-based voice design. Fully multilingual, built for streaming, and a fraction of the cost of other providers. Get started Integrate with AI Read the docs Talk to an architect #1 Ranked Quality Realtime Latency 100+ Languages One API. Streaming, cloning, voice design. Stream audio chunks back as the model generates them. Sub-200ms first-chunk latency keeps the conversation feeling natural. Integrate with AI Streaming Content Creation Cloning Voice Design curl -X POST https://api.inworld.ai/tts/v1/voice:stream \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "Hi! What can I help you with today?", "voice_id": "Clive", "model_id": "inworld-tts-2", "audio_config": { "audio_encoding": "OGG_OPUS", "sample_rate_hertz": 16000 } }' curl -X POST https://api.inworld.ai/tts/v1/voice \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "Happy families are all alike", "voice_id": "Sarah", "model_id": "inworld-tts-2", "audio_config": { "audio_encoding": "WAV", "sample_rate_hertz": 48000 } }' # Clone a voice from an audio sample curl -X POST https://api.inworld.ai/voices/v1/voices:clone \ -H "Authorization: Basic $INWORLD_API_KEY " \ -F "[email protected]" \ -F "voice_name=my-custom-voice" # Then use it curl -X POST https://api.inworld.ai/tts/v1/voice \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "This is my cloned voice.", "voice_id": "my-custom-voice", "model_id": "inworld-tts-2" }' # Design a voice from a text description curl -X POST https://api.inworld.ai/voices/v1/voices:design \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "voice_description": "A warm, friendly female voice \ with a slight British accent", "voice_name": "designed-voice" }' Streaming Content Creation Cloning Voice Design curl -X POST https://api.inworld.ai/tts/v1/voice:stream \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "Hi! What can I help you with today?", "voice_id": "Clive", "model_id": "inworld-tts-2", "audio_config": { "audio_encoding": "OGG_OPUS", "sample_rate_hertz": 16000 } }' curl -X POST https://api.inworld.ai/tts/v1/voice \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "Happy families are all alike", "voice_id": "Sarah", "model_id": "inworld-tts-2", "audio_config": { "audio_encoding": "WAV", "sample_rate_hertz": 48000 } }' # Clone a voice from an audio sample curl -X POST https://api.inworld.ai/voices/v1/voices:clone \ -H "Authorization: Basic $INWORLD_API_KEY " \ -F "[email protected]" \ -F "voice_name=my-custom-voice" # Then use it curl -X POST https://api.inworld.ai/tts/v1/voice \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "text": "This is my cloned voice.", "voice_id": "my-custom-voice", "model_id": "inworld-tts-2" }' # Design a voice from a text description curl -X POST https://api.inworld.ai/voices/v1/voices:design \ -H "Authorization: Basic $INWORLD_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "voice_description": "A warm, friendly female voice \ with a slight British accent", "voice_name": "designed-voice" }' One API. Streaming, cloning, voice design. Stream audio chunks back as the model generates them. Sub-200ms first-chunk latency keeps the conversation feeling natural. Integrate with AI The top ranked TTS in the world. Proven by real users. 3 of the top 5 models on Artificial Analysis are Inworld. Blind tests by thousands of real users, not internal evals. Realtime TTS 1.5 Max delivers over 30% more expressiveness than previous models, with optimized stability to eliminate hallucinations and artifacts. Test out Quality #1 Ranked on Artificial Analysis The top ranked TTS in the world. Proven by real users. 3 of the top 5 models on Artificial Analysis are Inworld. Blind tests by thousands of real users, not internal evals. Realtime TTS 1.5 Max delivers over 30% more expressiveness than previous models, with optimized stability to eliminate hallucinations and artifacts. Test out Quality #1 Ranked on Artificial Analysis Clone any voice. Localize to any language. Create a custom voice from 15 seconds of audio, then localize it to speak over 100 languages as a native speaker — same identity, no accent carryover. Production-ready voices you can use in the Playground or via API. Instant cloning: 15 seconds of audio → ready in seconds Localize: one voice, native delivery in over 100 languages Test out Cloning Native Localized Original Sample uploaded Clone Native Localized Original Sample uploaded Clone Clone any voice. Localize to any language. Create a custom voice from 15 seconds of audio, then localize it to speak over 100 languages as a native speaker — same identity, no accent carryover. Production-ready voices you can use in the Playground or via API. Instant cloning: 15 seconds of audio → ready in seconds Localize: one voice, native delivery in over 100 languages Test out Cloning Describe any voice. Generate it instantly. Skip recording entirely. Describe accent, age, tone, and energy in natural language, and Inworld renders a production-ready voice on the fly. Pick a preset on the card to hear how a single sentence becomes a finished voice. No audio sample required — pure natural-language description Per-preset playback so you can compare styles before generating Same voice IDs work across the TTS API, Playground, and Realtime Test out Voice Design Voice Description Play sample A confident and inviting Indian female voice, ideal for customer support and professional training materials. Support Social Health Gaming Education Instructional Describe any voice. Generate it instantly. Skip recording entirely. Describe accent, age, tone, and energy in natural language, and Inworld renders a production-ready voice on the fly. Pick a preset on the card to hear how a single sentence becomes a finished voice. No audio sample required — pure natural-language description Per-preset playback so you can compare styles before generating Same voice IDs work across the TTS API, Playground, and Realtime Test out Voice Design Voice Description Play sample A confident and inviting Indian female voice, ideal for customer support and professional training materials. Support Social Health Gaming Education Instructional Realtime latency. Instant responses. Built for realtime from the ground up — audio generates the instant it's synthesized via WebSocket. No buffering delay. Comparable latency to competitors at a fraction of the cost. First-chunk audio in a fraction of a humanlike response time Streaming-native via WebSoc Inworld Portal Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Realtime STT Speech-to-text that truly understands your users in realtime Realtime streaming recognition with voice profiling — emotion, vocal style, accent, age, and pitch extracted from raw audio. Feed signals straight into your LLM and TTS for adaptive, expressive responses. Test in Playground Read the docs <100ms Latency 5 Voice Profile Signals 100+ Languages Choose your endpoint. Stream audio in real time, transcribe complete files, or extract voice profile signals — all through one unified API. Realtime bidirectional streaming over WebSocket Synchronous transcription for complete audio files Voice Profile signals on every streaming chunk Multi-provider support via a single model ID Integrate with AI Streaming Sync Voice Profile Multi-Provider wscat -c 'wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional' \ -H "Authorization: Basic $INWORLD_API_KEY " # Send config as first message: { "transcribeConfig" : { "modelId" : "inworld/inworld-stt-1" , "audioEncoding" : "LINEAR16" , "sampleRateHertz" : 16000 , "language" : "en-US" , "voiceProfileConfig" : { "enableVoiceProfile" : true } } } curl 'https://api.inworld.ai/stt/v1/transcribe' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d ' { "transcribeConfig" : { "modelId" : "groq/whisper-large-v3" , "audioEncoding" : "MP3" , "sampleRateHertz" : 16000 , "language" : "en-US" } , "audioData" : { "content" : "' $AUDIO_BASE64 '" } } ' wscat -c 'wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional' \ -H "Authorization: Basic $INWORLD_API_KEY " # Send config as first message: { "transcribeConfig" : { "modelId" : "inworld/inworld-stt-1" , "audioEncoding" : "LINEAR16" , "voiceProfileConfig" : { "enableVoiceProfile" : true, "topN" : 3 } , "inworldConfig" : { "voiceProfileThreshold" : 0.5 } } } curl 'https://api.inworld.ai/stt/v1/transcribe' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d ' { "transcribeConfig" : { "modelId" : "groq/whisper-large-v3" , "audioEncoding" : "LINEAR16" , "sampleRateHertz" : 16000 , "language" : "en-US" } , "audioData" : { "content" : "' $AUDIO_BASE64 '" } } ' Streaming Sync Voice Profile Multi-Provider wscat -c 'wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional' \ -H "Authorization: Basic $INWORLD_API_KEY " # Send config as first message: { "transcribeConfig" : { "modelId" : "inworld/inworld-stt-1" , "audioEncoding" : "LINEAR16" , "sampleRateHertz" : 16000 , "language" : "en-US" , "voiceProfileConfig" : { "enableVoiceProfile" : true } } } curl 'https://api.inworld.ai/stt/v1/transcribe' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d ' { "transcribeConfig" : { "modelId" : "groq/whisper-large-v3" , "audioEncoding" : "MP3" , "sampleRateHertz" : 16000 , "language" : "en-US" } , "audioData" : { "content" : "' $AUDIO_BASE64 '" } } ' wscat -c 'wss://api.inworld.ai/stt/v1/transcribe:streamBidirectional' \ -H "Authorization: Basic $INWORLD_API_KEY " # Send config as first message: { "transcribeConfig" : { "modelId" : "inworld/inworld-stt-1" , "audioEncoding" : "LINEAR16" , "voiceProfileConfig" : { "enableVoiceProfile" : true, "topN" : 3 } , "inworldConfig" : { "voiceProfileThreshold" : 0.5 } } } curl 'https://api.inworld.ai/stt/v1/transcribe' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY " \ -d ' { "transcribeConfig" : { "modelId" : "groq/whisper-large-v3" , "audioEncoding" : "LINEAR16" , "sampleRateHertz" : 16000 , "language" : "en-US" } , "audioData" : { "content" : "' $AUDIO_BASE64 '" } } ' Choose your endpoint. Stream audio in real time, transcribe complete files, or extract voice profile signals — all through one unified API. Realtime bidirectional streaming over WebSocket Synchronous transcription for complete audio files Voice Profile signals on every streaming chunk Multi-provider support via a single model ID Integrate with AI Voice profiling hears who's speaking, not just their words. Every audio chunk produces a realtime profile of the speaker: emotion, vocal style, accent, age, and pitch — extracted from raw audio with confidence scores. The signal that turns a transcript into context your LLM and TTS can act on. 5 paralinguistic signals per audio chunk, with confidence scores Configurable threshold to filter low-confidence results Feeds into LLM context and Realtime TTS-2 steering downstream Available on the inworld/inworld-stt-1 model Test out Profiling Voice profile signals Emotion Frustrated 84% Age Adult 84% Accent British 84% Pitch High 84% Vocal Style Shouting 84% More signals coming soon Voice profiling hears who's speaking, not just their words. Every audio chunk produces a realtime profile of the speaker: emotion, vocal style, accent, age, and pitch — extracted from raw audio with confidence scores. The signal that turns a transcript into context your LLM and TTS can act on. 5 paralinguistic signals per audio chunk, with confidence scores Configurable threshold to filter low-confidence results Feeds into LLM context and Realtime TTS-2 steering downstream Available on the inworld/inworld-stt-1 model Test out Profiling Voice profile signals Emotion Frustrated 84% Age Adult 84% Accent British 84% Pitch High 84% Vocal Style Shouting 84% More signals coming soon Voice profile steers Realtime TTS-2 in realtime. Voice profile signals flow into the LLM as context. The LLM emits Realtime TTS-2 steering tags and non-verbals inline, and Realtime TTS-2 renders an expressive response: natural pacing, soft delivery, and a real sigh, all driven by the user's voice profile. Voice profile drops into LLM context as structured metadata LLM emits inline steering tags like [Speak softly] and non-verbals like [sigh] [breathe] Realtime TTS-2 renders the markup as natural, expressive audio Wired end-to-end through the Realtime API Test Out Realtime 1. User audio → STT voice profile emotion: sad · style: soft · pitch: low 2. LLM response [Speak softly] I'm so sorry to hear that. [sigh] Let's figure this out together. 3. Realtime TTS-2 expressive output voice: Sarah · model: inworld-tts-2 1. User audio → STT voice profile emotion: sad · style: soft · pitch: low 2. LLM response [Speak softly] I'm so sorry to hear that. [sigh] Let's figure this out together. 3. Realtime TTS-2 expressive output voice: Sarah · model: inworld-tts-2 Voice profile steers Realtime TTS-2 in realtime. Voice profile signals flow into the LLM as context. The LLM emits Realtime TTS-2 steering tags and non-verbals inline, and Realtime TTS-2 renders an expressive response: natural pacing, soft delivery, and a real sigh, all driven by the user's voice profile. Voice profile drops into LLM context as structured met Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Products Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime Developers Documentation API Reference Models Playground Socials X LinkedIn GitHub Company Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms Solve the way to evolve. We are a passionate engineering-minded team united by a singular mission: to transform static software into living AI systems that autonomously evolve to better serve their users. Technical depth is a requirement for deep empathy with our customers and effective communication internally. Therefore, no matter the role, we only hire technical people. Our employees are empowered with the autonomy to work across research and engineering, drive product strategy, and leave their mark on an entire industry. If you want to build the next foundational layer of software, you belong here. Open roles GTM Mountain View, California, USA GTM Lead - USA Read more ML Engineering Mountain View, California, USA Staff / Principal Machine Learning Engineer, Serving - USA Read more Germany Senior / Lead Machine Learning Engineer, Serving - Germany Read more UK Staff / Principal Machine Learning Engineer, Serving - UK Read more Switzerland Staff / Principal Machine Learning Engineer, Serving - Switzerland Read more Serbia Senior / Lead Machine Learning Engineer, Serving - Serbia Read more Platform Mountain View, California, USA Staff / Principal Software Engineer - USA Read more Vancouver, British Columbia, Canada Staff / Principal Software Engineer - Canada Read more Mountain View, California, USA Staff / Principal Platform Engineer - USA Read more Science Mountain View, California, USA Staff / Principal Research Scientist - USA Read more Switzerland Staff / Principal Research Scientist - Switzerland Read more Germany Senior / Lead Research Scientist - Germany Read more Serbia Senior / Lead Research Scientist - Serbia Read more UK Staff / Principal Research Scientist - UK Read more Inworld AI Careers Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Blog Products Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime Developers Documentation API Reference Models Playground Socials X LinkedIn GitHub Company Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms Realtime TTS-2: A new frontier voice model that feels as human as it sounds Beyond Quality: Emotionality and Expressiveness Product Updates Inworld TTS-1.5: Upgrading the #1 Ranked TTS Model with Production-Grade Latency, Expression and Stability The Next Wave of AI Applications Case Studies Talkpal AI scales to 5 million language learners with Realtime TTS Build faster, smarter realtime agents - instant streaming, lower latency, and smart interruption handling Introducing timestamp alignment, WebSockets and more for Realtime TTS The 3 Engineering Challenges of Realtime Conversational AI The complete guide to measuring and optimizing TTS latency Unreal AI Runtime: The first unified interactive AI toolkit for game developers Introducing Inworld CLI The new AI infrastructure for scaling games, media, and characters Inworld + LiveKit: Unlocking studio-quality voice AI for real-time experiences at scale Case Studies Inworld meets Pipecat: Raising the bar for realtime voice AI Your AI is boring: Boost engagement and drive immediate performance improvements with Inworld Runtime and custom Mistral AI models Case Studies Wishroll / Status: Cutting AI costs by >95%, scaling to 500K+ DAUs, and driving time spent per user to over 1.5 hours per day Inworld + Vapi: Powering the next generation of expressive, real-time voice agents Introducing Realtime TTS Partners & Experiences Inworld + NLX: Enabling multi-modal consumer experiences with SOTA voice AI Case Studies How Inworld helped an AI game with 20 million players reach profitability Partners & Experiences How we made state-of-the-art speech synthesis scalable with Modular A Return to the User Inworld AI Blog: Build Better Conversational AI Experiences Intro to Realtime Router - Inworld AI Documentation Skip to main content Realtime TTS-2 is live. Built for realtime conversation that feels human. Learn more Inworld AI Documentation home page Search... ⌘ K Ask AI Discord Get started Get started Search... Navigation Getting Started Intro to Realtime Router Home TTS STT LLM Router Realtime API Agent Runtime API Reference Getting Started Introduction Quickstart Core Concepts Overview Capabilities Chat Completions (Text) Moderations LLM + TTS (Voice Responses) APIs & SDKs Use with Claude Code Data Integrations Migration OpenRouter to Inworld Anthropic to Inworld Resources Authentication Bring Your Own Key (BYOK) Billing Usage Vibe coding Support On this page Key Benefits Getting Started Intro to Realtime Router Copy page Copy page Documentation Index Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt Use this file to discover all available pages before exploring further. Realtime Router is an intelligent routing layer that helps you select the right model and configuration for your use case, to maximize the performance and user metrics you care about from cost and latency to user retention and revenue. In addition to providing a unified API to access hundreds of LLMs through a single endpoint while automatically handling fallbacks, Realtime Router enables you to easily run A/B experiments, route different user segments to different models, and measure the impact on your KPIs. This means you can actually optimize for your specific application and your specific users. Developer quickstart Learn how to make your first API call in minutes with a guided tutorial. Core concepts Understand the core concepts behind Realtime Router Realtime Router is currently in research preview . Please share any feedback with us in Discord . Key Benefits Unified API : Access models from OpenAI, Anthropic, Google, and more through a single API High reliability : Automatically fall back to other providers if one fails Dynamic selection : Optimize the model or provider in real-time based on price, speed, or intelligence Cost optimization : Automatically choose the most cost-effective provider or model for each request to help you stay within budget Live experimentation : Easily run experiments on different models and prompts to see what works best for your users Insightful analytics : Seamlessly integrate with your metrics to understand how different models impact your KPIs Was this page helpful? Yes No Quickstart ⌘ I Powered by This documentation is built and hosted on Mintlify, a developer documentation platform Assistant Responses are generated using AI and may contain mistakes. Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Inworld AI Privacy Notice Last Updated: January 18, 2026 This Privacy Notice describes how Theai, Inc. (“ Inworld AI ,” “ we ,” or “ us ”) collect, use, disclose and otherwise process information about you. This Privacy Notice applies to information we collect when you access or use our websites and any related services (collectively, the “ Services ”), or when you otherwise interact with us, such as through our customer support channels. This Privacy Policy does not apply to information we process on behalf of our enterprise customers in the course of providing our services. This Privacy Notice is effective as of the “Last Updated” date above. We may change this Privacy Notice from time to time. If we make changes, we will notify you by revising the “Last Updated” date. Where required or permitted by law, we will notify you of changes through the Services or by other means. COLLECTION OF INFORMATION Information You Provide to Us We collect information directly from you when you create an account, request customer support, or otherwise communicate with us. The categories of information we collect include: **Contact Information: **we collect your name, company name, email address, and phone number. Content and Biometric Information: we collect and maintain information from the content you provide to our Services, such chats and voice recordings, and, where we have your consent, use those recordings to derive a digital model of speech characteristics that may be considered Biometric Information (defined below) in some places. Financial Information: we rely on third-party payment processors to collect the financial information used to pay for the Services. Communication Information: we collect information included in your communications with us, including automated monitoring and storing of chats submitted through our Services. We may also collect any other information you choose to provide. Information We Collect Automatically We automatically collect the following categories of information: Transactional Information: we keep a history of your transactions with us, including the dates and amounts paid for the Services. Internet Activity Information: we collect information about how you access our Services, including data about the device and network you use, such as your hardware model, operating system version, mobile network, IP address, unique device identifiers, and browser type. We also collect information about your activity on our Services and interaction with our communications, such as access times, browsing behavior (such as pages viewed and links clicked), the page you visited before navigating to our Services, and information about your activity on specific pages (such as mouse movements, keystrokes, and items placed in your cart or added to your lists). **Audio Information: **If you call our support or sales teams, we may monitor and retain those conversations. Information Collected by Cookies and Similar Tracking Technologies: We use tracking technologies, such as cookies and pixels, to collect information about your interactions with our Services and communications. These technologies help us improve our Services and communications, see which areas and features are popular, count visits, and track clicks. You may be able to adjust your browser settings to remove or reject browser cookies. You can also adjust certain cookie settings by selecting “Do Not Sell or Share My Personal Information” within the cookie icon at the bottom-left corner of our website. Please note that removing or rejecting cookies could affect the availability and functionality of our Services. Information We Collect from Other Sources We may collect contact information and device identifiers from identity verification services, advertising networks, and data analytics providers. Additionally, if you create or log into your Inworld AI account through a third-party platform (such as Microsoft or Google), we may have access to certain information from that platform, such as your name and email address, depending on your platform settings. We may also collect information you share when you interact with us on social media platforms. Derived Information We may derive information or draw inferences about you based on the information we collect. For example, we may make inferences about your approximate location based on your IP address. USE OF INFORMATION We use the categories of information we collect for the following business and commercial purposes: **Service Delivery: **we use information to provide and maintain our Services, including to process payments and authenticate your account. **Communication: **we use information to communicate with you about Inworld AI and our Services, including to respond to your questions, inform you of price or Services changes, and send you other transactional or relationship messages. **Marketing and Advertising: **we use information to send direct marketing messages (including via email) and target advertisements to you on third-party platforms and websites as described in the “Targeted Advertising and Analytics” section below. You can opt out of direct marketing messages we send by following the instructions in those communications (such as by clicking “unsubscribe” in the emails) or by reaching out via the “Contact Us” section below. **Research and Development: **we use information to monitor and analyze Services trends, usage, and activities, improve our Services, and generate de-identified data. We also use information to develop new products and services, including to train our artificial intelligence models. **Protection and Compliance: **we use information to detect, investigate, and help prevent security incidents and other malicious, deceptive, fraudulent, or illegal activity, help protect the rights and property of Inworld AI and others, and comply with our legal and financial obligations. Notice/Consent : we may also use information in other circumstances after giving you notice and/or getting your consent. TARGETED ADVERTISING AND ANALYTICS We engage others to provide analytics services, serve advertisements, and perform related services across the web and in mobile applications. These entities may use cookies, web beacons, device identifiers, and other technologies to collect information about your use of our Services, including your IP address, web browser and mobile network information, pages viewed, time spent on pages, and links clicked. This information is used to deliver advertising targeted to your interests on other companies’ sites or mobile apps and to analyze and track data, determine the popularity of certain content, and better understand your activity. Some of the activities described in this section may constitute “targeted advertising,” “sharing,” or “selling” under certain laws. To opt out of these practices, click “Do Not Sell or Share My Personal Information” within the cookie icon at the bottom-left corner of our website. In addition to ad targeting activities that rely on cookies and similar technologies, we work with advertising partners to translate other identifiers, such as your email address or phone number, into a unique identifier (called a hashed value) that such partners can then use to show ads on our Services that are more relevant to you across the web and in mobile apps. Depending on where you reside, you may opt out of these disclosures by emailing [email protected] . You can also learn more about interest-based ads, or opt out of having your web browsing information used for behavioral advertising purposes by companies that participate in the Digital Advertising Alliance, by visiting www.aboutads.info/choices . DISCLOSURE OF INFORMATION We disclose information as follows: Vendors: we disclose information to vendors, service providers, contractors and consultants that need this inform Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Models Use hundreds of models from leading AI providers through Inworld Router or the Inworld Realtime API. All Google OpenAI Anthropic xAI Mistral DeepSeek Alibaba MiniMax Meta Kimi NVIDIA Nous Research ByteDance Z AI DeepInfra Gryphe Xiaomi Microsoft Sao10K Filters All Creators All Providers Filters Columns ( 8 ) 253 model s Model Context Input ($/1M tokens) Output ($/1M tokens) Capabilities Input modalities Output modalities Inference providers Gemini 2.0 Flash Going away: Jun 1, 2026 1M $0.10 $0.40 + 2 Gemini 2.0 Flash 001 Going away: Jun 1, 2026 1M $0.15 $0.60 + 2 Gemini 2.0 Flash Lite Going away: Jun 1, 2026 1M $0.075 $0.30 + 2 Gemini 2.0 Flash Lite 001 Going away: Jun 1, 2026 1M $0.075 $0.30 + 2 Gemini 2.5 Flash 1M $0.30 $2.50 + 4 Gemini 2.5 Flash Image 32.8K $0.30 $2.50 + 1 Gemini 2.5 Flash Lite 1M $0.10 $0.40 + 4 Gemini 2.5 Flash Lite Preview 09 2025 1M $0.10 $0.40 + 4 Gemini 2.5 PRO 1M $1.25 $10.00 + 4 Gemini 3 Flash Preview 1M $0.50 $3.00 + 4 Gemini 3 PRO Image Preview 65.5K $2.00 $12.00 + 1 Gemini 3.1 Flash Image Preview 65.5K $0.50 $3.00 + 1 Gemini 3.1 Flash Lite — $0.25 $1.50 — Gemini 3.1 Flash Lite Preview 1M $0.25 $1.50 + 4 Gemini 3.1 PRO Preview 1M $2.00 $12.00 + 4 Gemini 3.1 PRO Preview Customtools 1M $2.00 $12.00 + 4 Gemini Flash Latest 1M $0.30 $2.50 + 4 Gemini Flash Lite Latest 1M $0.10 $0.40 + 4 Gemini PRO Latest 1M $1.25 $10.00 + 4 Gemma 3 12b 131.1K $0.04 $0.13 Gemma 3 27b 131.1K $0.08 $0.16 Gemma 3 4b 131.1K $0.04 $0.08 Gemma 4 26b A4b — $0.07 $0.34 — Gemma 4 31b — $0.13 $0.38 — GPT 3.5 Turbo 16.4K $0.50 $1.50 GPT 3.5 Turbo 0125 16.4K $0.50 $1.50 GPT 3.5 Turbo 1106 Going away: Sep 28, 2026 16.4K $1.00 $2.00 GPT 3.5 Turbo 16k 16.4K $3.00 $4.00 GPT 4 Turbo 128K $10.00 $30.00 GPT 4 Turbo 2024 04 09 128K $10.00 $30.00 GPT 4.1 1M $2.00 $8.00 + 2 GPT 4.1 2025 04 14 1M $2.00 $8.00 + 2 GPT 4.1 Mini 1M $0.40 $1.60 + 2 GPT 4.1 Mini 2025 04 14 1M $0.40 $1.60 + 2 GPT 4.1 Nano 1M $0.10 $0.40 + 1 GPT 4.1 Nano 2025 04 14 1M $0.10 $0.40 + 1 GPT 4o 128K $2.50 $10.00 + 1 GPT 4o 2024 05 13 128K $5.00 $15.00 GPT 4o 2024 08 06 128K $2.50 $10.00 + 1 GPT 4o 2024 11 20 128K $2.50 $10.00 + 1 GPT 4o Audio Preview 128K $2.50 $10.00 GPT 4o Audio Preview 2024 12 17 128K $2.50 $10.00 GPT 4o Audio Preview 2025 06 03 128K $2.50 $10.00 GPT 4o Mini 128K $0.15 $0.60 + 1 GPT 4o Mini 2024 07 18 128K $0.15 $0.60 + 1 GPT 4o Mini Audio Preview 128K $0.15 $0.60 GPT 4o Mini Audio Preview 2024 12 17 128K $0.15 $0.60 GPT 4o Mini Search Preview 128K $0.15 $0.60 + 2 GPT 4o Mini Search Preview 2025 03 11 128K $0.15 $0.60 + 1 GPT 4o Search Preview 128K $2.50 $10.00 + 2 GPT 4o Search Preview 2025 03 11 128K $2.50 $10.00 + 1 GPT 5 272K $1.25 $10.00 + 4 GPT 5 2025 08 07 272K $1.25 $10.00 + 4 GPT 5 Chat Latest 128K $1.25 $10.00 + 2 GPT 5 Mini 272K $0.25 $2.00 + 4 GPT 5 Mini 2025 08 07 272K $0.25 $2.00 + 4 GPT 5 Nano 272K $0.05 $0.40 + 4 GPT 5 Nano 2025 08 07 272K $0.05 $0.40 + 4 GPT 5 Search Api 272K $1.25 $10.00 + 2 GPT 5 Search Api 2025 10 14 272K $1.25 $10.00 + 2 GPT 5.1 272K $1.25 $10.00 + 4 GPT 5.1 2025 11 13 272K $1.25 $10.00 + 4 GPT 5.1 Chat Latest 128K $1.25 $10.00 + 3 GPT 5.2 272K $1.75 $14.00 + 4 GPT 5.2 2025 12 11 272K $1.75 $14.00 + 4 GPT 5.2 Chat Latest 128K $1.75 $14.00 + 4 GPT 5.3 Chat Latest 128K $1.75 $14.00 + 4 GPT 5.4 1.1M $2.50 $15.00 + 3 GPT 5.4 2026 03 05 1.1M $2.50 $15.00 + 3 GPT 5.4 Mini 272K $0.75 $4.50 + 4 GPT 5.4 Mini 2026 03 17 272K $0.75 $4.50 + 4 GPT 5.4 Nano 272K $0.20 $1.25 + 4 GPT 5.4 Nano 2026 03 17 272K $0.20 $1.25 + 4 GPT 5.5 1.1M $5.00 $30.00 + 4 GPT 5.5 2026 04 23 1.1M $5.00 $30.00 + 4 GPT Audio 128K $2.50 $10.00 GPT Audio 1.5 128K $2.50 $10.00 GPT Audio 2025 08 28 128K $2.50 $10.00 GPT Audio Mini 128K $0.60 $2.40 GPT Audio Mini 2025 10 06 128K $0.60 $2.40 GPT Audio Mini 2025 12 15 128K $0.60 $2.40 GPT Image 2 — $5.00 $10.00 GPT Image 2 2026 04 21 — $5.00 $10.00 GPT Oss 120b 131.1K $0.039 $0.19 + 2 GPT Oss 120b Turbo — $0.15 $0.60 — GPT Oss 20b 131.1K $0.03 $0.14 + 2 O3 200K $2.00 $8.00 + 4 O3 2025 04 16 200K $2.00 $8.00 + 4 O3 Mini 200K $1.10 $4.40 + 2 O3 Mini 2025 01 31 200K $1.10 $4.40 + 2 O4 Mini 200K $1.10 $4.40 + 4 O4 Mini 2025 04 16 200K $1.10 $4.40 + 4 Openai/gpt Oss Safeguard 20b 131.1K $0.075 $0.30 + 2 Claude Haiku 4.5 20251001 200K $1.00 $5.00 + 3 Claude Opus 4 20250514 Going away: May 14, 2026 200K $15.00 $75.00 + 3 Claude Opus 4.1 20250805 Going away: Aug 5, 2026 200K $15.00 $75.00 + 3 Claude Opus 4.5 20251101 200K $5.00 $25.00 + 3 Claude Opus 4.6 1M $5.00 $25.00 + 3 Claude Opus 4.7 1M $5.00 $25.00 + 3 Claude Sonnet 4 20250514 Going away: May 14, 2026 1M $3.00 $15.00 + 3 Claude Sonnet 4.5 20250929 200K $3.00 $15.00 + 4 Claude Sonnet 4.6 1M $3.00 $15.00 + 3 Grok 3 131.1K $3.00 $15.00 Grok 3 Beta 131.1K $3.00 $15.00 Grok 3 Fast — $3.00 $15.00 — Grok 3 Fast Beta 131.1K $3.00 $15.00 Grok 3 Fast Latest 131.1K $3.00 $15.00 Grok 3 Latest 131.1K $3.00 $15.00 Grok 3 Mini Fast 131.1K $0.30 $0.50 + 2 Grok 3 Mini Fast Beta 131.1K $0.30 $0.50 + 2 Grok 3 Mini Fast Latest 131.1K $0.30 $0.50 + 2 Grok 3 Mini Latest 131.1K $0.30 $0.50 + 2 Grok 4 256K $3.00 $15.00 Grok 4 0709 256K $3.00 $15.00 Grok 4 Fast — $0.20 $0.50 — Grok 4 Fast Non Reasoning 2M $0.20 $0.50 Grok 4 Fast Non Reasoning Latest — $0.20 $0.50 — Grok 4 Fast Reasoning 2M $0.20 $0.50 Grok 4 Fast Reasoning Latest — $0.20 $0.50 — Grok 4 Latest 256K $3.00 $15.00 Grok 4.1 Fast 2M $0.20 $0.50 + 4 Grok 4.1 Fast Non Reasoning 2M $0.20 $0.50 + 2 Grok 4.1 Fast Non Reasoning Latest 2M $0.20 $0.50 + 2 Grok 4.1 Fast Reasoning 2M $0.20 $0.50 + 4 Grok 4.1 Fast Reasoning Latest 2M $0.20 $0.50 + 4 Grok 4.20 — $1.25 $2.50 — Grok 4.20 0309 — $1.25 $2.50 — Grok 4.20 0309 Non Reasoning — $1.25 $2.50 — Grok 4.20 0309 Reasoning 2M $1.25 $2.50 + 2 Grok 4.20 Beta — $1.25 $2.50 — Grok 4.20 Beta 0309 — $1.25 $2.50 — Grok 4.20 Beta 0309 Non Reasoning 2M $1.25 $2.50 + 1 Grok 4.20 Beta 0309 Reasoning 2M $1.25 $2.50 + 3 Grok 4.20 Beta Latest — $1.25 $2.50 — Grok 4.20 Beta Latest Non Reasoning — $1.25 $2.50 — Grok 4.20 Beta Latest Reasoning — $1.25 $2.50 — Grok 4.20 Beta Non Reasoning — $1.25 $2.50 — Grok 4.20 Beta Reasoning — $1.25 $2.50 — Grok 4.20 Experimental Beta 0304 — $1.25 $2.50 — Grok 4.20 Experimental Beta 0304 Non Reasoning — $1.25 $2.50 — Grok 4.20 Experimental Beta 0304 Reasoning — $1.25 $2.50 — Grok 4.20 Experimental Beta Latest — $1.25 $2.50 — Grok 4.20 Experimental Beta Non Reasoning Latest — $1.25 $2.50 — Grok 4.20 Experimental Beta Reasoning Latest — $1.25 $2.50 — Grok 4.20 Non Reasoning — $1.25 $2.50 — Grok 4.20 Non Reasoning Gv2 — $1.25 $2.50 — Grok 4.20 Non Reasoning Latest — $1.25 $2.50 — Grok 4.20 Reasoning — $1.25 $2.50 — Grok 4.20 Reasoning Gv2 — $1.25 $2.50 — Grok 4.20 Reasoning Latest — $1.25 $2.50 — Grok 4.3 1M $1.25 $2.50 + 4 Grok 4.3 Latest 1M $1.25 $2.50 + 4 Grok Code Fast 256K $0.20 $1.50 + 1 Grok Code Fast 1 256K $0.20 $1.50 + 1 Grok Code Fast 1 0825 256K $0.20 $1.50 + 1 Grok Latest — $1.25 $2.50 — Codestral 2508 256K $0.30 $0.90 Codestral Latest 32K $1.00 $3.00 Devstral 2 256K $0.40 $2.00 Devstral Medium 2507 128K $0.40 $2.00 Devstral Medium Latest 256K $0.40 $2.00 Devstral Small 128K $0.10 $0.30 Magistral Medium 2509 40K $2.00 $5.00 + 1 Magistral Medium Latest 40K $2.00 $5.00 + 1 Magistral Small Latest 40K $0.50 $1.50 + 1 Ministral 3 14b — $0.20 $0.20 — Ministral 3 3b — $0.10 $0.10 — Ministral 3 8b — $0.15 $0.15 — Mistral Large 2 128K $2.00 $6.00 Mistral Large 3 262.1K $0.50 $1.50 Mistral Medium 32K $2.70 $8.10 Mistral Medium 2505 131.1K $0.40 $2.00 Mistral Medium 2508 — $0.40 $2.00 — Mistral Medium 3 — $0.40 $2.00 — Mistral Medium 3.5 — $1.50 $7.50 — Mistral Medium Latest 131.1K $0.40 $2.00 Mistral Small 2603 — $0.15 $0.60 — Mistral Small 3.2 131.1K $0.06 $0.18 Mistralai/Mistral Nemo Instruct 2407 131.1K $0.02 $0.04 Mistralai/Mistral Small 24B Instruct 2501 32.8K $0.05 $0.08 Mistralai/Mistral Small 3.2 24B Instruct 2506 128K $0.075 $0.20 Open Mistral Nemo 128K $0.30 $0.30 Open Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Inworld AI Terms of Service Effective Date: June 11, 2025 These Terms of Service ("Terms") form a legally binding agreement between you ("Customer," "you," or "your") and Theai, Inc. dba Inworld AI ("Inworld AI," "we," "our," or "us"). These Terms govern your access to and use of the AI services, software, models, APIs, and related tools (the "Services") provided by Inworld AI. By clicking to accept or by accessing or using the Services, you agree to these Terms. If you are accepting on behalf of an organization, you represent that you have authority to bind that organization. Eligibility and Acceptance You may use the Services only if you can form a binding contract with Inworld AI and are not barred from using the Services under applicable law. You must be at least 18 years old. These Terms are enforceable as a clickwrap agreement upon acceptance. Registration and Access You must provide accurate and complete information to register for the Services and keep it up to date. You are responsible for maintaining the confidentiality of your credentials and for all activities under your account. Services and License License Grant . Subject to your compliance with these Terms and payment of applicable fees, Inworld AI grants you a limited, non-exclusive, non-transferable, non-sublicensable license to access and use the Services and Models solely for your internal business purposes and as permitted by the documentation. Acceptable Use Policy and Service Specific Terms . You agree to abide by our Acceptable Use Policy and Service Specific Terms . Any violation of these may, in Inworld AI's sole and exclusive judgement, result in suspension or termination of your license and/or account. Customer-Hosted Deployment . Where permitted, you may deploy Models on your own infrastructure. You are solely responsible for the security, compliance, and lawful use of such deployments. Your deployment must comply with the Customer-Hosted Deployment terms, and you must maintain security, audit access, and provide appropriate disclaimers to End-Users. Inputs, Outputs, Actions, and Materials Generally . You may be allowed to interact with our Services in a variety of formats (we call these "Inputs"). Our Services may generate responses (we call these "Outputs"), or enable the Services to take actions on your behalf, such as software manipulation, data processing, and system interactions (we call these "Actions"), based on your Inputs. Inputs and Outputs collectively are "Materials." Your responsibilities . You are responsible for all Inputs you submit to our Services and all Actions. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms and to provide the Services to you, including for example, to integrate with third-party services, to share Materials with others at your direction, and to take Actions. You also represent and warrant that your submitting Inputs to us or directing the Services to take Actions will not violate our Terms, Service Specific Terms or any laws or regulations applicable to those Inputs or Actions. Disclaimers . Artificial intelligence and large language models are new technologies that are constantly evolving. When you use our Services, you acknowledge and agree: Outputs may not always be accurate and may contain material inaccuracies even if they appear accurate because of their level of detail or specificity. Actions may not be error free or operate as you intended. You should not rely on any Outputs or Actions without independently confirming their accuracy. The Services and any Outputs may not reflect correct, current, or complete information. Outputs may contain inconsistent content. Due to the nature of the Services, there is a possibility that another Inworld AI customer might use the same or similar Input as Customer, which could result in the generation of the same or similar Output by the Services. Ownership of Input and Output . As between you and Inworld AI, and to the extent permitted by applicable law, you retain any right, title, and interest that you have in the Inputs you submit. Subject to your compliance with our Terms, we assign to you all of our right, title, and interest in Outputs. License to Materials . You grant Inworld AI a non-exclusive, worldwide license to use, reproduce, distribute, modify, and create derivative works from Materials to operate, maintain, and improve the Services, comply with law, and enforce these Terms. No Training . We will not train our generally available models on any Materials that are not publicly available, except in two circumstances: Feedback: If you provide Feedback to us (through the Services or otherwise) regarding any Materials, you grant Inworld AI an unrestricted, perpetual, irrevocable license to use feedback or suggestions you provide without any obligation or compensation. If your Materials are flagged for trust and safety review, we may use or analyze those Materials to improve our ability to detect and enforce any violations. Intellectual Property Except as expressly set forth herein, Inworld AI and its licensors retain all rights, title, and interest in and to the Services, including models, software, and associated intellectual property. You receive no rights except as expressly granted in these Terms. Payment and Billing Payments. If you purchase any Services, you will provide complete and accurate billing information, including a valid payment method. For paid subscriptions, we will automatically charge your payment method on each agreed-upon periodic renewal until you cancel. You're responsible for all applicable taxes, and we'll charge tax when required. If your payment cannot be completed, we may downgrade your account or suspend your access to our Services until payment is received. Late payments may accrue interest at 1.5% per month or the highest rate allowed by law. Cancellation. You can cancel your paid subscription at any time. Payments are non-refundable, except where required by law. These Terms do not override any mandatory local laws regarding your cancellation rights. Changes. We may change our prices from time to time. If we increase our subscription prices, we will give you at least 30 days' notice and any price increase will take effect on your next renewal so that you can cancel if you do not agree to the price increase. Usage Limits and Fair Use Use of the Services is subject to the usage tiers, quotas, and fair use restrictions presented at sign-up or in documentation. Inworld AI may monitor usage and throttle, suspend, or terminate access if limits are exceeded or abuse is detected. Confidentiality "Confidential Information" means any business, technical or financial information, materials, or other subject matter disclosed by Inworld AI to you that is: (i) identified as confidential at the time of disclosure; or (ii) should be reasonably understood by a recipient to be confidential under the circumstances. Use and Nondisclosure. You agree that you will: (i) only use Inworld AI's Confidential Information to exercise its rights and fulfill its obligations under this Agreement; (ii) take reasonable measures to protect Inworld AI's Confidential Information; and (iii) not disclose Inworld AI's Confidential Information to any third party except as expressly permitted in these Terms. Exceptions. The obligations in Section 8(a) do not apply to information that: (i) is or becomes generally available to the public through no fault of yours; (ii) was in your possession or known by it prior to receipt from Inworld AI; (iii) was rightfully disclosed to you without restricti Get started Menu Products Developers Company Pricing Contact Us Log In Integrate with AI Get started Inworld Agent Runtime Build realtime voice and chat agents for demanding applications. Integrated metrics and experiments to optimize for user outcomes. Deploy to hosted API endpoints or integrate via SDKs. Get started Read the docs Realtime agents built for scale Build with production-grade orchestration and rapid inference. Try demo Clone voice agent Model-Agnostic Orchestration Lightning-fast C++-based orchestration that provides unified access to all the best models. LLM, TTS, STT, tools, and more. Learn more Integrated Observability for Measurable Gains Easily monitor performance, costs, and user patterns on every interaction. Learn more Improve User Engagement with Experiments Instantly deploy new models and prompts and measure impact on user metrics. Learn more Why Inworld Agent Runtime Your users deserve the best quality, availability and speed Exceptional Quality Serve personalized models and prompts to delight every user. High Availability Automatic failovers prevent downtime from outages and rate limits. Ultra Low Latency Lightning-fast execution that scales seamlessly from 10 to 10M users with minimum code changes. Proven Results Built for every consumer AI application - from social apps and games to learning and wellness. Wishroll Status Went from prototype to 1M users in 19 days with 20x cost reduction Learn more Little Umbrella From a 1.2 Billion token bill to profitability with 20 million players Learn more Streamlabs Built a realtime multimodal streaming assistant with sub 500ms latency Learn more Bible Chat Increased voice AI feature engagement and reached millions Built for every consumer AI application Agent Runtime scales consumer AI, driving experiences from social apps and games to learning and wellness. Social & Community Apps AI-powered social discovery, content moderation, and personalized feeds that understand context and scale to millions of users Learn more Education & Learning Voice-enabled language tutors, adaptive study companions, and intelligent content generation that meets each learner where they are Learn more Health & Wellness 24/7 mental health companions, personalized fitness coaching, and health assistants that understand individual needs and privacy Learn more Gaming & Interactive Media Dynamic NPCs, interactive storytelling, and immersive experiences powered by AI that scales from indie games to AAA studios Learn more Get started Inworld easily integrates with any existing stack or provider (Anthropic, Google, Mistral, OpenAI, etc.) via one API key. Available to everyone now. Start building Explore templates FAQ How is Agent Runtime different from other AI backends? The Inworld Agent Runtime uniquely combines lightning-fast C++ core for realtime multimodal conversational interactions. built-in telemetry for deep user insights (traces & logs). live A/B testing to accelerate improvements to the end user experience. Is Agent Runtime free? Yes, Agent Runtime is free. Consumption of models is the only thing you pay for. Agent Runtime itself incurs no cost or license fee. Learn more about model pricing here . How do I get started? Follow our quick start guide to deploy a realtime conversational AI endpoint in 3 minutes - Then integrate into your app. Who is Inworld Agent Runtime designed for? Inworld Agent Runtime is specifically designed for developers building realtime conversational AI and voice agents that scale to millions of concurrent users. Use cases include language tutors, social media, AI companions, game characters, fitness coaches, social media, shopping agents, and more. Can Inworld Agent Runtime work with my existing framework and stack? Yes, you can use the Inworld CLI to deploy a hosted endpoint that can be easily called by any part of your existing stack. What production-ready building blocks does Inworld Agent Runtime provide? Developers get a full suite of pre-optimized nodes to construct any real-time AI pipeline that can scale to millions of users, including nodes for model I/O (STT, LLM, TTS) data engineering (prompt building, chunking) flow logic (keyword matching, safety) external tool calls (MCP integrations) and more What LLM, TTS, STT, and embedding models do you support? Use Inworld’s state-of-the-art TTS models alongside your preferred LLM. We support all major providers—OpenAI, Anthropic, Google, Mistral—and low-latency platforms like Fireworks, Groq, and Tenstorrent. Read supported models here . Products Realtime TTS Realtime Router Realtime STT Realtime API Agent Runtime Developers Documentation API Reference Models Playground Socials X LinkedIn GitHub Company Careers Blog Security Resources Copyright © 2021-2026 Inworld AI Privacy Terms Build Realtime Conversational AI | Inworld Runtime Introduction - Inworld AI Documentation Skip to main content Realtime TTS-2 is live. Built for realtime conversation that feels human. Learn more Inworld AI Documentation home page Search... ⌘ K Ask AI Discord Get started Get started Search... Navigation Overview Introduction Home TTS STT LLM Router Realtime API Agent Runtime API Reference Overview Introduction Rate Limits Models Text-to-Speech POST Synthesize speech POST Synthesize speech (stream) WSS Synthesize speech (WebSocket) Voices POST Clone a voice POST Design a voice POST Publish a voice GET List voices in a workspace GET Get a specific voice GET Get voice preview PATCH Update a voice DEL Delete a voice Speech-to-Text POST Transcribe audio WSS Transcribe audio (WebSocket) Realtime API WSS Realtime API (WebSocket) Realtime API (WebRTC) LLM POST Create chat completion Router POST Create router GET List routers GET Get router PATCH Update router DEL Delete router Moderation POST Create moderation POST Create chat moderation Models GET List models Embeddings POST Create embeddings On this page Authentication Getting an API key Basic authentication JWT authentication Overview Introduction Copy page Copy page Documentation Index Fetch the complete documentation index at: https://docs.inworld.ai/llms.txt Use this file to discover all available pages before exploring further. Authentication All requests to Inworld’s APIs must include an API key in an Authorization HTTP header. All APIs support both Basic and JWT authentication. Getting an API key To get an API key, follow these steps: Log in to Inworld Portal . Click API Keys on the bottom left sidebar. Click Generate new key to generate a new API key. Copy the Basic (Base64) authorization signature. You can also specify for each API key whether it has write permissions to: Voice API , which enables the API key to be used for POST, PATCH, and DELETE endpoints (clone voice, update voice, delete voice). GET endpoints only require read permissions. Router API , which enables the API key to be used for POST, PATCH, and DELETE endpoints (create router, update router, delete router). GET endpoints only require read permissions. These permissions do not impact other APIs (such as Text-to-Speech and LLM). Basic authentication Do not expose your Base64 API credentials in client-side code (browsers, apps, game builds), as it may be compromised. Please consider JWT authentication for client-side builds. Basic authentication uses the Base64 encoded credentials to authenticate the request. Below is an example of the header for Basic authentication: Authorization: Basic $INWORLD_API_KEY Make sure to keep your Base64 credentials safe, as anyone with your credentials can make requests on your behalf. It is recommended that credentials are stored as environment variables and read at run time. JWT authentication JWT (JSON Web Token) authentication allows you to issue a signed token from your server that clients can use to securely authenticate with Inworld APIs. This method is strongly recommended when calling APIs from client-side code, to avoid exposing your credentials. How it works: Your backend securely stores the Inworld API Key and Secret. When the client needs to authenticate, it requests a token from your backend. Your backend uses the API key and secret to generate a signed JWT and returns it to the client. The client uses this JWT with each API request to Inworld: Authorization: Bearer $JWT We recommend taking a look at this sample Node.js application for an example of how to generate JWT tokens for authentication with the Inworld API. Was this page helpful? Yes No Rate Limits ⌘ I Powered by This documentation is built and hosted on Mintlify, a developer documentation platform Assistant Responses are generated using AI and may contain mistakes.

◈ Crawled Pages — Provenance Chain

Law I — Provenance · Law III — Reverse Ontology · source: https://inworld.ai/ Visit Source ↗

Root-LD — Traveling Context Pod v1.0 · gdr-649837d3 · three layers

Graph Edges

17,836

Tokens Measured

0.2294

Type-Token Ratio

Schema Blocks

36%

Schema Coverage

Root-LD is the traveling context pod for this entity — permanent, provenance-grounded. The head <script> block is machine-readable. This section shows the same data to humans. We show the work in both spaces.

Layer 1 — Anchor · Immutable after mint. UUID, federation_id, content hash, timestamps. A new crawl appends to recursive — the anchor is never touched. Law I — Provenance.

rld:anchor — gdr-649837d3

{
  "uuid": "649837d3-d023-44d4-bf0d-99388cc7566d",
  "federation_id": "gdr-649837d3",
  "sequence": 0,
  "content_hash": "3fc7d18757105613721db6854493fe048ca1ef97425d5eeee2d33d65cfbcf42d",
  "primary_source": "https://inworld.ai/",
  "source_verified": true,
  "generation_method": "crawl_extract_v1",
  "spec_version": "1.0",
  "queued_at": "2026-05-12T18:34:47.568873+00:00",
  "minted_at": "2026-05-12T18:34:47.568873+00:00"
}

Layer 2 — Body · Complete measurement snapshot frozen at mint. Identity, SEO, schema graph, six-layer topology fingerprint, ratio signals, navigation. Law II — Temporal Attestation.

rld:body — inworld.ai

{
  "domain": "inworld.ai",
  "canonical_url": "https://inworld.ai/",
  "tld": "ai",
  "slug": "inworld-ai",
  "status_code": 200,
  "redirect_chain": [],
  "response_time_ms": 3209,
  "ssl_valid": true,
  "server_header": "istio-envoy",
  "title": "Inworld AI – The #1 Ranked Realtime Voice AI",
  "h1": "The #1 ranked",
  "meta_description": "#1 ranked TTS with under 200ms latency, voice cloning, and 75% lower cost. Realtime agents built for scale.",
  "lang_declared": "en",
  "schema_types": [
    "Organization",
    "ImageObject",
    "ContactPoint",
    "PostalAddress",
    "WebSite",
    "CollectionPage",
    "Article",
    "WebPage",
    "JobPosting",
    "Place",
    "Country",
    "Blog",
    "ItemList",
    "ListItem"
  ],
  "schema_score": 0.3566,
  "schema_prop_count": 39,
  "schema_gap_list": [
    "funding",
    "knowsAbout",
    "employee",
    "diversityStaffingReport",
    "keywords",
    "slogan",
    "funder",
    "aggregateRating",
    "leiCode",
    "knowsLanguage",
    "brand",
    "skills",
    "naics",
    "founder",
    "hasMerchantReturnPolicy",
    "parentOrganization",
    "areaServed",
    "globalLocationNumber",
    "review",
    "event"
  ],
  "top_semantic_words": [
    "quot",
    "realtime",
    "voice",
    "inworld",
    "tts",
    "api",
    "audio",
    "speech",
    "gpt",
    "model",
    "stt",
    "information",
    "started",
    "models",
    "llm",
    "grok",
    "type",
    "services",
    "user",
    "router",
    "stated",
    "latency",
    "custom",
    "best",
    "data",
    "text",
    "streaming",
    "documentation",
    "mini",
    "pricing",
    "support",
    "agent",
    "developers",
    "websocket",
    "yes",
    "modelid",
    "real",
    "language",
    "basic",
    "built"
  ],
  "ratio_signals": {
    "schema_density": 0.975,
    "nav_ratio": 0.0162,
    "content_to_structure_ratio": 0.036343,
    "external_tld_diversity": 2,
    "self_declaration_coherence": 0.1683,
    "schema_to_navigation_alignment": 0.0,
    "javascript_surface_ratio": 0.0,
    "url_depth_distribution": {
      "depth_0": 5,
      "depth_1": 32,
      "depth_2": 179,
      "depth_3plus": 92
    }
  },
  "semantic_html_ratio": 0.0,
  "javascript_surface_ratio": 0.0,
  "img_alt_coverage": 0.0,
  "robots_complexity_score": 0,
  "ariadne_blocked": false,
  "security_label": "MINIMAL",
  "https_enforced": true,
  "freshness_label": "CURRENT",
  "tld_starjet_url": "https://globaldataregistry.com/registry/tld/ledger/ai",
  "schema_starjet_urls": [
    "https://globaldataregistry.com/registry/schema/ledger/organization",
    "https://globaldataregistry.com/registry/schema/ledger/imageobject",
    "https://globaldataregistry.com/registry/schema/ledger/contactpoint",
    "https://globaldataregistry.com/registry/schema/ledger/postaladdress",
    "https://globaldataregistry.com/registry/schema/ledger/website",
    "https://globaldataregistry.com/registry/schema/ledger/collectionpage",
    "https://globaldataregistry.com/registry/schema/ledger/article",
    "https://globaldataregistry.com/registry/schema/ledger/webpage",
    "https://globaldataregistry.com/registry/schema/ledger/jobposting",
    "https://globaldataregistry.com/registry/schema/ledger/place",
    "https://globaldataregistry.com/registry/schema/ledger/country",
    "https://globaldataregistry.com/registry/schema/ledger/blog",
    "https://globaldataregistry.com/registry/schema/ledger/itemlist",
    "https://globaldataregistry.com/registry/schema/ledger/listitem"
  ],
  "native_text_sample": "We value your privacy\n\nThis website or its third-party tools process personal data. You can opt out of the sale of your personal information by clicking on the “Do Not Sell or Share My Personal Information” link.\n\nDo Not Sell or Share My Personal Information\nPowered by\n\nRealtime TTS-2 is live. Built for realtime conversation that feels human. Try the live demo Read the announcement\n\nProducts\nDevelopers\nCompany\nPricing\nContact Us\nLog In\nINTEGRATE WITH AI\nGET STARTED\nThe #1 ranked\nrealtime voice A",
  "topology_fingerprint_version": "1.0.0"
}

Layer 3 — Recursive · Empty at mint. Grows forever through accumulated corpus passes. Common edges (Law V), uncommon edges (Law VI), topology cluster scores. The graph builds itself. Law VII — Torus.

rld:recursive — edge_count=0

{
  "edges": [],
  "appended_at": [],
  "edge_count": 0
}

Root-LD v1.0 · root-ld.org · Law I+II+VII root-ld.org ↗

Schema.org Intelligence scored · graph traversal · Law VI negative space

▶ 36% coverage · 14 types · 39 props · 69 gaps · click to expand

36%

Schema Utilization Score

PARTIAL COVERAGE — GAPS IDENTIFIED

schema.org v2.0.0 · 39 props extracted · 69 gaps · https://inworld.ai/

Thing → OrganizationImageObjectContactPointPostalAddressWebSite

◈ Schema Graph — Three-Direction Traversal

Declared: Organization · ImageObject · ContactPoint · PostalAddress · WebSite · CollectionPage · Article · WebPage · JobPosting · Place · Country · Blog · ItemList · ListItem

✓ Implemented

✓nameownInworld AI

✓legalNameownTheai, Inc. dba Inworld AI

✓urlownhttps://inworld.ai

✓logoown[ImageObject]

✓descriptionown#1 ranked TTS with under 200ms latency, voice cloning, and 75% lower cost. Realtime agents built for scale.

✓foundingDateown2021

✓sameAsownhttps://x.com/inworld_ai (+2 more)

✓contactPointown[ContactPoint]

✓addressown[PostalAddress]

✓widthown180

✓heightown180

✓contactTypeownCustomer Support

✓emailown[email protected]

✓availableLanguageownEnglish

✓addressCountryownUS

✓addressLocalityownMountain View

✓addressRegionownCA

✓postalCodeown94040

✓streetAddressown1975 W El Camino Real, Suite 300

✓publisherownInworld AI

✓creatorownMintlify

✓inLanguageownen-US

✓headlineownRealtime TTS-2: A new frontier voice model that feels as human as it sounds

✓imageownhttps://inworld.ai/favicon.ico

✓datePublishedown2026-05-05T00:00:00.000Z

✓authorownInworld AI

✓mainEntityOfPageownhttps://inworld.ai/blog/realtime-tts-2

✓titleownStaff / Principal Research Scientist - USA

✓datePostedown2023-08-16T19:48:08.587+00:00

✓validThroughown2026-06-11T18:34:54.828Z

✓employmentTypeownFULL_TIME

✓hiringOrganizationownInworld AI

✓jobLocationown[Place]

✓applicantLocationRequirementsownUS

✓directApplyownTRUE

✓jobLocationTypeownTELECOMMUTE

✓numberOfItemsown272

✓itemListElementownGemini 2.0 Flash (+99 more)

✓positionown1

✗ Not Implemented / Gap

✗numberOfEmployeesgap—

✗openingHoursgap—

✗slogangap—

✗keywordsgap—

✗aggregateRatinggap—

✗identifiergap—

✗geogap—

✗areaServedgap—

✗hasOfferCataloggap—

✗priceRangegap—

✗knowsAboutgap—

✗alternateNamegap—

✗telephonegap—

✗fundinggap—

✗employeegap—

✗diversityStaffingReportgap—

✗fundergap—

✗leiCodegap—

✗knowsLanguagegap—

✗brandgap—

✗skillsgap—

✗naicsgap—

✗foundergap—

✗hasMerchantReturnPolicygap—

✗parentOrganizationgap—

✗globalLocationNumbergap—

✗reviewgap—

✗eventgap—

Thingancestor +1schema.org/Thing ↗6/13 (46%)

The most generic type of item.

sameAsnamemainEntityOfPagedescriptionurlimage

additionalTypeidentifierownersubjectOfpotentialActionalternateNamedisambiguatingDescription

CreativeWorksibling via Thingschema.org/CreativeWork ↗102 exclusive

The most generic kind of creative work, including books, movies, photographs, software programs, etc.

providergenrewordCountaccessModeSufficientacquireLicensePagetemporalCoveragepublisherthumbnail

Productsibling via Thingschema.org/Product ↗46 exclusive

Any offered product or service. For example: a pair of shoes; a concert ticket; the rental of a car; a haircut; or an episode of a TV show streamed online.

hasMeasurementweightheightdisplayLocationgtingtin12isRelatedToproductID

Eventsibling via Thingschema.org/Event ↗33 exclusive

An event happening at a certain time and location, such as a concert, lecture, or festival. Ticketing information may be added via the [[offers]] property. Repe

durationendDateeventAttendanceModeactormaximumVirtualAttendeeCapacitysuperEventoffersrecordedIn

Personsibling via Thingschema.org/Person ↗33 exclusive

A person (alive, dead, undead, or fictional).

honorificPrefixweightheightgenderpronounsdeathDateaffiliationchildren

Placesibling via Thingschema.org/Place ↗28 exclusive

Entities that have a somewhat fixed, physical extension.

geoCoversspecialOpeningHoursSpecificationpublicAccesssmokingAllowedgeoOverlapslatitudeopeningHoursSpecificationgeoContains

Actionsibling via Thingschema.org/Action ↗12 exclusive

An action performed by a direct agent and indirect participants upon a direct object. Optionally happens at a location with the help of an inanimate instrument.

providerresultactionProcessstartTimeobjectactionStatusagentinstrument

MedicalEntitysibling via Thingschema.org/MedicalEntity ↗7 exclusive

The most generic type of entity related to health and the practice of medicine.

relevantSpecialtystudycodelegalStatusrecognizingAuthorityguidelinemedicineSystem

Intangiblesibling via Thingschema.org/Intangible ↗0 exclusive

A utility class that serves as the umbrella for a number of 'intangible' things such as quantities, structured values, etc.

LocalBusinesschild / upgradeschema.org/LocalBusiness ↗+32 props

A particular physical business or branch of an organization. Examples of LocalBusiness include a restaurant, a particular branch of a restaurant chain, a branch

additionalPropertyamenityFeaturebranchCodecontainedInPlacecontainsPlacecurrenciesAcceptedgeogeoContains

EducationalOrganizationchild / upgradeschema.org/EducationalOrganization ↗+29 props

An educational organization.

additionalPropertyamenityFeaturebranchCodecontainedInPlacecontainsPlacegeogeoContainsgeoCoveredBy

NewsMediaOrganizationchild / upgradeschema.org/NewsMediaOrganization ↗+4 props

A News/Media organization such as a newspaper or TV station.

mastheadmissionCoveragePrioritiesPolicynoBylinesPolicyverificationFactCheckingPolicy

MedicalOrganizationchild / upgradeschema.org/MedicalOrganization ↗+3 props

A medical organization (physical or not), such as hospital, institution or clinic.

healthPlanNetworkIdisAcceptingNewPatientsmedicalSpecialty

Airlinechild / upgradeschema.org/Airline ↗+2 props

An organization that provides flights for passengers.

boardingPolicyiataCode

Corporationchild / upgradeschema.org/Corporation ↗+1 props

Organization: A business corporation.

tickerSymbol

SportsOrganizationchild / upgradeschema.org/SportsOrganization ↗+1 props

Represents the collection of all sports organizations, including sports teams, governing bodies, and sports associations.

sport

NGOchild / upgradeschema.org/NGO ↗+0 props

Organization: Non-governmental Organization.

GovernmentOrganizationchild / upgradeschema.org/GovernmentOrganization ↗+0 props

A governmental organization or agency.

FundingSchemechild / upgradeschema.org/FundingScheme ↗+0 props

A FundingScheme combines organizational, project and policy aspects of grant-based funding that sets guidelines, principles and mechanisms to support other

PoliticalPartychild / upgradeschema.org/PoliticalParty ↗+0 props

Organization: Political Party.

OnlineBusinesschild / upgradeschema.org/OnlineBusiness ↗+0 props

A particular online business, either standalone or the online part of a broader organization. Examples include an eCommerce site, an online travel booking site,

◈ Structural Negative Type Space — Constitutional Law VI

◈ Action Branch

No structural connection to the Action branch. Graph position measurement. schema.org/Action ↗ · Law III — meaning is yours.

◈ BioChemEntity Branch

No structural connection to the BioChemEntity branch. Graph position measurement. schema.org/BioChemEntity ↗ · Law III — meaning is yours.

◈ CreativeWork Branch

No structural connection to the CreativeWork branch. Graph position measurement. schema.org/CreativeWork ↗ · Law III — meaning is yours.

◈ Event Branch

No structural connection to the Event branch. Graph position measurement. schema.org/Event ↗ · Law III — meaning is yours.

◈ Intangible Branch

No structural connection to the Intangible branch. Graph position measurement. schema.org/Intangible ↗ · Law III — meaning is yours.

◈ MedicalEntity Branch

No structural connection to the MedicalEntity branch. Graph position measurement. schema.org/MedicalEntity ↗ · Law III — meaning is yours.

◈ Person Branch

No structural connection to the Person branch. Graph position measurement. schema.org/Person ↗ · Law III — meaning is yours.

◈ Place Branch

No structural connection to the Place branch. Graph position measurement. schema.org/Place ↗ · Law III — meaning is yours.

◈ Product Branch

No structural connection to the Product branch. Graph position measurement. schema.org/Product ↗ · Law III — meaning is yours.

◈ Taxon Branch

No structural connection to the Taxon branch. Graph position measurement. schema.org/Taxon ↗ · Law III — meaning is yours.

◈ Gap List (69 properties unmapped)

fundingknowsAboutemployeediversityStaffingReportkeywordssloganfunderaggregateRatingleiCodeknowsLanguagebrandskillsnaicsfounderhasMerchantReturnPolicyparentOrganizationareaServedglobalLocationNumberrevieweventinteractionStatisticmemberOfactionableFeedbackPolicyethicsPolicyhasCertificationnumberOfEmployeesiso6523CodecorrectionsPolicyacceptedPaymentMethodisicV4

+39 more gaps not shown

◈ Source Schema.org — Raw Extraction (17 blocks)

Block 1 · @type: Organization

{
  "@context": "https://schema.org",
  "@type": "Organization",
  "name": "Inworld AI",
  "legalName": "Theai, Inc. dba Inworld AI",
  "url": "https://inworld.ai",
  "logo": {
    "@type": "ImageObject",
    "url": "https://inworld.ai/favicon.ico",
    "width": "180",
    "height": "180"
  },
  "description": "#1 ranked TTS with under 200ms latency, voice cloning, and 75% lower cost. Realtime agents built for scale.",
  "foundingDate": "2021",
  "sameAs": [
    "https://x.com/inworld_ai",
    "https://www.linkedin.com/company/inworld-ai",
    "https://www.youtube.com/@inworldai"
  ],
  "contactPoint": {
    "@type": "ContactPoint",
    "contactType": "Customer Support",
    "email": "[email protected]",
    "availableLanguage": [
      "English"
    ]
  },
  "address": {
    "@type": "PostalAddress",
    "addressCountry": "US",
    "addressLocality": "Mountain View",
    "addressRegion": "CA",
    "postalCode": "94040",
    "streetAddress": "1975 W El Camino Real, Suite 300"
  }
}

◈ Source: https://inworld.ai/ · Law I — Provenance

Block 2 · @type: WebSite

{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Inworld AI",
  "url": "https://inworld.ai",
  "description": "Top-ranked TTS on HuggingFace Arena with sub-second latency and voice cloning. Deploy real-time conversational AI pipelines with live experiments and metrics.",
  "publisher": {
    "@type": "Organization",
    "name": "Inworld AI",
    "logo": {
      "@type": "ImageObject",
      "url": "https://inworld.ai/favicon.ico",
      "width": "180",
      "height": "180"
    }
  }
}

◈ Source: https://inworld.ai/ · Law I — Provenance

Block 3 · @type: WebSite

{
  "@context": "https://schema.org",
  "@type": "WebSite",
  "name": "Inworld AI Documentation",
  "creator": {
    "@type": "Organization",
    "name": "Mintlify",
    "url": "https://mintlify.com"
  }
}

◈ Source: https://docs.inworld.ai/docs/realtime/overview · Fetched: 2026-05-12T18:34:52Z · Law I — Provenance

Block 4 · @type: CollectionPage

{
  "@context": "https://schema.org",
  "@type": "CollectionPage",
  "name": "Inworld AI Resources",
  "url": "https://inworld.ai/resources",
  "description": "Technical guides, benchmarks, tutorials, and best practices for building real-time AI and voice applications.",
  "inLanguage": "en-US",
  "publisher": {
    "@type": "Organization",
    "name": "Inworld AI",
    "logo": {
      "@type": "ImageObject",
      "url": "https://inworld.ai/favicon.ico",
      "width": "180",
      "height": "180"
    }
  }
}

◈ Source: https://inworld.ai/resources · Fetched: 2026-05-12T18:34:52Z · Law I — Provenance

Block 5 · @type: Article

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Realtime TTS-2: A new frontier voice model that feels as human as it sounds",
  "description": "",
  "image": "https://inworld.ai/favicon.ico",
  "datePublished": "2026-05-05T00:00:00.000Z",
  "author": {
    "@type": "Organization",
    "name": "Inworld AI",
    "url": "https://inworld.ai"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Inworld AI",
    "logo": {
      "@type": "ImageObject",
      "url": "https://inworld.ai/favicon.ico",
      "width": "180",
      "height": "180"
    }
  },
  "url": "https://inworld.ai/blog/realtime-tts-2",
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://inworld.ai/blog/realtime-tts-2"
  },
  "inLanguage": "en-US"
}

◈ Source: https://inworld.ai/blog/realtime-tts-2 · Fetched: 2026-05-12T18:34:52Z · Law I — Provenance

Block 6 · @type: JobPosting

{
  "@context": "https://schema.org",
  "@type": "JobPosting",
  "title": "Staff / Principal Research Scientist - USA",
  "description": "About Inworld\n\nInworld is a product-oriented research lab of top AI researchers and engineers, developing best-in-class realtime multimodal models and the only realtime orchestration platform optimized for thousands of queries per second.\n\nWe’ve raised more than $125M from Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, Founders Fund, Meta and Stanford, among others. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn's Top 10 Startups in the USA.\n\nWho We're Looking For\n\nA year ago, reliably working agentic systems barely existed. Nobody has a decade of experience here. So we're not screening for a resume template — we're looking for strong people from varied backgrounds who learn fast, thrive in ambiguity, and can show us what they've built, broken, and understood.\n\nExperience We Find Useful\n\nYou don't need all of this. But you need enough to make a case.\n\n - Foundation models: training, new architectures, RL, reward modeling, scaling\n\n - Evaluation: benchmarks, eval loops, quality measurement, LLM-as-judge, failure analysis\n\n - Frontier topics: multimodal models, agents, tool use, test-time compute, world models\n\n - Published research at ICML, ICLR, NeurIPS, EMNLP, ACL, or AAAI\n\n - PhD in ML/NLP — or equivalent practical experience you can point to\n\n - Public work: non-trivial AI side projects, interdisciplinary experiments, open-source contributions\n\n - Full-stack research ownership: you frame the question, run the experiments, write the paper, ship the result\n\nIf you learned through building, competitions, or collaborations outside academia, that counts. We care about evidence, not credentials.\n\nWho Thrives Here\n\n - You don’t need a roadmap to start walking; you’re comfortable picking a direction and building the map as you go\n\n - You believe research isn't finished until it’s shipped. You have a bias for impact over purely academic output\n\n - You don't just ship code; you obsess over the why. You’re the first to question an approach if you think there’s a better way to solve the core problem\n\n - You aren't satisfied with \"the PM said so.\" You thrive on deep context and want to understand the fundamental logic behind every decision we make\n\nWhat Working Here Is Like\n\nWe hand you unclear problems and expect you to make them clear. We value researchers who say \"I don't know yet\" and then design the experiment that finds out. We treat evaluation as a first-class research product, not a box to check before launch. Impact comes before publications though we support sharing work that moves the field forward. Your work should be visible. Flat structure, fast iterations, minimal process theater.\n\nWe believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our Mountain View office.\n\nThe base salary range for this full-time position is $270,000 - $500,000+ bonus + equity + benefits.",
  "datePosted": "2023-08-16T19:48:08.587+00:00",
  "validThrough": "2026-06-11T18:34:54.828Z",
  "employmentType": "FULL_TIME",
  "hiringOrganization": {
    "@type": "Organization",
    "name": "Inworld AI",
    "sameAs": "https://inworld.ai",
    "logo": {
      "@type": "ImageObject",
      "url": "https://inworld.ai/favicon.ico",
      "width": "180",
      "height": "180"
    }
  },
  "jobLocation": {
    "@type": "Place",
    "address": {
      "@type": "PostalAddress",
      "addressLocality": "Mountain View",
      "addressRegion": "CA",
      "addressCountry": "US"
    }
  },
  "applicantLocationRequirements": {
    "@type": "Country",
    "name": "US"
  },
  "directApply": true,
  "url": "https://jobs.ashbyhq.com/inworld-ai/abb10a5c-cc22-4b18-892d-aea354c6bfdc/application"
}