Nari AI Voice

Explore our collection of Nari AI Voice technology samples and experience the future of voice interaction

Nari AI Voice represents the cutting edge of voice synthesis, combining advanced neural networks with deep linguistic understanding to create truly human-like speech patterns.

Experience the Power of Nari AI Voice

Our samples showcase the exceptional capabilities of Nari AI Voice technology. Hear how Nari AI Voice creates natural, emotionally rich conversations that transform human-machine interaction.

Nari AI Voice Technology Advantages

Natural Speech Synthesis

Nari AI Voice delivers human-like voice quality with natural intonation and fluid rhythm

Emotional Expression

Nari AI Voice can express rich emotions, making AI conversations more engaging

Multilingual Support

Nari AI Voice supports multiple languages while maintaining consistent high-quality voice experiences

Real-time Response

Nari AI Voice provides low-latency voice generation suitable for real-time interactive scenarios

Nari AI Voice Sample Collection

Standard Usage 1

Note that ElevenLabs and Sesame models do not have the ability to transcribe laughter tags into speech. We replace (laughs) with haha. Also, Dia is not fine-tuned on a specific voice. It will generate random voices unless you add audio prompts, or fix the seed.

Speaker 1:Dia is an open weights text to dialogue model.

Speaker 2:You get full control over scripts and voices.

Speaker 1:Wow. Amazing. (laughs)

Speaker 2:Try it now on Git hub or Hugging Face.

Dia-1.6B

ElevenLabs Studio

Sesame CSM-1B

Standard Usage 2

Note that Sesame's 1B model output is significantly worse compared to the audio provided as an example on their website. The example was likely created using the 8B version of the CSM model.

Speaker 1:Hey. how are you doing?

Speaker 2:Pretty good. Pretty good. What about you?

Speaker 1:I'm great. So happy to be speaking to you.

Speaker 2:Me too. This is some cool stuff. Huh?

Speaker 1:Yeah. I have been reading more about speech generation.

Speaker 2:Yeah.

Speaker 1:And it really seems like context is important.

Speaker 2:Definitely.

Dia-1.6B

ElevenLabs Studio

Sesame Website Example

Sesame CSM-1B

Fun Example 1

This one is extra fun. I recommend listening to all three of them.

Speaker 1:Oh fire! Oh my goodness! What's the procedure? What to we do people? The smoke could be coming through an air duct!

Speaker 2:Oh my god! Okay.. it's happening. Everybody stay calm!

Speaker 1:What's the procedure...

Speaker 2:Everybody stay fucking calm!!!... Everybody fucking calm down!!!!!

Speaker 1:No! No! If you touch the handle, if its hot there might be a fire down the hallway!

Dia-1.6B

ElevenLabs Studio

Sesame CSM-1B

Fun Example 2

Other models are unable of recreating these non-verbal tags.

Speaker 1:Hey there (coughs).

Speaker 2:Why did you just cough? (sniffs)

Speaker 1:Why did you just sniff? (clears throat)

Speaker 2:Why did you just clear your throat? (laughs)

Speaker 1:Why did you just laugh?

Speaker 2:Nicely done.

Dia-1.6B

Fun Example 3

Speaker 1:His palms are sweaty, knees weak, arms are heavy.

Speaker 2:There's vomit on his sweater already, mom's spaghetti.

Speaker 1:He's nervous, but on the surface, he looks calm and ready.

Speaker 2:To drop bombs, but he keeps on forgetting.

Speaker 1:What he wrote down, the whole crowd goes so loud.

Speaker 2:He opens his mouth, but the words won't come out.

Speaker 1:He's chokin', how. Everybody's jokin' now.

Dia-1.6B

ElevenLabs Studio

Sesame CSM-1B

Audio Prompts

Note that you need to prepend scripts corresponding to the audio prompt in the input to get high quality output. We are considering adding a TTS model to automate the transcription process for easier usage.

Speaker 1:Open weights text to dialogue model.

Speaker 2:You get full control over scripts and voices.

Speaker 1:I'm biased, but I think we clearly won.

Speaker 2:Hard to disagree. (laughs)

Speaker 1:Thanks for listening to this demo.

Speaker 2:Try it now on Git hub and Hugging Face.

Speaker 1:If you liked our model, please give us a star and share to your friends.

Speaker 2:This was Nari Labs.

Audio Prompts

Dia-1.6B

About Nari AI Voice Technology

Nari AI Voice represents a breakthrough in voice synthesis technology. Using advanced neural networks and deep learning techniques, Nari AI Voice creates remarkably natural speech with proper emotional inflection, timing, and non-verbal sounds that other TTS systems cannot match.

•Nari AI Voice generates natural-sounding dialogue with appropriate pauses and rhythm
•Nari AI Voice can incorporate laughter, sighs, and other non-verbal elements
•Nari AI Voice maintains consistent voice characteristics throughout conversations
•Nari AI Voice adapts to different contexts and emotional states seamlessly

Ready to experience Nari AI Voice?

Nari AI Voice is transforming the future of human-machine interaction. Discover how Nari AI Voice can enhance your applications and create more engaging user experiences.