Skip to content

Speech Synthesizers

Prerequisites

Overview

Speech Synthesizers in Mosaico are components that convert text into natural-sounding speech for video narration. The system supports multiple synthesizer implementations and offers flexible configuration options.

Working with Synthesizers

from mosaico.speech_synthesizers import OpenAISpeechSynthesizer

# Create synthesizer with configuration
tts = OpenAISpeechSynthesizer(
    model="tts-1",              # TTS model to use
    voice="alloy",              # Voice selection
    speed=1.0,                  # Speech speed
    api_key="your_api_key"      # Optional API key
)

# Generate speech
audio_assets = tts.synthesize(
    texts=["Welcome to our video", "This is a demo"],
    audio_params=AudioAssetParams(volume=0.8)
)

Integration with Video Projects

Basic Integration

# Create project with speech
project = (
    VideoProject.from_script_generator(
        script_generator=generator,
        media=media_files,
    )
    .add_narration(tts_engine)
)

Manual Speech Addition

# Generate speech for specific text
speech_asset = tts.synthesize(["Welcome message"])[0]

# Add to project
project = (
    project
    .add_assets(speech_asset)
    .add_timeline_events(
        AssetReference.from_asset(speech_asset)
            .with_start_time(0)
            .with_end_time(speech_asset.duration)
    )
)

Custom Speech Parameters

Audio Configuration

# Configure audio parameters
params = AudioAssetParams(
    volume=0.8,         # Set volume level
    crop=(0, 30)       # Use specific segment
)

# Generate with parameters
assets = tts.synthesize(
    texts=["Narration text"],
    audio_params=params
)

Voice Customization

# OpenAI customization
openai_tts = OpenAISpeechSynthesizer(
    model="tts-1-hd",    # High-definition model
    voice="nova",        # Different voice
    speed=1.2           # Faster speech
)

# ElevenLabs customization
elevenlabs_tts = ElevenLabsSpeechSynthesizer(
    voice_id="custom_voice",
    voice_stability=0.7,
    voice_similarity_boost=0.8,
    voice_speaker_boost=True
)

Common Use Cases

Video Narration

# Generate news narration
news_tts = OpenAISpeechSynthesizer(
    voice="nova",     # Clear, professional voice
    speed=1.1        # Slightly faster for news
)

narration = news_tts.synthesize(
    [shot.subtitle for shot in news_script.shots]
)

Tutorial Voice-Over

# Tutorial narration with pauses
tutorial_tts = ElevenLabsSpeechSynthesizer(
    voice_id="tutorial_voice",
    voice_stability=0.8,    # More consistent
    voice_style=0.3        # Less emotional
)

# Add pauses between steps
tutorial_texts = [f"{text}..." for text in tutorial_steps]
tutorial_audio = tutorial_tts.synthesize(tutorial_texts)

Multi-Language Support

# Create synthesizers for different languages
tts_en = OpenAISpeechSynthesizer(language_code="en")
tts_es = OpenAISpeechSynthesizer(language_code="es")
tts_fr = OpenAISpeechSynthesizer(language_code="fr")

# Generate multi-language audio
audio_en = tts_en.synthesize(texts_en)
audio_es = tts_es.synthesize(texts_es)
audio_fr = tts_fr.synthesize(texts_fr)

Conclusion

Understanding speech synthesizers in Mosaico enables the creation of professional-quality narration for various video types. The flexible synthesizer system and configuration options allow for customized voice output suitable for different content needs.