Index

SpeechSynthesizer

Bases: Protocol

Protocol defining the interface for text-to-speech synthesis services.

The SpeechSynthesizer protocol standardizes how text is converted to speech across different TTS providers (e.g., Google Cloud TTS, Azure Speech). Implementations of this protocol handle the specifics of communicating with TTS services and managing the generated audio assets.

Implementations should handle:

Authentication with the TTS service
Rate limiting and quotas
Error handling and retries
Audio format conversion if necessary
Temporary file management
Resource cleanup

Example:

class MySpeechSynthesizer(SpeechSynthesizer):
    provider = "my-provider"

    def synthesize(self, texts, audio_params=None):
        # Implementation details
        pass

synthesizer = MySpeechSynthesizer()
audio_assets = synthesizer.synthesize(
    texts=["Hello world", "Welcome to the demo"],
    audio_params=AudioAssetParams(volume=0.8)
)

Attributes:

Name	Type	Description
`provider`	`str`	Identifier for the TTS service provider (e.g., "openai", "assemblyai", "azure"). This should be a unique string that identifies the implementation.

provider `class-attribute`

provider: str

The provider of the speech synthesizer.

synthesize

synthesize(
    texts: Sequence[str],
    *,
    audio_params: AudioAssetParams | None = None,
    **kwargs: Any
) -> list[AudioAsset]

Convert a list of texts into synthesized speech audio assets.

This method handles the conversion of text to speech, managing both the synthesis process and the creation of audio assets for use in video projects.

Note

The method should handle cleanup of any temporary files
Audio assets should be properly configured with metadata
Implementation should handle text normalization if needed
Large texts may need to be chunked according to service limits
Audio format should match project requirements

Parameters:

Name	Type	Description	Default
`texts`	`Sequence[str]`	List of text strings to be converted to speech. Each string should be properly formatted text ready for synthesis.	required
`audio_params`	`AudioAssetParams \| None`	Optional parameters for configuring the output audio assets. If None, default parameters will be used. These parameters affect properties like sample rate, channels, etc.	`None`
`kwargs`	`Any`	Additional provider-specific parameters.	`{}`

Returns:

Type	Description
`list[AudioAsset]`	List of audio assets containing the synthesized speech. The returned list will have the same length as the input texts list, with corresponding indices.

Source code in src/mosaico/speech_synthesizers/protocol.py

def synthesize(
    self, texts: Sequence[str], *, audio_params: AudioAssetParams | None = None, **kwargs: Any
) -> list[AudioAsset]:
    """
    Convert a list of texts into synthesized speech audio assets.

    This method handles the conversion of text to speech, managing both the synthesis
    process and the creation of audio assets for use in video projects.


    !!! note
        * The method should handle cleanup of any temporary files
        * Audio assets should be properly configured with metadata
        * Implementation should handle text normalization if needed
        * Large texts may need to be chunked according to service limits
        * Audio format should match project requirements

    :param texts: List of text strings to be converted to speech. Each string should
        be properly formatted text ready for synthesis.
    :param audio_params: Optional parameters for configuring the output audio assets.
        If None, default parameters will be used. These parameters affect properties
        like sample rate, channels, etc.
    :param kwargs: Additional provider-specific parameters.
    :return: List of audio assets containing the synthesized speech. The returned list
        will have the same length as the input texts list, with corresponding indices.

    """
    ...

Index

SpeechSynthesizer

provider `class-attribute`

synthesize

`texts`

`audio_params`

`kwargs`

Index

SpeechSynthesizer

provider class-attribute

synthesize

texts

audio_params

kwargs

provider `class-attribute`

`texts`

`audio_params`

`kwargs`