ElevenLabs

ElevenLabsSpeechSynthesizer

Bases: BaseModel

Speech synthesizer for ElevenLabs.

provider `class-attribute`

provider: str = 'elevenlabs'

Provider name for ElevenLabs.

api_key `class-attribute` `instance-attribute`

api_key: str | None = None

API key for ElevenLabs.

voice_id `instance-attribute`

voice_id: str

Voice ID for ElevenLabs.

voice_stability `class-attribute` `instance-attribute`

voice_stability: Annotated[float, Field(ge=0, le=1)] = 0.5

Voice stability for the synthesized speech. It ranges from 0 to 1. Default is 0.5.

voice_similarity_boost `class-attribute` `instance-attribute`

voice_similarity_boost: Annotated[
    float, Field(ge=0, le=1)
] = 0.5

Voice similarity boost for the synthesized speech. It ranges from 0 to 1. Default is 0.5.

voice_style `class-attribute` `instance-attribute`

voice_style: Annotated[float, Field(ge=0, le=1)] = 0.5

Voice style for the synthesized speech. It ranges from 0 to 1. Default is 0.5.

voice_speaker_boost `class-attribute` `instance-attribute`

voice_speaker_boost: bool = True

Voice speaker boost for the synthesized speech. Default is True.

voice_speed `class-attribute` `instance-attribute`

voice_speed: Annotated[float, Field(ge=0.7, le=1.2)] = 1

The generated speech speed.

language_code `class-attribute` `instance-attribute`

language_code: LanguageAlpha2 = Field(
    default_factory=lambda: LanguageAlpha2("en")
)

Language code of the text to synthesize. If not provided, it defaults to "en".

Check the ElevenLabs API documentation for the list of supported languages by model. https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them

model `class-attribute` `instance-attribute`

model: Literal[
    "eleven_turbo_v2_5",
    "eleven_turbo_v2",
    "eleven_multilingual_v2",
    "eleven_monolingual_v1",
    "eleven_multilingual_v1",
] = "eleven_multilingual_v2"

Model ID for ElevenLabs.

timeout `class-attribute` `instance-attribute`

timeout: int = 120

Timeout for the HTTP request in seconds.

synthesize

synthesize(
    texts: Sequence[str],
    *,
    audio_params: AudioAssetParams | None = None,
    **kwargs: Any
) -> list[AudioAsset]

Synthesizes the given texts into audio assets using the ElevenLabs API.

Parameters:

Name	Type	Description	Default
`texts`	`Sequence[str]`	List of texts to synthesize.	required
`audio_params`	`AudioAssetParams \| None`	Audio parameters for the synthesized audio assets.	`None`
`kwargs`	`Any`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`list[AudioAsset]`	List of synthesized audio assets.

Source code in src/mosaico/speech_synthesizers/elevenlabs.py

def synthesize(
    self, texts: Sequence[str], *, audio_params: AudioAssetParams | None = None, **kwargs: Any
) -> list[AudioAsset]:
    """
    Synthesizes the given texts into audio assets using the ElevenLabs API.

    :param texts: List of texts to synthesize.
    :param audio_params: Audio parameters for the synthesized audio assets.
    :param kwargs: Additional keyword arguments.
    :return: List of synthesized audio assets.
    """
    assets = []
    previous_request_ids = []

    for i, text in enumerate(texts):
        is_first = i == 0
        is_last = i == len(texts) - 1
        response = self._fetch_speech_synthesis(
            text=text,
            previous_request_ids=previous_request_ids[-3:],
            previous_text=None if is_first else " ".join(texts[:i]),
            next_text=None if is_last else " ".join(texts[i + 1 :]),
        )
        previous_request_ids.append(response.headers["request-id"])
        duration = AudioSegment.from_file(io.BytesIO(response.content), format="mp3").duration_seconds
        asset = AudioAsset.from_data(
            response.content,
            params=audio_params if audio_params is not None else {},
            mime_type="audio/mpeg",
            info=AudioInfo(
                duration=duration,
                sample_rate=44100,
                sample_width=128,
                channels=1,
            ),
        )
        assets.append(asset)

    return assets

ElevenLabs

ElevenLabsSpeechSynthesizer

provider `class-attribute`

api_key `class-attribute` `instance-attribute`

voice_id `instance-attribute`

voice_stability `class-attribute` `instance-attribute`

voice_similarity_boost `class-attribute` `instance-attribute`

voice_style `class-attribute` `instance-attribute`

voice_speaker_boost `class-attribute` `instance-attribute`

voice_speed `class-attribute` `instance-attribute`

language_code `class-attribute` `instance-attribute`

model `class-attribute` `instance-attribute`

timeout `class-attribute` `instance-attribute`

synthesize

`texts`

`audio_params`

`kwargs`

ElevenLabs

ElevenLabsSpeechSynthesizer

provider class-attribute

api_key class-attribute instance-attribute

voice_id instance-attribute

voice_stability class-attribute instance-attribute

voice_similarity_boost class-attribute instance-attribute

voice_style class-attribute instance-attribute

voice_speaker_boost class-attribute instance-attribute

voice_speed class-attribute instance-attribute

language_code class-attribute instance-attribute

model class-attribute instance-attribute

timeout class-attribute instance-attribute

synthesize

texts

audio_params

kwargs

provider `class-attribute`

api_key `class-attribute` `instance-attribute`

voice_id `instance-attribute`

voice_stability `class-attribute` `instance-attribute`

voice_similarity_boost `class-attribute` `instance-attribute`

voice_style `class-attribute` `instance-attribute`

voice_speaker_boost `class-attribute` `instance-attribute`

voice_speed `class-attribute` `instance-attribute`

language_code `class-attribute` `instance-attribute`

model `class-attribute` `instance-attribute`

timeout `class-attribute` `instance-attribute`

`texts`

`audio_params`

`kwargs`