Skip to content

ElevenLabs

ElevenLabsSpeechSynthesizer

Bases: BaseModel

Speech synthesizer for ElevenLabs.

provider class-attribute

provider: str = 'elevenlabs'

Provider name for ElevenLabs.

api_key class-attribute instance-attribute

api_key: str | None = None

API key for ElevenLabs.

voice_id instance-attribute

voice_id: str

Voice ID for ElevenLabs.

voice_stability class-attribute instance-attribute

voice_stability: Annotated[float, Field(ge=0, le=1)] = 0.5

Voice stability for the synthesized speech. It ranges from 0 to 1. Default is 0.5.

voice_similarity_boost class-attribute instance-attribute

voice_similarity_boost: Annotated[
    float, Field(ge=0, le=1)
] = 0.5

Voice similarity boost for the synthesized speech. It ranges from 0 to 1. Default is 0.5.

voice_style class-attribute instance-attribute

voice_style: Annotated[float, Field(ge=0, le=1)] = 0.5

Voice style for the synthesized speech. It ranges from 0 to 1. Default is 0.5.

voice_speaker_boost class-attribute instance-attribute

voice_speaker_boost: bool = True

Voice speaker boost for the synthesized speech. Default is True.

language_code class-attribute instance-attribute

language_code: LanguageAlpha2 = Field(
    default_factory=lambda: LanguageAlpha2("en")
)

Language code of the text to synthesize. If not provided, it defaults to "en".

Check the ElevenLabs API documentation for the list of supported languages by model. https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them

model class-attribute instance-attribute

model: Literal[
    "eleven_turbo_v2_5",
    "eleven_turbo_v2",
    "eleven_multilingual_v2",
    "eleven_monolingual_v1",
    "eleven_multilingual_v1",
] = "eleven_multilingual_v2"

Model ID for ElevenLabs.

timeout class-attribute instance-attribute

timeout: int = 120

Timeout for the HTTP request in seconds.

synthesize

synthesize(
    texts: Sequence[str],
    *,
    audio_params: AudioAssetParams | None = None,
    **kwargs: Any
) -> list[AudioAsset]

Synthesizes the given texts into audio assets using the ElevenLabs API.

Parameters:

Name Type Description Default
texts Sequence[str]

List of texts to synthesize.

required
audio_params AudioAssetParams | None

Audio parameters for the synthesized audio assets.

None
kwargs Any

Additional keyword arguments.

{}

Returns:

Type Description
list[AudioAsset]

List of synthesized audio assets.

Source code in src/mosaico/speech_synthesizers/elevenlabs.py
def synthesize(
    self, texts: Sequence[str], *, audio_params: AudioAssetParams | None = None, **kwargs: Any
) -> list[AudioAsset]:
    """
    Synthesizes the given texts into audio assets using the ElevenLabs API.

    :param texts: List of texts to synthesize.
    :param audio_params: Audio parameters for the synthesized audio assets.
    :param kwargs: Additional keyword arguments.
    :return: List of synthesized audio assets.
    """
    assets = []
    previous_request_ids = []

    for i, text in enumerate(texts):
        is_first = i == 0
        is_last = i == len(texts) - 1
        response = self._fetch_speech_synthesis(
            text=text,
            previous_request_ids=previous_request_ids[-3:],
            previous_text=None if is_first else " ".join(texts[:i]),
            next_text=None if is_last else " ".join(texts[i + 1 :]),
        )
        previous_request_ids.append(response.headers["request-id"])
        duration = AudioSegment.from_file(io.BytesIO(response.content), format="mp3").duration_seconds
        asset = AudioAsset.from_data(
            response.content,
            params=audio_params if audio_params is not None else {},
            mime_type="audio/mpeg",
            duration=duration,
            sample_rate=44100,
            sample_width=128,
            channels=1,
        )
        assets.append(asset)

    return assets