Ir para o conteúdo

OpenAI

OpenAISpeechSynthesizer

Bases: BaseModel

Speech synthesizer using OpenAI's API.

provider class-attribute

provider: str = 'openai'

Provider name for OpenAI.

api_key class-attribute instance-attribute

api_key: str | None = None

API key for OpenAI's API.

base_url class-attribute instance-attribute

base_url: str | None = None

Base URL for OpenAI's API.

model class-attribute instance-attribute

model: Literal['tts-1', 'tts-1-hd'] = 'tts-1'

Model to use for speech synthesis.

voice class-attribute instance-attribute

voice: Literal[
    "alloy", "echo", "fable", "onyx", "nova", "shimmer"
] = "alloy"

Voice to use for speech synthesis.

speed class-attribute instance-attribute

speed: Annotated[float, Field(ge=0.25, le=4)] = 1.0

Speed of speech synthesis.

timeout class-attribute instance-attribute

timeout: PositiveInt = 120

Timeout for speech synthesis in seconds.

synthesize

synthesize(
    texts: Sequence[str],
    *,
    audio_params: AudioAssetParams | None = None,
    **kwargs: Any
) -> list[AudioAsset]

Synthesize speech from texts using OpenAI's API.

Parameters:

Name Type Description Default
texts Sequence[str]

Texts to synthesize.

required
audio_params AudioAssetParams | None

Parameters for the audio asset.

None
kwargs Any

Additional parameters for the OpenAI API.

{}

Returns:

Type Description
list[AudioAsset]

List of audio assets.

Source code in src/mosaico/speech_synthesizers/openai.py
def synthesize(
    self, texts: Sequence[str], *, audio_params: AudioAssetParams | None = None, **kwargs: Any
) -> list[AudioAsset]:
    """
    Synthesize speech from texts using OpenAI's API.

    :param texts: Texts to synthesize.
    :param audio_params: Parameters for the audio asset.
    :param kwargs: Additional parameters for the OpenAI API.
    :return: List of audio assets.
    """
    assets = []

    for text in texts:
        response = self._client.audio.speech.create(
            input=text, model=self.model, voice=self.voice, response_format="mp3", speed=self.speed, **kwargs
        )
        segment = AudioSegment.from_file(io.BytesIO(response.content), format="mp3")
        assets.append(
            AudioAsset.from_data(
                response.content,
                params=audio_params if audio_params is not None else {},
                duration=segment.duration,
                sample_rate=segment.frame_rate,
                sample_width=segment.sample_width,
                channels=segment.channels,
                mime_type="audio/mpeg",
            )
        )

    return assets