Skip to content

OpenAI

OpenAIWhisperTranscriber

Bases: BaseModel

Transcriber using OpenAI's Whisper API.

api_key class-attribute instance-attribute

api_key: str | None = None

API key for OpenAI's Whisper API.

base_url class-attribute instance-attribute

base_url: str | None = None

Base URL for OpenAI's Whisper API.

timeout class-attribute instance-attribute

timeout: PositiveInt = 120

Timeout for transcription in seconds.

model class-attribute instance-attribute

model: Literal['whisper-1'] = 'whisper-1'

Model to use for transcription.

temperature class-attribute instance-attribute

temperature: ModelTemperature = 0

The sampling temperature for the model.

language class-attribute instance-attribute

language: LanguageAlpha2 | None = None

Language of the transcription.

transcribe

transcribe(audio_asset: AudioAsset) -> Transcription

Transcribe audio using OpenAI's Whisper API.

Parameters:

Name Type Description Default
audio_asset AudioAsset

The audio asset to transcribe.

required

Returns:

Type Description
Transcription

The transcription words.

Source code in src/mosaico/audio_transcribers/openai.py
def transcribe(self, audio_asset: AudioAsset) -> Transcription:
    """
    Transcribe audio using OpenAI's Whisper API.

    :param audio_asset: The audio asset to transcribe.
    :return: The transcription words.
    """
    with audio_asset.to_bytes_io() as audio_file:
        audio_file.name = f"{audio_asset.id}.mp3"  # type: ignore
        response = self._client.audio.transcriptions.create(
            file=audio_file,
            model=self.model,
            temperature=self.temperature,
            language=str(self.language) if self.language is not None else "",
            response_format="verbose_json",
            timestamp_granularities=["word"],
        )

    if not response.words:
        raise ValueError("No words found in transcription response.")

    words = [TranscriptionWord(start_time=word.start, end_time=word.end, text=word.word) for word in response.words]

    return Transcription(words=words)