Skip to content

Audio Transcriptors

Prerequisites

Overview

Mosaico provides audio transcriptor components to convert speech into text, which can be used for subtitle generation and content synchronization. The system uses a protocol-based approach allowing different transcriptor services to be integrated through a common interface.

Audio Transcriptor Protocol

The transcriptor system is built around the AudioTranscriber protocol:

from mosaico.audio_transcribers.protocol import AudioTranscriber
from mosaico.assets.audio import AudioAsset
from mosaico.audio_transcribers.transcription import Transcription

class MyTranscriber(AudioTranscriber):
    def transcribe(self, audio_asset: AudioAsset) -> Transcription:
        # Implement transcription logic
        ...

Transcription Structure

Transcriptions are represented using the Transcription class:

from mosaico.audio_transcribers.transcription import Transcription, TranscriptionWord

words = [
    TranscriptionWord(
        start_time=0.0,
        end_time=0.5,
        text="Hello"
    ),
    TranscriptionWord(
        start_time=0.6,
        end_time=1.0,
        text="world"
    )
]

transcription = Transcription(words=words)

Using Transcriptors in Projects

Transcriptors can be used to generate subtitles for video projects:

# Create transcriptor
transcriber = MyTranscriber()

# Transcribe audio asset
transcription = transcriber.transcribe(audio_asset)

# Add subtitles from transcription
project = project.add_captions_from_transcriber(
    transcription,
    max_duration=5,  # Maximum subtitle duration
    params=TextAssetParams(
        font_size=36,
        font_color="white"
    )
)

Transcription Formats

VTT Format

# Convert to WebVTT
vtt_content = transcription.as_vtt()

# Create from VTT
transcription = Transcription.from_vtt(vtt_content)

SRT Format

# Create from SRT
transcription = Transcription.from_srt(srt_content)

Best Practices

Handling Long Content

  • Break long transcriptions into manageable chunks
  • Consider memory usage for large files
  • Use appropriate subtitle durations

Timing Synchronization

  • Verify audio/subtitle sync
  • Handle overlapping speech
  • Account for pauses and breaks

Text Processing

  • Clean up transcription text
  • Handle punctuation properly
  • Format numbers and special characters

Common Use Cases

Video Subtitles

# Create news video with transcribed subtitles
project = (
    VideoProject.from_script_generator(news_generator, media_files)
    .add_captions_from_transcriber(
        transcriber,
        max_duration=5,
        params=TextAssetParams(
            font_size=24,
            font_color="yellow"
        )
    )
)

Interview Captioning

# Process interview audio
transcription = transcriber.transcribe(interview_audio)

# Add captions to video
project = project.add_captions(
    transcription,
    params=TextAssetParams(
        position=RegionPosition(x="center", y="bottom")
    )
)

Multi-language Support

# Create subtitles in different languages
for language in languages:
    translated_transcription = translate_transcription(
        transcription,
        target_language=language
    )
    project.add_captions(
        translated_transcription,
        params=subtitle_params[language]
    )

Conclusion

The transcriptor system in Mosaico provides a flexible foundation for adding subtitles and captions to your videos, with support for different formats and processing options.