Skip to content

Audio

AudioInfo

Bases: BaseModel

Represents the audio specific metadata.

duration instance-attribute

duration: PositiveFloat

The duration of the audio asset.

sample_rate instance-attribute

sample_rate: PositiveFloat

The sample rate of the audio asset.

sample_width instance-attribute

sample_width: NonNegativeInt

The sample width of the audio asset.

channels instance-attribute

channels: NonNegativeInt

The number of channels in the audio asset.

AudioAssetParams

Bases: BaseModel

Represents the parameters for an Audio assets.

volume class-attribute instance-attribute

volume: float = Field(default=1.0)

The volume of the audio assets.

crop class-attribute instance-attribute

crop: tuple[int, int] | None = None

Crop range for the audio assets

AudioAsset

Bases: BaseAsset[AudioAssetParams, AudioInfo]

Represents an Audio asset with various properties.

type class-attribute instance-attribute

type: Literal['audio'] = 'audio'

The type of the asset. Defaults to "audio".

params class-attribute instance-attribute

params: AudioAssetParams = Field(
    default_factory=AudioAssetParams
)

The parameters for the asset.

duration property

duration: float

The duration of the audio asset.

Wrapper of AudioAsset.info.duration for convenience and type-hint compatibility.

sample_rate property

sample_rate: float

The sample rate of the audio asset.

Wrapper of AudioAsset.info.sample_rate for convenience and type-hint compatibility.

sample_width property

sample_width: int

The sample width of the audio asset.

Wrapper of AudioAsset.info.sample_width for convenience and type-hint compatibility.

channels property

channels: int

The number of channels in the audio asset.

Wrapper of AudioAsset.info.channels for convenience and type-hint compatibility.

to_audio_segment

to_audio_segment(**kwargs) -> AudioSegment

Casts the audio asset to a pydub.AudioSegment object.

Source code in src/mosaico/assets/audio.py
def to_audio_segment(self, **kwargs) -> AudioSegment:
    """
    Casts the audio asset to a pydub.AudioSegment object.
    """
    with self.to_bytes_io(**kwargs) as audio_buf:
        return AudioSegment.from_file(
            file=audio_buf,
            sample_width=self.sample_width,
            frame_rate=self.sample_rate,
            channels=self.channels,
        )

slice

slice(
    start_time: float, end_time: float, **kwargs: Any
) -> AudioAsset

Slices the audio asset.

Parameters:

Name Type Description Default

start_time

float

The start time in seconds.

required

end_time

float

The end time in seconds.

required

kwargs

Any

Additional parameters passed to the audio loader.

{}

Returns:

Type Description
AudioAsset

The sliced audio asset.

Source code in src/mosaico/assets/audio.py
def slice(self, start_time: float, end_time: float, **kwargs: Any) -> AudioAsset:
    """
    Slices the audio asset.

    :param start_time: The start time in seconds.
    :param end_time: The end time in seconds.
    :param kwargs: Additional parameters passed to the audio loader.
    :return: The sliced audio asset.
    """
    audio = self.to_audio_segment(**kwargs)

    sliced_buf = io.BytesIO()
    sliced_audio = cast(AudioSegment, audio[round(start_time * 1000) : round(end_time * 1000)])
    sliced_audio.export(sliced_buf, format="mp3")
    sliced_buf.seek(0)

    return AudioAsset.from_data(
        sliced_buf.read(),
        info=AudioInfo(
            duration=len(sliced_audio) / 1000,
            sample_rate=self.sample_rate,
            sample_width=self.sample_width,
            channels=self.channels,
        ),
    )

strip_silence

strip_silence(
    silence_threshold: float = -50,
    chunk_size: int = 10,
    **kwargs: Any
) -> AudioAsset

Removes leading and trailing silence from the audio asset.

Parameters:

Name Type Description Default

silence_threshold

float

Silence threshold in dBFS (default: -50.0).

-50

chunk_size

int

Size of the audio iterator chunk, in ms (default: 10).

10

kwargs

Any

Additional parameters passed to the audio loader.

{}

Returns:

Type Description
AudioAsset

A new AudioAsset with leading and trailing silence removed.

Source code in src/mosaico/assets/audio.py
def strip_silence(self, silence_threshold: float = -50, chunk_size: int = 10, **kwargs: Any) -> AudioAsset:
    """
    Removes leading and trailing silence from the audio asset.

    :param silence_threshold: Silence threshold in dBFS (default: -50.0).
    :param chunk_size: Size of the audio iterator chunk, in ms (default: 10).
    :param kwargs: Additional parameters passed to the audio loader.
    :return: A new AudioAsset with leading and trailing silence removed.
    """
    audio = self.to_audio_segment(**kwargs)
    start_trim = detect_leading_silence(audio, silence_threshold, chunk_size)
    end_trim = detect_leading_silence(audio.reverse(), silence_threshold, chunk_size)
    return self.slice(start_trim / 1000, (len(audio) - end_trim) / 1000)