Skip to content


Mosaico follows a modular architecture organized around several key concepts:


The foundation of the library is the asset system. Assets represent media elements that can be composed into scenes. The base BaseAsset class provides core functionality, with specialized implementations for different media types.


The positioning system provides multiple ways to place elements in a frame through the Position protocol, with implementations for absolute, relative and region-based positioning.


Effects are implemented through the Effect protocol, allowing for extensible animation and visual effects. Built-in effects include pan and zoom capabilities.


Scenes group related assets together and manage their timing and organization. The Scene class handles asset references and timing coordination.

Script Generation

Script generation is handled through the ScriptGenerator protocol, with implementations for specific use cases like news video generation.

Speech Synthesis

Speech synthesis is abstracted through the SpeechSynthesizer protocol, with implementations for different TTS providers.

Simplified Diagram

graph TD
    subgraph Core
        Media[Media] --> Asset[Asset]
        Asset --> |references| Scene
        Position --> Asset
        Effect --> Scene

    subgraph Assets
        Asset --> ImageAsset
        Asset --> AudioAsset
        Asset --> TextAsset
        Asset --> SubtitleAsset

    subgraph Generators
        ScriptGenerator --> Scene
        SpeechSynthesizer --> AudioAsset

    subgraph Integrations
        Adapter --> Media
        Adapter --> ScriptGenerator

    classDef protocol fill:#f9f,stroke:#333,stroke-width:2px
    classDef base fill:#bbf,stroke:#333,stroke-width:2px
    classDef concrete fill:#dfd,stroke:#333,stroke-width:2px

    class Position,Effect,ScriptGenerator,SpeechSynthesizer,Adapter protocol
    class Media,Asset base
    class ImageAsset,AudioAsset,TextAsset,SubtitleAsset concrete