A library and a framework for synthesizing images containing handwritten music, intended for the creation of training data for OMR models.
Try out the demo on 🤗 Huggingface Spaces right now!
Example output with MUSCIMA writer no. 28 style:
Install from pypi with:
pip install smashcima
Smashcima is a Python package primarily intended to be used as part of optical music recognition workflows, esp. with domain adaptation in mind. The target user is therefore a machine-learning, document processing, library sciences, or computational musicology researcher with minimal skills in python programming.
Smashcima is the only tool that simultaneously:
- synthesizes handwritten music notation,
- produces not only raster images but also segmentation masks, classification labels, bounding boxes, and more,
- synthesizes entire pages as well as individual symbols,
- synthesizes background paper textures,
- synthesizes also polyphonic and pianoform music images,
- accepts just MusicXML as input,
- is written in Python, which simplifies its adoption and extensibility.
Therefore, Smashcima brings a unique new capability for optical music recognition (OMR): synthesizing a near-realistic image of handwritten sheet music from just a MusicXML file. As opposed to notation editors, which work with a fixed set of fonts and a set of layout rules, it can adapt handwriting styles from existing OMR datasets to arbitrary music (beyond the music encoded in existing OMR datasets), and randomize layout to simulate the imprecisions of handwriting, while guaranteeing the semantic correctness of the output rendering. Crucially, the rendered image is provided also with the positions of all the visual elements of music notation, so that both object detection-based and sequence-to-sequence OMR pipelines can utilize Smashcima as a synthesizer of training data.
(In combination with the LMX canonical linearization of MusicXML, one can imagine the endless possibilities of running Smashcima on inputs from a MusicXML generator.)
To quickly learn how to start using Smashcima for your project, start with the tutorials:
Smashcima is primarily a framework and a set of crafted interfaces for building custom visual-data related synthesizers.
- Introduction
- Models and service orchestration
- Scene
- Scene objects
- Affine spaces and rendering
- Semantic music scene objects
- Visual music scene objects
- Synthesis
- Synthesizer interfaces
- Glyphs
- Style control
- Asset bundles
- ...
If you feel like improving the library, take a look at the TODO List.
Create a virtual environment and install dependencies:
python3 -m venv .venv
.venv/bin/pip3 install -e .
# to run jupyter notebooks:
.venv/bin/pip3 install -e .[jupyter]
# to run the gradio demo:
.venv/bin/pip3 install -e .[gradio]
This work has been done by the OmniOMR project within the 2023-2030 NAKI III programme, supported by the Ministry of Culture of the Czech Republic (DH23P03OVV008).
There's a publication being written. Until then, you can cite the original Mashcima paper:
Jiří Mayer and Pavel Pecina. Synthesizing Training Data for Handwritten Music Recognition. 16th International Conference on Document Analysis and Recognition, ICDAR 2021. Lausanne, September 8-10, pp. 626-641, 2021.
Developed and maintained by Jiří Mayer ([email protected]) as part of the Prague Music Computing Group lead by Jan Hajič jr. ([email protected]).