A python package for analyzing, visualizing and generating synthetic birdsongs from recorded audios.
- Birdsongs
Design, development, and evaluation of a computational-physical model for generating synthetic birdsongs from recorded samples.
Study and Python implementation of the motor gestures for birdsongs model created by Prof. G Mindlin. This model explains the physics of birdsong by simulating the organs involved in sound production in birds, including the syrinx, trachea, glottis, oro-esophageal cavity (OEC), and beak, using ordinary differential equations (ODEs).
This work presents an automated model for generating synthetic birdsongs that are comparable to real birdsong in both spectrographic and temporal aspects. The model utilizes motor gestures for birdsongs model and an audio recording of real birdsong as input. Automation is achieved by formulating a minimization problem with three control parameters: air sac pressure of the bird’s bronchi, labial tension of the syrinx walls, and a time scale constant. This optimization problem is solved using numerical methods, signal processing tools, and numerical optimization techniques. The objective function is based on the Fundamental Frequency (also called pitch, denoted as FF or F0) and the Spectral Content Index (SCI) of both synthetic and real syllables.
The model is tested and evaluated on three different Colombian bird species: Zonotrichia capensis, Ocellated Tapaculo, and Mimus gilvus, using recorded samples downloaded from Xeno-Canto and eBird audio libraries. The results show relative errors in FF and SCI of less than 10%, with comparable spectral harmonics in terms of number and frequency, as detailed in the Results section.
This repository contains the documentation, scripts, and results developed to achive the proposed objective. The files and information are divided in branches as follows:
- main: Python package with model code implementation, tutorial examples, example data, and results.
- dissertation: Latex document of the bachelor's dissertation: Design, development, and evaluation of a computational physical model to generate synthetic birdsongs from recorded samples.
- gh-pages: Archieves for the BirdSongs website, a more in-depth description of the package.
- results: Some results obtanied from the tutorial examples: image, audios and motor gesture parameters (.csv).
The physical model used, Motor Gestures for birdsongs [1], have been developed by profesog G. Mindlin at the Dynamical Systems Laboratory (in spanish LSD) of the university of Buenos Aires, Argentina.
Schematic description of the physical model motor gestures for birdsongs with the organs involved in the sound production (syrinx, trachea, glotis, OEC, and beak) and their corresponding ODEs.
Figure 1. Motor gestures model diagram.By leveraging the Object-Oriented Programming (OOP) paradigm, the need for lengthy code is minimized. Additionally, the execution and implementation of the model are efficient and straightforward, allowing for the creation and comparison of several syllables with a single line of code. To solve the optimization problem and to analyze and compare real and synthetic birdsong, five objects are created:
- BirdSong: Read an audio using its file name and a path sobject, it computes the audio spectral and temporal features. It can also split the audio into syllables (in process).
- Syllable: Create a birdsong syllable from a birdsong object using a time interval that can be selected in the plot or defined as a list. The spectral and temporal features of the syllable are automatically computed.
- Optimizer: Create an optimizer that solves the minimization problem using the method entered (the default is brute force but can be changed to leastsq, bfgs, newton, etc. The use of a different method to brute force need to add additional parameters. Further information in lmfit) in a feasible region that can be modified.
- Plotter: Visualize the birdsong and sylalble objects and their spectral and temporal features. It also include a functionality to select points from the spectrum.
- Paths: Manage the package paths, audio files and results directories.
For each object an icon is defined as follows:
Figure 2. Objects implemented.This approach simplifies the interpretation of the methodology diagram. Each icon represents an object that handles different tasks. The major advantage of this implementation is the ability to easily compare features between syllable or chunk (small part of a syllable) objects.
Using the previous defined objects, the optimization problem is solved by following the next steps below:
Figure 3. Methodology diagram.Each step includes the icon of the object involved. The final output is a parameters object (a data frame similar to the lmfit library parameters objects) containing the optimal control parameter coefficients for the motor gestures that best reproduce the real birdsong.
birdsong
is implemented in Python 3.8 but is also tested in Python 3.10 and 3.11. The required packages are listed in the file requirements.txt.
To use birdsongs, clone the main
branch of the repository and go to its root folder.
git clone -b main --single-branch https://github.com/saguileran/birdsongs.git
cd birdsongs
You can clone the whole repository using the code
git clone https://github.com/saguileran/birdsongs.git
but since it is very large only the main branch is enough. To change the branch use the command git checkout
follow of the branch name of interest.
The next step is to install the required packages, any of the following commands lines will work:
pip install -r ./requirements.txt
python -m pip install -r ./requirements.txt
If you are using a version of Python higher than 3.10, to listening the audios you must execute
pip install playsound@git https://github.com/taconi/playsound
Now, install the birdsong package.
python .\setup.py install
or using pip, any of the following command lines should work:
pip install -e .
pip install .
That's all. Now let's create a synthetic birdsong!
Take a look at the tutorials notebooks for basic uses: physical model implementation, motor-gestures.ipynb; define and generate a syllable from a recorded birdsong, syllable.ipynb; or to generate a whole birdsong, several syllables, birdsong.ipynb,
Import the birdonsg package as bs
import birdsongs as bs
from birdsongs.utils import *
First, define the plotter and paths objects, optionally you can specify the audio folder or enable plotter to save figures
root = "root\path\files" # default ..\examples\
audios = "path\to\audios" # default ..\examples\audios\
results = "path\to\results" # default ..\examples\results\
paths = bs.Paths(root, audios, results)
plotter = bs.Ploter(save=True) # images are saved at ./examples/results/Images/
Displays the audios file names found with the paths.AudiosFiles(True)
function, if the folder has a spreadsheet.csv file this function displays all the information about the files inside the folder otherwise it diplays the audio files names found.
Define and plot the wave sound and spectrogram of the sample XC11293. You can use both mp3 and wav files but in Windows maybe you can get errors from librosa.load
.
birdsong = bs.BirdSong(paths, file_id="XC11293", NN=512, umbral_FF=1., Nt=500,
tlim=(0,60), flim=(100,20e3) # other features
)
plotter.Plot(birdsong, FF_on=False) # plot the wave sound and spectrogram without FF
birdsong.Play() # listen the birdsong
Note
The parameter Nt is related to the envelope of the waveform, for long audios large Nt while for short audios small Nt.
Define the syllable using a time interval of interest and the previous birdsong object. The syllable inherits the birdsong attributes (NN, flim, paths, etc). To select the two time interval points (start and end of the syllable) from the plot change the SelectTime_on
argument of the plotter.Plot()
funtion to True
.
# selec time intervals from the birdsong plot, you can select a single pair
plotter.Plot(birdsong, FF_on=False, SelectTime_on=True) # select
Then, define a birdsong syllable with the time interval selected
time_intervals = Positions(plotter.klicker) # save
print(time_intervals) # display
# define the syllable object
syllable = bs.Syllable(birdsong, tlim=time_intervals[0], Nt=30, ide="syllable")
plotter.Plot(syllable, FF_on=True)
Important
The algorithm used to calculate the fundamental frequency does not perform well at the extremes of the syllable. To avoid issues, do not select the exact extremes; instead, choose a slightly shorter segment of the syllable.
Now let's define the optimizer object to generate the synthetic syllable by solving the optimization problem. First, create the optimizer object by specifying the optimization method, its parameters, and the syllable of interest.
brute_kwargs = {'method':'brute', 'Ns':11} # optimization method, Ns is the number of grid points
optimizer = bs.Optimizer(syllable, brute_kwargs) # optimizer object
Then, execute the solver to find the optimal time scalar constant and the optimal motor gesture parameters (labial tension and air sac pressure vairables)
optimal_gm = optimizer.OptimalGamma(syllable) # find optimal gamma (time scale constant)
optimizer.OptimalParams(syllable, Ns=11) # find optimal parameters coefficients
#syllable, synth_syllable = optimizer.SongByTimes(time_intervals) # find optimal parameters over several time intervals
Finally, define the synthetic syllable object with the optimal values found above.
synth_syllable = syllable.Solve(syllable.p)
Finally, visualize and write the optimal synthetic audio.
plotter.Plot(synth_syllable); # sound wave and spectrogram of the synthetic syllable
plotter.PlotVs(synth_syllable); # physical model variables over the time
plotter.PlotAlphaBeta(synth_syllable); # motor gesture curve in the parametric space
plotter.Syllables(syllable, synth_syllable); # synthetic and real syllables spectrograms and waveforms
plotter.Result(syllable, synth_syllable); # scoring variables and other spectral features
birdsong.WriteAudio(); synth_syllable.WriteAudio(); # write both audios at ./examples/results/Audios
Note
To generate a single synthetic syllable (or chunck) you must have defined a birdsong (or syllable) and the process is as follows:
- Define a path object.
- Define a birdsong object using the above path object, it requeries the audio file id. You can also enter the length of the window FFT and the umbral (threshold) for computing the FF, between others.
- Select or define the time intervals of interest.
- Define an optimization object with a dictionary of the method name and its parameters.
- Find the optimal gammas for all the time intervals, or a single, and average them.
- Find and export the optimal labia parameters for each syllable, the motor gesture curve.
- Generate synthetic birdsong from the previous control parameters found.
- Visualize and save all the syrinx, scoring, and result variables.
- Save both synthetic and real syllable audios.
The repository has some audio examples, in the ./examples/audios folder. You can download and store your own audios in the same folder or enter the audio folder path to the Paths object.
The audios can be in WAV of MP3 format. If you prefer WAV format, we suggest use Audacity to convert the audios without any issue.
The model is tested and evaluated with different syllables of the birdsong of the Rufous Collared Sparrow. Results are located at examples/examples, images and audios. For more information visit the project website birdsongs or access the results page.
Simple syllable of a birdsong of the Rufous Collared Sparrow
Simple syllable of a birdsong of the Ocellated Tapaculo - Acropternis
The PDF document of the bachelor thesis, Design, development, and evaluation of a computational physical model to generate synthetic birdsongs from recorded samples, is stored in the dissertation
brach of this repository.
Some of the applications of this model are:
- Data Augmentation: use the model to create numerous synthetic syllables, it can be done by creating a syntetic birdsong and then varying its motor gestures parameters to get similar birdsongs.
- Birdsongs Descriptions: characterize and compare birdsongs using the motor gestures parameters.