An enhanced automatic stylizer for pitch (contour) of speech corpora based on SLAM.
Luigi (Yu-Cheng) Liu
- [email protected]Anne Lacheret-Dujour
- [email protected]Emmett Strickland
- [email protected]Nicolas Obin
- [email protected]Julie Belião
SLAM : SLAM is a family of stylization software derived from the first iteration of SLAM [4]. SLAM is a data-driven language independent software for pitch (contour) annotation of speech corpora. It integrates an algorithm for the automatic stylization and labelling of melodic contours, developed to process intonation. SLAM method is based on the bottom-up generation of the contours. The underlying algorithm can be highlighted with the following features:
- Model-agnostic Approach:
- Stylized melodic contours are directly derived from a manually cleaned (denoised) pitch signal.
- Time-Frequence Representation:
- Melodic contours, simple or complex, are described through a simple time-frequency representation.
- The melodic contours are automatically represented with a vocabulary of tonal labels
L
,l
,m
,h
,H
- User-defined Linguistic Units:
- Melodic contours are used to describe various linguistic units as specified by users.
- The linguistic variation concerns
- The nature (pragmatics, syntactic, phonologic) of the unit
- The size of the unit (from the syllable to larger prosodic and syntactic units)
Two enhanced features are added in SLAM , the second major iteration
- Support source data
- a pair of
Praat PitchTier
(binary or short text) file (.PitchTier
)- Associated
Praat TextGrid
file (.TextGrid
)
Praat Collection
file in binary format (.Collection
)Analor JavaObj
file (.or
)
- a pair of
- Generate a double stylization based on:
- Global register (calculated on classic account of intonational register)
- Parametrizable local register (computed on a short-term account of intonational register)
The third iteration, SLAM3, adds several additional functionalities.
- An automatic correction of minor overlaps between F0 contours and alignments
- Integration of the glissando threshold for an improved perceptive modelling of short-duration units
- An automatic correction of F0 microvariations
- Exportable tabular data files
We show, in the figure below, a visualization of pitch contours and their analysis by SLAM . These contours realize the following utterance 'euh on est partis au Portugal complètement' (Uh, we went to Portugal entirely.) (Rhap-D1003). Analysis is conducted with configuration of support
and target
detailed in the following: support
is the temporal interval uxed to compute the global register of the targets. target is the temporal interval to which a melodic contour is computed. As indicated by target's labels, 'euh' and 'on est partis au Portugal complètement' are signaled respectively as N[Assos_N_U]
(discourse marker) and N
(the nucluer) of a speech act.
Fig 1. Example of analysis carried out by SLAM on a sample of the Rhapsodie Spoken French corpus [3].
-
Install Python3 under MacOS. For more information, users are refered to this installation guide which we find very helpful.
-
Download or clone SLAMplus.
-
Install the following libraries required by SLAM via pip3:
sudo pip3 install numpy scipy matplotlib pandas sympy nose chardet
-
Download or clone SLAMplus.
-
Install the following libraries required by SLAM :
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose python-tgt python-scikit-learn
-
Download SLAMplus.
-
Choose a full version of WinPython and download it.
-
Then put the decompressed content of
SLAMplus
in the sub-directory of WinPython wherepython.exe
is
-
Drop your
PitchTier
files andTextGrid
files in the sub-directorydata
of the correspondingSLAMplus
directory.PitchTier
files must come in pair of the same name withTextGrid
files. As an example:myfile1.PitchTier
myfile1.TextGrid
myfile2.wav
myfile2.TextGrid
-
Open a terminal and go to the
SLAMplus
directory -
Execute
for Linux
python SLAMplus.py
for Windows
python.exe SLAMplus.py
- Follow the instructions.
Configuration of SLAM to suit your work:
-
Open the
SLAMplus.py
in theSLAMplus
working folder with text editor (recommaded 'notepad ') -
Edit the values of
SpeakerTier
,TargetTier
andTagTier
.
Note: These values as stated here are different tiers specified in the concerned TextGrid
files. SupportTier
(as valued in this work) is defined as the tier name where the largest units of register estimation are delimited. TargetTier
is defined as the tier name where units of stylization are bounded. TagTier
provides additional descriptive information of the contents. It is used to compare and ascertain the details of SpeakerTier
and TargetTier
.
For the examples (NaijaSynCor project: JOS_01_V___MDT) in the following, we use the same TextGrid
file which provides 4 annotation tiers. These tiers are
- Syllabes (
Syl
) - Prosodic Word (
PrWd
) - Prosodic Phrase (
PP
) - Large Prosodic Unit (
LPU
)
Note that only the targetTie
varies in these exemples while SupportTier
and TagTier
are fixed as LPU
and PrWd
, respectively.
Fig 2. Input TextGrid file used in examples
Fig 3. Configuration for Syllabes (Syl) as target
Fig 4. A sample of analysis Result
Fig 5. Configuration for Prosodic Phrase (PP) as target
Fig 6. A sample of Analysis Result
L. Liu, A. Lacheret-Dujour, N. Obin (2019), AUTOMATIC MODELLING AND LABELLING OF SPEECH PROSODY: WHAT’S NEW WITH SLAM ?. In ICPhS (to appear).
[1] Camacho, A. (2007). SWIPE: A sawtooth waveform inspired pitch estimator for speech and music. Gainesville: University of Florida.
[2] Cleveland, W. S. (1981). LOWESS: A program for smoothing scatterplots by robust locally weighted regression. American Statistician, 35(1), 54.
[3] Lacheret, A., Kahane, S., Beliao, J., Dister, A., Gerdes, K., Goldman, J. P., ... & Tchobanov, A. (2014, May). Rhapsodie: a prosodic-syntactic treebank for spoken french. In Language Resources and Evaluation Conference.
[4] N. Obin, J. Beliao, C., Veaux, A. Lacheret (2014). SLAM: Automatic Stylization and Labelling of Speech Melody. Speech Prosody, 246-250.
[5] Deulofeu, J., Duffort, L., Gerdes, K., Kahane, S., & Pietrandrea, P. (2010, July). Depends on what the French say spoken corpus annotation with and beyond syntactic functions. In Proceedings of the Fourth Linguistic Annotation Workshop (pp. 274-281). Association for Computational Linguistics.
[6] Oyelere S. Abiola, Candide Simard and Anne Lacheret (2018). Prominence in the Identification of Focussed Elements in Naija. In Workshop on the Processing of Prosody across Languages and Varieties (Proslang).