Skip to content

Spotify Subset for accent classification in Brazilian Portuguese

Notifications You must be signed in to change notification settings

aryamtos/spotify-subset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Spotify Subset

The 'Spotify Subset' includes file names from the Spotify Dataset (Tanaka et al. (2022)) for classifying language variations in Brazilian Portuguese. The selection of file names resulted from applying a filter to the original dataset metadata, focusing on idiomatic expressions and names or acronyms of locations.

Spotify A subset

General Table

Speakers Duration Episodes Female Male
92 ~15hrs 24 min 52 43 38

Subset A Information

Accent Speaker Duration Female Male
Rio de Janeiro 5 49 min 2 3
Bahia 4 1hr 27 min 4
Mato Grosso do Sul 4 18 min 3 1
Maranhão 7 1hr 18 min 2 3
Minas Gerais ~35 5hrs 23 min ~13 ~22
Recife 10 3hrs 45 min
São Paulo ~25 1hr 18 min ~19 ~7
Rio Grande do Sul 2 ~53 min 2

Spotify B subset

General Table

Accent Train_speakers Dev_speakers Test_speakers Podcasts Episodes Duration segments
RE 69 23 11 15 57 ~48.23 14,008
SP 52 18 15 11 78 ~30.88 11,906

About

Spotify Subset for accent classification in Brazilian Portuguese

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published