Kidaw'ida
| dc.contributor.author | Mbogho, A., Awuor, Q., Kipkebut, A., Wanzare, L., & Oloo, V. | |
| dc.date.accessioned | 2026-03-24T17:03:26Z | |
| dc.date.issued | 2025-01-19 | |
| dc.description | A collection of read speech recordings in Kidaw'ida (dav). This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset. | |
| dc.description.abstract | This datasheet is for cv-corpus-25.0-2026-03-09 of the Mozilla Common Voice Scripted Speech dataset for Kidaw'ida [dav - dav]. The dataset contains 49630 clips representing 55.95 hours of recorded speech (9.31 hours validated) from 24 speakers, recorded from a text corpus of 31,892 sentences. | |
| dc.identifier.citation | Mbogho, A., Awuor, Q., Kipkebut, A., Wanzare, L., & Oloo, V. (2025). Building low-resource African language corpora: A case study of Kidaw'ida, Kalenjin and Dholuo. arXiv. https://doi.org/10.48550/arXiv.2501.11003 | |
| dc.identifier.uri | https://datacollective.mozillafoundation.org/datasets/cmn2e92b801limm07whtrn9te | |
| dc.identifier.uri | https://dspace.fiti.info/handle/00254/49 | |
| dc.publisher | Mozilla Foundation | |
| dc.subject | Kidaw'ida | |
| dc.subject | Natural language processing | |
| dc.subject | Low-resource languages | |
| dc.subject | African languages | |
| dc.subject | Corpus building | |
| dc.subject | Crowd sourcing | |
| dc.title | Kidaw'ida | |
| dc.title.alternative | Common Voice Scripted Speech 25.0 - Kidaw'ida | |
| dc.type | Dataset |