Common Voice Scripted Speech 25.0 - Swahili

dc.contributor.authorMozilla Foundation
dc.date.accessioned2026-03-24T15:49:16Z
dc.date.issued2026-03-23
dc.description<table border="1" cellpadding="8" cellspacing="0" style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif; font-size: 0.9em;"> <thead> <tr style="background-color: #f2f2f2; text-align: left;"> <th>Code</th> <th>Variant</th> <th>Clips</th> <th>Speakers</th> </tr> </thead> <tbody> <tr> <td>sw-baratz</td> <td>Kiswahili cha Bara ya Tanzania</td> <td>22,486 (3.1%)</td> <td>24 (1.6%)</td> </tr> <tr style="background-color: #f9f9f9;"> <td>sw-sanifu</td> <td>Kiswahili Sanifu (EA)</td> <td>21,833 (3.0%)</td> <td>69 (4.5%)</td> </tr> <tr> <td>sw-barake</td> <td>Kiswahili cha Bara ya Kenya</td> <td>6,487 (0.9%)</td> <td>59 (3.9%)</td> </tr> <tr style="background-color: #f9f9f9;"> <td>sw-kingwana</td> <td>Kingwana (DRC)</td> <td>1,756 (0.2%)</td> <td>34 (2.2%)</td> </tr> <tr> <td>sw-kimvita</td> <td>Kimvita (KE) — Central dialect</td> <td>15 (0.0%)</td> <td>2 (0.1%)</td> </tr> <tr style="background-color: #f9f9f9;"> <td>sw-katanga</td> <td>Katanga (DRC)</td> <td>5 (0.0%)</td> <td>1 (0.1%)</td> </tr> </tbody> </table>
dc.description.abstractA collection of read speech recordings in Swahili (Kiswahili). This datasheet is for cv-corpus-25.0-2026-03-09 of the Mozilla Common Voice Scripted Speech dataset for Swahili [Kiswahili - sw]. The dataset contains 730187 clips representing 1064.02 hours of recorded speech (392.11 hours validated) from 1518 speakers, recorded from a text corpus of 140,486 sentences. This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.
dc.identifier.urihttps://datacollective.mozillafoundation.org/datasets/cmn3ailbd008nmb07mjyu3xro
dc.identifier.urihttps://dspace.fiti.info/handle/00254/43
dc.publisherMozilla Foundation
dc.subjectKiswahili cha Bara ya Tanzania
dc.subjectKiswahili Sanifu (EA)
dc.subjectKiswahili cha Bara ya Kenya
dc.subjectKingwana (DRC)
dc.subjectKimvita (KE)
dc.subjectKatanga (DRC)
dc.titleCommon Voice Scripted Speech 25.0 - Swahili
dc.typeDataset

Files