Common Voice Scripted Speech 25.0 - Swahili
| dc.contributor.author | Mozilla Foundation | |
| dc.date.accessioned | 2026-03-24T15:49:16Z | |
| dc.date.issued | 2026-03-23 | |
| dc.description | <table border="1" cellpadding="8" cellspacing="0" style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif; font-size: 0.9em;"> <thead> <tr style="background-color: #f2f2f2; text-align: left;"> <th>Code</th> <th>Variant</th> <th>Clips</th> <th>Speakers</th> </tr> </thead> <tbody> <tr> <td>sw-baratz</td> <td>Kiswahili cha Bara ya Tanzania</td> <td>22,486 (3.1%)</td> <td>24 (1.6%)</td> </tr> <tr style="background-color: #f9f9f9;"> <td>sw-sanifu</td> <td>Kiswahili Sanifu (EA)</td> <td>21,833 (3.0%)</td> <td>69 (4.5%)</td> </tr> <tr> <td>sw-barake</td> <td>Kiswahili cha Bara ya Kenya</td> <td>6,487 (0.9%)</td> <td>59 (3.9%)</td> </tr> <tr style="background-color: #f9f9f9;"> <td>sw-kingwana</td> <td>Kingwana (DRC)</td> <td>1,756 (0.2%)</td> <td>34 (2.2%)</td> </tr> <tr> <td>sw-kimvita</td> <td>Kimvita (KE) — Central dialect</td> <td>15 (0.0%)</td> <td>2 (0.1%)</td> </tr> <tr style="background-color: #f9f9f9;"> <td>sw-katanga</td> <td>Katanga (DRC)</td> <td>5 (0.0%)</td> <td>1 (0.1%)</td> </tr> </tbody> </table> | |
| dc.description.abstract | A collection of read speech recordings in Swahili (Kiswahili). This datasheet is for cv-corpus-25.0-2026-03-09 of the Mozilla Common Voice Scripted Speech dataset for Swahili [Kiswahili - sw]. The dataset contains 730187 clips representing 1064.02 hours of recorded speech (392.11 hours validated) from 1518 speakers, recorded from a text corpus of 140,486 sentences. This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset. | |
| dc.identifier.uri | https://datacollective.mozillafoundation.org/datasets/cmn3ailbd008nmb07mjyu3xro | |
| dc.identifier.uri | https://dspace.fiti.info/handle/00254/43 | |
| dc.publisher | Mozilla Foundation | |
| dc.subject | Kiswahili cha Bara ya Tanzania | |
| dc.subject | Kiswahili Sanifu (EA) | |
| dc.subject | Kiswahili cha Bara ya Kenya | |
| dc.subject | Kingwana (DRC) | |
| dc.subject | Kimvita (KE) | |
| dc.subject | Katanga (DRC) | |
| dc.title | Common Voice Scripted Speech 25.0 - Swahili | |
| dc.type | Dataset |