Common Voice Scripted Speech 25.0 - Swahili
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Mozilla Foundation
Abstract
A collection of read speech recordings in Swahili (Kiswahili). This datasheet is for cv-corpus-25.0-2026-03-09 of the Mozilla Common Voice Scripted Speech dataset for Swahili [Kiswahili - sw]. The dataset contains 730187 clips representing 1064.02 hours of recorded speech (392.11 hours validated) from 1518 speakers, recorded from a text corpus of 140,486 sentences.
This dataset is released under the Creative Commons Zero (CC-0) licence. By downloading this data you agree to not determine the identity of speakers in the dataset.
Description
<table border="1" cellpadding="8" cellspacing="0" style="border-collapse: collapse; width: 100%; font-family: Arial, sans-serif; font-size: 0.9em;">
<thead>
<tr style="background-color: #f2f2f2; text-align: left;">
<th>Code</th>
<th>Variant</th>
<th>Clips</th>
<th>Speakers</th>
</tr>
</thead>
<tbody>
<tr>
<td>sw-baratz</td>
<td>Kiswahili cha Bara ya Tanzania</td>
<td>22,486 (3.1%)</td>
<td>24 (1.6%)</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td>sw-sanifu</td>
<td>Kiswahili Sanifu (EA)</td>
<td>21,833 (3.0%)</td>
<td>69 (4.5%)</td>
</tr>
<tr>
<td>sw-barake</td>
<td>Kiswahili cha Bara ya Kenya</td>
<td>6,487 (0.9%)</td>
<td>59 (3.9%)</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td>sw-kingwana</td>
<td>Kingwana (DRC)</td>
<td>1,756 (0.2%)</td>
<td>34 (2.2%)</td>
</tr>
<tr>
<td>sw-kimvita</td>
<td>Kimvita (KE) — Central dialect</td>
<td>15 (0.0%)</td>
<td>2 (0.1%)</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td>sw-katanga</td>
<td>Katanga (DRC)</td>
<td>5 (0.0%)</td>
<td>1 (0.1%)</td>
</tr>
</tbody>
</table>