Gikuyu

dc.contributor.authorDeutscher Wortschatz
dc.date.accessioned2026-03-25T03:06:08Z
dc.date.issued2017-06-01
dc.descriptionThis corpus was built through the CURL initiative, which invites community members to submit URLs of web pages in under-resourced languages for automated crawling and processing by the Leipzig pipeline. The resulting resource follows the standard Leipzig Corpora Collection format, enabling direct comparison with corpora in over 250 other languages using identical methodology. The corpus captures contemporary written Gikuyu as it appears in online contexts — including news sites, community platforms, and digital publications — and therefore reflects modern orthographic conventions and registers rather than literary or archival language.
dc.description.abstractThis item record provides access to the Gikuyu (Kikuyu) corpus contributed to the Leipzig Corpora Collection (LCC) via the CURL (Crawling Under-Resourced Languages) initiative at the University of Leipzig. The corpus consists of randomly selected sentences in Gikuyu harvested from web-crawled sources, processed through the standard LCC pipeline: sentence segmentation, removal of non-sentences and foreign-language material, and random reordering to destroy original document structure in compliance with copyright requirements. Word co-occurrence statistics at sentence level have been precomputed and are included with the data. Limitations: As a web-crawled corpus, the resource may contain domain imbalance skewed toward topics with higher digital representation in Gikuyu (religious texts, political commentary, diaspora media). Quality filtering removes obvious non-Gikuyu content but does not guarantee the absence of code-switching with Kiswahili or English. ISO 639-3: kik. Glottolog: kiku1240. Source: web texts submitted via the CURL community URL-contribution platform. Licence: Creative Commons Attribution (CC BY).
dc.identifier.citationT. Eckart and U. Quasthoff: Statistical Corpus and Language Comparison on Comparable Corpora. In: Building and Using Comparable Corpora, Springer-Verlag Berlin Heidelberg, 2013, ISBN: 978-3-642-20128-8
dc.identifier.urihttps://curl.corpora.uni-leipzig.de/languages/kik
dc.identifier.urihttps://dspace.fiti.info/handle/00254/50
dc.publisherDeutscher Wortschatz
dc.subjectweb-crawled corpus
dc.subjectGikuyu
dc.subjectGĩkũyũ
dc.titleGikuyu
dc.typeDataset

Files

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: