Gikuyu

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Deutscher Wortschatz

Abstract

This item record provides access to the Gikuyu (Kikuyu) corpus contributed to the Leipzig Corpora Collection (LCC) via the CURL (Crawling Under-Resourced Languages) initiative at the University of Leipzig. The corpus consists of randomly selected sentences in Gikuyu harvested from web-crawled sources, processed through the standard LCC pipeline: sentence segmentation, removal of non-sentences and foreign-language material, and random reordering to destroy original document structure in compliance with copyright requirements. Word co-occurrence statistics at sentence level have been precomputed and are included with the data. Limitations: As a web-crawled corpus, the resource may contain domain imbalance skewed toward topics with higher digital representation in Gikuyu (religious texts, political commentary, diaspora media). Quality filtering removes obvious non-Gikuyu content but does not guarantee the absence of code-switching with Kiswahili or English. ISO 639-3: kik. Glottolog: kiku1240. Source: web texts submitted via the CURL community URL-contribution platform. Licence: Creative Commons Attribution (CC BY).

Description

This corpus was built through the CURL initiative, which invites community members to submit URLs of web pages in under-resourced languages for automated crawling and processing by the Leipzig pipeline. The resulting resource follows the standard Leipzig Corpora Collection format, enabling direct comparison with corpora in over 250 other languages using identical methodology. The corpus captures contemporary written Gikuyu as it appears in online contexts — including news sites, community platforms, and digital publications — and therefore reflects modern orthographic conventions and registers rather than literary or archival language.

Citation

T. Eckart and U. Quasthoff: Statistical Corpus and Language Comparison on Comparable Corpora. In: Building and Using Comparable Corpora, Springer-Verlag Berlin Heidelberg, 2013, ISBN: 978-3-642-20128-8

Endorsement

Review

Supplemented By

Referenced By