Corpus-based Linguistics and Language Education | Collection of Spontaneous Conversational Data of Swahili

English Top » Field Linguistics » Collection of Spontaneous Conversational Data of Swahili

1.Project name: Collection of Spontaneous Conversational Data of Swahili

2.Outline of the Project:
Until now, a good amount of Swahili corpus data have been located at Helsinki University (Helsinki Corpus of Swahili, http://www.aakkl.helsinki.fi/cameel/corpus/intro.htm), however, the data collected there are mainly written language and the spoken Swahili data exist only partially and unsystematically. Spoken Swahili data, especially conversation scripts are still lacking for proper analysis of spoken Swahili.
In this project, by collecting original data of natural conversations and transcribing them, we will construct a corpus to analyze the various grammatical phenomena in spoken Swahili. Moreover, our data will complement the Helsinki corpus of Swahili.

3.Core Member in Charge: Hieda, Osamu

4.Collaborating Researchers:
Abe, Yuko (TUFS Post-doctoral Researcher, CbLLE); Abe, Maya (Adjunct Lecturer, Osaka University); Idone, Ayako (Doctoral Student, Osaka University); Miyazaki, Kumiko (Doctoral Student, Osaka University).

5.Progress
During this academic year, we arranged the project participants, and deputed one of the participants, Abe, Yuko, to Tanzania. Abe requested the cooperation of the Institute of Kiswahili Research at the University of Dar es Salaam, adjusted the plan since April, 2008 with the local coordinator, and signed the contract with the institute. In addition, as colloquial data, Abe purchased a certain volume of cartoons in a local newspaper (Sani, published twice a week), and conducted a survey of language attitudes towards a Swahili suffix -ag in Dar es Salaam and also in northern Tanzania. In the following years, we plan to collect, transcribe, and tag conversational data with the help of the project participants.

6.Accomplishments:
In the academic year of 2008, with the cooperation of the Institute of Kiswahili Research at the University of Dar es Salaam, we have done the transcription of 20 hour-conversations and put them into a convenient format for analysis. In addition, Yuko Abe gave a presentation on a Swahili suffix -ag at the Symposium of the Global COE on May 8 and 9, 2008.
The “spoken” Swahili corpus constructed in 2011 has been released. The morpheme analysis tool for Swahili is still under development, and it has not been released yet. The possibility that the analysis of the “spoken” Swahili corpus, including the morpheme analysis tool for Swahili that iscurrently under development, will contribute to linguistic study was discussed in “Pragmatic Functions of Swahili Object Suffixes – an analysis of ‘spoken’ Swahili corpus” by Osamu Hieda.

Menu: