tufs_banner ubli_banner
English Top » Corpus linguistics » Medium-Scale and Small-Scale Language Corpus

1.Project name: Corpus Compilation of Data from Medium/Minor Language Groups

 2.Outline of the Project:
Regarding languages for which either corpus data or electronic dictionaries do not exist, or neither do not exist, we will put the primary data into electronic form and make a corpus out of it, and then develop electronic dictionaries and machine-readable dictionaries. Our tentative goal is to make a corpus out of the data of the Hmong (China), Santali, Lhaovo, and other languages. 

3.Core Members in Charge: Minegishi, Makoto (ILCAA); Sawada Hideo (ILCAA) 

4.Collaborating Researchers:
Taguchi, Yoshihisa (Chiba University:Hmong language); Takashima, Jun (ILCAA: Santali Language); Ganesh Murmu (Ranchi University: Santali Language); Peri Bhaskararao (formerly ILCAA: Toda Language). 

5.Progress:
Hmong: Databasing of the lexicon survey results has been completed, and results were released as a publication.
Santali: The santali dictionary by P.O Bodding contains approximately 40,000 headwords has been digitized earlier. In 2007, as an initial stage of our linguistic study based on the dictionary, headwords are extracted and classified to examine phonetic and phonemic conditions of each vowel. In 2008, this digitized data was reexamined and modified in order to make bilingual usage corpus of Santali and English. The data is available via our web-site.
Lhaovo: Databasing of the lexicon (the enlarged and revised version of Hideo SAWADA (2004)) has been completed and the database is available on the CbLLE website.
Palaung: Inputting Japanese entries of Toru ONO's Japanese-Burmese Dictionary (1995) and its Palaung translation trascribed in IPA has been completed.
Toda: Spoken narratives were recorded. Some of them are being transcribed in narrow phonetic transcription. The sample analysis of a Toda text is publicized as an instance of how we propose to display the processed corpora for public access.

6.Accomplishments:
Hmong
Book:
Taguchi, Yoshihisa (2008) “A Vocabulary of Luobohe Miao”, Tokyo: Tokyo University of Foreign Studies.

Santali
URL with released corpus data:
http://www.aa.tufs.ac.jp/~mmine/india/Bodding2k/index.html
Paper:
Minegishi, Makoto, Jun TAKASHIMA and Ganesh MURMU “On the narrow and open “e” contrast in Santali”, (In Print Corpus Analysis and Diachronic Linguistics, John Benjamins Publisher Co.)
Academic Presentations:
Minegishi, Makoto, Jun TAKASHIMA and Ganesh MURMU “Corpus-based Analysis based on Bodding’s Santal Dictionary”, the third International Conference of Austroasiatic Linguistics , Deccan College, Pune, India, November 28, 2007.
Minegishi, Makoto, Jun TAKASHIMA and Ganesh MURMU “On the narrow and open “o” contrast in Santali”, 32nd All India Conference of Linguists, Lucknow University, India, December 21-23, 2010.

Lhaovo
URL with released vocabulary data:
http://cblle.tufs.ac.jp/med_min_lang/lhaovo/
Papers:
Sawada, Hideo (2010) "Ronwō-go no Meishiku no Sosei (Composition of noun phrase in Lhaovo)" [in Japanese]. Makoto Minegishi et al. (ed.) Working Papers in Corpus-based Linguistics and Language Education 7, Tokyo University of Foreign Studies. pp.259-283.
Sawada, Hideo (2010) "'Upward-Curling' Realization of Tone L in Lhaovo (Maru) Language". Zhaoming Dai (ed.) Forty Years of Sino-Tibetan Language Studies: Proceedings of ICSTLL-40. Heilongjiang University Press. pp.168-175.
Sawada, Hideo (2009) "Ronwō-go no Kaku-hyōji-keishiki no Taikei (Case Marking System of Lhaovo" [in Japanese]. Sawada Hideo (ed.) Grammatical Phenomena of Tibeto-Burman Languages 1: Case-marking and Related Matters. ILCAA, Tokyo University of Foreign Studies. pp.175-222.
Sawada, Hideo (2008) "20-seiki Shotō no Ronwō-go Shiryō (A Lhaovo (Maru) Material of Early 20 Century)" [in Japanese]. S.Fujishiro, M.Shogaito (ed.) Dynamics in Eurasian Languages, (Contribution to the Studies of Eurasian Languages series vol.14), Kobe City College of Nursing. pp.177-245.
Sawada, Hideo (2008) "Ronwō-go Tekisuto (II) (Lhaovo Texts (II))" [in Japanese]. Peri BHASKARARAO (ed.) Research on Minority Languages of South and South-East Asia, Report of Research Project, Grant-in-Aid for Scientific Research. ILCAA, Tokyo Univ. of Foreign Studies. pp.45-86.
Academic Presentations:
Sawada, Hideo, "Case-marking of P and A in Lhaovo". Workshop on Optional Case Marking, 16th Himalayan Languages Symposium, SOAS, London, 2010.9.3.
Sawada, Hideo, "ʔă-prefixation on Verbs and Auxiliaries in Lhaovo (Maru) Language: Non-derivational Use". 39th International Conference of Sino-Tibetan Languages and Linguistics, University of Washington, Seattle, 2006.9.14-17.

Toda
URL with released sample of analysis:
https://sites.google.com/site/bhaperi/
Academic Presentation:
Bhaskararao, Peri "Correlating linguistic abstractions with speech signal: Issues from continuous speech of Indian languages", 2010 - Workshop on Image and Speech Processing, International Institute of Information Technology, Hyderabad, India. December 16, 2010