1. Project name: Learner Corpus

2. Outline of the Project:
In this project, we aim to construct a learner corpus of Japanese by gathering data at educational institutions overseas. We have implemented two major activities: 1) collection of composition data using the e-learning system and 2) collection of composition data and construction of online dictionary of Japanese errors. Data for 1) were collected at Tamkang University (Taiwan). To parallel with these data, additional data were collected from learners at SOAS (UK), and from Japanese native speakers at TUFS (Japan). Data for 2) were collected at the University of Leeds (UK), Kiev State Linguistic University (Ukraine) and MingChuan University (Taiwan).

3. Core Member in Charge: Umino, Tae (GSACS)


4. Collaborating Researchers:
1) CONSTRUCTION OF LEARNER CORPUS OF JAPANESE USING THE E-LEARNING SYSTEM
Cooperators within the university: Lin, ChunChen (GSACS); Okada, Akito (GSACS)
Graduate students:Suzuki, Ayano (Doctoral Student);  Yang, ChiaChen (Doctoral Student), Inokawa, Mutsumi (Doctoral Student); Torii, Aya (Master’s Student)
Data Provider:Peng, ChunYang(Tamkang University, Taiwan), Horikoshi, Kazuo (Tamkang University, Taiwan), Barbara Pizziconi (SOAS, UK)
2) CONSTRUCTION OF CORPUS OF JAPANESE LEARNER ERRORS AND ONLINE DICTIONARY OF JAPANESE ERRORS
Cooperators within the university: Mochizuki, Keiko (GSACS)
Graduate students:Seah, Terence (Doctoral Student), Cai, SongYi (Doctoral Student), Fukuda,Sho(Doctoral Student),  Oyanagi , Noboru (Doctoral Student) , Kobernick ,Nadya
(Doctoral Student), Zhang ,ZhiLing(Doctoral Student), Takasugi ,Hiroko (Master’s Student),
Shida,Takahiro (Master’s Student), Sumiya ,Kazuki (Master’s Student), Arakawa,Kazuhito(Research Student),  Ichikawa,Junta(Asia-Africa Linguistic Institute)
Data Provider: Morimoto, Kazuki (the University of Leeds, UK), Oheda, Yuka(the University
of Leeds, UK), Gornovska, Olga (Kiev State Linguistic University ,Ukraine),Yakovchuk,Svitlana
(Kiev State Linguistic University ,Ukraine), Yang, YuWen (MingChuan University,Taiwan) and Xu,MengLing(MingChuan University,Taiwan).


5. Progress and accomplishment:
[1] Data Acquisition in Taiwan (by Umino and Lin): These data are available in plain text form. Morphological analysis of the data will be completed by the end of March, 2012.
First Phase (February-June, 2008): 8 functional tasks, 8 times diary writing tasks by 22 learners (52800 characters in total); Second Phase (September-December, 2008): 8 functional tasks, 8 times diary writing tasks by 10 learners (12000 characters in total); Third Phase (February-June, 2009): 8 functional tasks and 8 times diary writing tasks by 24 learners (72,000 characters in total); Fourth Phase (September-December, 2009): 8 times diary writing tasks by 8 learners (13,000 characters in total); Fifth Phase (February-June, 2010): 8 functional tasks and 8 times diary writing tasks by 26 learners (74,000 characters in total); Sixth Phase (September-December, 2010): 8 times diary writing tasks by 14 learners (22,000 characters in total); Seventh Phase (February-June, 2011): 8 functional tasks and 8 times diary writing tasks by 24 learners (87,000 characters in total)
Other related data: 1. With the intention of conducting comparative research of the learners’ ability before, during, and after studying in Japan, we experimentally gathered data of the Japanese language learners studying in Japan using the similar tasks, under the cooperation of SOAS, University of London (UK). 2. With the intention of conducting research to compare with Japanese native speakers, we gathered data from 59 undergraduate students who are Japanese native speakers (8 functional tasks and 1 time diary writing task). We intend to expand the data in the future projects.

[2] Data Acquisition (by Mochizuki) : U.K.(145 essays by 113 learners, 122,980 characters), Ukraine(169 essays by 59 learners, 35,585characters) and Taiwan (81 essays by 29 learners, 35,178 characters).

The data from [1] and [2] are expected to be released in the following two corpora by the end of March, 2012.
・Learner’s Language Corpus of Japanese http://cblle.tufs.ac.jp/llc/ja/ (string search)
All data from [1] and [2] will be available here.
・the CbLLE POS Research Engine (written Japanese by Japanese learners) http://cblle.tufs.ac.jp/tag/ja/index.php?menulang=ja (parts of speech search)
All data from [1] will be available here.

[3] Database of Japanese composition by advanced learners (by Umino) :These data acquired from 121 learners (587 compositions, 469600 characters) are available in plain text form and will be available for public access.

[4] Online Dictionary of Japanese Errors based on the learner corpus of Japanese (by Mochizuki). This dictionary of Japanese errors will contain 380 essays among the data in [2] and will register 10,498 errors analyzed according to the grammatical and lexical categories. The explanations of the causes of errors and the revised versions are offered mainly for the sake of the teachers and learners of Japanese. This online dictionary is available for public access in our HP.