curve_top_left curve_top_right
 

Home

■Total tokens
813579

What's New
2014/07/16 595213 tokens Aix-Marseille I, Febrary-March 2010
2009/04/03 59603 tokens Paris XIII, Febrary-October 2006
2008/06/11 158763 tokens Aix-Marseille I, July 2005


Outline & Purpose
CbLLE POS Research Engine (Spoken French) is to search part-of-speech tags of the Multilingual Spoken Language Corpora (French) developped by the 21st Century COE Program “Usage-Based Linguistic Informatics”. Part-of-speech (POS) tagging is based on the system called Tree Tagger, cf. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/. The purpose of this research engine is largely to publish the research results from the Globla COE Program “Corpus-based Linguistics and Language Education (CbLLE) and to analyze various grammatical phenomena in spoken French.

Corpus Description
The current corpora of Spoken French are composed of two different datas.
1. Twenty one dialogs recorded during July 2005 with the cooperation of the University of Aix-Marseille I. The total length is around seven hours and the total number of word tokens exceeds 150.000. They cover various subjects such as office life, work at the university, recent news about one’s family and improvised sketches.
2. Seven dialogs recorded during Febrary-October 2006 with the cooperation of the University of Paris XIII. The total length is around eight hours and the total number of word tokens exceeds 50.000. They cover subjects such as music and culture.
3. Thirty three dialogs recorded during Febrary-March 2010 with the cooperation of the University of Aix-Marseille. The total length is around thirty four hours and the total number of word tokens exceeds 590.000. They are various conversations among students.

References
http://www.coelang.tufs.ac.jp/multilingual_corpus/fr/index.html?contents_xml=gaisetsu&menulang=en
http://www.coelang.tufs.ac.jp/multilingual_corpus/fr2/index.html?contents_xml=gaisetsu&menulang=en

Research Team
Supervisors:
- Yuji Kawaguchi (Tokyo University of Foreign Studies)
- Hisae Akihiro (Tokyo University of Foreign Studies)
- Atsushi Sano (Fukushima University)

Organizers:
- Kaori Sugiyama (Seinan Gakuin University)
- Sunsuke Nakata (Tokyo University of Foreign Studies)
- Mito Matsuzawa (Tokyo University of Foreign Studies)
- Nori Kondo (Tokyo University of Foreign Studies)
- Kentaro Koga (Tokyo University of Foreign Studies)
- Misato Kikuchi (Tokyo University of Foreign Studies)
- Françoise Lorant (University of Paris XIII)
- Takahiro Ogawa

Academic Advisers:
- †Claire Blanche-Benveniste (University of Aix-Marseille)
- José Delofeu (University of Aix-Marseille)
- Frédéric Sabio (University of Aix-Marseille)
- André Valli (University of Aix-Marseille)
- Jeanne-Marie Debaisieux (University of Paris III)
- Christophe Benzitoun (University of Lorraine)
- Takaaki Shochi (University of Bordeaux III)

Programming:
- Tsuyoshi Umeno (Tokyo University of Foreign Studies)
- Kaori Omura (Tokyo University of Foreign Studies)

Informants:
Students and teaching staffs of Universities of Aix-Marseille, Paris III and Paris XIII, etc.

Correcters of part-of-speech tags:
Students of French Section of Tokyo University of Foreign Studies

UBLE

 
curve_top_left curve_top_right