Notes from Tuesday 2007-03-20 Natural Language Processing Meeting in Japan

I attended the (Japanese) Natural Language Processing meeting in Ryukoku University from the 20th until the 23rd. I’ve taken some notes on the sessions that I attended.

Session B1: Meaning Analysis

意味分析
Session chair is Utusrou Takehito 宇津呂武仁 from Tkuba Daigaku.

B1-1 構文解析を補助的に用いる意味解析
○船越孝太郎, 中野幹生, 長谷川雄二, 辻野広司 (HRI-JP)
B1-2 結合価パターン辞書からの情緒を明示する用言の知識ベース化
○黒住亜紀子, 徳久雅人, 村上仁一, 池原悟 (鳥取大)
B1-3 SYNGRAPHデータ構造における述語項構造の柔軟マッチング
○小谷通隆 (京大), 中澤敏明, 柴田知秀 (東大), 黒橋禎夫 (京大)
C1-5 科学技術文献を対象とする日中機械翻訳システム開発プロジェクト
○井佐原均 (NICT), 黒橋禎夫 (京大), 辻井潤一 (東大), 内元清貴 (NICT), 中川裕志 (東大), 梶博行 (静岡大), 中村徹 (JST)
C1-6 ハイブリッド翻訳のためのフレーズアラインメント
○潮田明 (富士通研)
C1-7 部分目標の達成度に基づく機械翻訳自動評価 – 部分目標の自動生成 –
○内元清貴, 小谷克則, 張玉潔, 井佐原均 (NICT)
C1-8 Translation quality prediction using multiple automatic evaluation metrics
○Paul Michael, Andrew Finch, 隅田英一郎 (NICT/ATR)

B-1: 構文解析を補助的に用いる意味解析

○船越孝太郎, 中野幹生, 長谷川雄二, 辻野広司 (HRI-JP)

It looks like the problem they want to tackle is translating natural language into a kind of frame structure that can then be modified into a predicate structure for understanding. They want to avoid manually created patterns to pick up meaning, and also bottom-up based parsing. They have a semantic ontology that has a lexicon-meaning mapping, and a set of meaning frames that contain slots that they need to fill. They do some sort of bottom-up parsing with their lexicon, and do matching to paths over that parse to see if they can fill slots in their meaning frames.

B-2: 結合価パターン辞書からの情緒を明示する用言の知識ベース化

○黒住亜紀子, 徳久雅人, 村上仁一, 池原悟 (鳥取大)
They have 14,000 patterns for Japanese combination patterns. There are about 1000 that they created that attach some emotive element to the sentence. Based on random sampling over 80% of their patterns are correctly interpreted (with respect to?) They performed an experiment to evaluate the performance of their dictionary. They have a corpus with 1642 (sentences?) that are tagged with information. They ran their program to determine which things like other things (?) and then compared to the annotations. Looks like they are looking at things like “happy”, “like”, “angry”, “dislike”, “unhappy”, “fear” and a few others. They had 7 people tag each sentence (?) and for things where more than 4 people agreed (?) they took that as a correct tag.

B-3: SYNGRAPHデータ構造における述語項構造の柔軟マッチング

“Flexible Predicate Argument Structure Matching using the SYNGRAPH data”
○小谷通隆 (京大), 中澤敏明, 柴田知秀 (東大), 黒橋禎夫 (京大)

Gave an example where meaning analysis explodes combinatorily based on the different meanings of the words in the sentence. Syngraph looks like some kind of database that contains similar concepts. By collapsing many possibilities, the search space is reduced. Analysis with JUMAN and KNP. They did some extraction of patterns from a dictionary using what looks like a pattern base (e.g., Sasha’s definition pattern work.) They do some matching with a similarity metric between terms (1.0 for the same term, the other scores are based on their ontology?)

C1-5 科学技術文献を対象とする日中機械翻訳システム開発プロジェクト

A development project targeting a Chinese-Japanese Machine Translation System for Technical Literature.
○井佐原均 (NICT), 黒橋禎夫 (京大), 辻井潤一 (東大), 内元清貴 (NICT), 中川裕志 (東大), 梶博行 (静岡大), 中村徹 (JST)

A five year project starting from 2006, with three university affiliates and NICT, JST. They would like to be able to share research work in Asia, and particularly would like to make information about cutting-edge research available to many countries. They present a break-down of who is doing what work and research, and a schedule for it.

C1-6 ハイブリッド翻訳のためのフレーズアラインメント

“Phrase Alignment for Hybrid Machine Translation”, ○潮田明 (富士通研)

There are a few different types of hybrid systems, like voting systems, syntax-guided SMT, EBMT using parsed examples, and also fusion type systems. Phrase based SMT “phrases” are not necessarily linguistic phrases. Parses in syntax-based SMT aren’t as good as traditional parses, errors propogate and can’t be fixed, there isn’t feedback between parsers and the bilingual training data. They explain their phrase alignment technique which loses me for a while. They did some evaluation with hand-entered heuristics using the NTCIR3 Patent task. From the examples shown, it looks like it creates some very good phrases and translations.

C1-7 部分目標の達成度に基づく機械翻訳自動評価 – 部分目標の自動生成 –

“Building an automatic evaluation of parts based on automatic part translation”, ○内元清貴, 小谷克則, 張玉潔, 井佐原均 (NICT)

They would like to be able to rank many different systems in terms of which is better than another using automatic evaluation methods. Looking at scores like BLEU, NIST and fluency and adequacy they look at multiple system translations and which areas overlap. So there is some concept of global (BLUE, NIST) evaluation and local evaluation (over just smaller parts.) They evaluated these local things with Yes/No questions about specific types of translation rules (maybe using human evaluation?) It looks like they built simple patterns to answer these questions based on previous work with humans. Maybe. They have an equation for mixing in many global evaluation methods with their local evaluation method. They use some sort of skip trigram method for doing matching. Tested over the JEDIA (769 sentences) English-Japanese set. They compared their automatic yes/no question matching to human question matching.

C1-8 Translation quality prediction using multiple automatic evaluation metrics

○Paul Michael, Andrew Finch, 隅田英一郎 (NICT/ATR)

I thought I would be getting a talk in English, but it is in Japanese. The slides are English though. They are trying to predict MT system translation quality based on automatic evaluation metrics. They use a travel domain corpus doing both binary and multi-class learing using decision trees with eight features (BLEU, NIST, WER, PER, etc.) They take classes based on the human-graded fluency and accuracy as their target. Their approach outperforms the majority baseline in all cases.

Poster Session

I didn’t walk around with my laptop open, so no notes for this.

B2:語彙・辞書

座長:颯々野学 (ヤフー)

B2-1 漢輔：外国人のための漢字検索システム
○田中久美子, Julian Godon (東大)
B2-2 自動未知語獲得によ仮名漢字変換システムの精度向上
○森信介 (日本IBM), 小田裕樹
B2-3 最小記述長原理に基づいた日本語話し言葉の単語分割
○松原勇介 (東大), 秋葉友良 (豊橋技科大), 辻井潤一 (東大/Univ. of Manchester/NaCTeM)
B2-4 辞書見出し語の5文字漢字熟語を対象とした語基構成の解析
○郭恩東, 森本貴之, 後藤智範 (神奈川大)

B2-1 漢輔：外国人のための漢字検索システム

“Kansuke: A Kanji Search System for Foreigners”, ○田中久美子, Julian Godon (東大)

For Chinese, particularly beginners, how should they look up complicated Kanji? Lookup by reading hard, radical takes time and you have to know the strokes, stroke takes time. They did a study on how foreigners draw their characters (stroke number and order wrong!) They count up how many horizontal strokes, vertical strokes, and other strokes a character has, then search based on that. That works for simple things, but for complicated things (鬱) it is harder. So, look up little radicals for each character as before, click on the part, then refine as before. They use EDICT and a Chinese dictionary for lookup on those characters. Comparing to lookup by stoke number or SKIP code, they have many fewer candidates on average with lower variance.
Kansuke Kanji Lookup Web Interface.

B2-2 自動未知語獲得によ仮名漢字変換システムの精度向上

“Atomatic acquisition of unknown terms from kana-kanji henkan input”, ○森信介 (日本IBM), 小田裕樹

Talking about how to choose the proper kanji for kana input I believe. They extend this model with some context information.

B2-3 最小記述長原理に基づいた日本語話し言葉の単語分割

○松原勇介 (東大), 秋葉友良 (豊橋技科大), 辻井潤一 (東大/Univ. of Manchester/NaCTeM)

B2-4 辞書見出し語の5文字漢字熟語を対象とした語基構成の解析

○郭恩東, 森本貴之, 後藤智範 (神奈川大)

FuguTabetai Blog