Notes from Monday 2007-07-17’s talks at ACL/COLIONG 2006

Notes from the second machine translation session at COLING/ACL in Sydney Australia. If you aren’t a computational linguist, this will probably not interest you. Even if you are, I am not making any promises…

Machine Translation Session I:

Combining Arabic Preprocessing Schemes for Statistical Machine Translation
Fatiha Sadat, Nizar Habash

The presentation was not very polished, and slides had lots of text on
them. The content seemed very nice though.

Nizar is not here, but I did run into Owen Rambow later.

—

Going Beyond AER: An Extensive Analysis of Word Alignment and Their Impact on MT
Necip Fazil Ayan, Bonnie J. Dohr

I thought this was a good presentation, but Bonnie went very quickly
on some of the slides. Overall if you are using BLEU or Alignment
Error Rate (AER) for evaluation, you have to look at the actual
application to be able to make a good evaluation. AER doesn’t seem to
be a very good metric for word alignment evaluation for phrase
translation.

—

Discriminative Word Alignment with Conditional Random Fields
Phil Blunsom, Trevor Cohn (The University of Melbourne)

Because they do forward and backwards models, they can do many to many
(since one direction can do one to many) alignments.

They add features to the alignment process that you can’t use in
Giza. Orthographic features like (stems, prefix and suffix match,
word length difference.) Markov feature, lets model learn that one to
many are not good? Relative sentence position, POS tags, bilingual
dictionary, some features on null.

They conclude that Conditional Random Fields are a very good model for
word alignment even with small number of aligned words per sentence.
They are tractable to train and use. Using multiple features are
useful, especially when combined with using other models (GIZA / IBM
model 4) are very useful. Markov sequence features improve
alignments.

—

Named Entity Transliteration with Comparable Corpora
Richard Sproat, Tao Tao, ChengXiang Zhai (University of Illinois at
Urbana-Champaign)

Using data from Xinhau in English and Chinese from the same news
agency. Start with an English name, and try to identify Chinese
character n-grams. Score the candidates via phonetic scoring, and a
frequency profile looking at how the English and Chinese candidate
show up over time together. Looks also at scores based on
cooccurrences in document pairs over corpora.

Phonetic model is probabilistic noisy source-channel model that maps
from Chinese characters into sequences of English phonemes. Uses
Good-Turning smoothing for unseen mappings.

Identify texts with potential names by searching for strings with
sequences of characters (about 500) that are used for transliteration
commonly.

If you have documents that contain other potential transliteration
pairs that are in the same document, you increase your weight for the
candidate. They introduce a new parameter to account for scores being
reinforced by co-occurring terms also being likely translation pairs.
Seems to be similar to PageRank (they say, and it makes sense.)

Both the phonetic and time based model are known I think, but the
score propagation is novel, and it is shown to help.

At the talk, Pascale Fung was sitting next to me, but I didn’t notice
until mid-way through the talk. She suggests using the time frequency
measure as a pre-selection step for generating candidates.

Also asked a question on whether they have compared their approach to
using a large dictionary.

—

Extracting Parallel Fragments from Comparable Corpora
Dragos Munteanu, Daniel Marcu

Parallel data is almost entirely from the Political domain
(parliamentary data) and news. Looks like they are doing comparable
corpora extraction from the web. They are focusing on finding
parallel bits from the document pairs, but are not talking about
finding candidate documents.

They use a bilingual lexicon in their system that has three features:
high precision, probability of translation, probability of not being
translations. They learn this dictionary, using log likelyhood ratio
scores, and put a positive or negative association on them based on
occurrence patterns.

Using this learned resource they find sub-fragments of a sentence that
are translations of each other. They look at a smoothed signal
(average translation probabilities of word + five surrounding areas)
to see which parts of the sentence are translation.

They evaluate comparing MT systems that are training on initial data,
or data + extracted parallel fragments, data + extracted sentences.

(In general, the font was so small on the slides that I could not read
any of the examples at all.)

Question from Pascale about using Mutual Information with T-score to
solve the problem with fewer steps instead of log likelyhood ratio.

—

Applications I session

Automated Japanese Essay Scoring System based on Articles written by
Experts
Tsunenori Ishioka, Masayuki Kameda

Their system rates on rhetoric, organization, and contents.

Rhetorical: They have some metrics for rhetorical judgments: shorter
sentence length, clause length, number of clauses, kanji / kana ratio,
number of embeddings, vocabulary diversity, converts kanji to kana
reading then counts lengths where longer is more complicated, and
passive sentences. They compute statistical distributions from
editorials and text from Mainichi news. They compare input text to
the distribution over these variables.

Organization: They have a list of (125) connective conjunctive phrases
that are used to segment the document, they are classified into 8
categories. They look at the number of connective conjunctions, and
look at the order of the appearance with a trigram and unigram model.

Latent Semantic Indexing is used to evaluate contents. Their system
works on essays from 400-800 characters in Japanese.

—

A Feedback-Augmented Method for Detecting Errors in the Writing of
Learners of English
Ryo Nagata, Atsuo Kawaii, Koichiro Morihiro, Naoki Isu
(Hyogo University of Teacher Education and Mie University)

They are focusing on errors of articles and countability.

They generate WIC lists for a word with context to get examples for
rules, then have some rule based system for classifying into countable
and uncountable instances. They then learn rules to predict
countability, and learn a log liklihood ratio to determine when to
apply the rules. The rules are predicated on the context of the word.

Augmenting the decision lists with feedback. They use a corpus of
teachers’ corrections to learn rules, and a probabilistic model to mix
them with the rules from the general corpus. Since the feedback
corpus is small, they looked at many types of models for mixing the
results.

I think it would have been nice to see an example of the feedback in
the feedback corpus.

Question about the statistical significance of the results – they were
not statistically significant across different types of the system.
Another question about corpus choice, perhaps should download texts on
the specific topic to improve lexical coverage. Comment that he
should look at Bond’s work on AltJ/E and some of his papers on
translation and countability.

—

Correcting ESL Errors using Phrasal SMT Techniques
Chris Brokett, William B. Dolan, Michael Gamon
(Microsoft Research)

The idea of ESL error correction as statistical machine translation.
They model the learner errors as a noisy channel model and try to find
the best translation sentence with the input error sentence. This
paper is a small pilot study with an off the shelf SMT system
correcting errors in mass nouns. Used “Tree to string” system
(Menezes and Quirk 2005) uses unlabeled dependency parse.

Created artificial error-full sentences. With 45K words of training
data, they were able to correct completely about 55% of the testing
data.

Overall I really liked this presentation.

—

Dialogue II Session

Learning to Generate Naturalistic Utterances Using Reviews in Spoken
Dialogue Systems
Ryuichiro Higashinaka, Rashimi Prasad, Marilyn Walker

I enjoyed this paper. The basic approach was to mine review websites
(food in this case) for reviews and their associated scores (1-5) on a
variety of features (food quality, ambiance, value, etc.) They
determined which sentences are related to the scores using a
hand-crafted lexicon of words that connect to the various scored
features, and built a template that realized the scores. One template
can have multiple value associated with it, and will only be used for
generation when all the values are available to generate. He
demonstrated that the system could be plugged into a dialogue system.
There were some questions that were not quite on point I think (can
you use multiple websites? — of course, if you write a front-end
parser to scrape it, and have a good lexicon for the feature mapping.
Another was an interesting question, but not on point either really.
The question was a comment that the learned templates have words
(adjectives, descriptions) that are used for different scores. For
example, 1 and 2 for food both had “bad food.” Of course, that is a
question of human interpretation, and since humans used these words to
describe the food, of course it is valid.)

—

Poster Sessions

I saw a good poster from Satoshi Sekine from NYU.

FuguTabetai Blog

Notes from Monday 2007-07-17’s talks at ACL/COLIONG 2006

Comments

Leave a Reply Cancel reply