{"id":125,"date":"2006-07-22T00:15:10","date_gmt":"2006-07-21T15:15:10","guid":{"rendered":"https:\/\/fugutabetai.com\/blog\/2006\/07\/22\/notes-from-friday-2006-07-21-coling-acl-conference\/"},"modified":"2006-07-22T00:15:10","modified_gmt":"2006-07-21T15:15:10","slug":"notes-from-friday-2006-07-21-coling-acl-conference","status":"publish","type":"post","link":"https:\/\/fugutabetai.com\/blog\/2006\/07\/22\/notes-from-friday-2006-07-21-coling-acl-conference\/","title":{"rendered":"Notes from Friday 2006-07-21 COLING\/ACL conference"},"content":{"rendered":"<p>Notes from Friday&#8217;s sessions at the last day of COLING\/ACL.  <\/p>\n<p><!-- readmore --><\/p>\n<p>Friday 2007-07-21 Notes from COLING\/ACL<br \/>\nLanguage, gender and sexuality: Do bodies always matter?<br \/>\nSally McConnel-Ginet Cornell University (Invited talk)<\/p>\n<p>Co-author of book &#8220;Language and Gender&#8221; &#8211; sounds interesting.<\/p>\n<p>Gave an example of John Summers (sp?) the Harvard guy and his talk<br \/>\nabout women and science.  There is a difference between the overt and<br \/>\ncovert messages, and we have to be aware of both of those, as well as<br \/>\nthe ways that our messages can be interpreted.  The majority of the<br \/>\ntalk was about gender as a performance vs. gender as what we do, and<br \/>\nvarious ways that linguistics can be a resource for that performance.<br \/>\nInteresting talk about identity and how we realize our identities, and<br \/>\nthe frameworks around us for socializing them.  <\/p>\n<p>&#8212;<br \/>\nQuestion and Answering Session<\/p>\n<p>Answer Extraction, Semantic Clustering, and Extraction Summarization<br \/>\nfor Clinical Question Answering<br \/>\nDina Demner-Fushman, Jimmy Lin (University of Maryland)<\/p>\n<p>They present a system that answers the question &#8220;What is the best<br \/>\ntreatment type for X?&#8221; and provide bullets that list treatment, along<br \/>\nwith context to show support, and links to the source articles where<br \/>\nthe answers came from.  Extract answers, do semantic clustering to put<br \/>\nanswers into a hierarchy, then extract answers for each semantic<br \/>\ncategory.  <\/p>\n<p>They have a template to search medline for drug therapy type<br \/>\nquestions.  They then tag the returned data with MetaMap (a NE \/<br \/>\nsemantic tagger for medical domain) to find interventions (drug<br \/>\ntreatments) and cluster the interventions and extract a sentence to<br \/>\nshow support.  Their extractive summarizer uses supervised machine<br \/>\nlearning to detect outcome statements using a bunch of classifiers<br \/>\n(rule based, naive Bayes, n-gram, position, semantics, length) that<br \/>\nare combined linearly to detect best &#8220;outcome statement&#8221;.  <\/p>\n<p>Seems like a reasonable approach, I would be interested in seeing how<br \/>\nthis work compares to Min-yen Kan&#8217;s centrifuser system, or the work<br \/>\nthat Noemie has done.  I&#8217;m not familiar enough with work in this area<br \/>\nto be able to make a good evaluation of this work.  It doesn&#8217;t look<br \/>\nlike anything amazing to me though.  I think evaluation in this<br \/>\nscenario is tough, because what are they going to compare to?  They<br \/>\ncompare to a graded scale from 1-10 of how effective each treatment<br \/>\nwas (but I&#8217;m not sure if this comes from medline or elsewhere for<br \/>\nsimilar questions.)  They are using ROUGE to evaluate their summaries,<br \/>\nbut what are they using as the reference summary?<\/p>\n<p>So far their results have been color coded instead of hard numbers<br \/>\nwith significance testing.  They do say that the answers that come<br \/>\nfrom their clustering system outperform just the results from the IR<br \/>\nsystem at a statistically significant level, but I don&#8217;t know what<br \/>\ntheir model summaries are.<\/p>\n<p>Kathy asks about how they deal with contradictions across articles.<br \/>\nThey do not deal with contradictions.  Question from someone else<br \/>\nabout what version of ROUGE and what the reference summaries were (me<br \/>\ntoo!)  Used ROUGE 1.5 with ROUGE-1 precision, summaries are the one<br \/>\nprovided as the abstract of the papers used to generate their<br \/>\nextractive summary (this is a very strange and poor evaluation<br \/>\napproach I think.)  <\/p>\n<p>Another question about comparatives and negation (&#8220;this is not the<br \/>\nbest treatment&#8221;) they use SEMREP which deals with negations.  It isn&#8217;t<br \/>\nquite the same as Kathy&#8217;s question, but is somehow related.  In the<br \/>\ndata there does not seem to be many comparative summaries, so it<br \/>\nhasn&#8217;t been much of a problem.  <\/p>\n<p>Question to explain more about the features used in the classifier for<br \/>\nsentence extraction.  It looks to me like they have basically some<br \/>\npretty normal features.  The rule-based classifier was created by a<br \/>\nregistered nurse who looked at about 1000 good articles and tried to<br \/>\nidentify what the bottom line sentence would be, and indicators to<br \/>\nfind it.  What is the naive Bayes classifier trained on?  N-gram &#038;<br \/>\nsemantic also&#8230;  They have a paper on that (JAMIA 2005) it looks<br \/>\nlike, so we can go there for more information.  Another question on<br \/>\nthe extractor: have you compared the output of the meta-classifier<br \/>\nagainst expert opinions on extracts.  Yes, it is fairly accurate (from<br \/>\n85 &#8211; 92%) but didn&#8217;t say about how they evaluated that.  There is<br \/>\nanother work that looks at the relations from the abstract, and they<br \/>\ndidn&#8217;t want to duplicate that work.  <\/p>\n<p>&#8212;<\/p>\n<p>Exploring Correlation of Dependency Relation Paths for Answer<br \/>\nExtraction<br \/>\nDan Shen, Dietrich Klakow (Saarland University presented by Dan Shen &#8211;<br \/>\na female)<\/p>\n<p>They are looking at syntactic matching between question parse tree and<br \/>\ncandidate answer sub-trees.  They want to avoid strict parse tree<br \/>\nmatching because the answer can be realized in different forms.  They<br \/>\nuse Minipar dependency relation parser.  They introduce a path<br \/>\ncorrelation measure to check the correlation between they question<br \/>\nparse and candidate answer parse.  They use a Dynamic Time warping<br \/>\nalgorithm to align the two paths.  They need to know what the<br \/>\ncorrelation is between types of relations, so they compute a<br \/>\ncorrelation measure over the TREC 02 and 03 data using a mutual<br \/>\ninformation measure.  <\/p>\n<p>They also have a phrase similarity metric that works over noun<br \/>\nphrases.  It takes some morphological variation (via stemming) into<br \/>\naccount and some format variations.  Semantic similarity is done as<br \/>\nper Moldovan and Novischi 2002.  <\/p>\n<p>TREC 1999-2003 for training data, test data is TREC 2004.  Built the<br \/>\ntraining set by grabbing relevant documents for each question and<br \/>\nmatching one keyword to find the answer.  They used this for training?<br \/>\nIsn&#8217;t there some better gold standard?  I am not clear on their<br \/>\ntraining data and what they are trying to train at all.  They use<br \/>\nkernel trees to do the syntactic matching, but say that they don&#8217;t<br \/>\nlearn the similarity features?  They present a bunch of different<br \/>\nmethods, but spent too much time early on explaining their system, and<br \/>\nI&#8217;m not clear what they are comparing against.  <\/p>\n<p>This talk is going a bit long, but there were some mike problems<br \/>\nearlier on (picking up interference from another mike somewhere?) that<br \/>\ndelayed it.  I don&#8217;t think there will be time for questions though.<br \/>\nShe might not finish.  For a 20 minute talk, she has 32 slides&#8230;<br \/>\nThey found that overall their correlation based metric outperformed<br \/>\ncompeting methods, and performed well on &#8220;difficult question&#8221; where NE<br \/>\nrecognizers might not help.  <\/p>\n<p>There is time for at least one question.  The correlation was used in<br \/>\nboth answer extraction and answer ranking.  How good of a job does<br \/>\nthis correlation do in capturing additional candidates and wrong<br \/>\ncandidates?  (Where are false positives and negatives?)  Using exact<br \/>\nmatching degraded performance by 6%, and lose 8% if they remove the<br \/>\nMaximum Entropy model.  <\/p>\n<p>Seems like an interesting paper, but I think the presentation should<br \/>\nhave been more polished.  There were real timing issues, and I&#8217;m not<br \/>\nsure if the real impact was presented.  I think the main contribution<br \/>\nhere is the correlation measure and not using strict matching only.<br \/>\nAgain, I don&#8217;t have a large background in this area though.  I&#8217;m<br \/>\ninterested in the similarity measures though, since I think variation<br \/>\nmakes taking different forms into account important.  <\/p>\n<p>&#8212;<\/p>\n<p>Question Answering with Lexical Chains Propagating Verb Arguments<br \/>\nAdrian Novischi, Dan Moldovan (Language Computer Corp.)<\/p>\n<p>Dan Moldovan (session chair) is giving the next paper, so he was<br \/>\nintroduced by someone else.  <\/p>\n<p>The main point is that you should propagate syntactic structures along<br \/>\nyour lexical chains.  They needed to extend WordNet by matching to<br \/>\nVerbNet patterns.  Not all VerbNet classes have syntactic patterns, so<br \/>\nthey had to learn them.  Didn&#8217;t present the results of the learning,<br \/>\nbut probably there is another paper about that somewhere previously.  <\/p>\n<p>Presented a method to propagate the structure along through verbs.<br \/>\nSee Moldovan and Novischi 2002.  Different relations modify the<br \/>\nsyntactic structure around in some way, and they have a table showing<br \/>\nthe different classes and what changes.  They have some weights for<br \/>\nhow to order the relations, which were set by trial and error.  I&#8217;m<br \/>\nnot really clear on what the ordering is used for.  Examples for<br \/>\ndifferent relation types.  There is also an example for how this can<br \/>\nbe used to map an answer to a question.  <\/p>\n<p>They have some numbers on how many arguments can be propagated across<br \/>\n106 concept chains.  By adding this syntactic structure propagation<br \/>\nthey improved the performance of their Q&#038;A system.  <\/p>\n<p>eXtended WordNet is available from University of Texas at Dallas.  It<br \/>\nhas word senses disambiguated in the glosses and they have been<br \/>\ntransformed into logic forms.  <\/p>\n<p>In general, I am not sure what this talk is really about.  It is about<br \/>\nlexical word chains (using WordNet concepts instead of words) and how<br \/>\nyou can transform them by following the relations in WordNet\/VerbNet (along<br \/>\nwith some learned relations) to help with Q&amp;A by transforming<br \/>\nanswer parses to match question parses.  You can do some sophisticated<br \/>\nreasoning with this approach.  It seemed like it was really a bit all<br \/>\nover the place though. I would have preferred that the presentation<br \/>\nfocused on the algorithm and how it can be used with Q&amp;A.  I<br \/>\ndidn&#8217;t see enough on the learning for 20,000 syntactic patterns added<br \/>\nto VerbNet.  <\/p>\n<p>Questions:  How do you handle modifications that have things like<br \/>\n&#8220;considered hitting the ball&#8221;.  That would be up to the semantic<br \/>\nparser to figure out.  In cases of entailment the same thematic roles<br \/>\nare retained &#8211; what about in cases like &#8220;buy and sell&#8221;, do the<br \/>\nthematic roles stay the same?  Possibly they flip in that example.<\/p>\n<p>&#8212;<\/p>\n<p>Methods of Using Textual Entailment in Open-Domain Question Answering<br \/>\nSanda Harabagiu, Andrew Hickl (LCC)<\/p>\n<p>Focused on using textual entailment in a Q&amp;A system.  They have a<br \/>\nsophisticated system for parsing questions and paragraph answers.<br \/>\nThey extract features, then a classifier does a yes\/no classification<br \/>\non whether the question entails the answer given with some belief.<br \/>\nSanda has really been going through the slides quickly.  They do<br \/>\ndetect negative polarity.  Their system is built upon a large QUAB<br \/>\n(Question Answer Bank &#8211; a big database of questions and answers.)<br \/>\nGenerally they are looking for an entailment relationship between a<br \/>\nnew question, and a question they have already seen in their<br \/>\ndatabase.  They use a Maximum Entropy classifier to predict if two<br \/>\nquestions can be related by entailment.  They use a variety of<br \/>\nfeatures for the classifier.  <\/p>\n<p>Trained classifier over 10,000 alignment chunk pairs as positive or<br \/>\nnegative.  MaxEnt did better than a hill climber on the same training<br \/>\nset.  They extracted 200,000 sentence pairs as positive examples from the ACQUAINT corpus<br \/>\nand used them as positive examples.  Used headline to entail a<br \/>\nsentence that is the first pair, filtered on if there were at least<br \/>\none entity in common between the two.  Negative examples took<br \/>\nsentences with even though, in contrast, but etc.  between two<br \/>\nsentences.  The two were used as negative examples for not entailing<br \/>\nbetween them.  Performance improved significantly over their 10,000<br \/>\ndevelopment corpus.  <\/p>\n<p>Also uses paraphrase alignment in their system.  Use WWW data to get<br \/>\nclusters of potential paraphrases following Barzilay and Lee 2003.  <\/p>\n<p>She&#8217;s really been whipping through the slides, there is a lot of<br \/>\ncontent here, but it is getting hard to follow.  She explained the<br \/>\nfeatures used in the entailment classifier.  Their experiments use the<br \/>\ntextual entailment in three ways: to filter the output answers, to<br \/>\nrank candidate passages, to select automatically generated question &#8211;<br \/>\nanswer pairs from the QUAB database.  Evaluated it over 600 randomly<br \/>\nselected questions from TREC.  Also using PASCAL Recognizing Textual<br \/>\nEntailment 2006 data for evaluation?  Their textual entailment system<br \/>\nis about 75% accurate based on that PASCAL data.  They can improve<br \/>\nQ&amp;A accuracy from .30 to .56 using entailment for known answer<br \/>\ntypes.  Also improves with unknown answer types but not as much.  <\/p>\n<p>Lots of interesting things in here, but there was too much in the<br \/>\npresentation I think.  I&#8217;ll have to read the paper.  In general<br \/>\nthough, it looks like textual entailment was a big help&#8230;  The talk<br \/>\nwent long, maybe 10 minutes?  Since it is just before lunch though, it<br \/>\nisn&#8217;t a problem.  <\/p>\n<p>Questions:  It wasn&#8217;t clear how the actual entailment was done?  The<br \/>\nquestion was the hypothesis.  <\/p>\n<p>&#8212;<br \/>\nMachine Translation Session (Dekai Wu chairing)<\/p>\n<p>Scalable Inference and Training of Context-Rich Syntactic Translation<br \/>\nModels<br \/>\nMichel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve<br \/>\nDeNeefe, Wei Wang, Ignacio Thayer<br \/>\n(Columbia University, USC\/ISI, Language Weaver, Google)<\/p>\n<p>I thought Michel&#8217;s talk was pretty good.  I&#8217;m sure he&#8217;ll get some<br \/>\ntough questions though.  I liked the rule analysis, and that he didn&#8217;t<br \/>\nspend too much time talking about BLEU scores.  <\/p>\n<p>Dan (Melamed) asked a question about non-contiguous phrases, and what<br \/>\nwas meant by that.  Asked a question about the power of the formalism,<br \/>\nif it is the same as ITG or not?  Question by Dekai Wu about that time<br \/>\ncomplexity.  <\/p>\n<p>&#8212;<\/p>\n<p>Modelling lexical redundancy for machine translation<br \/>\nDavid Talbot, Miles Osborne (School of informatics) (which one?<br \/>\nwhere?  University of Edinborough) <\/p>\n<p>There is redundancy in the lexicon used for translation that is<br \/>\nunimportant for translation purposes.  The model becomes more complex<br \/>\n(e.g., seaweed -> nori, konbu, wakame, etc. in Japanese) which makes the<br \/>\nlearning problem more difficult.  Remove the distinction in the<br \/>\ntraining corpus to improve distributions for learning, reducing data<br \/>\nsparsity, etc.<\/p>\n<p>I had troubling following the math used to cluster terms together that<br \/>\nshould be collapsed.  The model prior is a markov random field.  This<br \/>\nis used somehow to indicate how likely words should be assigned to the<br \/>\nsame cluster (has features over the words, for example two words that<br \/>\ndiffer only in that they start with a c or g.)  They use an EM<br \/>\nalgorithm to tune the parameters using bilingual occurrence<br \/>\ninformation to guide the E step.  That looks pretty interesting to<br \/>\nme.  David gave an overview of other related work, and there is some<br \/>\nother related work in morphology that might do more of that.  <\/p>\n<p>Their experiments use Czech, Welsh, and French.  Phrase-based decoder<br \/>\nwith Giza for alignment.  Showed some improvement in BLEU scores (2-4<br \/>\npoints) and examples where their system helped.  Also looked at vocab<br \/>\nsizes, which are maybe .25 reduced.  <\/p>\n<p>Questions:  BLEU should be thrown away.  Would like to see 10<br \/>\nsentences before and after.  Did you look at distributional features<br \/>\nin the source language?  No, not yet, but he has considered it.<br \/>\nThought of using monolingual features in the MRF prior, but didn&#8217;t<br \/>\nimplement it.  Initially talked about how more refined word senses<br \/>\ncould be tackled here, but examples were all morphological.  The<br \/>\nfeatures in the MRF don&#8217;t capture thesauri information, but had an<br \/>\nexample of secret, secretive and private translated into welsh where<br \/>\nsome collapsing was done.  <\/p>\n<p>I thought this was an interesting presentation and will have to read<br \/>\nthe paper.  <\/p>\n<p>&#8212;<\/p>\n<p>Moved over to the generation session.<\/p>\n<p>Learning to Predict Case Markers in Japanese<br \/>\nHisami Suzuki, Kristina Toutanova (Microsoft Research)<\/p>\n<p>In one of the examples she gives as incorrect output from her MT<br \/>\nsystem, I would probably have made the same mistake&#8230;  First<br \/>\npresented results on predicting case markers in a monolingual task.<br \/>\nThe system makes errors with ha \/ ga, similar to how humans make the<br \/>\nmistake&#8230;  <\/p>\n<p>Second part did prediction in a bilingual reference settings.  Given a<br \/>\nsource English sentence, and a Japanese translation of the sentence<br \/>\nmissing case markers.  They have a dependency parse of the English<br \/>\nfrom the NLPWIN (Qurick et. al 2005 from MS research) parser then<br \/>\nalign with Giza++, and derive syntactic information on the Japanese<br \/>\nside from the English parse projected via alignment links, and POS<br \/>\ntags on Japanese.  <\/p>\n<p>They have a variety of features for this task.  A 2-word left and<br \/>\nright context window, POS tags and dependency tree info, words from<br \/>\nthe English that are aligned, source syntactic features.  Improves<br \/>\nover the monolingual Japanese model only, in each case using syntactic<br \/>\nfeatures helped.  <\/p>\n<p>Questions:  Japanese case markers often do not have English<br \/>\nequivalents, so what do they align to in English using Giza?  Usually<br \/>\nalign to null, or ha sometimes goes to copula (is).  <\/p>\n<p>&#8212;<\/p>\n<p>Improving QA Accuracy by Question Inversion<br \/>\nJohn Prager, Pablo Duboue, Jennifer Chu-Carrol (IBM T.J. Watson)<\/p>\n<p>What other questions should you ask a QA system when you don&#8217;t know if<br \/>\nan answer is correct?  By asking other questions they can find<br \/>\ninformation that can be used to invalidate bad answers (e.g., if you<br \/>\nknow the birth and death year of a person, all things relating to<br \/>\ntheir achievements must be bounded by those years.)  <\/p>\n<p>They take a question Q1, generate more questions (inverted questions)<br \/>\nQ2 to find answers A2 that can be used to invalidate answers A1 in<br \/>\nresponse to Q1.  Looks like I got this description a bit wrong, but it<br \/>\nshould be more clear as the presentation goes on.  <\/p>\n<p>Interesting, they use the generated questions to get answers that they<br \/>\nuse to validate the answers they already had.  <\/p>\n<p>&#8212;<\/p>\n<p>Learning to Say It Well: Reranking Realizations by Predicted Synthesis<br \/>\nQuality<br \/>\nCrystal Nakatsu, Michael White (Department of Linguistics, Ohio State<br \/>\nUniversity)<\/p>\n<p>This is synthesis in the text to speech meaning.  The job is to choose<br \/>\ntext realizations for the synthesizer that are predicted to sound<br \/>\ngood.  They rated a variety of different sentence realizations, and<br \/>\nused an SVN to rank the plans.  Their system improved output quality<br \/>\nto statistically significant levels.  Raised average ratings from &#8220;ok&#8221;<br \/>\nto &#8220;good&#8221;.  <\/p>\n<p>Questions:  There was a question about where effort should go to<br \/>\nimprove synthesis?  The answer is that you should work on both<br \/>\nsynthesis and generation.  Did the reviews hit a fatigue effect where<br \/>\nthey got used to the synthesizer?  Spread effort out over days, and<br \/>\nstarted from different ends of the lists.  How was inter-annotator<br \/>\nagreement, and did the annotators really judge only on synthesis and<br \/>\nnot sentence structure?  Agreement was like .66.  More interested in<br \/>\nextremes of what was really good and really bad, so not too worried<br \/>\nabout that.  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Notes from Friday&#8217;s sessions at the last day of COLING\/ACL. Friday 2007-07-21 Notes from COLING\/ACL Language, gender and sexuality: Do bodies always matter? Sally McConnel-Ginet Cornell University (Invited talk) Co-author of book &#8220;Language and Gender&#8221; &#8211; sounds interesting. Gave an example of John Summers (sp?) the Harvard guy and his talk about women and science. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4,10],"tags":[],"_links":{"self":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/125"}],"collection":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/comments?post=125"}],"version-history":[{"count":0,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/125\/revisions"}],"wp:attachment":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/media?parent=125"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/categories?post=125"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/tags?post=125"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}