{"id":124,"date":"2006-07-20T07:58:26","date_gmt":"2006-07-19T22:58:26","guid":{"rendered":"https:\/\/fugutabetai.com\/blog\/2006\/07\/20\/notes-from-thursday-2006-07-20-coling-acl-conference\/"},"modified":"2006-07-20T07:58:26","modified_gmt":"2006-07-19T22:58:26","slug":"notes-from-thursday-2006-07-20-coling-acl-conference","status":"publish","type":"post","link":"https:\/\/fugutabetai.com\/blog\/2006\/07\/20\/notes-from-thursday-2006-07-20-coling-acl-conference\/","title":{"rendered":"Notes from Thursday 2006-07-20 COLING\/ACL conference"},"content":{"rendered":"<p>Rough notes from Thursday&#8217;s presentations at COLING\/ACL.  <\/p>\n<p><!-- readmore --><\/p>\n<p>Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution<br \/>\nRyu Iida, Kentaro Inui, Yuji Matsumoto<\/p>\n<p>A nominee for the best asian paper award.  A fairly interesting<br \/>\napproach to anaphoric identification for &#8220;missing&#8221; zero-anaphora.  I<br \/>\ncan&#8217;t really say that I know much about anaphoric resolution research,<br \/>\nbut they have a fairly sophisticated approach that uses dependency<br \/>\nparse trees (retaining only relevant sub-trees and reducing them to<br \/>\npart of speech information to avoid data sparsity.)  <\/p>\n<p>&#8212;<\/p>\n<p>Challenges in NLP: Some New Perspectives from the East<\/p>\n<p>A very interesting panel I think.  Three points:<\/p>\n<p>* Are Asian language special?<\/p>\n<p>* What are some salient linguistic differences in Asian languages, and<br \/>\n  what are the implications for NLP?<\/p>\n<p>* Can the availability of detailed linguistic information (e.g.,<br \/>\n  morphology) help ameliorate problems with the scarcity of large<br \/>\n  annotated corpora?<\/p>\n<p>(1 Tsuji Junichi)<\/p>\n<p>For Asian languages, there is some separation between sentence grammar<br \/>\nand discourse grammar.  Asian languages are discourse oriented<br \/>\nlanguages.  Topic markers (wa) and particles that are very sensitive<br \/>\nto discourse (sure, dake, mo, koso&#8230;) so they have a real impact on<br \/>\ndiscourse understanding.  Similarly for Chinese, there does not seem<br \/>\nto be a packing device in English that corresponds to Chinese topic<br \/>\nmarking.  The concept of zero-anaphora, pronouns that are just dropped<br \/>\nand have to be recovered, and in some way are quite related to<br \/>\ndiscourse.  Context is very important to understand underspecified<br \/>\nmeaning (\u50d5\u306f\u3046\u306a\u304e\u3060.)<br \/>\n(2 Kenjamin K. Tsou City University of Hong Kong)<\/p>\n<p>Writing system in asian languages is very broad.  Alphabetic and<br \/>\nideographic and mixtures of them.  What are some differences?  Entropy<br \/>\nof European writing systems is about 4, but with Asian languages can<br \/>\ngo up to 9.  <\/p>\n<p>Structurally when you have two nouns together, what is their<br \/>\nrelationship for (for MT etc.)  The love of God vs. the love of<br \/>\nMoney.  (subjective &#038; objective vs objective only.)  There is some<br \/>\nculture basis about how to interpret these things as well.  Intrinsic<br \/>\ndifferences in the writing systems are seen in the differences in the<br \/>\nentropy of the writing systems.  Different relationships between<br \/>\nconstituents, sometimes marked, sometimes you need culture to<br \/>\ndisambiguate.  <\/p>\n<p>(3 Pushpak Bhattacharyya)<\/p>\n<p>Annotated corpora is a scarce resource.  Only about 10 languages have<br \/>\nlarge annotated corpora out of about 7000 languages.  English has<br \/>\nrelatively low morphology compared to other languages, so can we<br \/>\nunderstand rich morphology?  Exploiting morphological richness can<br \/>\nreally help.  We should build high quality morphological analyzers.  <\/p>\n<p>&#8212;<\/p>\n<p>After the panel on Asian NLP I decided to go shopping for new shoes.<br \/>\nDespite my best efforts, my shoes stank from Wednesday&#8217;s incident<br \/>\n(thanks for owning up to that Kris!) and I didn&#8217;t think I could take<br \/>\nthe smell much longer myself.  So I missed out on the morning sessions<br \/>\nin exchange for a bout of shopping.  <\/p>\n<p>&#8212;<\/p>\n<p>Discourse session (Daniel Marcu chair)<\/p>\n<p>Proximity in Context: an empirically grounded computational model of<br \/>\nproximity for processing topological spatial expressions<br \/>\nJohn Kelleher, Geert-Jan Kruijff, Fintan Costello<\/p>\n<p>This paper so far wins my &#8220;Best author list for creating a D&amp;D<br \/>\nparty from&#8221;.  <\/p>\n<p>They want to be able to talk to robots about the space they are in and<br \/>\ntheir surroundings.  This means they need to understand spatial<br \/>\nreferences.  They have looked at previous (human) research on<br \/>\nproximity and determine that proximity is a smooth function that is<br \/>\ninversely related to distance, depends on the absence or presence of<br \/>\n&#8220;distractor&#8221; objects and the size of the reference object.  They use<br \/>\nfunctions to approximate proximity and merge the fields of the<br \/>\ndistractors to determine overall proximity.  They used human<br \/>\nexperiments to verify their model.  Presence of a distractor has a<br \/>\nstatistically significant influence.  Their relative proximity<br \/>\nequation better predicted results than just absolute proximity<br \/>\n(without the normalization with the distractor.)  They had a funny<br \/>\nvideo with a robot that answered questions about where things are<br \/>\n(cows, pigs, and donkeys.)  <\/p>\n<p>They have another paper on how the descriptions are chosen to describe<br \/>\nthe scene &#8211; that would have been more interesting for me I think.  The<br \/>\ntalk was very understandable though, and well-presented I thought. <\/p>\n<p>&#8212;<\/p>\n<p>Machine Learning of Temporal Relations<br \/>\nInderjeet Mani, Marc Verhagen, Ben Wellner, Chong Min Lee, James<br \/>\nPustejovsky<\/p>\n<p>Inderjeet Mani presenting, joint work with MITRE, Georgetown<br \/>\nUniversity, and Brandeis University.  <\/p>\n<p>There is an annotation scheme, TIMEX, to annotate time-based<br \/>\nexpressions and some relations between them.  TLINKS in TIMEX3 are<br \/>\nused to express time links using Allen&#8217;s Interval Relations.  MITRE<br \/>\nhas some tools for annotating text (Tango? probably available for<br \/>\nresearch use.)  TimeBank corpus and they have an Opinion corpus with<br \/>\nabout 100,000 documents!?  I need to look into this.  Inter-annotator<br \/>\nagreement is pretty low on TLINKs (.55F links and labels) .  So they<br \/>\nare looking at, given a human-determined link, can they learn the<br \/>\ncorrect link type?<\/p>\n<p>They have a tool for enriching link chains with all links types<br \/>\nbetween all entities (called closure).  So a small amount of markup is<br \/>\ndone, then a full graph is created that allows for temporal reasoning.<br \/>\nLearning over the full graphs improves results greatly.<\/p>\n<p>Had a baseline of human-created rules, and rules mined from the web<br \/>\nusing ISI&#8217;s VerbOcean.  Their approach of learning with a maximum<br \/>\nentropy system over the closed data is the best.  <\/p>\n<p>&#8212;<\/p>\n<p>You Can&#8217;t Beat Frequency (Unless You Use Linguistic Knowledge) &#8212; A<br \/>\nQualitative Evaluation of Association Measures for Collocation and<br \/>\nTerm Extraction<br \/>\nJoachim Wemter, Udo Hahn<\/p>\n<p>Looking at collocation and technical term identification.  Wants to<br \/>\nsee how to determine if one statistical method can be said to be<br \/>\nbetter than another or not.  Only looked at candidates that had<br \/>\nfrequency higher than 10 to avoid low frequency data.  In English they<br \/>\nlooked at trigram noun phrases, in German they looked at PP-Verb<br \/>\ncombinations.  <\/p>\n<p>Compared to t-test for collocations and terms.  In each case their<br \/>\nlinguistically motivated approach compared to how often the candidates<br \/>\ncould be modified, or how much variation they showed.  For their<br \/>\ncomparison they computed their terms, then split the n-best lists in<br \/>\nhalf and looked to see how much other statistical tests would change<br \/>\nthe ranking with respect to true negatives and true positives.  <\/p>\n<p>His result tables and graphs were a little hard to follow because he<br \/>\nwent by them very quickly.  I think the scatter plots were easier to<br \/>\nunderstand than the tables, and on the tables I&#8217;m not sure how<br \/>\nsignificant the differences (in percentages) were.  They seemed pretty<br \/>\nclose.  <\/p>\n<p>The overall conclusion is that frequency counting does about as good<br \/>\nas the t-test for identifying these entities.  The linguistically<br \/>\nmotivated measures that are presented here do perform better than<br \/>\nfrequency counting (and the t-test) though.  <\/p>\n<p>Question from Pascale: The linguistically motivated measure also<br \/>\nseemed to be statistical though.  Doesn&#8217;t agree with the conclusion<br \/>\n(talks about Frank Smadja&#8217;s collocation work) and saying that nobody<br \/>\nproposed to use just t-score.  <\/p>\n<p>Another question: does he do stop-listing?  Yes, they do take out<br \/>\nstop words for terms.  <\/p>\n<p>Another question: What exactly do they show in this study?  What<br \/>\nwouldn&#8217;t I learn from just looking at a precision recall graph or<br \/>\ntable from X&#8217;s study (some other study cited, but I didn&#8217;t catch it.)  <\/p>\n<p>&#8212;<\/p>\n<p>Weakly Supervised Named Entity Transliteration and Discovery from<br \/>\nMultilingual Comparable Corpora<br \/>\nAlexandre Klementiev, Dan Roth (UIUC Cognitive Computation Group)<\/p>\n<p>Use an English NE tagger to tag text, in a comparable corpus use the<br \/>\nEnglish NE tags and identify counterparts in another language<br \/>\n(Russian) and move tags across.  <\/p>\n<p>They use temporal alignment as a supervision signal to learn a<br \/>\ntransliteration model.  Identify a NE in the source language, ask<br \/>\ntransliteration model M for best candidates in Target language, then<br \/>\nselect candidate that has good temporal match to candidate NE, add to<br \/>\ntraining data for model.  Re-train model M and repeat until it stops<br \/>\nchanging, or some stability has been reached.  <\/p>\n<p>Linear discriminative approach for transliteration model, is a<br \/>\nperceptron M(Es, Et) gives a score for transliteration pair. Features<br \/>\nare substring pairs of letters in source and target.  Feature-set<br \/>\n(sets of pairs) grows as they see more examples.  Model is initialized<br \/>\nwith 20 NE pairs.  They also look at how many \/ few pairs you can use<br \/>\nto seed model with.  5 candidates worked as well as 20, but required a<br \/>\nlot longer to converge.  <\/p>\n<p>For their temporal similarity model they group word variants into<br \/>\nequivalence class for Russian.  English isn&#8217;t as diverse so just uses<br \/>\nexact strings.  Use DFT then do euclidean distance in Fourier space.<br \/>\nThey use about 5 years of short BBC articles with loose Russian<br \/>\ntranslations.  <\/p>\n<p>Using temporal feature along with learned transliteration model they<br \/>\nget accuracy that is pretty high, up to 70% for top-5 list for a NE<br \/>\nterm.  About 66% accuracy on multi-word NEs.  <\/p>\n<p>I really liked this presentation and think it is a very novel look at<br \/>\ntransliteration.  Would it work with Chinese?  It is similar to<br \/>\nPascale&#8217;s early work on time warping with comparable corpora to learn<br \/>\na bilingual dictionary, but the feature set is novel to me &#8211; on the<br \/>\nfly learning of character transliteration pairs.  <\/p>\n<p>I asked a question about application to Chinese and Japanese, and got<br \/>\nabout the answer that I expected.  <\/p>\n<p>&#8212;<\/p>\n<p>A Composite Kernel to Extract Relations between Entities with both<br \/>\nFlat and Structured Features<br \/>\nMin Zhang, Jie Zhang, Jian Su, Guodong Zhou<\/p>\n<p>They elect to use a kernel method for a learning system (SVM?) because<br \/>\nit can easily take hierarchical features as input.  Their contribution<br \/>\nis that they designed a composite kernel that combines flat features<br \/>\nand syntactic features (hierarchical) for classification.  <\/p>\n<p>So they did a lot of work choosing which sub-trees from the parse<br \/>\ninformation to throw at their machine learning system, and ways to<br \/>\nprune the data to make it more tractable.  Trained on ACE data (&#8217;03<br \/>\nand &#8217;04.)  They report that a specific type of tree kernel (mixed both<br \/>\nfeatures and syntax trees with a polynomial) has the best performance,<br \/>\nbeating state-of-the-art systems.  <\/p>\n<p>I guess they have a nice result, but I didn&#8217;t like the talk much.  The<br \/>\nspeaker focused a lot on details that I don&#8217;t think are important.<br \/>\nWhat I took away from this is that it is possible to use a tree kernel<br \/>\nto mix in numeric feature values with syntax type hierarchical<br \/>\nfeatures and achieve good performance.  This seems like it should have<br \/>\nbeen a poster paper to me.  (Of course, I don&#8217;t work in this area so<br \/>\nmy opinion probably isn&#8217;t worth much&#8230;)  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Rough notes from Thursday&#8217;s presentations at COLING\/ACL. Exploiting Syntactic Patterns as Clues in Zero-Anaphora Resolution Ryu Iida, Kentaro Inui, Yuji Matsumoto A nominee for the best asian paper award. A fairly interesting approach to anaphoric identification for &#8220;missing&#8221; zero-anaphora. I can&#8217;t really say that I know much about anaphoric resolution research, but they have a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10],"tags":[],"_links":{"self":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/124"}],"collection":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/comments?post=124"}],"version-history":[{"count":0,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/124\/revisions"}],"wp:attachment":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/media?parent=124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/categories?post=124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/tags?post=124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}