{"id":138,"date":"2006-09-13T03:11:00","date_gmt":"2006-09-12T18:11:00","guid":{"rendered":"https:\/\/fugutabetai.com\/blog\/2006\/09\/13\/information-processing-society-of-japan-natural-language-meeting\/"},"modified":"2006-09-13T03:11:00","modified_gmt":"2006-09-12T18:11:00","slug":"information-processing-society-of-japan-natural-language-meeting","status":"publish","type":"post","link":"https:\/\/fugutabetai.com\/blog\/2006\/09\/13\/information-processing-society-of-japan-natural-language-meeting\/","title":{"rendered":"Information Processing Society of Japan Natural Language Meeting"},"content":{"rendered":"<p>Notes from the 2006-09-12 to 13 Information Processing Society of Japan meeting.  <a href=\"http:\/\/nl-ipsj.r.dl.itc.u-tokyo.ac.jp\/\">The Information Processing Society of Japan Special Interest Group on Natural Language Processing<\/a> holds bi-monthly meetings all around Japan.  Two months ago, the meeting was in Hakodate.  This time, the meeting was in Shinjuku, very close to where I live, so I decided it would be a good chance to attend and see what research is going on in the field in Japan.  <\/p>\n<p>It was really interesting.  All but two of the presentations were in Japanese, which was a very nice chance to get up to speed on technical Japanese, and to see how presentations here go.  It was pretty tiring too though.  I also had a chance to talk with some of the member of the \u60c5\u5831\u7206\u767a\u4e16\u754c\u30cb\u30e5\u30fc\u30b9 group that I&#8217;m involved with.  <\/p>\n<p>If you are interested in reading some very surface comments about the papers that I saw on Tuesday&#8217;s session, click to read more&#8230;<\/p>\n<p><!-- readmore --><\/p>\n<p>The first presentation was about concepts that can not be found in<br \/>\ndictionaries or wikis.  How do these concepts operate like technical<br \/>\nterms, and how are they related?  It looks like the input is books:<br \/>\nthey use scanners and OCR over books.  He builds a matrix of technical<br \/>\nterms over the publications.  Given a term, you can then pick out the<br \/>\ncontexts that it can appear in, and find related terms. <\/p>\n<p>&#8212;<\/p>\n<p>Second Presentation is from New York University &#8211; I met him at SIGIR &#8211;<br \/>\nNobuyuki Shimizu &#8211; &#8220;Knowledge Frame Extraction From Navigational Route<br \/>\nInstruction&#8221; <\/p>\n<p>Giving directions to a robot or something, how do you plan the path?<br \/>\nThere are three actions, turn left\/right, go straight, enter a door.<br \/>\nTakes those directions and fills a frame with them.  <\/p>\n<p>&#8212;<\/p>\n<p>Shimomura-san from Okayama daigaku.  <\/p>\n<p>He is trying to determine the agent, theme, and goal of sentences.  So<br \/>\nthe problem is to assign parts, roles to the actors in the sentence.<br \/>\nSo it turns out they&#8217;ve been studying many of these, and now today&#8217;s<br \/>\npresentation is on &#8220;Reason&#8221; and &#8220;Opponent&#8221;.  They use EDR to get<br \/>\ninformation on nouns, and have some interesting rules that use EDR<br \/>\ncategories and particles for information on how to classify nouns<br \/>\n(actor, agent, theme, goal, etc.)  <\/p>\n<p>&#8212;<\/p>\n<p>Tokyo Daigaku &#8211; Hiroko Ishida-san &#8211; &#8220;Study of sensory classification<br \/>\nas co-occurring terms with imitative words&#8221; <\/p>\n<p>This is about retrieving information from medical type databases using<br \/>\nnormal Japanese, and the giongo \/ gitaigo associated with describing<br \/>\nsounds or feelings in organs.  <\/p>\n<p>They used goo \u30d8\u30eb\u30b9\u30b1\u30a2 as a resource, as well as a giongo\/gitaigo<br \/>\ndictionary, and a medical dictionary.  They hand-classified about 2300<br \/>\nterms from goo into about 8 categories on what part of the body they<br \/>\nrelate to.  <\/p>\n<p>&#8212;<\/p>\n<p>I had lunch with Yoshioka-sensei, Mori-sensei, and others.<\/p>\n<p>&#8212;<\/p>\n<p>Yoshioka&#8217;s presentation: Classification of Anchor Text for Web<br \/>\nInformation Applications<\/p>\n<p>He looked at the anchor text in 100gigabytes of web pages (from the<br \/>\nNTCIR web test collection) and how inter- and intra- site link anchor<br \/>\ntext differs.  About 100,000,000 links.  There is some interesting<br \/>\nanalysis of the link content.  He wants to classify link text into 8<br \/>\ntypes of categories.  <\/p>\n<p>&#8212;<\/p>\n<p>Hirao Kazuki &#8211; &#8220;Web Search Result Clustering Based on Structure of<br \/>\nCompound Nouns&#8221; &#8211; Okayama Daigaku<\/p>\n<p>The idea here is to use the compound nouns in Japanese documents to<br \/>\ncluster them.  Also, since compound noun composition is fairly easy to<br \/>\nunderstand, we can use that to build a hierarchy of clusters.  \u8907\u5408\u540d<br \/>\n\u8a5e is compound noun in Japanese.  You can break them up into a<br \/>\nsupplemental concept and main concept (\u88dc\u8db3\u90e8 \u4e3b\u8981\u90e8) That can be used<br \/>\nto build a concept hierarchy.  The build their clusters, and then have<br \/>\na nice way of labeling the clusters.  In particular, if the cluster<br \/>\nwas generated by a keyword that is a sub-concept that they all share,<br \/>\nthen they make a label like &#8220;XX Recipes&#8221;.  <\/p>\n<p>&#8212;<\/p>\n<p>Hiroyuki Sakai &#8211; &#8220;Estimation of Impact Contained in Articles about<br \/>\neach Company in Financial Areas&#8221; &#8211; Toyoshahi University<\/p>\n<p>This also looks like a very interesting and relevant (to my work)<br \/>\npaper.  He is talking about identifying documents and sentences that<br \/>\nwill have an &#8220;impact&#8221; on business.  I&#8217;m having a really hard time<br \/>\nunderstanding this presentation (the presenter speaks quickly) but it<br \/>\nlooks like they calculate the stock price the day before and the day<br \/>\nafter the date of the news article, and look at that to calculate the<br \/>\nimpact.  In general, I think this is a very good approach (and for<br \/>\nlonger periods of terms, like over a week, look at the week-behavior<br \/>\nof the stock price and the centroid of articles over that week.)  <\/p>\n<p>They extract some terms that have a big impact (via entropy) on the<br \/>\ncompany.  <\/p>\n<p>&#8212;<\/p>\n<p>Daisuke Matsuzaki, Toshinori Watanabe, Hisashi Koga, Nuo Zhang &#8211;<br \/>\n&#8220;Document Relation Analysis by Data Compression&#8221; &#8211; University of<br \/>\nElectro-Communications <\/p>\n<p>I wasn&#8217;t particularly interested in this paper, and I&#8217;ve seen work<br \/>\nbefore on using compression to do document categorization.<\/p>\n<p>&#8212;<\/p>\n<p>Testuya Sakai &#8211; &#8220;Controlling the Penalty on Late Arrival of Relevant<br \/>\nDocuments in Information Retrieval Evaluation with Graded Relevance&#8221; &#8211;<br \/>\nKnowledge Media Laboratory, Toshiba R&#038;D Center.<\/p>\n<p>I&#8217;ve read one of his papers on Q-measure before, and it seems quite<br \/>\nsimilar.  <\/p>\n<p>&#8212;<\/p>\n<p>Hiroka Kimura, Toshinori Watanabe, Hisashi Koga, Nuo Zhang &#8211; &#8220;A New<br \/>\nDocument Retrieval Method Using LZ78 Compression Function&#8221; <\/p>\n<p>They did some summarization experiments using artificial documents.<br \/>\nThey had a few strategies: centroid summarization, most similar<br \/>\nsummarization, and most unique summarization.  I didn&#8217;t really follow<br \/>\nwhat the documents were like and what they evaluated against.  They<br \/>\nalso used Japanese news documents to do real summarization with human<br \/>\nmodels to compare against.  I don&#8217;t understand how the summaries were<br \/>\nscored either actually&#8230;  People liked centroid and most novel types<br \/>\nof summarization most.  It looked like it was more of a human scoring<br \/>\nevaluation than anything else.  <\/p>\n<p>This particular paper was the focus of a lot of questions.  The main<br \/>\nthrust of the questioning was &#8220;how is this different from a<br \/>\nvector-space model for summarization?&#8221; for the most part.  The<br \/>\ninteresting part of the answer is that this approach works really with<br \/>\nbinary data, so application to text is a first step, but conceivably<br \/>\nthese approaches can be applied to pictures or video as well.<br \/>\nCertainly I agree that is true, but I&#8217;m not sure if there is a strong<br \/>\nargument to be made for how to interpret the results over those kinds<br \/>\nof media.  <\/p>\n<p>&#8212;<\/p>\n<p>Junichi Fukumoto, Tsuneaki Kato, Fumito Masui, Tatsunori Mori, Noriko<br \/>\nKando &#8211; &#8220;An Automatic Evaluation of Question Answering using Basic<br \/>\nElements&#8221; &#8211; Ritsumeikan University, University of Tokyo, Mie<br \/>\nUniversity, Yokohama National University, National Institute of<br \/>\nInformatics<\/p>\n<p>An overview of the BE evaluation method for Question and Answering.<br \/>\nIt will be used in the QAC at NTCIR this year.  They have also created<br \/>\na Japanese version of a BE creator that runs over Kabochya output.  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Notes from the 2006-09-12 to 13 Information Processing Society of Japan meeting. The Information Processing Society of Japan Special Interest Group on Natural Language Processing holds bi-monthly meetings all around Japan. Two months ago, the meeting was in Hakodate. This time, the meeting was in Shinjuku, very close to where I live, so I decided [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10],"tags":[],"_links":{"self":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/138"}],"collection":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/comments?post=138"}],"version-history":[{"count":0,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/138\/revisions"}],"wp:attachment":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/media?parent=138"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/categories?post=138"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/tags?post=138"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}