{"id":158,"date":"2006-11-29T13:24:00","date_gmt":"2006-11-29T04:24:00","guid":{"rendered":"https:\/\/fugutabetai.com\/blog\/2006\/11\/29\/international-conference-on-asian-digital-libraries-icadl2006-in-kyoto-day-3\/"},"modified":"2006-11-29T13:24:00","modified_gmt":"2006-11-29T04:24:00","slug":"international-conference-on-asian-digital-libraries-icadl2006-in-kyoto-day-3","status":"publish","type":"post","link":"https:\/\/fugutabetai.com\/blog\/2006\/11\/29\/international-conference-on-asian-digital-libraries-icadl2006-in-kyoto-day-3\/","title":{"rendered":"International Conference on Asian Digital Libraries (ICADL2006) in Kyoto day 3"},"content":{"rendered":"<ul>\n<li><a href=\"#day3kikori\">Kikori-KS: An Effective and Efficient<br \/>\n  Keyword Search System for Digital Libraries in XML<\/a><\/li>\n<li><a href=\"#day3grouping\">Supporting Efficient Grouping and<br \/>\n  Summary Information for Semistructured Digital Libraries<\/a><\/li>\n<li><a href=\"#day3composition\">Functional Composition of Web<br \/>\n  Databases<\/a><\/li>\n<li><a href=\"#day3wikipedia\">Integration of Wikipedia and a<br \/>\n  Geography Digital Library<\/a><\/li>\n<li><a href=\"#day3hierarchicalsummarization\">Impact of Document<br \/>\n  Structure to Hierarchical Summarization<\/a><\/li>\n<li><a href=\"#day3books\">Indexing All the World&#8217;s Books: Future<br \/>\n  Directions and Challenges for Google Book Search<\/a><\/li>\n<li><a href=\"#day3onelaptop\">One Billion Children and Digital<br \/>\n  Libraries: With Your Help, What the $100 Laptop and Its Sunlight<br \/>\n  Readable Display Might Enable<\/a><\/li>\n<li><a href=\"#day3searchpanel\">Next-Generation Search panel<\/a><\/li>\n<\/ul>\n<p><!-- readmore --><\/p>\n<h3>Session 8: Semistructured Data XML<\/h3>\n<p><a name=\"day3kikori\"><\/p>\n<h4>\n&#8220;Kikori-KS: An Effective and Efficient Keyword Search System for<br \/>\nDigital Libraries in XML&#8221;, <i><br \/>\nToshiyuki Shimizu (Kyoto University Japan), Norimasa Terada (Nagoya<br \/>\nUniversity), Masatoshi Yoshikawa (Kyoto University)<br \/>\n<\/i><br \/>\n<\/h4>\n<p><\/a><br \/>\nMany DL systems are starting to use XML documents, and we would like<br \/>\nto be able to search taking advantage of XML&#8217;s structure.  They want<br \/>\nto target users who are not familiar with XML, so they use keyword<br \/>\nsearches.  Their main contribution is a user-friendly &#8220;FetchHighlight&#8221;<br \/>\nuser interface, and a storage system that is well suited to XML.  They<br \/>\nstore the XML documents into a relational database.  They explain four<br \/>\ntypes of returning XML elements, whether you rank documents or<br \/>\nelements, etc.  Their fetchbrowse method aggregates relevant elements by<br \/>\ndocument, and ranks them in document order.  <\/p>\n<p><P\/><\/p>\n<p>With XML you have to handle a huge number of document fragments.<br \/>\n16,000,000 fragments for 17,000 documents.  The store in a databased<br \/>\nwith the XRel schema (particular to XML documents.)  They do<br \/>\ntranslation from keywords to SQL query.  Their IR system uses a vector<br \/>\nspace model with term weights in a tf*idf type method.  <\/p>\n<p><P\/> <\/p>\n<p>They used a 700MB dataset from INEX with 40 queries.  <\/p>\n<p><a name=\"day3grouping\"><\/p>\n<h4>\n&#8220;Supporting Efficient Grouping and Summary Information for<br \/>\nSemistructured Digital Libraries&#8221;, <i>Minsoo Lee, Sookyung Song, Yunmi<br \/>\nKim (Ewha Woman&#8217;s University, Korea), Hyoseop Shin (Konkuk University,<br \/>\nKorea)<\/i>.<br \/>\n<\/h4>\n<p><\/a><\/p>\n<p>When documents are in XML, you need to have grouping capabilities to<br \/>\ngive users an overview or summary of the documents in the collection.<br \/>\nXQuery is a language that allows you to query XML documents.  XQuery<br \/>\ndoes not specifically support a group-by syntax, so it is quite<br \/>\ndifficult to create queries that properly do grouping.  Their work<br \/>\nfocuses on extending XQuery to provide a group-by clause.  They give<br \/>\nmany examples of how the group-by clause makes it easier to understand<br \/>\nthe query compared to writing it without.  <\/p>\n<p><P\/>  They did not really talk about summarization at all.  <\/p>\n<hr>\n<h3>Session 9a: Information Organization<\/h3>\n<p><a name=\"day3composition\"><\/p>\n<h4>\n&#8220;Functional Composition of Web Databases&#8221;, <i>Masao Mori, Tetsuya<br \/>\nNakatoh, Sachio Hirokawa (Kyuushyuu University, Japan)<\/i>.<br \/>\n<\/h4>\n<p>When searching for content in multiple web databases, you need to have<br \/>\nmultiple browsers open to search in different systems.  So they<br \/>\npropose a system that merges results from multiple data sources.  They<br \/>\nautomatically generate CGI programs on a server based on user settings<br \/>\nto connect to the various web databases, and then sends response back<br \/>\nto the user.  The user specifies settings (which databases to use,<br \/>\nconnections to make between them, etc.) using a fairly complicated<br \/>\nlooking scripting language via a web-form interface.  I can&#8217;t help but<br \/>\nthink that this is a pretty complicated way to do federate search, but<br \/>\nit does look quite extensible and general.  <\/p>\n<p><a name=\"day3wikipedia\"><\/p>\n<h4>\n&#8220;Integration of Wikipedia and a Geography Digital Library&#8221;, <i><br \/>\nEe-Peng Lim, Zhe Wang, Darwin Sadeli, Yuanyuan Li, Chew-Hung Chang,<br \/>\nKalyani Chatterjea, Dion Hooe-Lian Goh, Yin-Leng Theng, Jun Zhang,<br \/>\nAixin Sun (Nangyan Technical University, Singapore)<\/i><br \/>\n<\/h4>\n<p><\/a><\/p>\n<p>G-Portal is a geography digital library with metadata for geographical<br \/>\nweb resources.  The reason to integrate the two is that wikipedia does<br \/>\nnot have digital library features, and from G-Portal point of view<br \/>\nthey are a good resource for geography education.  They created an<br \/>\nautomatic login to G-Portal from Wikipedia so users can navigate from<br \/>\nWikipedia to G-Portal content, and they modify G-Portal to show<br \/>\nselected information based on displayed WikiPedia information (their<br \/>\n&#8220;metadata centric display&#8221;.)  They also add g-portal content<br \/>\nlinks into WikiPedia by using reverse proxy so they do not have to<br \/>\nactually change anything in the wiki.  Based on preliminary studies<br \/>\nwith students they have had good response.  They are now looking at<br \/>\nhow they can update their meta-data content when WikiPedia changes.  <\/p>\n<p><a name=\"day3hierarchicalsummarization\"><\/p>\n<h4>\n&#8220;Impact of Document Structure to Hierarchical Summarization&#8221;,<br \/>\n<i>Fu Lee Wang (City University of Hong Kong), Christopher C. Yang<br \/>\n(Chinese University of Hong Kong)<\/i>.<br \/>\n<\/h4>\n<p><\/a><\/p>\n<p>This is single document summarization.  This computes sentence scores<br \/>\nat each node in the hierarchical structure based on the score of the<br \/>\nsentences under the node.  <\/p>\n<p><P\/> <\/p>\n<p>For multiple document summarization, they looked at two sets of news<br \/>\nstory documents (from CNN.com) and plotted distribution of news<br \/>\nstories on a timeline.  They look at the set of multiple documents as<br \/>\na large document that should be fit into a hierarchical structure and<br \/>\nthen summarized.  They have three ways to build the tress:<\/p>\n<p><P\/>  A balanced tree built over the documents as arranged on a<br \/>\ntimeline.  A second approach organizes by having child nodes with<br \/>\nequal and non-overlapping intervals.  (The tree is unbalanced.)  Their<br \/>\nthird approach is structure by event topics.  For the summarization<br \/>\nthey generate a summary for each range block.  If the summary is too<br \/>\nlarge, they partition into children.  <\/p>\n<p><P\/>  They look at the intersection of sentences between the three<br \/>\nsentence types, but I don&#8217;t know what that really shows or means.<br \/>\nThey also conducted intrinsic and extrinsic evaluation of the<br \/>\nsummaries.  They had human summaries created at a 5% compression<br \/>\nratio.  They have &#8220;precision&#8221; of summary at 5% precision, but I don&#8217;t<br \/>\nknow how they are computing that.  They show that the &#8220;summarization<br \/>\nby event&#8221; performs better than the other two.  They also performed an<br \/>\nextrinsic evaluation using a Q&#038;A task.  <\/p>\n<p><P\/><\/p>\n<p>According to that evaluation, the degree of the tree spread-out does<br \/>\nnot affect the results of summarization.  Also, summarization by event<br \/>\nagain outperformed the other two methods (but not at compression ratio<br \/>\nof 20% and 20% where it is about the same.)  So they conclude that<br \/>\nhierarchical summarization is useful for high compression ratios.  <\/p>\n<p><P\/> <\/p>\n<p>I asked for a clarification and indeed the topics are assigned<br \/>\nmanually.  I asked him later if he thought about looking at the DUC<br \/>\ndata, but he said that due to the manual event categorization<br \/>\nnecessary, they did not look at that data set.  We talked a little bit<br \/>\n(briefly) about current automatic topic clustering, but he didn&#8217;t seem<br \/>\ntoo interested.  <\/p>\n<hr>\n<h3>Session 10: Keynote and Invited Talks<\/h3>\n<p><a name=\"day3books\"><\/p>\n<h4>\n&#8220;Indexing All the World&#8217;s Books: Future Directions and Challenges for<br \/>\nGoogle Book Search&#8221;, <i><br \/>\nDaniel Clancy (Google, USA).<br \/>\n<\/i><br \/>\n<\/h4>\n<p><\/a><\/p>\n<p>People are lazy, and if information is readily available they will go<br \/>\nthere to get information instead of somewhere else that is slightly<br \/>\nharder to access.  He has a poll question: How many of you have been<br \/>\nin a library in the past year?  (In this audience, most everyone, in<br \/>\nnormal audiences, it can fall down to 30%-50% easily!)  The internet<br \/>\nmakes it very easy to find some information, but there is still a vast<br \/>\namount of information that is not accessible on the internet.  <\/p>\n<p><P\/><\/p>\n<p>They have two initiatives: 15% is the partner program (books that are<br \/>\nin print) and library program (85% and most of these books are out of<br \/>\nprint and\/or out of copyright.)  For most library content there are<br \/>\nabout 15% in print, 65% unclear copyright status, 20% public domain<br \/>\n(books before 1923).<br \/>\n92% of the world&#8217;s books are neither generating revenue for the<br \/>\ncopyright holder nor easily accessible to potential readers.  <\/p>\n<p><P\/><\/p>\n<p>Gives a demo of Google Books.  His first search &#8220;Kyoto History&#8221;<br \/>\nprompted him to talk about ranking problems.  On the web because of<br \/>\nthe inter-connectivity, Google is able to use that linking information<br \/>\nto improve result ranking.  That link structure doesn&#8217;t exist within<br \/>\nbooks, so they have to look at other ways to improve rankings.  <\/p>\n<p><P\/><\/p>\n<p>They&#8217;ve designed the interface as a real browsing experience.  There<br \/>\nis a reference page that tries to give you information about the<br \/>\nbook, which has references (books that refer to this book) and related<br \/>\nbooks, and references from scholarly works.  This is for a publisher<br \/>\nbook.  <\/p>\n<p><P\/><\/p>\n<p>For books that they don&#8217;t know the rights about, they show a short<br \/>\n&#8220;snippet&#8221; view.  <\/p>\n<p><P\/><\/p>\n<p>For full view books you get the whole thing.  For example, the Diary<br \/>\nof George Washington is the whole public domain book.  <\/p>\n<p><P\/><\/p>\n<p>About finding books: the long tail.  There is a really large amount of<br \/>\ndiversity.  They are moving towards blending in book results into the<br \/>\nnormal web search content.  The result links you right into the page<br \/>\nwhere the information is.  People are comfortable with parsing a book,<br \/>\nand some people have a hard time getting oriented to web pages since<br \/>\ntheir layout is all different.  <\/p>\n<p><P\/><\/p>\n<p>Their process for scanning.  There is logistics for moving books,<br \/>\ndoing the scanning, and storage.  When Google looked at this market,<br \/>\nnobody has developed a scanner that fits their needs where they really<br \/>\nwant to look at large scale scanning.  For 30 million books, they need<br \/>\nto make their own technology that is cost-effective.  He can&#8217;t release<br \/>\nthe technology that they are using though.  It sounds like they are<br \/>\ndoing some sort of grid-type thing with lower quality components but<br \/>\nuse software to detect and fix problems, lots of redundancy, etc.  As<br \/>\nthe software improves over time you can always go back and fix that<br \/>\nstuff.  Many problems with poor text books, tightly bound, etc.  They<br \/>\nalso needed to make sure that they would not damage the books.  They<br \/>\nhave extensive processing to remove yellowed pages, etc.  Problems<br \/>\nwith getting the right page number, OCR and math, spelling correction<br \/>\nwith old, old books with olde English, intentional spelling errors,<br \/>\nincorrect metadata for books, mixed languages, layout order for pages,<br \/>\nworking on support for CJK languages.  <\/p>\n<p><P\/><\/p>\n<p>How do you create a rich link structure that relates all these books<br \/>\nand information outside of books?  Books are not always individual<br \/>\nunits, and indeed have relationships to other books, references,<br \/>\ncriticism and review, and so on.  They would like to have a similar<br \/>\nkind of link structure, so perhaps we will allow people to build up<br \/>\nlinks in books.  <\/p>\n<p><P\/> <\/p>\n<p>Discussion notes: <\/p>\n<ul>\n<li>Role of Library in the Future\n<li>Access for everyone everywhere\n<li>Problems with current institutional subscription models\n<li>Publishing and User Generated Content\n<li>How can Google help support research?\n<li>Role of private companies\n<\/ul>\n<p>And interesting question relating the Great Library of Alexandria to<br \/>\nGoogle: what happens if Google burns down?  For the library partners<br \/>\nGoogle actually gives them a copy of the data as they create it, so<br \/>\nthey will not be the only people with the data.<\/p>\n<p><a name=\"day3onelaptop\"><\/p>\n<h4>\n&#8220;One Billion Children and Digital Libraries: With Your Help, What the<br \/>\n$100 Laptop and Its Sunlight Readable Display Might Enable&#8221;,<br \/>\n<i><br \/>\nMary Lou Jepson (One Laptop Per Child, USA)<br \/>\n<\/i><br \/>\n<\/h4>\n<p><\/a><\/p>\n<p>Has a demo of the cute little laptop.  It is small and has two cute<br \/>\n&#8220;ears&#8221; for wireless I guess?  The single laptop is cheaper than a text<br \/>\nbook in the 3rd world.  They want to provide entire libraries to<br \/>\nkids.  <\/p>\n<p><P\/><\/p>\n<p>The display is the most costly and power intensive part of the<br \/>\nlaptop.  Mary Lou Jepson&#8217;s job was to reduce the cost of the display.<br \/>\nThe laptop is being produced by Quanta, the largest laptop maker in<br \/>\nthe world (20 million a year about.)  Has a book more (screen flips<br \/>\naround) and in general looks pretty cool.  Average power consumption<br \/>\nis 2 watts for OLPC, compared to 45 watts for normal laptops (wow!)<br \/>\nSo they go to great lengths to turn off components that are not in<br \/>\nuse.  They have a nice trick where they put memory buffer to the<br \/>\ndisplay, and if nothing is changing on it, they let the buffer do the<br \/>\nupdates and turn off the CPU.  It has 512MB of flash, no hard drive<br \/>\nthough.  <\/p>\n<p><P\/><\/p>\n<p>Interesting comment: this isn&#8217;t a product, it is a global humanitarian<br \/>\ncause.  Ownership is key: the laptop does not break when the child<br \/>\nowns it.  The mesh network works: you have to get backhaul<br \/>\nconnectivity to the school, but then that is it and distributed<br \/>\nlaptops spread the area connectivity.  They have a 640&#215;480 video<br \/>\ncamera for $1.5!  It uses a AMD Geode GX2-500.  3 USB ports hidden<br \/>\nunder rabbit ears, SD card slot, mic and stereo, video camera.  Mode<br \/>\n1: 1200&#215;900 greyscale sunlight readable, mode 2: 1024&#215;768 color.  1<br \/>\nWatt with backlight off (vs about 7 Watts normal), backlight off ~ 0.2<br \/>\nwatts.  Innovative changes in the LCD: changes pixel layout to use<br \/>\nfewer of the expensive color gels.  200dpi at 6 bit per pixel (same as<br \/>\na book) in b&#038;w mode.  Color when backlight is on, roomlight increases<br \/>\nresolution.<\/p>\n<p><P\/><\/p>\n<p>They are also making a $100 disk farm server that stores lots of data,<br \/>\n$10 DVD player (parts are $8, but licensing fees are $10! They are<br \/>\ntrying to get a forgiveness for the licensing fee for that project),<br \/>\n$100 projector.  The projector uses the same screen as the laptop!<br \/>\nReduces lamp cost because it doesn&#8217;t need to be so expensive: $1 lamp<br \/>\nand 30 Watt power consumption.  Mechanical design has no moving parts.<br \/>\nIt is droppable with a shock-mounted LCD and replaceable bumper.<br \/>\nDirt\/moisture resistant, has a handle, can put a strap on it.  They<br \/>\nget about .5 &#8211; 1 kilometer range with rabbit ears (2-3x more than<br \/>\nwithout!)  Input devices: game controller, sealed keyboard, dual mode<br \/>\ntouchpad (middle is finger pad, or you can write with a stylus across<br \/>\nthe whole thing.)  They are designing different types of chargers, car<br \/>\nbattery, human power, etc.  They had a crank, but ergonomically that<br \/>\ndidn&#8217;t work well, so they&#8217;ve moved to a foot pedal and string puller<br \/>\nthing.<\/p>\n<p><P\/><\/p>\n<p>All of the software they ship on there is GPL.  <\/p>\n<hr>\n<h3>Session 11: ICADL2006 \/ DBWeb2006 Joint Panel<\/h3>\n<p><a name=\"day3searchpanel\"><\/p>\n<h4>&#8220;Next-Generation Search&#8221;, <i>Daniel Clancy (Google), Masaru<br \/>\nKitsuregawa (University of Tokyo), Zhou Lizhu (Tsinghua University of<br \/>\nChina), Wei-Ying Ma (Microsoft Research Asia, China), Hai Zhuge<br \/>\n(Chinese Academy of Sciences)<\/i><br \/>\n<\/h4>\n<p><\/a><\/p>\n<p>Kitsuregawa: Trends.  Nobody knows the future, and on the web things<br \/>\nare very fast (Friendster to MySpace migration, acute increase in<br \/>\nMySpace value, YouTube.)  Transition from Search to Service.<br \/>\nCurrently the value of search results are not high enough; people will<br \/>\nnot pay for it so it is currently advertising supported.  Businesses<br \/>\nmust respond within 1 second, but Universities do not have to be<br \/>\nrestricted to that limit.  He is looking at this in the information<br \/>\nexplosion project.  He gives a demo of temporal evolution search,<br \/>\nshowing how information changes over time in sites, and how linking<br \/>\npatterns between them change.  <\/p>\n<p><P\/> <\/p>\n<p>He also started to speak about the Information Grand Voyage Project.<br \/>\nThe main theme is <strong>From Search to Service<\/strong>.  <\/p>\n<p><P\/><\/p>\n<p>Dr. Wei-Ying Ma, Microsoft Research Asia.  Improving Search: can we<br \/>\nunderstand content (web pages) or peoples&#8217; queries better?  There are<br \/>\nbottom up approaches (IR, DB, ML, IE, DM, Web 2.0) and top-down<br \/>\napproaches (semantic web.)  One research direction his group has been<br \/>\nworking on is moving from web pages to web objects (entities).  For<br \/>\nexample, Microsoft Libra.  The idea is that for the research community<br \/>\nthere are many important objects (conferences, authors, papers,<br \/>\njournal, interest groups, etc.)  You can search for different types of<br \/>\nobjects.  Input a search topic (association rules) and get a ranked<br \/>\nlist of papers, or a ranked list of important people in the field, or<br \/>\nimportant conferences \/ journals \/ interest groups.  This is an<br \/>\nexample of moving search engines from page level to the object level.  <\/p>\n<p><P\/><\/p>\n<p>They have another project (Guanxi) for doing general search on the<br \/>\ninternet over objects.  It looks like they try to model important<br \/>\ntypes of objects (at least people for sure) and extract information<br \/>\nabout that.  Moving from doing just relevance ranking to providing<br \/>\nintelligence in the search results.  Also moving to personalized<br \/>\nsearch.  Also, can we integrate more information from the deep\/hidden<br \/>\nweb (databases)?  Searching is moving from individual to social as<br \/>\nwell (MySpace networking sites and so on.)  <\/p>\n<p><P\/><\/p>\n<p>System and Infrastructure:  Process-centric to Data centric.  Tightly<br \/>\ncoupled to loosely coupled.  They are trying to re-architect their<br \/>\nsearch engine so that it is built in modularized layers.  They are<br \/>\nfocusing on building an infrastructure along with IDE tools to make<br \/>\ndevelopment of these systems easier.  WebStudio is another layer in<br \/>\nthis search-framework architecture.  <\/p>\n<p><P\/><\/p>\n<p>Global Advertising is $500 billion vs $200 billion in packaged<br \/>\nsoftware in 2005.  So search is important.  They are looking to focus<br \/>\non global internet economies.  <\/p>\n<p><P\/><\/p>\n<p><strong>Hai Zhuge<\/strong>.  &#8220;Completeness of Query Operations on<br \/>\nResource Space&#8221;.  People have not paid enough attention to semantics<br \/>\nbecause of a lack of computational power.  How do you normalize<br \/>\norganization of resources in a large-scale decentralized network.  He<br \/>\nintroduces a resource space model that can be used to locate objects<br \/>\nin the space, and normal form over them.  He talks about different<br \/>\noperations over the space.  <\/p>\n<p><P\/>  Content is the key, but that is not text, image, keywords, not<br \/>\nstatic.  It needs to be dynamically clustered.  Gave a demo of the<br \/>\ndigital cave project, lots of video and scrolling.  A combination of<br \/>\ntext, image and sounds.  <\/p>\n<p><P\/>  I really have a very hard time seeing how the theory he has<br \/>\nmostly been talking about connects to the search aspects in practical<br \/>\nterms; the resource space model it a bit vague and unclear to me right<br \/>\nnow.  <\/p>\n<p><P\/><\/p>\n<p><strong>Zhou Linzu<\/strong>.  &#8220;The Next Generation Search Engines &#8211;<br \/>\nPersonal Search Engines.&#8221;  Search will become a daily activity, but<br \/>\ninformation needs vary drastically from person to person, so we need<br \/>\npersonal search engines.  Search results should be easily navigated by<br \/>\nclustering, as a graph, and indexed by semantic terms.  You should be<br \/>\nable to include technology from other fields (AI, NLP, DB, Multimedia)<br \/>\nto build the search engines easily.  Need to develop a<br \/>\nsystems-building framework for personal search engines.  They have an<br \/>\nexample of the C-Book search engine for searching for books.<br \/>\nThey have a platform for search engine construction: SESQ for<br \/>\nconstructing vertical search engines.  Users supply a schema, some<br \/>\nseed URLS, and some external rules, the system does the rest.  <\/p>\n<p><P\/><\/p>\n<p><strong>Dr. Daniel Clancy<\/strong>.  The reason we care about<br \/>\ndifferent directions in the future of search is so that we can do<br \/>\nresearch that has an impact.  The areas we publish in should grow and<br \/>\nhave an impact.  Looking at the history of how things evolved:  <b>users<br \/>\nlike it simple<\/b>.  After many usability studies and assessments,<br \/>\npeople like things to be simple.  The other thing is <b>one revolution at<br \/>\na time please<\/b>.  The web revolution happened and users built this<br \/>\ngreat link structure, and then Google came along and took advantage of<br \/>\nthat.  They did not say &#8220;make good hyperlinks and tags then I can make<br \/>\na great search&#8221; &#8211; they used what was there.  Search and ads were not<br \/>\ncoupled together, they first got search right and then got ads right.<br \/>\nAs a researcher I love the Semantic web, but as a user I am cautious<br \/>\nabout the possibilities.  <\/p>\n<p><P\/><\/p>\n<p>How does it work today?  Users give you a query and it works well for<br \/>\nsimple things.  When you are looking for complex information, it has<br \/>\nmore trouble.  (e.g., I will be in Kyoto for three hours and will be<br \/>\non the western side of town which temple should I go to?)  The future<br \/>\nwill be transitioning search to <strong>search as a<br \/>\ndialogue<\/strong>.  Personalization is important, but in reality the<br \/>\ninformation that knowing the personal context has very little<br \/>\ninformation compared to the query.  Also, people do not like<br \/>\nremembering lots of different places to go.  People like to go to one<br \/>\nplace and then let the system figure out what vertical application<br \/>\nthey are actually interested in.  Another area of big interest is<br \/>\nuser-generated content.  How can we take advantage of user-generated<br \/>\ncontent and search within that content space because it is a very<br \/>\ndifferent type of content.  <\/p>\n<hr>\n<p><strong>Comments and discussion<\/strong>.<br \/>\nSemantic high-level search is very interesting, but there is a problem<br \/>\nin that users like simplicity.  Should queries be more natural<br \/>\nlanguage in the future?  <\/p>\n<p><P\/><\/p>\n<p>Daniel Clancy comments that natural language doesn&#8217;t seem to be up to<br \/>\nthe level needed for this challenge yet.  Users type in short queries<br \/>\npartly because that is what we have trained them to do, but maybe they<br \/>\ncan be re-trained, and it will happen at some point in the future.  It<br \/>\nis a good area for research.  <\/p>\n<p><P\/>  Dr. Wei-Ying Ma: they initially thought that Ask Jeeves was<br \/>\ntheir main competitor since they think that natural language search<br \/>\nwas the main area.  But still NLP does not seem to be up to the task.  <\/p>\n<p><P\/> Professor Kitsuregawa: When you look at the long tail, there are<br \/>\npeople that make natural language queries.  The desire is there, so we<br \/>\nshould try to pursue that in the future.  One way to reduce the<br \/>\nambiguity in NLP is by using interaction, so going to dialogue first<br \/>\nmight be a better idea.  Right now they are only trying to apply NLP<br \/>\nto sentiment analysis.  <\/p>\n<p><P\/>  Dr. Lizhu agrees with this.  There is still no formal theory to<br \/>\nrepresent the semantics of NLP, which is a big problem.  <\/p>\n<p><P\/> Kitsuregawa: the main problem is disambiguation, so maybe through<br \/>\ndialogue we can perform the disambiguation better.  <\/p>\n<p><P\/> Makiko Miwa:  Perhaps it is possible to identify the users&#8217; goal,<br \/>\nor at least genre and date of update of the content.  For example, for<br \/>\npeople that monitor the same information every day, or other people<br \/>\nthat don&#8217;t want to read personal homepages but just want to look at<br \/>\nofficial sites.  So what about users&#8217; context information processing?  <\/p>\n<p><P\/> Professor Lizhu:  some context can be gained from log<br \/>\nanalysis. It still looks difficult to use this information to<br \/>\ndetermine the users&#8217; goals.  There is research on this in Europe that<br \/>\nincludes more user context information and interaction.  <\/p>\n<p><P\/> Daniel Clancy: when people go to scholar.google.com or<br \/>\nMicrosoft&#8217;s Libre then they are giving you some context.  It is<br \/>\ndifficult to understand a user because what they want really depends<br \/>\nand changes with the circumstances.  <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Kikori-KS: An Effective and Efficient Keyword Search System for Digital Libraries in XML Supporting Efficient Grouping and Summary Information for Semistructured Digital Libraries Functional Composition of Web Databases Integration of Wikipedia and a Geography Digital Library Impact of Document Structure to Hierarchical Summarization Indexing All the World&#8217;s Books: Future Directions and Challenges for Google Book [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10],"tags":[],"_links":{"self":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/158"}],"collection":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/comments?post=158"}],"version-history":[{"count":0,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/posts\/158\/revisions"}],"wp:attachment":[{"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/media?parent=158"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/categories?post=158"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fugutabetai.com\/blog\/wp-json\/wp\/v2\/tags?post=158"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}