November 27, 2006

International Conference on Asian Digital Libraries (ICADL2006) in Kyoto day 1

Some general information. About 190 registered participants on the first day. 220+ abstract submissions, 46 full papers, 13 short papers, 6 asian posters from 170 paper submissions.

Opening Keynote by Dr. Makoto Nagao

President NICT (National Institute of Information and Communications Technology) "The Age of Content and Knowledge Processing".

Worked on a digital library system from 1998-2004 here at Kyoto University, implemented it into the library system here. The internet is a great source of knowledge, with a lot of multimedia information as well. A search returns lots of information, so it is very hard to understand and visualize all of what is returned. There are three large studies in Japan looking at this problem:

  • Information Grand Voyage Project (Ministry of Economy, Trade, and Industry)
    • Technology for navigation multimedia information
    • Personalization for a user
    • Focused on industry development, 5 billion yen first year budget
  • Information Explosion Project (Ministry of Education, Culture, Sports, Science and Techonology)
    • Information extraction, particularly from multi modal information that is massively expanding
    • Technology for scanning information
    • Focus on University researchers, 600 million yen for the first year
  • The Information Analysis Proecjt (Ministry of Internal affairs and communication.)
    • Filter out useful information from unreliable and/or incorrect information
    • Add information on reliability, trustworthiness, and authenticity
    • Based at NICT for five years, with 300 million budget for the first year
    Information analysis:
  • surface analysis of information reliability
  • reliability of web sites based on site meta-data.
    Deep analysis of reliability:
  • cluster large amounts of information
  • Cluster summarization
  • Intention and emotion extraction from cluster information
  • Check and present contradictory information against dominant information, e.g., long tail information against dominant information.
  • Check consistency between the retrieved information and judge accuracy against known academic knowledge. (knowledge base, inference mechanism.)
  • Check consistency against temporal flow of information
Incorporates information from the web into a large-scale knowledge database, machine translation into Japanese before analysis.

III. Digital Archives

NICT is working on the Kyoto Digital Archives. They aim to digitize cultural and historical assets, promote new local industry (while helping to save traditional crafts and performing arts), and property rights research. They have the "Nijo Castle Digital Archives Project", with 327 Kano Tan'yu paintings digitized.

Has some details on their digital library preservation effort. They also have done studies on perceptual color recognition, and developed a system that uses 8 color components to record color.

Needs work on image retrieval technology (image search with color) and motion search segmentation, extraction, and extracting interesting images for video.

"Today's machine translation is nearly of usable quality to translate newspaper articles, introductory articles of various kinds, commentary papers, science and engineering papers, etc."

They have the Kyoto Project of World Cultural Exchange, and a few slides on how that works.

Session 2a: Advanced Digital Archives

"Annotating the Web Archives - An Exploration of Web Archives Cataloging and Semantic Web", Paul H.J. Wu, Adrian K.H. Heok, Ichsan P. Tamsir (Nanyang Technological University, Singapore.)

Launched a web archive in October of this year. They have also developed an annotation tool to help give a context to the archive categorization. This talk mostly focuses on cataloging and annotation. So for an example, the MOM (Singaporean Ministry of Manpower) was involved in an accident in 2004, and the website of the time had lots of information linking to the inquiry and speeches about it, and so on. Two years later, there are still minister's speeches and press releases, but they do not have the inquiry status report or the FAQ about the incident, but there is new information about the law that resulted from that accident.

So the idea is that web pages are focused on the present, while web archives capture what was in the past. (Of course, this depends on the type of website as well. Blog-driven or content-management sites almost all have archives that are accessible as well.)

The key features of their annotation tool is "context-aware annotation" and "ontology-aware annotation" to relate semantic content to the web content, and to look at agreement and differences in the text. Their annotation tool lets you highlight text on a website and link it into an ontology (created by someone) - their "context-aware" means that you incorporate the evidence from the text when linking things into the ontology.

They also allow you to annotate relational metadata between instances in the ontology. E.g., a particular instance of a speech is a specific speech of all general speeches.

After all of this tagging, it isn't clear to me what can be done automatically with the tools. When the content changes (which this method will let you detect) you need to still modify or adapt the ontology.


"Owlery: A Flexible Content Management System for 'Growing Metadata' of Cultural Heritage Objects and its Education Use in the CEAX Project", Kenro Aihara, Taizo Yamada, Noriko Kando (NII, Graduate University for Advanced Systems), Satoko Fujisawa (Graduate University for Advanced Systems), Yusuke Uehara, Takayuki Baba, Shigemi Nagata (Fujitsu Laboratories Ltd. Japan), Takashi Tojo, Tetsuhiko Awaji (Fujitsu Ltd. Japan), Jun Adachi (NII).

Metadata is difficult for cultural heritage objects because descriptions can vary greatly depending on the viewpoint. Names and titles can also change since they are recently given things relative to the age of these objects. There are also often multiple versions of the same object. They also want to be able to provide readable and understandable descriptions for different user groups (experts, children, etc.) It is essentially difficult to create an ontology in this field, since it is difficult to achieve a general consensus. This means that the RDF approach is not suitable.

In their approach the metadata is separate from the factual data, and descriptions are written over the data and kept separately. They have multiple descriptions, some which might target different audiences. Authorized users are allowed to add their own descriptions (or blog or wiki.) Their framework also gives semantic links between works that share the same factual data, helping users to browse concepts.

Implemented as Web Services on Apache Axis2 Java PostgreSQL. There are three clients, the Owlery Web Browser, (Java) Owlery client, CEAX client. When tested in a class, students did read the simplified characters, and the teachers were surprised that the students tried to actively read when often they give up when they encounter unknown kanji.


"A Digital Library for Preservation of Folklore Crafts, Skills, and Rituals and its role in Folklore Education", Yung-Fu Chen, Po-Chou Chan, Kuo-Hsien Huang, Hsuan-Hung Lin.

First, an introduction to folklore. They have created a classification for folklore types, broadly at Craft, Skill, Ritual, Artifact, etc. They have some relationships between them, i.e., "How to make the Taoist Bell?", "What is the Toaist Bell?", "How to use the Toaist Bell?". In their preliminary study, they built a system for demonstrating crafts, skills, and rituals, and use it for education.

They have a metadata method for recording using title, description, associated digital media, and other related information. They also have a "relation" metadata that is compatible with Dublin Core. The main relation that they use is "has part" and "is part of".

Their data consists of text and video created by folklore specialists. They have a web interface in Chinese and English for navigating through the data.


"A Digital Video Archive System of DNAP Taiwan", Hsiang-An Wang, Guey-Ching Chen, Chih-Yi Chiu, Jan-Ming Ho (Academica Sinica, Taiwan)

. An introduction to their video archiving system. They transcode for different displays (archive is high bitrate MPEG2, others are RM or WMV.) They tried Dublin Core for metadata, but found it to be insufficient, and instead used ECHO with some minor modifications. They have a website for managing the video metadata. They have some subtitle recognition from video data, and do speech recognition as well (Mandarin Chinese.) They also do shot detection, and use that to create a summary video with key frames from each shot. They have the ability to add watermark and subtitles to videos to identify ownership. Adding the watermarks was very time consuming and a non-reversible process, so they use FLV video now which supports layering.


Comments

Provide your email address when commenting and Gravatar will provide general portable avatars, and if you haven't signed up with them, a cute procedural avatar with their implementation of Shamus Young's Wavatars.

Comments have now been turned off for this post