International Conference on Asian Digital Libraries (ICADL2006) in Kyoto day 1

Some general information.
About 190 registered participants on the first day.
220+ abstract submissions, 46 full papers, 13 short papers, 6 asian posters
from 170 paper submissions.

Opening Keynote by Dr. Makoto Nagao

President NICT (National Institute of Information and Communications Technology)
“The Age of Content and Knowledge Processing”.

Worked on a digital library system from 1998-2004 here at Kyoto
University, implemented it into the library system here. The internet
is a great source of knowledge, with a lot of multimedia information
as well. A search returns lots of information, so it is very hard to
understand and visualize all of what is returned. There are three
large studies in Japan looking at this problem:

  • Information Grand Voyage Project (Ministry of Economy, Trade, and
    Industry)

    • Technology for navigation multimedia information
    • Personalization for a user
    • Focused on industry development, 5 billion yen first year budget
  • Information Explosion Project (Ministry of Education, Culture,
    Sports, Science and Techonology)

    • Information extraction, particularly from multi modal information
      that is massively expanding

    • Technology for scanning information
    • Focus on University researchers, 600 million yen for the first
      year
  • The Information Analysis Proecjt (Ministry of Internal
    affairs and communication.)

    • Filter out useful information from unreliable and/or incorrect
      information

    • Add information on reliability, trustworthiness, and authenticity
    • Based at NICT for five years, with 300 million budget for the
      first year
    Information analysis:

  • surface analysis of information reliability
  • reliability of web sites based on site meta-data.
    Deep analysis of reliability:

  • cluster large amounts of information
  • Cluster summarization
  • Intention and emotion extraction from cluster information
  • Check and present contradictory information against dominant
    information, e.g., long tail information against dominant
    information.

  • Check consistency between the retrieved information and judge
    accuracy against known academic knowledge. (knowledge base,
    inference mechanism.)

  • Check consistency against temporal flow of information

Incorporates information from the web into a large-scale knowledge
database, machine translation into Japanese before analysis.

III. Digital Archives

NICT is working on the Kyoto Digital Archives. They aim to digitize
cultural and historical assets, promote new local industry (while
helping to save traditional crafts and performing arts), and property
rights research. They have the “Nijo Castle Digital Archives
Project”, with 327 Kano Tan’yu paintings digitized.

Has some details on their digital library preservation effort. They
also have done studies on perceptual color recognition, and developed
a system that uses 8 color components to record color.

Needs work on image retrieval technology (image search with color) and
motion search segmentation, extraction, and extracting interesting
images for video.

“Today’s machine translation is nearly of usable quality to translate
newspaper articles, introductory articles of various kinds, commentary
papers, science and engineering papers, etc.”

They have the Kyoto Project of World Cultural Exchange, and a few
slides on how that works.

Session 2a: Advanced Digital Archives

“Annotating the Web Archives – An Exploration of Web Archives
Cataloging and Semantic Web”
, Paul H.J. Wu, Adrian K.H. Heok,
Ichsan P. Tamsir (Nanyang Technological University, Singapore.)

Launched a web archive in October of this year. They have also
developed an annotation tool to help give a context to the archive
categorization. This talk mostly focuses on cataloging and
annotation. So for an example, the MOM (Singaporean Ministry of
Manpower) was involved in an accident in 2004, and the website of the
time had lots of information linking to the inquiry and speeches about
it, and so on. Two years later, there are still minister’s speeches
and press releases, but they do not have the inquiry status report or
the FAQ about the incident, but there is new information about the law
that resulted from that accident.

So the idea is that web pages are focused on the present, while web
archives capture what was in the past. (Of course, this depends on
the type of website as well. Blog-driven or content-management sites
almost all have archives that are accessible as well.)

The key features of their annotation tool is “context-aware
annotation” and “ontology-aware annotation” to relate semantic content
to the web content, and to look at agreement and differences in the
text. Their annotation tool lets you highlight text on a website and
link it into an ontology (created by someone) – their “context-aware”
means that you incorporate the evidence from the text when linking
things into the ontology.

They also allow you to annotate relational metadata between instances
in the ontology. E.g., a particular instance of a speech is a
specific speech of all general speeches.

After all of this tagging, it isn’t clear to me what can be done
automatically with the tools. When the content changes (which this
method will let you detect) you need to still modify or adapt the
ontology.


“Owlery: A Flexible Content Management System for ‘Growing
Metadata’ of Cultural Heritage Objects and its Education Use in the
CEAX Project”, Kenro Aihara, Taizo Yamada, Noriko Kando (NII,
Graduate University for Advanced Systems), Satoko
Fujisawa (Graduate University for Advanced Systems), Yusuke Uehara,
Takayuki Baba, Shigemi Nagata (Fujitsu Laboratories Ltd. Japan), Takashi Tojo,
Tetsuhiko Awaji (Fujitsu Ltd. Japan), Jun Adachi (NII).

Metadata is difficult for cultural heritage objects because
descriptions can vary greatly depending on the viewpoint. Names and
titles can also change since they are recently given things relative
to the age of these objects. There are also often multiple versions
of the same object. They also want to be able to provide readable and
understandable descriptions for different user groups (experts,
children, etc.) It is essentially difficult to create an ontology in
this field, since it is difficult to achieve a general consensus.
This means that the RDF approach is not suitable.

In their approach the metadata is separate from the factual
data, and descriptions are written over the data and kept separately.
They have multiple descriptions, some which might target different
audiences. Authorized users are allowed to add their own descriptions
(or blog or wiki.) Their framework also gives semantic links between
works that share the same factual data, helping users to browse
concepts.

Implemented as Web Services on Apache Axis2 Java PostgreSQL.
There are three clients, the Owlery Web Browser, (Java) Owlery client,
CEAX client. When tested in a class, students did read the simplified
characters, and the teachers were surprised that the students tried to
actively read when often they give up when they encounter unknown
kanji.


“A Digital Library for Preservation of Folklore Crafts, Skills, and
Rituals and its role in Folklore Education”
, Yung-Fu Chen, Po-Chou
Chan, Kuo-Hsien Huang, Hsuan-Hung Lin.

First, an introduction to folklore. They have created a
classification for folklore types, broadly at Craft, Skill, Ritual,
Artifact, etc. They have some relationships between them, i.e., “How
to make the Taoist Bell?”, “What is the Toaist Bell?”, “How to use the
Toaist Bell?”. In their preliminary study, they built a system for
demonstrating crafts, skills, and rituals, and use it for education.

They have a metadata method for recording using title, description,
associated digital media, and other related information. They also
have a “relation” metadata that is compatible with Dublin Core. The
main relation that they use is “has part” and “is part of”.

Their
data consists of text and video created by folklore specialists. They
have a web interface in Chinese and English for navigating through the
data.


“A Digital Video Archive System of DNAP Taiwan”, Hsiang-An
Wang, Guey-Ching Chen, Chih-Yi Chiu, Jan-Ming Ho (Academica Sinica,
Taiwan)

.

An introduction to their video archiving system. They transcode for
different displays (archive is high bitrate MPEG2, others are RM or
WMV.) They tried Dublin Core for metadata, but found it to be
insufficient, and instead used ECHO with some minor modifications.
They have a website for managing the video metadata. They have some
subtitle recognition from video data, and do speech recognition as
well (Mandarin Chinese.) They also do shot detection, and use that to
create a summary video with key frames from each shot. They have the
ability to add watermark and subtitles to videos to identify
ownership. Adding the watermarks was very time consuming and a
non-reversible process, so they use FLV video now which supports
layering.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *