International Conference on Asian Digital Libraries (ICADL2006) in Kyoto day 2

“A Digital Resource Harvesting Approach for Distributed
Heterogeneous Repositories”, Yang Zhao, Airon Jiang (Tsinghua
University Library, China)

They present a digital resource harvesting system that uses a few
different standards (which I don’t really know about) to get around
problems in using plain HTTP. Their messaging system is able to more
easily transfer large data payloads. They use METS (metadata encoding
and transmission standard) which is supported by some digital library
federation. They have tested their system in CALIS-ETD, a large
Chinese consortium for distributing electron thesis and dissertation.

“Parallelising Harvesting”, Hussein Suleman (University of Cape
Town, South Africa)

For two open source systems, EPrint and DSpace, they both become quite
slow with over 100,000 documents. In pure sciences (Physics,
Chemistry, etc.) they have lots of data and use high performance
computer systems to handle the large sets of data. He explains some
types of large computer systems (clusters, multi-core, grid, etc.) He
used OpenMosix open source system to create a cluster.

For harvesting there is the standard OAI Protocol for Metadata
harvesting transfers. Since that is a serial protocol, there is a
token passed around the grid and only the token owning node can
request data. The rest process data. He did some experiments on
different architectures impacts on data transfer and data indexing.
In the end, if you have more computation to do on the data than data
transfer, the cluster is a win.

Also, the OAI protocol needs to be extended to allow for parallel
access.

Session 5a: Personalization in Digital Libraries

“Personlized Information Delivering Service in Blog-like Digital
Libraries”, Jason J. Jung (Inha University, Korea; INRIA
Rhone-Alpes, France.)

In the Blogosphere there is a problem of information overload,
it is difficult to find relevant information. Another problem is the
network isolation phenomenon. The network space is generally a
personal set of links. The speaker’s BlogGrid is a method to approach
these problems. They define and collect user activities, extract user
preferences, and then make recommendations to the users. The system
follows users’ posts, their linking patterns, their navigation
patterns (Random or access to neighbors), their responses (trackbacks,
comments), and blog categories (using ODP or TopicMap taxonomies.)

When users exhibit similar response patterns, we can assume that they
exhibit the same interests. The speaker wants to find “virtual hub”,
so they use the distance between users, and hub weight which looks to
be similar to the Google algorithm (since the equation has authorities
in there too.)

They had a user evaluation where they asked students to track new
information, some using BlogGrid, some not. Most liked the
recommendations from the users, but some people did not like the
system due to having to install it on their systems and privacy
issues.

“A Personal Ontology Model for Library Recommendation System”, I-En
Liao (National Chung Hsing University, Taiwan), Shu-Chuan Liao (Asia
University, Taiwan), Kuo-Fong Kao,
Ine-Fei Harn (National Chung Hsing University)

There are two approaches: social filtering (other people who liked X
also liked Y), and content-based filtering. This might not be as
applicable to libraries because library users are not as interested in
popularity; they have a specific information need. Their paper looks
at content-based recommendation.

Their objectives are to automatically mine user interests from loan
record, re-rank keywords based on a user interest score from a
personal ontology. Their system is a web-based application. When the
user logs in, their record is analyzed, then books are recommended to
them. Their reference ontology could be Library of Congress (LCC) or
a Chinese ontology (CCL). They build a personal ontology by using the
loan record to find favorite categories from the reference ontologies.
They copy over the nodes that have a value greater than some
threshold. Then they try to find interesting keywords for the user.
They compute a kind of TF*IDF over the keywords and then fetch books
based on the value of their highest keywords (I’m surprised they
didn’t do some sort of weighting function, they just took the value of
the highest rated keyword!)

“Research and Implementation of a Personalized Recommendation System”,
Li Dong (Tsinghua University, China), Yu Nie (SINA Corporation,
China), Chunxiao Xing, Kehong Wang (Tsinghau University China)
.

They have a clustering algorithm (static that runs once, then dynamic
for interacting with the user.) Their “cluster mining” approach
allows data elements to overlap. (There are standard approaches in
clustering that allow for that as well.) They did a clustering
evaluation by looking at how many of the items in the cluster were
recommended, but I don’t know if they have a user evaluation included
here. It looks like they also did some user evaluation that looks at
how often the users took the recommendations, but I’m really not too
sure about that. They used some data from a website where they are
running their recommendation system.

Overall for the talks this morning, I feel like I should read the
papers because in general the talks were not very good. Sometimes the
presentation language (English) was a bit troublesome, other times I
just didn’t get a feel that the talk was well organized. Hussein
Suleman’s talk was nice, but he was aiming for more the librarian end
of things, and was a bit simplified.


Session 7c: Multimedia Resource Retrieval and Organization

“Text Image Spotting Using Local Crowdedness and Hausdorff Distance”,
Hwa-Jeong Son, Sang-Cheol Park, Soo-Hyung Kim, Ji-Soo Kim, Guue Sang
Lee, Deok Jai Choi (Chonnam National University, Korea)

They are looking at spotting text in an image given a query. They are
taking the query as an image, instead of as text. They are trying to
match the query image to a sub-region of the document image. They use
the Hausdorff distance to match the images, and tried two other
approaches: binary correlation and a modified Hausdorff distance.

Binary correlation just looks at an average of pixels in an
overlapping region of the those images. You can move the image around
to find the minimum distance over the entire map. (That seems quite
computationally expensive!!) They also modify Hausdorff’s approach to
make it less computationally expensive by looking at “local
crowdedness” – which looks to me like doing a blur operationand the
taking points that are heavily dark from the bleed-over from adjacent
pixels.

They build probability distributions for class given the
features, and then classify based on the probability distributions
given the features.

They test over 380 documents with 100%, 70%, 50%?, and 30%? of
the query in the document — but also half of those were used for
training. So that is a very large document to query
size I think. Their proposed method performed best.

For future research they are looking at scaling and font
variations, and would like to look at newer faster features.

“Effective Image Retrieval for the M-learning System”,
EunJung Han, AnJin Park, DongWuk Kyoung (Soongsil University,
Korea), HwangKyu Yang (Dongseo University, Korea), KeeChul Jung
(Soongsil University Korea).

They are looking at blended learning environments where physical
learning environments are augmented with virtual information on a PDA
(camera to capture marks, screen to display information.) So they
want to recognize real images instead of “pattern markers” and also to
adapt their algorithms to the low-computational resources of PDAs.

They assume that the background is white, and the object to
recognize is located in a central region. They propose some rotation
and scale invariant features. They do boundary extraction, and then
take the starting point as the closet pixel to the centroid of the
object. That takes care of rotation. There is a problem when the
closest boundary pixel is on a circle though, so they use the closest
3 pixels. They also use Dynamic Time Warping to compare images, which
was much faster (a factor of 10) than an HMM approach.

Their approach can have problems when the shapes have similar
boundaries, or when the boundaries are not extracted well. They had
87% recognition rate over 30 objects in their DB.

“Language Translation and Media Transformation in Cross-Language Image
Retrieval”,
Hsin-Hsi Chen, Yih-Chen Chang (National Taiwan University, Taiwan)
.

For cross-media information retrieval, queries and documents are in
different media. In this paper they are dealing with cross-language
image retrieval. In their format the images can be annotated in
multiple languages. They build a transmedia dictionary by looking at
tagged images, breaking it up into bits, and assigning visual features
to the image tags.

They make a kind of comparable corpus by doing first content
based retrieval, and then take the retrieved images and use their tags
to build a text-based query. They present several methods for doing
the translation at different times with different information
(relevance feedback or not, include query translation or not, merging
results in different ways, etc.) Adding the content information
improves performance over just doing content-based IR.

“A Surface Errors Locator System for Ancient Chinese Culture
Preservation”,

Yimin Yu, Duanqing Xu, Chun Chen, Yijun Yu, Lei Zhao (Zhejiang
University, China).

I think this talk was cancelled because the speaker was not here.


The evening banquet is at
The Garden Oriental
Kyoto
.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *