Keynote talk is by Keith van Rijsbergen (recipient of the Salton
award, the highest SIGIR honor.)
Talk entitled “Quantum Haystacks”, and is more on the fun side of things according to him. Early work has been on clustering, and went over other areas he has worked in as well.
—
Session I: Speech and Music
Spoken Document Retrieval from Call Center Conversations
Jonathan Mamou, David Carmel, Ron Hoory
IBM Haifa Research Labs
The idea here is to provide support tools for call center operators. They have a Automatic Speech Recognition (ASR) system that listens in to the phone call and transcribes the speech to text. They then use that to query a database of previous problems and solutions.
Using ASR with real call center data is tougher though, and results are not as good as with broadcast news. He gives a brief introduction of ASR, word lattices, etc.
They index the calls with the posterior probability of the words from the ASR, and use more than just the 1-best path through the word lattice, but rank based on the ASR recognition probability. They modified TF*IDF to use the term probabilities.
They use a dataset of about 2200 calls from 2001 to IBM, and have indexes over automatic and manual transcripts. They make different levels of word error rate for the system by giving it different portions of the data for training (at best is 27% WER using the full 2200 calls for training – I don’t know if they use n-fold cross validation or not.)
There is a large precision drop when using all paths through the word lattice, but recall improves. They experiment with using confidence levels or regular TF*IDF, one best path or all paths, and whether to ues their custom boosting using term rank or not. Using the boosting with all hypotheses is the best results.
Questions: What is actually being retrieved? The entire conversation, but with a timestamp into the part of the conversation that was matched.
What is the training and test set? Why do you train on a small portion and test on a large portion (that is the reverse of what is usually done!) The answer is that the point is to create a bad word error rate, so this is what they did.
Question by Doug Oard: Using the weights in cross language for boosting works as well and there is a talk this afternoon. The question is what extent is word error rate the correct thing to compare to? Maybe we should look at the distribution of the word error rate.
Question from Yang Yu from University of Texas (I met and spoke with her on the way to the banquet): How do you create the queries for this task? There is a manually created set of fixed queries.
—
The second talk had not speakers here, so it was skipped in order to keep the parallel sessions still in parallel.
—
Music Structure Analysis and a Vector Space Modeling Approach for content indexing and retrieval
Namunu Maddage, Haizhou Li (Institute for Infocomm Research, Singapore)
Mohan Kankanhalli, National University of Singapore
They want to extract music structure information
– timing information
– harmony
– music region information
They extract timing and harmony, and whether a section is instrumental, or instrumental plus vocal.
So I didn’t really understand some of the features that they are using to do note extraction, but I have not really studied signals much so maybe that isn’t too surprising. I should study music theory a bit more though, since I am trying to learn to play bass guitar.
They do searching with vector space bigram model over short song clips as queries.
Question: Will you submit to MIREX? (Some sort of music retrieval evaluation.) Some information about this conference.
—
Session II: Fusion and Spam
Online Spam Filtering Fusion
Thomas Lynam, Gordon Cormack
University of Waterloo
Filter must classify all email in realtime, can use any source available. The filter will be given user feedback as corrections by users, quantity and quality vary greatly by user. The filter must be trained over time.
Talked about TREC spam evaluation framework. They took 8 open source spam filters and ran tests over them, and did some fusion runs to merge the filters somehow. They decided to submit this pilot system to TREC 2005, 53 filters were submitted to TREC.
Here are the methods they used for fusion:
– Best System (baseline)
– Voting
– SumScore
– Log-odds Averaging
– SVM
– Logistic Regression
Logistic regression and SVM did the best. They then looked at how could they get by with less than 53 filters. 2, 3, 4, 8, 16, …
—
Building Bridges for Web Query Classification
Dou Shen, Qiang Yang (Hong Kong University of Science and Technology)
Jian-Tao Sun (Microsoft)
Zheng Chen (Microsoft Research Asia)
They are using DMOZ to classify stuff. I had a really hard time following along because the room is slightly warm. I’m really, really tired and this heat makes me feel a little ill.
—
ProbFuse: A Proobabilistic Approach to Data Fusion
David Lillis, Fergus Toolan, Rem Collier, J. Dunnion
University College of Dublin
Fuse in this context is fusion again, where they combine the results of multiple systems into a hybrid system that has better results. The terminology is a bit different from what I am used to.
This paper started out pretty good, but in the end I had a tough time following it again. I didn’t understand what they were breaking up into segments – search results of some sort I’m sure, but I didn’t understand why they would want to do that. Hopefully I’ll have more luck at the next session.
—
Session III: Cross Language
A Study of Statistical Models for Query Translation: Finding a Good Unit of Translation
Jiafeng Gao, Microsoft
Jian-Yun Nie, University of Montreal
What size of tunits should be used to translate the query?
looks at Noun Phrase as the translation unit. Do search queries really have noun phrases though? 80+% of English NPs translate into Chinese.
—
Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval
Jianqian Wang, Douglas Oard
University of Maryland
Leave a Reply