May 17, 2008

The Future of News

I do research on automatic opinion identification, and one thing that is really interesting to the community right now is analyzing blog data. Most of the available tagged resources are over newspaper data or movie reviews or other kinds of collectable TEXT - user star rating type things (restaurant reviews, product reviews, etc.)

The community is very interested in moving to blog data, where ostensibly there would be more and varied opinions available to analyze, but there isn't too much data available for that yet. (But see the TREC Blog track work where there is an opinionated blog search task.)

All this interest in blogs and user generated media seems to have had an impact on "traditional" print media. Recently, there was a workshop on the Future of News. It was held at Princeton University, near my old stomping grounds. It would have been nice to go, but thanks to the (news-media destroying!) blogs, I've been able to at least get a brief impression of what was discussed.

Matthew Hurst's great Data Mining blog has two posts with pointers to some summaries from the workshop. Fun stuff.

I'm firmly of the opinion that traditional news media will be around for a long time. Blogs do have some role to play in modern news dissemination, but not a large enough role to displace focused organizations that can fund people to do research and have a vested interest in vetting information. It isn't clear to me that the newswires do as much of this as they should, but the traditional media certainly will play a role in choosing what news to elevate to the national level.

At some point, most blogs are really locally focused, and I don't see how any of the personally-run small sites (like, say, my blog) could ever hope to break interesting news more than once in a lifetime. Also, I like writing about what I ate for dinner. That isn't news. :)

February 25, 2008

People that are not me

While reading the program for the 2008 International Conference on Weblogs and Social Media (it looks like an interesting conference, I wish I was going!) I was surprised to see that I am going, and apparently giving a presentation!

Oh wait, no, that is not me. That is another David Evans. A quick search pulled up this page on Psychster LLC, that has information about David C. Evans, a Ph.D (like me!) in Psychology (not like me!) and has worked at Microsoft. Whew, that was close!

Since it looks like he is doing work on sentiment analysis on blogs, we might actually run into each other some time.

There is another famous person that shares my name, David A. Evans, who is the CEO and Chief Scientist at Clairvoyance. He is very well known, and a really nice guy. I met him at the New Directions in Multilingual Information Access workshop a while back. His invited talk was very interesting.

I think it is very strange that I feel compelled to make a list of people that I am not, but I was really surprised to come across David Evans at a conference that I'm interested in. For a few seconds there, I actually thought "wait, did I write a paper, get it accepted, and then completely forget about it?"

June 25, 2007

Installing the Perl Technorati API implementation WebService::Technorati on OSX via CPAN

This will be yet another entertaining dive into installing software on OSX. For today's task, I want to install the Perl WebService::Technorati API interface to the Technorati blog search / aggregation site. Usually, I do something like $ perl -MCPAN -e shell to get a CPAN shell, and then install WebService::Technorati and hit "yes" when asked about following references. This time, things failed because one of the requirements, XML::Parser, needs to have the XML parser Expat installed. I do have Expat installed - twice even, once from the Apple X11 extra install stuff, and once via the OSX packaging project fink - but CPAN couldn't pick either of those up since they aren't in the most obvious of places.

So it looks like I'll need to install XML::Parser myself. Since CPAN went to all the trouble to download the files that I need to do the install, I cd into the proper directory (have to spawn a root shell first since I'm installing in the system directories) cd ~/.cpan/build/XML-Parser-2.34-uwBcpV, then create a Makefile that actually points to the correct install: perl Makefile.PL EXPATLIBPATH=/sw/lib EXPATINCPATH=/sw/include, and then the magic incantation: make; make install. Since all that looked like it went well, I'll drop back into user mode, sudo perl -MCPAN -e shell and re-try install WebService::Technorati.

That installed some XPATH tools, and then failed spectacularly with a missing LWP/UserAgent.pm, which is something I should probably have installed anyway. Installing LWP::UserAgent failed with a missing HTML::Tagset, which installed easily (isn't CPAN supposed to chase down these dependencies for me? Usually it does, but today CPAN is really having trouble. It must be because of the rain.) The subsequent install of LWP::UserAgent went well. A final install WebService::Technorati completed fine as well.

So, a quick post on what I had to do to get that installed. Mainly, I needed to manually run the XML::Parser install process myself so I could create a Makefile that pointed to the existing install that I had put in via fink. Then I had to chase down some other CPAN modules that were necessary. Not to bad all told.

Just to be cautious, I tried a few things to test the install. Things were working just great. Of course, after about an hour of hacking away at some code, it looks like there are some problems with the WebService::Technorati Perl API: the SearchApiQuery does a cosmos query instead of a blog search query, but since I've got the .pm files, we can fix that easily enough...

March 22, 2007

Notes from Thursday 2007-03-22 Natural Language Processing Meeting in Japan

テーマセッション1 (2): 教育を支援する言語学・言語処理

Theme Session 1 (2): Linguistics and Language Processing in Support of Education
  • S2-1 英語例文オーサリングのための可算性決定プロセスの可視化
    ○永田亮 (兵庫教育大), 河合敦夫 (三重大), 森広浩一郎 (兵庫教育大), 井須尚紀 (三重大)
  • S2-2 統計的自動翻訳に基づく日本人学習者の英文訳質分析
    ○鍔木元 (早大), 安田圭志, 匂坂芳典 (NICT/ATR)
  • S2-3 日本語読解支援のための語義毎の用例抽出機能について
    ○小林朋幸, 大山浩美, 坂田浩亮, 谷口雄作, 太田ふみ, Noah Evans, 浅原正幸, 松本裕治 (NAIST)
  • S2-4 外国人が作成した日本語文書に対する自動校正技術
    ○祖国威, 加納敏行 (東芝ソリューション)
  • S2-5 コーパスを用いた言語習得度の推定
    ○坂田浩亮, 新保仁, 松本裕治 (NAIST)
  • S2-6 日本語学習者作文支援のための機械学習による日本語格助詞の正誤判定
    ○大山浩美 (NAIST)
  • S2-7 Dynamic situation based sentence generation used in creating questions for students of Japanese
    ○Christopher Waple, Yasushi Tsubota, Masatake Dantsuji, 河原達也 (京大)
  • S2-8 漢字の読み誤りの自動生成における候補生成能力の評価
    ○Bora Savas, 林良彦 (阪大)
read more (1567 words)

March 21, 2007

Notes from Wednesday 2007-03-21 Natural Language Processing Meeting in Japan

Information Extraction, Text Minig

Information Extraction / Text Mining room is almost completely full.
  • D3-3 小説テキストを対象とした人物情報の抽出と体系化 ("Extraction and organization of character information from short stories")
    ○馬場こづえ, 藤井敦 (筑波大)
  • D3-4 統計的手法を利用した伝染病検索システムの構築に向けて ("Towards construction of a statistical search system for infectious diseases")
    ○竹内孔一, 岡田和也 (岡山大), 川添愛, コリアー・ナイジェル (NII)
  • D3-5 米国特許データベースからの引用文献情報の抽出 ("Extracting literature references from Western Patent Databases")
    ○小栗佑実子, 難波英嗣 (広島市立大)
  • D3-6 開発プロジェクトリスク管理のための議事録発言の分析 ("Analysis of spoken meeting records for development project management")
    ○齋藤悠, 立石健二, 久寿居大 (NEC)
  • D3-7 コールセンターにおける会話マイニング ("Call Center Conversation Mining")
    ○那須川哲哉, 宅間大介, 竹内広宜, 荻野紫穂 (日本IBM)
  • D3-8 意見性判定手法の評価と精度向上 ("Improvement in precision of opinionated text identification")
    ○高橋大和, 廣嶋伸章, 古瀬蔵, 片岡良治 (NTT)
  • D3-9 言語情報と映像情報の統合による作業教示映像の自動要約 ("Automatic summarization of pictures used for teaching by unifying text and image information")
    ○柴田知秀 (東大), 黒橋禎夫 (京大)
read more (1218 words)

March 20, 2007

Notes from Tuesday 2007-03-20 Natural Language Processing Meeting in Japan

I attended the (Japanese) Natural Language Processing meeting in Ryukoku University from the 20th until the 23rd. I've taken some notes on the sessions that I attended.

Session B1: Meaning Analysis

意味分析 Session chair is Utusrou Takehito 宇津呂武仁 from Tkuba Daigaku.
  • B1-1 構文解析を補助的に用いる意味解析
    ○船越孝太郎, 中野幹生, 長谷川雄二, 辻野広司 (HRI-JP)
  • B1-2 結合価パターン辞書からの情緒を明示する用言の知識ベース化
    ○黒住亜紀子, 徳久雅人, 村上仁一, 池原悟 (鳥取大)
  • B1-3 SYNGRAPHデータ構造における述語項構造の柔軟マッチング
    ○小谷通隆 (京大), 中澤敏明, 柴田知秀 (東大), 黒橋禎夫 (京大)
  • C1-5 科学技術文献を対象とする日中機械翻訳システム開発プロジェクト
    ○井佐原均 (NICT), 黒橋禎夫 (京大), 辻井潤一 (東大), 内元清貴 (NICT), 中川裕志 (東大), 梶博行 (静岡大), 中村徹 (JST)
  • C1-6 ハイブリッド翻訳のためのフレーズアラインメント
    ○潮田明 (富士通研)
  • C1-7 部分目標の達成度に基づく機械翻訳自動評価 - 部分目標の自動生成 -
    ○内元清貴, 小谷克則, 張玉潔, 井佐原均 (NICT)
  • C1-8 Translation quality prediction using multiple automatic evaluation metrics
    ○Paul Michael, Andrew Finch, 隅田英一郎 (NICT/ATR)
read more (1053 words)

January 22, 2007

Chasen on OSX 10.4

I found myself needing to do some Japanese morphological analysis today, which usually means either Chasen or Kabocha. Kabocha is supposed to be the new hottness, running fast, but a quick search didn't turn up any precompiled packages for it on OSX. ChaSen, on the other hand, is available in DarwinPorts, but since I went with fink, and just want to get something running, not enter into some sort of strange package-management land-war, I skipped that. It also turns out that apple is hosting an package for chasen. It install with a nice installer into /usr/local/bin/chasen.

It seems to run fine, includes the necessary dictionaries, etc., but I had a strange problem. When I tried to process a file in shift-jis encoding using the -i s flag, I would get this strange error: chasen: /usr/local/lib/chasen/dic/ipadic/cforms.cha:9-21: no basic form

That wasn't really what I wanted: I wanted parsed output. Well, since things seem to work just fine in EUC-JP encoding, you can always use iconv to convert from shift-jis to EUC-JP and pipe the resulting output to chasen: iconv -f SHIFT-JIS -t EUC-JP file.txt | chasen

That works nicely.

November 30, 2006

November 29, 2006

November 28, 2006

November 27, 2006

September 14, 2006

IPSJ in Shinjuku Day two

Wednesday was the final day of the IPSJ meeting. I've got more comments on the papers that I saw that day below.

read more (773 words)

September 13, 2006

Information Processing Society of Japan Natural Language Meeting

Notes from the 2006-09-12 to 13 Information Processing Society of Japan meeting. The Information Processing Society of Japan Special Interest Group on Natural Language Processing holds bi-monthly meetings all around Japan. Two months ago, the meeting was in Hakodate. This time, the meeting was in Shinjuku, very close to where I live, so I decided it would be a good chance to attend and see what research is going on in the field in Japan.

It was really interesting. All but two of the presentations were in Japanese, which was a very nice chance to get up to speed on technical Japanese, and to see how presentations here go. It was pretty tiring too though. I also had a chance to talk with some of the member of the 情報爆発世界ニュース group that I'm involved with.

If you are interested in reading some very surface comments about the papers that I saw on Tuesday's session, click to read more...

read more (1073 words)

August 11, 2006

August 9, 2006

2006-08-09 SIGIR Notes

Wednesday's keynote:

Information Access in the Extended Boeing Enterprise
Radha Radhakrishnan

Overview of Boeing's information technology and information distribution structure.

read more (1146 words)

August 8, 2006

2006-08-08 SIGIR notes

Keynote:
Social Networks, Incentives, and Search
Jon Kleinberg, Cornell University

An introduction to social networks, and some parallels to information retrieval.

read more (1507 words)

August 7, 2006

2006-08-07 SIGIR in Seattle Notes

Keynote talk is by Keith van Rijsbergen (recipient of the Salton
award, the highest SIGIR honor.)

Talk entitled "Quantum Haystacks", and is more on the fun side of things according to him. Early work has been on clustering, and went over other areas he has worked in as well.

read more (1099 words)

July 22, 2006

Notes from Friday 2006-07-21 COLING/ACL conference

Notes from Friday's sessions at the last day of COLING/ACL.


read more (3229 words)

July 20, 2006

Notes from Thursday 2006-07-20 COLING/ACL conference

Rough notes from Thursday's presentations at COLING/ACL.

read more (1902 words)

July 19, 2006

Notes from Tuesday 2007-07-18 COLING/ACL 2006 session

2006-07-18 Invited Keynote Tuesday morning
Argmax Search in Natural Language Processing
Daniel Marcu

read more (1617 words)

Go to Page: 1 2