January 22, 2007

Chasen on OSX 10.4

I found myself needing to do some Japanese morphological analysis today, which usually means either Chasen or Kabocha. Kabocha is supposed to be the new hottness, running fast, but a quick search didn't turn up any precompiled packages for it on OSX. ChaSen, on the other hand, is available in DarwinPorts, but since I went with fink, and just want to get something running, not enter into some sort of strange package-management land-war, I skipped that. It also turns out that apple is hosting an package for chasen. It install with a nice installer into /usr/local/bin/chasen.

It seems to run fine, includes the necessary dictionaries, etc., but I had a strange problem. When I tried to process a file in shift-jis encoding using the -i s flag, I would get this strange error: chasen: /usr/local/lib/chasen/dic/ipadic/cforms.cha:9-21: no basic form

That wasn't really what I wanted: I wanted parsed output. Well, since things seem to work just fine in EUC-JP encoding, you can always use iconv to convert from shift-jis to EUC-JP and pipe the resulting output to chasen: iconv -f SHIFT-JIS -t EUC-JP file.txt | chasen

That works nicely.


Comments

Provide your email address when commenting and Gravatar will provide general portable avatars, and if you haven't signed up with them, a cute procedural avatar with their implementation of Shamus Young's Wavatars.

Re: Chasen on OSX 10.4
I got an email about CaboCha. You can check out the project here. It is a Japanese Dependency Structure Analyzer.
Posted 16 years, 3 months ago by Fugu • @wwwReply

Comments have now been turned off for this post