Chasen on OSX 10.4

I found myself needing to do some Japanese morphological analysis today, which usually means either Chasen or Kabocha. Kabocha is supposed to be the new hottness, running fast, but a quick search didn’t turn up any precompiled packages for it on OSX. ChaSen, on the other hand, is available in DarwinPorts, but since I went with fink, and just want to get something running, not enter into some sort of strange package-management land-war, I skipped that. It also turns out that apple is hosting an package for chasen. It install with a nice installer into /usr/local/bin/chasen.

It seems to run fine, includes the necessary dictionaries, etc., but I had a strange problem. When I tried to process a file in shift-jis encoding using the -i s flag, I would get this strange error: chasen: /usr/local/lib/chasen/dic/ipadic/cforms.cha:9-21: no basic form

That wasn’t really what I wanted: I wanted parsed output. Well, since things seem to work just fine in EUC-JP encoding, you can always use iconv to convert from shift-jis to EUC-JP and pipe the resulting output to chasen:
iconv -f SHIFT-JIS -t EUC-JP file.txt | chasen

That works nicely.


Posted

in

, ,

by

Tags:

Comments

One response to “Chasen on OSX 10.4”

Leave a Reply

Your email address will not be published. Required fields are marked *