January 22, 2007
Chasen on OSX 10.4I found myself needing to do some Japanese morphological analysis today, which usually means either Chasen or Kabocha. Kabocha is supposed to be the new hottness, running fast, but a quick search didn't turn up any precompiled packages for it on OSX. ChaSen, on the other hand, is available in DarwinPorts, but since I went with fink, and just want to get something running, not enter into some sort of strange package-management land-war, I skipped that. It also turns out that apple is hosting an package for chasen. It install with a nice installer into
/usr/local/bin/chasen. It seems to run fine, includes the necessary dictionaries, etc., but I had a strange problem. When I tried to process a file in shift-jis encoding using the
-i sflag, I would get this strange error:
chasen: /usr/local/lib/chasen/dic/ipadic/cforms.cha:9-21: no basic formThat wasn't really what I wanted: I wanted parsed output. Well, since things seem to work just fine in EUC-JP encoding, you can always use iconv to convert from shift-jis to EUC-JP and pipe the resulting output to chasen:
iconv -f SHIFT-JIS -t EUC-JP file.txt | chasenThat works nicely.
CommentsProvide your email address when commenting and Gravatar will provide general portable avatars, and if you haven't signed up with them, a cute procedural avatar with their implementation of Shamus Young's Wavatars.
Comments have now been turned off for this post