satomacoto: Webstemmer使い方抜粋

November 8, 2009

Webstemmer使い方抜粋

学習用ページの取得

textcrawler.py -o nikkei http://www.nikkei.co.jp/
学習

analyze.py nikkei.2009xxxxxxxx.zip > nikkei.pat
抽出用ページの取得

textcrawler.py -o nikkei http://www.nikkei.co.jp/
抽出

extract.py -Ceuc-jp nikkei.pat nikkei.2009yyyyyyyy.zip > nikkei.txt

Webstemmer http://www.unixuser.org/~euske/python/webstemmer/index-j.html

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)