tagger
2020-07-15
The Xerox Part-of-Speech Tagger Version 1.2
Upstream URL
Author
License
This directory contains release 1.2 of the Xerox Part-of-Speech tagger. For more information, print the file doc/tagger/tagger.ps.
Until this project is added to the Quicklisp repository installation must be performed manually in several steps (considering that you've got Quicklisp installed already):
- Download project sources (either by cloning the repository or downloading it in an archive).
- Unpack them to some directory, remember it.
cd
to the~/quicklisp/local-projects
directory.- Create symbolic links to the
.asd
files in the directory from step 2.
Now it is possible to download the application either in parts or entirely:
(ql:quickload "tagger")
When the loading is complete, you can run some simple queries:
(tag-analysis:tag-string "I saw the man on the hill with the telescope.") I saw the man on the hill with the telescope. ppss/2 vbd/3 at nn in at nn in/2 at nn/2
(The number following the tag is the arity of the ambiguity class assigned by the lexicon. Words without a number are unambiguous.)
Programmatic Tagging
To use the tagger in a program, create a tagging-ts and use the values of calls to the generic function next-token. Note that reinitialize-instance redirects tagging to a new text with minimal initialization overhead.
For example, the following function, my-tag-files, calls my-process-token-and-tag on each token/tag pair generated by tagging each le in the argument files:
(use-package :tdb) (use-package :tag-analysis) (defun my-tag-files (files) (let ((token-stream (make-instance 'tagging-ts))) (dolist (file files) (with-open-file (char-stream file) (reinitialize-instance token-stream :char-stream char-stream) (loop (multiple-value-bind (token tag) (next-token token-stream) (unless token (return)) (my-process-token-and-tag token tag)))))))