tagger

2020-07-15

The Xerox Part-of-Speech Tagger Version 1.2

Upstream URL

github.com/g000001/tagger

Author

Doug Cutting and Jan Pedersen of the Xerox Palo Alto Research Center

License

Use, reproduction, and distribution of this software is permitted, but only for non-commercial research or educational purposes. see the tagger/COPYRIGHT file for more information.
README

This directory contains release 1.2 of the Xerox Part-of-Speech tagger. For more information, print the file doc/tagger/tagger.ps.


Until this project is added to the Quicklisp repository installation must be performed manually in several steps (considering that you've got Quicklisp installed already):

  1. Download project sources (either by cloning the repository or downloading it in an archive).
  2. Unpack them to some directory, remember it.
  3. cd to the ~/quicklisp/local-projects directory.
  4. Create symbolic links to the .asd files in the directory from step 2.

Now it is possible to download the application either in parts or entirely:

(ql:quickload "tagger")

When the loading is complete, you can run some simple queries:

(tag-analysis:tag-string "I saw the man on the hill with the telescope.")

I saw the man on the hill with the telescope.
ppss/2 vbd/3 at nn in at nn in/2 at nn/2

(The number following the tag is the arity of the ambiguity class assigned by the lexicon. Words without a number are unambiguous.)

Programmatic Tagging

To use the tagger in a program, create a tagging-ts and use the values of calls to the generic function next-token. Note that reinitialize-instance redirects tagging to a new text with minimal initialization overhead.

For example, the following function, my-tag-files, calls my-process-token-and-tag on each token/tag pair generated by tagging each le in the argument files:

(use-package :tdb)
(use-package :tag-analysis)
(defun my-tag-files (files)
  (let ((token-stream (make-instance 'tagging-ts)))
    (dolist (file files)
      (with-open-file (char-stream file)
	(reinitialize-instance token-stream :char-stream char-stream)
	(loop (multiple-value-bind (token tag)
		  (next-token token-stream)
		(unless token (return))
		(my-process-token-and-tag token tag)))))))

Dependencies (1)

  • closer-mop

Dependents (0)

    • GitHub
    • Quicklisp