The Xerox Part-of-Speech Tagger Version 1.2

Upstream URL



Doug Cutting and Jan Pedersen of the Xerox Palo Alto Research Center


Use, reproduction, and distribution of this software is permitted, but only for non-commercial research or educational purposes. see the tagger/COPYRIGHT file for more information.

This directory contains release 1.2 of the Xerox Part-of-Speech tagger. For more information, print the file doc/tagger/tagger.ps.

Until this project is added to the Quicklisp repository installation must be performed manually in several steps (considering that you've got Quicklisp installed already):

  1. Download project sources (either by cloning the repository or downloading it in an archive).
  2. Unpack them to some directory, remember it.
  3. cd to the ~/quicklisp/local-projects directory.
  4. Create symbolic links to the .asd files in the directory from step 2.

Now it is possible to download the application either in parts or entirely:

(ql:quickload "tagger")

When the loading is complete, you can run some simple queries:

(tag-analysis:tag-string "I saw the man on the hill with the telescope.")

I saw the man on the hill with the telescope.
ppss/2 vbd/3 at nn in at nn in/2 at nn/2

(The number following the tag is the arity of the ambiguity class assigned by the lexicon. Words without a number are unambiguous.)

Programmatic Tagging

To use the tagger in a program, create a tagging-ts and use the values of calls to the generic function next-token. Note that reinitialize-instance redirects tagging to a new text with minimal initialization overhead.

For example, the following function, my-tag-files, calls my-process-token-and-tag on each token/tag pair generated by tagging each le in the argument files:

(use-package :tdb)
(use-package :tag-analysis)
(defun my-tag-files (files)
  (let ((token-stream (make-instance 'tagging-ts)))
    (dolist (file files)
      (with-open-file (char-stream file)
	(reinitialize-instance token-stream :char-stream char-stream)
	(loop (multiple-value-bind (token tag)
		  (next-token token-stream)
		(unless token (return))
		(my-process-token-and-tag token tag)))))))

Dependencies (1)

  • closer-mop

Dependents (0)

    • GitHub
    • Quicklisp