cl-conllu

API Reference

cl-conllu

Common Lisp corpus conllu utilities

CL-CONLLU

  • Class TOKEN
    ID   Accessor: TOKEN-ID
    FORM   Accessor: TOKEN-FORM
    LEMMA   Accessor: TOKEN-LEMMA
    UPOSTAG   Accessor: TOKEN-UPOSTAG
    XPOSTAG   Accessor: TOKEN-XPOSTAG
    FEATS   Accessor: TOKEN-FEATS
    HEAD   Accessor: TOKEN-HEAD
    DEPREL   Accessor: TOKEN-DEPREL
    DEPS   Accessor: TOKEN-DEPS
    MISC   Accessor: TOKEN-MISC
  • Class MTOKEN
    START   Accessor: MTOKEN-START
    END   Accessor: MTOKEN-END
    FORM   Accessor: MTOKEN-FORM
    MISC   Accessor: MTOKEN-MISC
  • Class SENTENCE
    START   Accessor: SENTENCE-START
    META   Accessor: SENTENCE-META
    TOKENS   Accessor: SENTENCE-TOKENS
    MTOKENS   Accessor: SENTENCE-MTOKENS
  • Function SENTENCE-BINARY-TREE (sentence)
    Based on the idea from [1], it produces a tree view of the sentence, still need to improve the priorities of children. Code at https://github.com/sivareddyg/UDepLambda in file src/deplambda/parser/TreeTransformer.java method 'binarizeTree' [1] Siva Reddy, O. Tackstrom, M. Collins, T. Kwiatkowski, D. Das, M. Steedman, and M. Lapataw, Transforming Dependency Structures to Logical Forms for Semantic Parsing, Transactions of the Association for Computational Linguistics, pp. 127?140, Apr. 2016.
  • Function SENTENCE-HASH-TABLE (sentence)
  • Function SENTENCE-META-VALUE (sentence meta-field)
  • Function SENTENCE-ID (sentence)
  • Function SENTENCE-TEXT (sentence)
  • Function SENTENCE->TEXT (sentence &key (ignore-mtokens nil) (special-format-test #'null special-format-test-supplied-p) (special-format-function #'identity special-format-function-supplied-p))
    Receives SENTENCE, a sentence object, and returns a string reconstructed from its tokens and mtokens. If IGNORE-MTOKENS, then tokens' forms are used. Else, tokens with id contained in a mtoken are not used, with mtoken's form being used instead. It is possible to special format some tokens. In order to do so, both SPECIAL-FORMAT-TEST and SPECIAL-FORMAT-FUNCTION should be passed. Then for each object (token or mtoken) for which SPECIAL-FORMAT-TEST returns a non-nil result, its form is modified by SPECIAL-FORMAT-FUNCTION in the final string. Example: (sentence-tokens *sentence*) => (#<TOKEN The/DET #1-det-3> #<TOKEN US/PROPN #2-compound-3> #<TOKEN troops/NOUN #3-nsubj-4> #<TOKEN fired/VERB #4-root-0> #<TOKEN into/ADP #5-case-8> #<TOKEN the/DET #6-det-8> #<TOKEN hostile/ADJ #7-amod-8> #<TOKEN crowd/NOUN #8-obl-4> #<TOKEN ,/PUNCT #9-punct-4> #<TOKEN killing/VERB #10-advcl-4> #<TOKEN 4/NUM #11-obj-10> #<TOKEN ./PUNCT #12-punct-4>) (sentence->text sentence) => "The US troops fired into the hostile crowd, killing 4." (sentence->text sentence :special-format-test #'(lambda (token) (eq (token-upostag token) "VERB")) :special-format-function (lambda (string) (format nil "*~a*" (string-upcase string)))) => "The US troops *FIRED* into the hostile crowd, *KILLING* 4."
  • Function SENTENCE-VALID? (sentence)
  • Function ADJUST-SENTENCE (sentence)
    Receives a sentence and reenumerate IDs and HEAD values of each token so that their order (as in sentence-tokens) is respected.
  • Function SENTENCE-EQUAL (sent-1 sent-2)
    Tests if, for each slot, sent-1 has the same values as sent-2. For tokens and multiword tokens, it uses token-equal and mtoken-equal, respectively.
  • Function MAKE-SENTENCE (lineno lines fn-meta)
  • Function READ-CONLLU (input &key (fn-meta #'collect-meta))
  • Function READ-DIRECTORY (path &key (fn-meta #'collect-meta))
  • Function READ-FILE (path &key (fn-meta #'collect-meta))
  • Function READ-STREAM (stream &key (fn-meta #'collect-meta))
  • Function WRITE-CONLLU-TO-STREAM (sentences &optional (out *standard-output*))
  • Function WRITE-CONLLU (sentences filename &key (if-exists :supersede))
  • Function QUERY (query sentences)
  • Function QUERY-AS-JSON (a-query sentences)
  • Function LEVENSHTEIN (s1 s2 &key test)
  • Function DIFF (sentences-a sentences-b &key test key)
  • Function NON-PROJECTIVE? (sentence)
    Verifies if a sentence tree is projective. Intuitively, this means that, keeping word order, there's no two dependency arcs that cross. More formally, let i -> j mean that j's head is node i. Let '->*' be the transitive closure of '->'. A tree if projective when, for each node i, j: if i -> j, then for each node k between i and j (i < k < j or j < k < i), i ->* k. References: - Nivre, Joakim; Inductive Dependency Parsing, 2006 - https://en.wikipedia.org/wiki/Discontinuity_(linguistics)
  • Function CONVERT-RDF (corpusname stream conlls text-fn id-fn)
    Converts the collection of sentences (as generated by READ-CONLLU) in CONLL, using the function TEXT-FN to extract the text of each sentence and ID-FN to extract the id of each sentence (we need this as there is no standardized way of knowing this.) Also the generated Turtle file contains a lot of duplication so when you import it into your triple-store, make sure you remove all duplicate triples afterwards.
  • Function CONVERT-RDF-FILE (file-in file-out)
  • Function CONVERT-TO-RDF (sentences &key (text-fn #'sentence-text) (id-fn #'sentence-id) (corpusname "my-corpus") (namespace-string "http://www.example.org/") (stream *standard-output*) (rdf-format :ntriples) (conll-namespace "http://br.ibm.com/conll/"))
    Converts a list of sentences (e.g. as generated by READ-CONLLU) in SENTENCES, using the function TEXT-FN to extract the text of each sentence and ID-FN to extract the id of each sentence (we need this as there is no standardized way of knowing this.) Currently only ntriples is supported as RDF-FORMAT.
  • Function APPLY-RULES-FROM-FILES (conllu-file rules-file new-conllu-file log-file &key recursive)
  • Function APPLY-RULES (sentences rules recursive)

CONLLU.PROLOG

  • Function CONVERT-FILENAME (context filename-in filename-out)

CONLLU.RDF

No exported symbols.

Also exports

  • CL-CONLLU:CONVERT-TO-RDF

CONLLU.CONVERTERS.NICELINE

No exported symbols.

CONLLU.CONVERTERS.TAGS

  • Function WRITE-SENTENCE-TAG-SUFFIX-TO-STREAM (sentence &key (stream *standard-output*) (tag 'upostag) (separator "_"))
    Writes sentence as CoNLL-U file in STREAM as FORM.SEPARATOR.TAGVALUE (without dots), followed by a whitespace character. If TAG is NIL, then writes only FORMs, followed by a whitepsace character. Example: ;; supposing sentence already defined (write-sentence-tag-suffix-to-stream (sentence :tag 'xpostag :separator "_")) Pierre_NNP Vinken_NNP ,_, 61_CD years_NNS old_JJ ,_, will_MD join_VB the_DT board_NN as_IN a_DT nonexecutive_JJ director_NN Nov._NNP 29_CD ._. => NIL
  • Function WRITE-SENTENCES-TAG-SUFFIX-TO-STREAM (sentences &key (stream *standard-output*) (tag 'upostag) (separator "_"))
    See documentation for write-sentence-tag-suffix-to-stream
  • Function WRITE-SENTENCES-TAG-SUFFIX (sentences filename &key (tag 'upostag) (separator "_") (if-exists :supersede))
    See documentation for write-sentence-tag-suffix-to-stream
  • Function READ-SENTENCE-TAG-SUFFIX (stream field separator)
    Writes as sentence object input from STREAM as FORM.SEPARATOR.TAGVALUE (without dots), followed by a whitespace character. Example: ;; Consider the file example.txt, with contents: ;; Pudim_NOUN ?_VERB bom_ADJ ._PUNCT ;; E_CONJ torta_NOUN tamb?m_ADV ._PUNCT (with-open-file (s "./example.txt") (write-conllu-to-stream (read-sentence-tag-suffix s 'upostag "_"))) 1 Pudim _ NOUN _ _ _ _ _ _ 2 ? _ VERB _ _ _ _ _ _ 3 bom _ ADJ _ _ _ _ _ _ 4 . _ PUNCT _ _ _ _ _ _ 1 E _ CONJ _ _ _ _ _ _ 2 torta _ NOUN _ _ _ _ _ _ 3 tamb?m _ ADV _ _ _ _ _ _ 4 . _ PUNCT _ _ _ _ _ _
  • Function READ-FILE-TAG-SUFFIX (filename &key (tag 'upostag) (separator "_"))

CONLLU.DRAW

  • Function TREE-SENTENCE (sentence &key (stream *standard-output*) show-meta)