Word Dependency Annotation Standard

(ver.0.1.1 by Shinsuke Mori; 2012/05/24)

Outline

  1. Word: Words are the short unit defined by the National Institute for Japanese Language and Linguistics, except that we split the conjugative suffixes from their stems and treat them as separate words.
    Reference (Japanese): 『現代日本語書き言葉均衡コーパス』形態論情報規程集改定版

  2. Dependency: Modification relationships between words in both directions are called dependencies. The modifier is the dependent, and the modified word is the head. This relationship is expressed with "->". For example, the dependency tree for the sentence 「リンゴを食べる。」 can be expressed as follows.

    リンゴ -> を -> 食べ -> る -> 。

    Dependencies between words which don't follow one another are expressed using zenkaku parentheses. As shown in the example below, the "->" before the opening parenthesis indicates a dependency to the last element inside the parentheses.

    リンゴ -> を -> ( 今日 -> 食べ ) -> る -> 。

  3. It is possible for dependencies to cross (as in the example below). In this case, it is not possible to draw them using "->". However, the tool used for annotation can handle these kind of dependencies, so we may safely annotate sentences with dependencies that cross.

    ウナギ を 浜松 に 食べ に 行 く
     o ウナギ を -> 食べ
     o 浜松 に -> 行

Dependency Standard

[General] (G)

  1. Nouns and Case Particles
    nouns depend on case particles

    リンゴ -> を
    学校 -> に -> も

  2. Conjugated Words and Conjugations
    conjugated words depend on their conjugations
    inflected word -> conjugation

    食べ -> る
    食べ -> れ -> ば

  3. Predicate Arguments (Noun -> Case Particle) and Predicates (Verbs, Adjectives, Adjectivial Verbs)
    case particles depend on predicates

    私 -> は -> ( リンゴ -> を -> 食べ ) -> る

  4. Copula (Assertive Auxiliary Verbs "だ" and "です", English "A is B.")
    each predicate argument depends on the word before the assertive auxiliary verb

    相手 -> は -> だれ -> だ
    これ -> が -> りんご -> で -> す

  5. Words that Modify an Entire Sentence (Conjunctions, Adverbs)
    words, like conjunctions and adverbs, that modify entire sentences depend on the root of the sentence's predicate (or a noun in the case of the copula)
    nouns that express time also depend on the predicate

    もちろん -> ( リンゴ -> も -> 食べ ) -> る
    しかし -> ( これ -> は -> ダメ ) -> だ
    今日 -> ( 学校 -> へ -> 行 ) -> く

    Treat conjunctions and adverbs separately.

    For conjunctions, annotate the largest possible dependency range.

  6. Compound Words (Compound Nouns, Compound Verbs)
    annotate dependencies to reflect the structure
    structure is defined in (Japanese) 『現代日本語書き言葉均衡コーパス』形態論情報規程集改定版

    京都 -> 大学 -> ( 工学 -> 研究 -> 科 )
    引っ越 -> し -> て -> 来 -> た

    elements that depend on a compound noun depend on its last word if the element modifies the entire compound
    however, note that elements may also depend on single words (other than the last) within a compound noun

    季節 -> の -> ( 挨拶 -> 状 )
    青山 -> 通り -> へ -> の -> 接道 -> 幅

    for patent text, treat the number of a structural element as part of a compound word

    掛け渡 -> さ -> れ -> た -> (スプリング -> 70)

[Parallel Structures] (P)

  1. Noun Phrases
    the element directly preceding the parallel structure depends on the parallel structure marker (such as "と"), and the marker depends on the element directly following it

    本 -> と -> 鉛筆 -> を

    when an element in a parallel structure is a sequence of words, the marker depends on the head of the sequence directly following the marker

    4 -> 時 -> と -> ( 5 -> 時 ) -> の

    however, when the element directly following the marker also has a marker, annotate as indicated below

    本 -> と -> ( 鉛筆 -> と ) -> を

    for parallel structures using commas, annotate as follows

    本 -> 、 -> 鉛筆 -> を

    in parallel structures with 3 or more elements, each element depends on the next

    本 -> 、 -> ( 鉛筆 -> 、 ) -> ボールペン -> を

    when words modify part of the parallel structure, group them together and set them to depend on the last element

    細い -> ( 鉛筆 -> と -> ボールペン ) と -> 本 -> を

  2. Modification of Parallel Noun Phrases
    words that modify all elements of a parallel noun phrase depend on the last element of the entire phrase

    現地 -> メディア -> へ -> の -> ( 投稿 -> ・ -> 出演 ) -> を
    現地 -> メディア -> へ -> の -> ( 投稿 -> ・ -> 出演 -> 等 ) -> を

    words that modify a single element of a parallel noun phrase depend on the last word of that element

  3. Parallel Predicates
    (the last word of) the predicate portion of an element depends on the stem of the next predicate
    ※ when the copula or and inflected word has more than one predicate, the dependency goes to the closest one

    首相 -> は -> ( 候補 -> 地 -> を -> 詰め ) -> 、 -> ( 九六 -> 年 -> に -> 決定 ) -> する
    これ -> が -> 本 -> で -> 、 -> ( あれ -> が -> ノート ) -> だ

  4. Modification of Predicates
    when a case element could depend on two or more predicates, set it to depend on the stem word of the first predicate

    隣 -> に -> 引っ越 -> し -> て -> 来 -> た
    私 -> は -> ( 本 -> を -> 買 ) -> っ -> て -> 読ん -> だ

    case elements of compound adjectives depend on the first element of the compound adjective

    彼 -> は-> 飽き -> っぽ -> い
    彼女 -> は -> ( 健康 -> 的 ) -> だ

    ※ inconsistency with the rule for handling noun phrases
    differences become apparent when an element of a parallel structures has an additional dependency on another word

    私 -> が -> 買 -> っ -> て -> 読ん -> だ -> 本

  5. Parallel Sentences
    when two sentences appear on the same line, the punctuation mark in the first one depends on the punctuation mark of the last one

    私 -> は -> バカ -> だ -> 。 -> ( 彼 -> も -> だ -> 。 )

    (*) Nested parallel structures

    コネクタ -> 1 -> 、 -> ( 保持 -> 部 -> ( 51 -> C -> , -> ( 51 -> d ) )

    (*) Parallel structure markers

    と, 、, ...

[Special Cases for Individual Words] (W)

  1. Parentheses (Brackets)
    opening (left) parentheses depend on the corresponding closing (right) parentheses
    the last word within parentheses depends on the closing parenthesis
    the word occurring immediately before an opening parenthesis depends on the closing parenthesis
    when two or more pairs of parentheses supplement the same element, closing parentheses depend on the final closing parenthesis

  2. Polite"お"
    group it together with the word being made polite

    ここ -> で -> (お -> 待ち) -> くださ -> い

  3. Commas
    the head of a comma is the word that would be the head of the preceding word if the comma was not present, and the preceding word's head is the comma itself

    私 -> は -> バカ -> だ -> 。
    私 -> は -> 、 -> バカ -> だ -> 。
    本 -> と -> ( 鉛筆 -> と ) -> ボールペン -> を
    本 -> と -> 、 -> ( 鉛筆 -> と -> 、 ) -> ボールペン -> を

[Others] (E)