(ver.0.1.1 by Shinsuke Mori; 2012/05/24)
Word: Words are the short unit defined by the National Institute for Japanese Language and Linguistics, except that we split the conjugative suffixes from their stems and treat them as separate words.
Reference (Japanese): 『現代日本語書き言葉均衡コーパス』形態論情報規程集改定版
Dependency: Modification relationships between words in both directions are called dependencies. The modifier is the dependent, and the modified word is the head. This relationship is expressed with "->". For example, the dependency tree for the sentence 「リンゴを食べる。」 can be expressed as follows.
リンゴ -> を -> 食べ -> る -> 。
Dependencies between words which don't follow one another are expressed using zenkaku parentheses. As shown in the example below, the "->" before the opening parenthesis indicates a dependency to the last element inside the parentheses.
リンゴ -> を -> ( 今日 -> 食べ ) -> る -> 。
It is possible for dependencies to cross (as in the example below). In this case, it is not possible to draw them using "->". However, the tool used for annotation can handle these kind of dependencies, so we may safely annotate sentences with dependencies that cross.
ウナギ を 浜松 に 食べ に 行 く
o ウナギ を -> 食べ
o 浜松 に -> 行
Nouns and Case Particles
nouns depend on case particles
リンゴ -> を
学校 -> に -> も
Conjugated Words and Conjugations
conjugated words depend on their conjugations
inflected word -> conjugation
食べ -> る
食べ -> れ -> ば
Predicate Arguments (Noun -> Case Particle) and Predicates (Verbs, Adjectives, Adjectivial Verbs)
case particles depend on predicates
私 -> は -> ( リンゴ -> を -> 食べ ) -> る
Copula (Assertive Auxiliary Verbs "だ" and "です", English "A is B.")
each predicate argument depends on the word before the assertive auxiliary verb
相手 -> は -> だれ -> だ
これ -> が -> りんご -> で -> す
Words that Modify an Entire Sentence (Conjunctions, Adverbs)
words, like conjunctions and adverbs, that modify entire sentences depend on the root of the sentence's predicate (or a noun in the case of the copula)
nouns that express time also depend on the predicate
もちろん -> ( リンゴ -> も -> 食べ ) -> る
しかし -> ( これ -> は -> ダメ ) -> だ
今日 -> ( 学校 -> へ -> 行 ) -> く
Treat conjunctions and adverbs separately.
For conjunctions, annotate the largest possible dependency range.
Compound Words (Compound Nouns, Compound Verbs)
annotate dependencies to reflect the structure
structure is defined in (Japanese) 『現代日本語書き言葉均衡コーパス』形態論情報規程集改定版
京都 -> 大学 -> ( 工学 -> 研究 -> 科 )
引っ越 -> し -> て -> 来 -> た
elements that depend on a compound noun depend on its last word if the element modifies the entire compound
however, note that elements may also depend on single words (other than the last) within a compound noun
季節 -> の -> ( 挨拶 -> 状 )
青山 -> 通り -> へ -> の -> 接道 -> 幅
for patent text, treat the number of a structural element as part of a compound word
掛け渡 -> さ -> れ -> た -> (スプリング -> 70)
Noun Phrases
the element directly preceding the parallel structure depends on the parallel structure marker (such as "と"), and the marker depends on the element directly following it
本 -> と -> 鉛筆 -> を
when an element in a parallel structure is a sequence of words, the marker depends on the head of the sequence directly following the marker
4 -> 時 -> と -> ( 5 -> 時 ) -> の
however, when the element directly following the marker also has a marker, annotate as indicated below
本 -> と -> ( 鉛筆 -> と ) -> を
for parallel structures using commas, annotate as follows
本 -> 、 -> 鉛筆 -> を
in parallel structures with 3 or more elements, each element depends on the next
本 -> 、 -> ( 鉛筆 -> 、 ) -> ボールペン -> を
when words modify part of the parallel structure, group them together and set them to depend on the last element
細い -> ( 鉛筆 -> と -> ボールペン ) と -> 本 -> を
Modification of Parallel Noun Phrases
words that modify all elements of a parallel noun phrase depend on the last element of the entire phrase
現地 -> メディア -> へ -> の -> ( 投稿 -> ・ -> 出演 ) -> を
現地 -> メディア -> へ -> の -> ( 投稿 -> ・ -> 出演 -> 等 ) -> を
words that modify a single element of a parallel noun phrase depend on the last word of that element
Parallel Predicates
(the last word of) the predicate portion of an element depends on the stem of the next predicate
※ when the copula or and inflected word has more than one predicate, the dependency goes to the closest one
首相 -> は -> ( 候補 -> 地 -> を -> 詰め ) -> 、 -> ( 九六 -> 年 -> に -> 決定 ) -> する
これ -> が -> 本 -> で -> 、 -> ( あれ -> が -> ノート ) -> だ
Modification of Predicates
when a case element could depend on two or more predicates, set it to depend on the stem word of the first predicate
隣 -> に -> 引っ越 -> し -> て -> 来 -> た
私 -> は -> ( 本 -> を -> 買 ) -> っ -> て -> 読ん -> だ
case elements of compound adjectives depend on the first element of the compound adjective
彼 -> は-> 飽き -> っぽ -> い
彼女 -> は -> ( 健康 -> 的 ) -> だ
※ inconsistency with the rule for handling noun phrases
differences become apparent when an element of a parallel structures has an additional dependency on another word
私 -> が -> 買 -> っ -> て -> 読ん -> だ -> 本
Parallel Sentences
when two sentences appear on the same line, the punctuation mark in the first one depends on the punctuation mark of the last one
私 -> は -> バカ -> だ -> 。 -> ( 彼 -> も -> だ -> 。 )
(*) Nested parallel structures
コネクタ -> 1 -> 、 -> ( 保持 -> 部 -> ( 51 -> C -> , -> ( 51 -> d ) )
(*) Parallel structure markers
と, 、, ...
Parentheses (Brackets)
opening (left) parentheses depend on the corresponding closing (right) parentheses
the last word within parentheses depends on the closing parenthesis
the word occurring immediately before an opening parenthesis depends on the closing parenthesis
when two or more pairs of parentheses supplement the same element, closing parentheses depend on the final closing parenthesis
Polite"お"
group it together with the word being made polite
ここ -> で -> (お -> 待ち) -> くださ -> い
Commas
the head of a comma is the word that would be the head of the preceding word if the comma was not present, and the preceding word's head is the comma itself
私 -> は -> バカ -> だ -> 。
私 -> は -> 、 -> バカ -> だ -> 。
本 -> と -> ( 鉛筆 -> と ) -> ボールペン -> を
本 -> と -> 、 -> ( 鉛筆 -> と -> 、 ) -> ボールペン -> を
list other dependency examples
い -> る -> か -> (どう -> か)
今日 -> (5 -> 時 -> に -> あ) -> う
今日 -> の -> (5 -> 時) -> に -> あ -> う
午後 -> (5 -> 時) -> 半