Move Notation |
---|
Shogi has a well-defined notation to record games. The notation of a move is decomposed into the following components. These categories are basically finite, but we include misspelled expressions as well. |
Tu: Expressions indicating the turn. This category only contains ''先手'' (black), ''¸後手'' (white), '▲'' (black), and''△'' (white) |
Po: Positions denoted by two numerals (one Arabic numeral for file and one Chinese numeral for rank). |
Pi: Piece names including promoted ones (14 types). |
Mc: Move compliment. There are only two expressions: ''成る'' (promoted) and ''不成'' (non-promoted). |
Move Descriptions |
For some moves, a commentator explains their meaning using the following expressions: |
Mn: Move name such as ''王手'' (check). |
Me: Move evaluation such as ''好手'' (good move). |
Opening Expressions |
Opening sequences have set names, which appear frequently. |
St: Strategy names. As with chess, shogi has many attacking formations with various names. This class is almost closed, but sometimes new openings are invented. An example is ''ゴキゲン中飛車'' (cheerful central rook). |
Ca: Castle names. Defensive formations also have names. This class is also almost closed with some exceptions like ''ミレニアム'' (Millenium formation), which arose in the year of 2000. |
Position Evaluation |
The most important commentaries are those concerning evaluation of the current board state, for example ''black is winning.'' The class for this type of commentary includes adjectival expressions and simple sentences consisting of a subject and a predicate with arguments. |
Ev: Evaluation expressions about the entire board. This category does not include those from a specific viewpoint covered by the followings. |
Ee: Other evaluation expressions focusing on a certain aspect. Examples are ''駒得'' (gaining pieces) and '配置が良い'' (pieces are well positioned). |
Expressions for Description of Board Positions |
Commentators use the following expressions to describe board states. |
Re: Region on the board, such as ''中央'' (center), '4筋'' (4th file), and ''3段目'' (3rd rank). |
Ph: Phase of the match, such as ''序盤'' (opening), ''中盤'' (middlegame), and ''終盤'' (endgame), including vague ones such as '終盤の入り口'' (start of endgame). |
Pa: Piece attributes. Every piece has its own movement and commentators use special expressions for it. For example, ''道'' (path) is used to denote bishop's diagonal lines and rook's orthogonal lines. There are special expressions to denote relative positions of a piece like ''腹'' (belly) meaning the side squares of a piece. |
Pq: Piece quantity. Usually it is a pair of a number and a counter word. This also includes expressions such as ''切れ'' (lack of) and ''豊富'' (abundant). |
Describing Events Outside the Board |
Commentators sometimes refer to issues outside of the board but related to the match. They can be classified into the followings types: |
Hu: Names of players, commentators, etc. including their title, such as ''名人'' (champion). This category also contains expressions for groups of players and places such as ''検討室'' (discussion room) which behaves like a human. Names in expressions belonging to other types are excluded like Ishida style. |
Ti: Expressions for the total time spent, the time spent on the current move and the time remaining. In addition to concrete expressions, like ''10 minutes,'' this includes abstract ones such as ''長時間'' (long time). |
Actions |
Unlike the general NE definitions, we decided to incorporate verbal expressions including copula verbs followed by an adjective. These include passive forms and causative forms. |
Ac: Verbs whose subject is a player. The action must be related to the board, such as ''捨てる'' (sacrifice). Thus this does not include other player actions like ''close eyes.'' |
Ap: Verbs whose subject is a piece. For example ''下がった'' (retreated). |
Ao: Other verbs. For example ''始まる'' (start), with the subject ''戦い'' (battle). |
Others |
Ot: Other important notions for shogi. Typical ones are noun phrases denoting the above categories themselves like ''戦型'' (strategy). Note that this in not included in St. |
We first segmented sentences automatically with a tool KyTea, trained on the general domain corpus, BCCWJ and a dictionary, UniDic containing 212,900 words. We then supplied the results to the tool. Finally an annotator corrected word boundaries and added BIO tags for word.(manu.) We also trained a BIO2-based NE recognizer, PWNER and conduct NE recognition.(auto)
Statistics of our corpus is below:
Training | Precision | Recall | F-measure |
---|---|---|---|
BCCWJ | 0.872 | 0.907 | 0.889 |
BCCWJ + shogi | 0.983 | 0.983 | 0.983 |
- A Japanese Chess Commentary Corpus,
- Shinsuke Mori, John Richardson, Atsushi Ushiku, Tetsuro Sasada, Hirotaka Kameko, and Yoshimasa Tsuruoka.
- LREC, 2016.