日本語

PointWise-based Named Entity Recognizer: PWNER

Definition of named entity tags for newspapers are ORG (Organizations), PER (Person names), LOC (Locations), MISC (Miscellaneous names) [MUC, CoNLL2002, CoNLL2003].

As the natural language processing is getting to be applied to diverse texts, there arise high demands for the NER for new named entity definition in different domains. For these special NE definitions, only a small annotated corpus is available in the beginning, and a rapid and low-cost development of an NER is needed in practice.

To satisfy the needs, we propose the use of partially annotated data, which is a set of sentences in which only a limited number of words are annotated with NE tags.

Downloads

Usage / Requirements

Input/Output Format

Input: Raw texts which is segmented at whitespaces. Output: The sequences of the pair of word and IOB2-tag.

IOB2 format

Training NER models

under construction.

Refereneces

Named Entity Recognizer Trainable from Partially Annotated Data
Tetsuro Sasada, Shinsuke Mori, Tatsuya Kawahara and Yoko Yamakata,
PACLING, 2015.
Overview of MUC-7/MET-2
Nancy A. Chinchor,
Message Understanding Conference, 1998.
IREX: IR and IE evaluation project in Japanese, 2000.
Satoshi Sekine and Hitoshi Isahara.

Development Information

Development Team

Tetsuro Sasada
Shinsuke Mori (Advisor, Power User)

Version history