Deidentification of clinical records has
drawn a great deal of attention in the medical field.
Since texts in clinical records are mostly ungrammatical
and fragmented, previous approaches have
relied only on local information, namely contextual
words surrounding a current target word. The
present paper proposes a new approach employing
three types of non-local features, which does not
come from surrounding words: (1) sentence features,
corresponding to the previous/next sentence
information and (2) label consistency, preferring
the same label for the same word sequence. The experimental
results showed high performance (precision
98.29%; recall 96.66%; f-measure 97.47),
demonstrating the feasibility of the proposed approach.
Eiji Aramaki, Takeshi Imai, Kengo Miyo, Kazuhiko Ohe: Automatic Deidentification by using Sentence Features and Label Consistency, Workshop on Challenges in Natural Language Processing for Clinical Data, 2006.
[PDF]
[レビュー論文]