Loading...
Citations
Abstract
Chinese word segmentation is a difficult, important and widely-studied sequence modeling problem. This paper demonstrates the ability of linear-chain conditional random fields (CRFs) to perform robust and accurate Chinese word segmentation by providing a principled framework that easily supports the integration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word segmentation competition. State-of-the-art performance is obtained.
Type
Article
Date
2004-01-01
Publisher
Degree
Advisors
License
License
Files
Loading...
Fuchun_Peng_2.pdf
Adobe PDF, 136.49 KB