Publication Date
2004
Abstract
Chinese word segmentation is a difficult, important and widely-studied sequence modeling problem. This paper demonstrates the ability of linear-chain conditional random fields (CRFs) to perform robust and accurate Chinese word segmentation by providing a principled framework that easily supports the integration of domain knowledge in the form of multiple lexicons of characters and words. We also present a probabilistic new word detection method, which further improves performance. Our system is evaluated on four datasets used in a recent comprehensive Chinese word segmentation competition. State-of-the-art performance is obtained.
Recommended Citation
Peng, Fuchun, "Chinese Segmentation and New Word Detection using Conditional Random Fields" (2004). Computer Science Department Faculty Publication Series. 92.
Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/92
Comments
This paper was harvested from CiteSeer