Volume 2, Issue 3, September 2017, Page: 105-109
Chinese Word Segmentation Based on Conditional Random Field
Junxia Deng, International Economics and Trade, Gengdan Institute of Beijing University of Technology, Beijing, China
Hong Zhang, School of Information, Beijing Wuzi University, Beijing, China; Chinese Academy of Sciences, Bioinformatics Research Center, Beijing, China; Chinese Academy of Sciences, Power System Research Center, Beijing, China; Chinese Academy of Sciences, Partial Differential Equation and Its Application Center, Beijing, China; Chinese Academy of Sciences, Statistical Science Research Center, Beijing, China; Chinese Academy of Sciences, Center for Optimization and Applied Research, Beijing, China; Chinese
Shanzai Li, School of Information, Beijing Wuzi University, Beijing, China; Chinese Academy of Sciences, Bioinformatics Research Center, Beijing, China; Chinese Academy of Sciences, Power System Research Center, Beijing, China; Chinese Academy of Sciences, Partial Differential Equation and Its Application Center, Beijing, China; Chinese Academy of Sciences, Statistical Science Research Center, Beijing, China; Chinese Academy of Sciences, Center for Optimization and Applied Research, Beijing, China; Chinese
Received: Feb. 6, 2017;       Accepted: Feb. 27, 2017;       Published: Apr. 17, 2017
DOI: 10.11648/j.mlr.20170203.14      View  1291      Downloads  64
Abstract
This paper systematically describes the definition, model structure, parameter estimation and corpus selection of the conditional random field model, and applies the conditional random field to the Chinese word segmentation and the Chinese word segmentation method. In this paper, a large number of experiments have been carried out using conditional random fields. The experimental corpus has been tested by Changjiang Daily for many years. Experiments are carried out to analyze the influence of the choice of conditional random field model parameters and the selection of Chinese character annotation sets on the experimental results. Furthermore, the condition of random field model can be used to add the advantages of arbitrary features, and some new features are added to the model. Word probability, the paper explores the probability characteristic of word location. Experiments on the corpus show that the introduction of the word position probability feature has improved the accuracy, recall and the value of Fl.
Keywords
Natural Language Processing, Chinese Word Segmentation, Hidden Markov Model, Maximum Entropy Model, Conditional Random Field, Automatic Proofreading
To cite this article
Junxia Deng, Hong Zhang, Shanzai Li, Chinese Word Segmentation Based on Conditional Random Field, Machine Learning Research. Vol. 2, No. 3, 2017, pp. 105-109. doi: 10.11648/j.mlr.20170203.14
Copyright
Copyright © 2017 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Reference
[1]
John Lafferty, Andrew McCallum, F Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international Conference on Machine Leaning. San Francisco, USA. 2001: 282-289.
[2]
Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [A]. Processing of the International Conference on Machine Learning (ICML-2001) [C]. Williams college, MA, 2001: 282-289.
[3]
Pinto D, McCallum A, Wei X et al. Table extraction using conditional random fields [A]. Proceedings of the 26th ACM SIGm [C], Toronto, Canada, 2003: 235-242.
[4]
David Palmer A Trainable Rule-based Algorithm for Word Segmentation 1997.
[5]
Berkeley, California, A new statistical formula for Chinese text segmentation incorporating contextual information. United States Pages: 82-89 Year of Publication: 1999.
[6]
Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition proceedings of The IEEE 77 (2): pp. ZS7-286, 1989.
[7]
Zhou, GD., Su J. Named entity Recognition using all HMM-based chunk tagger. 2002.
[8]
E. T. Jaynes. information Theory and Statistica Imeehanics. 1957.
[9]
J. R. Crran and S. Clark Investigatigating GIS and Smoothing for Maximum Entropy Tggers. Proceedings of the llh Conference of the Europe Chapter of the Association of Computation Linguistics (EACL), Pages 91—98, Budapest, Hungary. 2003.
[10]
Tan Y’Yao T, Chea Q ET al. Applying conditional random fields to Chinese shallow parsing
[11]
Proceedings of CICLing-2005 [c], Mexico City, Mexico, 2005: 167-176.
[12]
Kudo T, Yamamoto K, Matsumoto Y. Applying Conditional Random Fields to Japanese Morphological Analysis [A]. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP-2004) [C], Barcelona, 2004: 230-237.
[13]
Zhou J, Dai X, Ni R et al. A hybrid approach to Chinese word segmentation around CRFs [A]. Proceedings of the Fourth SIGHAN Workshop on Chinese language Processing [C], Jeju Island, Korea, 2005: 196-199.
Browse journals by subject