Computer Science Department Faculty Publication Series

A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models

Andrew McCallum, University of Massachusetts - Amherst
David Jensen, University of Massachusetts - Amherst

Publication Date

2003

Abstract

Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and significant mining of complex text sources is beyond reach. This position paper proposes the use of unified, relational, undirected graphical models for information extraction and data mining, in which extraction decisions and data-mining decisions are made in the same probabilistic “currency,” with a common inference procedure—each component thus being able to make up for the weaknesses of the other and therefore improving the performance of both. For example, data mining run on a partiallyfilled database can find patterns that provide “topdown” accuracy-improving constraints to information extraction. Information extraction can provide a much richer set of “bottom-up” hypotheses to data mining if the mining is set up to handle additional uncertainty information from extraction. We outline an approach and describe several models, but provide no experimental results.

Comments

This paper was harvested from CiteSeer

Recommended Citation

McCallum, Andrew and Jensen, David, "A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models" (2003). Computer Science Department Faculty Publication Series. 42.
Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/42

Download

Included in

Computer Sciences Commons

COinS

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models

Publication Date

Abstract

Comments

Recommended Citation

Included in

Browse

Author Corner

Links

ScholarWorks@UMass Amherst

Computer Science Department Faculty Publication Series

A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models

Authors

Publication Date

Abstract

Comments

Recommended Citation

Included in

Share

Browse

Author Corner

Links