Publication:
A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models

dc.contributor.authorMcCallum, Andrew
dc.contributor.authorJensen, David
dc.contributor.departmentUniversity of Massachusetts - Amherst
dc.contributor.departmentUniversity of Massachusetts - Amherst
dc.date2023-09-22T21:08:46.000
dc.date.accessioned2024-04-26T09:36:10Z
dc.date.available2024-04-26T09:36:10Z
dc.date.issued2003-01-01
dc.descriptionThis paper was harvested from CiteSeer
dc.description.abstractAlthough information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and significant mining of complex text sources is beyond reach. This position paper proposes the use of unified, relational, undirected graphical models for information extraction and data mining, in which extraction decisions and data-mining decisions are made in the same probabilistic “currency,” with a common inference procedure—each component thus being able to make up for the weaknesses of the other and therefore improving the performance of both. For example, data mining run on a partiallyfilled database can find patterns that provide “topdown” accuracy-improving constraints to information extraction. Information extraction can provide a much richer set of “bottom-up” hypotheses to data mining if the mining is set up to handle additional uncertainty information from extraction. We outline an approach and describe several models, but provide no experimental results.
dc.identifier.urihttps://hdl.handle.net/20.500.14394/10072
dc.relation.urlhttps://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1026&context=cs_faculty_pubs&unstamped=1
dc.source.statuspublished
dc.subjectComputer Sciences
dc.titleA Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models
dc.typearticle
dc.typearticle
digcom.contributor.authorMcCallum, Andrew
digcom.contributor.authorJensen, David
digcom.identifiercs_faculty_pubs/42
digcom.identifier.contextkey1300107
digcom.identifier.submissionpathcs_faculty_pubs/42
dspace.entity.typePublication
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Andrew_McCallum_5.pdf
Size:
102.42 KB
Format:
Adobe Portable Document Format