Abstract
This paper presents an automatic acquisition of lin- guistic patterns that can be used for knowledge-based information extraction from texts. In knowledge-based approach to information extraction, linguistic patterns play a central role in the recognition and classification of input texts. Although the knowledge-based approach has been proved effective for information extraction on limited domains, there are difficulties in construction of a large number of domain-specific linguistic patterns. Manual creation of patterns is time consuming and error prone, even for a small application domain. To solve the scalability and the portability problem, an automatic acquisition of patterns must be provided. In this paper, we present the PALKA (Parallel Automatic Linguistic Knowledge Acquisition) system that acquires linguistic patterns from a set of domain-specific training texts and their desired outputs. A specialized representation of patterns called FP-structures has been defined. Patterns are constructed in the form of FP-structures from training texts, and the acquired patterns are tuned further through the generalization of semantic constraints. Inductive learning mechanism is applied in the generalization step. The PALKA system has been used to generate patterns for our information extraction system developed for the fourth Message Understanding Conference (MUC-4). The MUC-4 was an ARPA-sponsored competitive evaluation of text analysis systems. Experimental results with a set of news articles from MUC-4 are discussed.
Original language | English |
---|---|
Pages (from-to) | 713-724 |
Number of pages | 12 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 7 |
Issue number | 5 |
DOIs | |
State | Published - Oct 1995 |
Keywords
- induc-
- information extraction
- Knowledge-based natural language processing
- linguistic knowledge acquisition
- tive learning