Extracting Financial Information from Text Documents
S. Lukose, F. Mathew, P. Lawhead, and S. Conlon
Proceedings of the 10th Americas Conference on Information Systems (ACIS)
The majority of
electronic data today is in textual form. Financial data such as
articles in the Wall Street Journal are written as texts. These
electronic documents contain awealth of information but require
human interpretation. For financial analysis, rapid up-to-date
information is critical. Most software tools currently require
data which are better structured than text (such as data in
relational databases). Thus, our research goal is to build a system, "FIRST"
(Flexible Information extRaction SysTem), that will extract data
from financial articles and store the output in an explicit
format. FIRST uses natural language processing techniques and
resources such as the lexical database WordNet and collocation
information to extract information. We hope to be able to extract
data such as an organization's name, its profit/loss status, and
sales status, from financial articles to input into a database.
The data will come from international corporate reports which
appear in the Wall Street Journal.