Datamining Article - Database
What is data mining ?
Data mining software allows users to analyze large databases to solve business decision problems. Data mining is, in some ways, an extension of statistics, with a few artificial intelligence and machine learning twists thrown in. Like statistics, data mining is not a business solution, it is just a technology. For example, consider a catalog retailer who needs to decide who should receive information about a new product. The information operated on by the data mining process is contained in a historical database of previous interactions with customers and the features associated with the customers, such as age, zip code, their responses. The data mining software would use this historical information to build a model of customer behavior that could be used to predict which customers would be likely to respond to the new product. By using this information a marketing manager can select only the customers who are most likely to respond. The operational business software can then feed the results of the decision to the appropriate touch point systems (call centers, direct mail, web servers, email systems, etc.) so that the right customers receive the right offers.
How Does Data Mining Work?
Data mining can be distinguished from other retrieval technologies in that it makes choices and calculations for the searcher and then categorizes information based on those choices. It accomplishes this by identifying data relevant to meet users’ information needs, and then organizing documents by topic, source, relationship with other documents, and a number of other criteria.
The first step that any data mining tool must accomplish is to identify which documents should be searched. In some cases,a known body of documents such as a magazine or image database may be searched. In other cases (such as in the World Wide Web), unfamiliar documents and services will be searched. The determination of which documents to search depends on knowledge of what the users intend to do with the information they find.
For example, computers can be programmed to recognize personal and place names as well as parts of speech. When a user is seeking information about a person, it is reasonable for data mining software to search for images of the person. Likewise,if the object of the search is a place, it is logical for data mining software to search for a map, though it would make little sense to search through images of people. While it is often not possible to make assumptions about users’ goals, users often convey information about themselves and their needs in the queries; this information can be used by data mining tools.
Once the data mining software has determined which documents it should search, it must then extract and normalize data that are relevant to the query. For text documents, stemming algorithms, grammar parsers, idiom detectors, thesauri, or other methods might be applied on the search terms as well as the documents searched to ensure results that are more relevant and comprehensive than could be accomplished by string or regular expression matching. It is at this step that data are categorized for use by the data mining algorithm. This step is roughly analogous to automatic authority control in a library setting.
After the data are prepared, the algorithms that search and arrange the data must be determined. The choice of the data mining algorithm depends at least partly on the purpose for the search. For example, if a user types in a personal name, the data mining algorithm might separate the output into categories such as biographical information, graphical files (i.e., pictures of the person),and documents authored by the person. Data mining algorithms vary, but in a library setting, these algorithms are likely to follow one or more of the following patterns.
An Architecture for Data Mining
To best apply these advanced techniques, they must be fully integrated with a data warehouse as well as flexible interactive business analysis tools. Many data mining tools currently operate outside of the warehouse, requiring extra steps for extracting, importing, and analyzing the data. Furthermore, when new insights require operational implementation, integration with the warehouse simplifies the application of results from data mining. The resulting analytic data warehouse can be applied to improve business processes throughout the organization, in areas such as promotional campaign management,fraud detection, new product rollout, and so on.
Applications
A wide range of companies have deployed successful applications of data mining. While early adopters of this technology have tended to be in information-intensive industries such as financial services and direct mail marketing, the technology is applicable to any company looking to leverage a large data warehouse to better manage their customer relationships. Two critical factors for success with data mining are: a large,well-integrated data warehouse and a well-defined understanding of the business process within which data mining is to be applied (such as customer prospecting, retention,campaign management, and so on).