Fundamentals of Data Mining

Data mining is a relatively new term to describe the process of finding new and unobvious patterns among large volumes of raw data in order to discover useful information and extract knowledge from it. For this reason, data mining is also known as data discovery or knowledge discovery. The correlation of relationships among the raw data forms patterns that can be useful for business in a number of ways – from cutting costs to increasing revenue.

Data mining is not entirely new technology but can now be applied to new data sources. With companies accumulating massive amounts of data from a diversity of sources, the power of data mining can now be extended beyond structured data in a relational database. It ultimately means that new patterns, new information and different knowledge can be leveraged in ways that were never before possible.

Basic Principles

A common misconception is that data mining is about finding and capturing data but this is to some extent only one step in the data mining process. Data mining brings a new dimension to the analysis of stored data. There are five basic elements that apply to data mining irrespective of the purpose or nature of the data. This includes:

  • Extraction, transforming and loading of data (ETL).
  • Storage and management of data.
  • Accessing data.
  • Analysis of data.
  • Visualization of data.

Different techniques have to be employed in order to find correlations within data that can be developed into useful patterns.

Data Mining Techniques

Some of the techniques that are utilized in data mining include:

  • Associations: Identifying the association between different attributes within data sources.
  • Clusters: Finding logical relationships for the grouping of data.
  • Classes: Locating data in predetermined groups.

Data mining techniques are constantly evolving to cater for new sources of data and identify patterns that were previously not considered. There are different levels of analysis that utilize one or more of these techniques in order to yield the information and derive knowledge from a large volume of data.

Basically there are two types of data mining models – predictive or descriptive. It includes algorithms, decision trees, neural networks, rule induction and data visualization in order to bring a deeper level of analysis to data.
Predictive data mining analyzes data with known results to then develop a model which can be used to predict the outcome of values. Descriptive data mining identifies and highlights patterns in existing data by correlating relationships in large volumes of stored data.

Uses of Data Mining

There are a number of uses of data mining, but it is best understood in the business context. In the face of large amounts of data from diverse sources within an organization, business analysts are unable to manually correlate relationships and find patterns. It is not about the shortfall in the human ability but rather that the sheer volume of data that is constantly amassing from a variety of sources makes it a resource-intensive undertaking.

This would ultimately compromise the level of data processing and even add in some degree of human bias.
Data mining overcomes these obstacles. It assists analysts with the sifting through the masses of data to find useful information. The utilization of this knowledge is then dependent on the business analyst – employing it to streamline business operations, identify new areas of growth, assess the outcome of implementing changes in the business and maximize opportunities that were previously unseen.

Ultimately it can provide new insights into cutting costs and increasing revenue that translates into an improvement in the bottom line.