Table of Contents
The most important decisions in almost any area of the business and social spheres are based on the analysis of large and complex databases. This analysis implies the use of approaches and techniques of machine learning, statistics, and database systems. Data Mining and Data Warehouse are two closely related concepts that have the abilities to find the necessary information from large amounts of data quickly and are designed to help in the decision-making process.
Data Mining and Data Warehouse Concepts
Data Mining is the generic notion, and it is used to describe the set of detection methods of previously unknown, nontrivial and practically useful knowledge in data that is necessary for decision-making in various spheres of human activities. The purpose of the Data Mining is to detect the implicit patterns in data sets (Frawley & Piatetsky-Shapiro, 1991).
As an interdisciplinary subfield of computer science, in the past years it has been actively developing due to the widespread use of the technology of automated information processing and the accumulation of large amounts of data in computer systems. Although existing technologies allow finding in the database the necessary information quickly, in many cases, this is already not enough. Thus, there was a need to find synergies between selected events among large amounts of data, for which the methods of mathematical statistics, database theory, artificial intelligence and several other areas were required (Bhaduri & Fogarty, 2016).
Calculate the cost of essay
Considering the variety of data presentation forms, exploited algorithms and scopes of application, Data Mining can be practiced with the help of software products of different classes, such as specialized software for Data Mining, mathematical calculable packages, spreadsheets and other software products. A study on the set of objects and variants is performing during the Data Mining. In most cases, the existing data can be presented as a table, where each row corresponds to one of the options, and the columns contain the values of parameters that characterize it. Dependent variable is a parameter whose value is considered as dependent on other parameters (independent variables). Actually, the described dependence needs to be determined with the use of Data Mining methods (Sumathi & Sivanandam, 2006).
Data Warehouse is the process of collecting, filtering and pre-processing of the data, in order to present the resulting information to users for statistical analysis and analytical reports. Ralph Kimball, the author of the book The Data Warehouse Toolkit, has described Data Warehouse as the place where people can access their data (1996). He also formulated the basic requirements for the data warehouse; some of those requirements include high speed data transfer, the internal consistency of the data, the ability to obtain and compare the data, the availability of convenient utilities for the view of data warehouse, completeness and accuracy of stored data, and high quality of database updates (Kimball, 2011).
It is often difficult to be able to satisfy all of these requirements, therefore, several items are used to implement the data warehouses. Some of the requirements are the data, then the data has to be retrieved and viewed, and one more mean is the tools for completing data warehouses. A typical Data Warehouse typically differs from a relational database. For example, a typical database is designed to help its users perform daily activities, whereas the Data Warehouse is designed for easier decision-making. A standard database is exposed to continuous changes in the work process of the users, while the Data Warehouse is relatively stable. A conventional database is often the source of data entering the warehouse, in addition, the warehouse can be supplemented by external sources (e.g., data compression).
Data Warehouse requires and simultaneously provides a full support of data cleaning. It loads and continuously refreshes huge amounts of data from different sources, so that the probability of the falling of unwanted data to warehouse is very high. Moreover, the Data Warehouse is being used in the decision-making process; therefore, adjustment of unwanted data is needed in order to not result in false conclusions. For example, duplicate or missing information may cause incorrect or inadequate statistics. Due to the large spectrum of possible inconsistencies in the data and a large amount of data, cleaning is one of the biggest issues in field of Data Warehouse technology (Wrembel & Koncilia, 2007).
Use of Data Mining and Data Warehouse within an Organization
Modern processes of business and manufacturing generate huge amounts of data. These collections of data contain a large potential to extract new, analytical information, which allows building the strategy of the company, to identify tendencies of the market development, and to find new solutions for successful development. Data Warehouse and Data Mining help to extract the maximum of useful knowledge from multidimensional, diverse, incomplete, inaccurate, contradictory, and indirect data.
Data mining is used in researches, education, work of law enforcement agencies, manufacturing, healthcare and many other fields. Data Mining and the Data Warehouse are also applicable in the following areas: forecasting in economic systems; automation of marketing research and analysis of client environments for production, trade, telecommunications and Internet companies; the automation of credit decisions and credit risk assessment; monitoring of financial markets; automated trading systems.
For example, complex modern researches in medicine are inconceivable without the use of computers. Such researches include CT scan, MRI, ultrasound or radioactive isotope use. The amount of information produced during such researches, is so big that no human can perceive and process it without the help of computers. New information technologies significantly improve management efficiency and help in solving complex issues in health through an access to specialized databases (Milovic, 2012).
Advantages and Disadvantages of the BI
A large number of BI applications have helped the companies to recover their invested capital. The business intelligence systems are used to explore the ways of costs reduction, identifying new opportunities for business development and for the rapid response to changing demand and prices optimization. BI can also provide companies with more benefits during negotiation and simplify the relationships with suppliers and customers. In addition, in the enterprise there are many opportunities to save money by optimizing business processes and decision-making processes, in general.
our Top 30 Writers
Benefit from the incredible opportunity at a very
reasonable price
A major obstacle to the success of BI systems is the data of unsatisfactory quality and the non-standardized data. Without standardization of data, there is a serious risk to receive incorrect results. One more important obstacle when using the BI system is the lack of understanding between the companies in their own business processes. Therefore, companies just do not understand how these processes can be improved. If the process has no direct effect on the profit or the company is not going to standardize the processes in all its units, the implementation of BI systems may be ineffective (Information Resources Management Association USA, 2015).
To summarize, in today’s world, many owners and managers of businesses are showing increased interest in Business Intelligence. This concept implies an interconnected complex of modern methods of business management based on modern information technologies and provides maximum business efficiency. Data Mining and Data Warehouse protect people from information overload by converting operational data into useful information, so that necessary actions could be taken at the right time.
What customers say about our service