Data Warehouse
A datawarehouse is subject-oriented which means that it provides simple and concise views for particular subjects by for example excluding non-relevant data. It tries to integrate multiple heterogeneous data sources into one consistent “database” by doing Extract Transform Load, Data Cleaning and Data Integration over a long time horizon of data (usually 5-10 years). It is nonvolatile and thus data needs to be copied to the datawarehouse which only has the three operations of the initital loading, over night refreshes and access of data.
Definition
Subject-oriented, integrated, time-variant and nonvolatile collection of data in support of decision-making process.
based on Multidimensionale Datenmodellierung
Usage
- Information Processing
- Querying
- statistical analysis
- reporting
- Analytical Processing
- Multidimensional analyis
- OLAP operations
- Data Mining
- KDD
Why
Regular DBMS are tuned for OLTP which doesn’t have the neccesary operations and functions which are being used for data mining.
- DBMS, tuned for OLTP, access methoStds, indexing, concurrency, recovery
- Data Warehouse, tuned for OLAP, complex queries, mdm view, consolidation
- historical data, consolidation
Models
Modeling
A data warehouse can be modeled with different schemas in mind: