Data analysis is a process of inspecting, cleansing, transforming, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today’s business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

Data Warehouse

In computing, a data warehouse (DW or DWH), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.
The data stored in the warehouse is uploaded from the operational systems (such as marketing or sales). The data may pass through an operational data store and may require data cleansing for additional operations to ensure data quality before it is used in the DW for reporting.
Extract, transform, load (ETL) and extract, load, transform (ELT) are the two main approaches used to build a data warehouse system.

Data warehouse characteristics

There are basic features that define the data in the data warehouse that include subject orientation, data integration, time-variant, nonvolatile data, and data granularity.

  • Subject-oriented – Unlike the operational systems, the data in the data warehouse revolves around the subjects of the enterprise. Subject orientation is not database normalization. Subject orientation can be really useful for decision making. Gathering the required objects is called subject-oriented.
  • Integrated – The data found within the data warehouse is integrated. Since it comes from several operational systems, all inconsistencies must be removed. Consistencies include naming conventions, measurement of variables, encoding structures, physical attributes of data, and so forth.
  • Time-variant – While operational systems reflect current values as they support day-to-day operations, data warehouse data represents a long time horizon which means it stores mostly historical data. It is mainly meant for data mining and forecasting.
  • Nonvolatile – The data in the data warehouse is read-only, which means it cannot be updated, created, or deleted.

Related systems

Data Mart

A data mart is a simple form of a data warehouse that is focused on a single subject, hence they draw data from a limited number of sources. Data marts are often built and controlled by a single department within an organization. Given that data marts generally cover only a subset of the data contained in a data warehouse, they are often easier and faster to implement.

Online Analytical Processing (OLAP)

Online analytical processing (OLAP) is characterized by a relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems, response time is an effective measure. OLAP applications are widely used by Data Mining techniques. OLAP databases store aggregated, historical data in multi-dimensional schemas. OLAP systems typically have a data latency of a few hours, as opposed to data marts, where latency is expected to be closer to one day. The OLAP approach is used to analyze multidimensional data from multiple sources and perspectives. The three basic operations in OLAP are Roll-up (Consolidation), Drill-down, and Slicing & Dicing.

Online Transaction Processing (OLTP)

Online transaction processing (OLTP) is characterized by a large number of short online transactions (INSERT, UPDATE, DELETE). OLTP systems emphasize very fast query processing and maintaining data integrity in multi-access environments. For OLTP systems, effectiveness is measured by the number of transactions per second. OLTP databases contain detailed and current data. The schema used to store transactional databases is the entity model.
Predictive analytics is about finding and quantifying hidden patterns in the data using complex mathematical models that can be used to predict future outcomes. Predictive analysis is different from OLAP in that OLAP focuses on historical data analysis and is reactive, while predictive analysis focuses on the future. These systems are also used for customer relationship management (CRM).

References

Online transaction processing – Wikipedia

Online analytical processing – Wikipedia

Data warehouse – Wikipedia

Data mart – Wikipedia

Data analysis – Wikipedia

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.