Basic DataWrehousing Concepts

What is ETL: – The ETL (Extraction, Transformation and Loading) process extracts data from different source systems and transforms it into a more consistent, standard formatteddata. This ETL process is done with the help of an ETL tool.Most widely used tools are Informatica, Data Stage, Ab Inito, Oracle Warehouse Builder etc.

What is BI: – Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information to help corporate executives, business managers andother end users make more informed business decisions.
The potential benefits of business intelligence programs include accelerating and improvingdecision making; optimizing internal business processes; increasing operational efficiency;driving new revenues; and gaining competitive advantages over business rivals.BI systems can also help companies identify market trends and spot business problems that need to be addressed.
Most widely used tools are OBIEE, Tableau , Cognos , QlikView etc.
Datawarehouse/BI answers typical business queries like:
• What is the sales distribution region wise?
• What are the slow movers in my product line?
• Which of my Sales Agents are doing better?
• Who are my profitable customers?
• What is Defaulter‘s Profile?
• How did my revenue improve in the past 5 years?
• Which channel costs me more and pays less?
• Strategic Planning / Budgeting
• Currency Risk, Interest Rate Risk, Liquidity Risk
Why OLAP? Why Not OLTP? :-
• DSS queries can adversely impact On-Line Transaction Processing (OLTP) system
• Constantly changing state of OLTP systems makes replication of result-set difficult
• Data in OLTP systems are rarely quality assured for DSS analysis
• OLTP systems may not store data over 90 days making temporal comparisons difficult
Terms and Definitions:
Metadata – Metadata is simply defined as data about data. The data that are used to represent other data is known as metadata.

Data Dictionary – A collection of Meta Data. Many kinds of products in the datawarehouse arena use a data dictionary, including database management systems, modeling tools,middleware, and query tools.
Data cube – Data cube help us to represent the data in multiple dimensions. The data cube is
defined by dimensions and facts.
Data Mart – A subset of a data warehouse that focuses on one or more specific subject areas.The data usually is extracted from the data warehouse and further demoralized and indexed to support intense usage by targeted customers.
Data Mining – Techniques for finding patterns and trends in large data sets.
Data Cleansing – The process of cleaning or removing errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse. It is part of the quality assurance process.
Data Transformation-The modification of transaction data extracted from one or more data sources before it is loaded into the data mart or warehouse. The modifications may include data cleansing, translation of data into a common format so that is can be aggregated and compared, summarizing the data, etc.
Normalization – The process of eliminating duplicate information in a database by creating a separate table that stores the redundant information.
Normalization rule divided into following normal forms:
• First Normal Form (1 NF)
• Second Normal Form (2 NF)
• Third Normal Form (3 NF)
• Boyce Codd Normal Form(BCNF)
• Fourth Normal Form(4 NF)
Staging Area – The Data Warehouse Staging Area is temporary location where data from source systems is copied. Staging is database comprises of multiple tables which holds the Incremental Data from different source systems.

Operational Data Store (ODS) – An operational data store is a database designed to integratedata from multiple sources for additional operations on the data. An ODS may contain 30 to 60 days of information, while a data warehouse typically contains years of data.

DW Components

vkni (2)

Extraction Process (CDC):-
• Extract the incremental data from feed system
• Store the extracted data into a temporary area
• Transmit the extracted data from Feed system to Staging area
• Periodicity of transmission ( daily / weekly ) depends upon the feed system
• Incremental data is the Operational data inserted or updated in the OLTP source system on day to Day Basis.
Transformation Process:-
• Transform the cleaned Operational Data into ODS
• ODS contains the current DSS data at the lowest level of granularity
Summarization Process:-
• Summarize and aggregate ODS data and Populate to the Warehouse
• Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily )