TERADATA BLOG: Datawarehousing Concepts

Thursday, July 28, 2011

Datawarehousing Concepts

Datawarehousing Interview Questions And Answers

Hi Everybody,
First I like to share data warehousing related questions. This topic covers from scratch to end in High level.

Need of Data warehouse
To Analysis of data and History Maintenance.
Companies Require Strategic information to face the competition in market. The Operation system are not designed for strategic information.
To Maintain History of data for whole Organization and to have a single place where the entire data stored.

What is data warehousing and Explain Approaches?

Many companies follow either Characteristic defined by W.H.Inmon or Sean kelly.
Inmon definition
Subjected Oriented,Integrated,Non Volatile,Time Variant.

Sean Kelly definition
Seperate,Available,Integrated,TimeStamped,Suject Oriented,Non Volatile,Accessible.

Dwh Approaches
There are two Approches
1.Top Down by Inmon

2.Bottom Up by Ralph kimbal

Inmon approach -->Enterprise datawarehouse structured first and next Datamart created.(TopDown).
Ralph kimbal------>Datamart designed first.Later Datamarts to Datawarehouse designed.(BottomUp).

What are the responsibilities of a data warehouse consultant/professional?

The basic responsibility of a data warehouse consultant is to ‘publish the right data’.
Some of the other responsibilities of a data warehouse consultant are:

1. Understand the end users by their business area, job responsibilities, and computer
tolerance.

2. Find out the decisions the end users want to make with the help of the data warehouse.

3. Identify the ‘best’ users who will make effective decisions using the data warehouse

4. Find the potential new users and make them aware of the data warehouse.

5. Determining the grain of the data.

6. Make the end user screens and applications much simpler and more template driven.

What are fundamental stages of Data Warehousing?

Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.

Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.

Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)

Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

What is Datamart Explain Types?

It is a specific Subject area or Functionality or Task.It is Designed to facilitate end user Analysis.

Wrong Answer-- It is a subset of warehouse--Please dont use this wrong answer.
Types of Datamarts
Dependent,Independent,Logical.
Dependent--->Warehouse created first and datamart is created next.
Independent-->Datamart is created directly from the source systems without depending on the warehouse.
Logical--->It is a backup or replica of any other Datamart.

How to create Datawarehouse and Datamart?

DWH----->By Applying Datawarehouse Approach on any Database.

DM------->Its Created by either using Views or Complex Tables.

What is Dimensional Modeling?

It provides relationship between Dimension and Fact with the help of particular model.(Star,Snowflake etc)

What do you mean by Dimension table and Explain Dimension Types?

Dimension table is a collection of Attributes which defines a Functionality or Task.

Features:
1.It contains textual information or descriptive information.
2.Does not contain any measurable information.
3.Answers for wht,where,when,why qstns.
4,These tables are Master tables and also Maintains History.

Types of Dimension
a.Confirmed
b.Degenerated
c.Junk
d.Role Playing
e.SCD
f.Dirty

What is Fact table and explain types of Measures?

Fact table is a main table in Relational Model.it contains two sections.
a.Foreign keys to Dimensions
b.Measures or Facts.

Features
1.Fact table contains measurable information or Numerical information.
2.Answers for how many,how much related questions.
3.These tables are children or transactional tables also contain history.

Types of Measures

Additive Measure,Semi Additive Measure, Non Additive Measure.

What is Factless Fact Table?

A table which does not contain any Meaningful or Additive measures.

What is Surrogate key? How do we generate?

It is a key contains Unique values like a Primary Key.
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.
It is just a unique identifier or number for each row that can be used for the primary key to the table.

we may generate this key in 2 ways

System generated
Manual sequence

What is the necessity of having surrogate keys?

1.Production may reuse keys that it has purged but that you are still maintaining.

2.Production might legitimately overwrite some part of a product description or a
customer description with new values but not change the product key or the customer
key to a new value. We might be wondering what to do about the revised attribute
values (slowly changing dimension crisis)

3.Production may generalize its key format to handle some new situation in the
transaction system.
E.g. changing the production keys from integers to alphanumeric
or may have 12-byte keys you are used to have become 20-byte keys.

4.Acquisition of companies

What are the advantages of using Surrogate Keys?

1. We can save substantial storage space with integer valued surrogate keys.

2.Eliminate administrative surprises coming from production.

3.Potentially adapt to big surprises like a merger or an acquisition.

4.Have a flexible mechanism for handling slowly changing dimensions.

What is SCD? Explian SCD types?

SCD--->Slowly Changing Dimension
As a Dimensions maintains history of the Data.A process into this dimensions in less volume so we call this dimensions as Slowly Changing Dimension.The process we follow here called SCD process.

SCD Types
Type 1 ---> No History
The new record replaces the original record. Only one record exist in database - current data.

Type 2----> History Maintained ---> 1. Current Expired Method
2.Effective Date Range Method.
A new record is added into the customer dimension table.
Two records exist in database - current data and previous history data.

Type 3---->History Maintained.
The original data is modified to include new data. One record exist in database - new information are attached with old information in same row.

What are the techniques for handling SCD’s?

Overwriting
Creating another dimension record
Creating a current value filed

What are the Different methods of loading Dimension tables?

There are two different ways to load data in dimension tables.

Conventional (Slow) :
All the constraints and keys are validated against the data before, it is
loaded, this way data integrity is maintained.

Direct (Fast) :
All the constraints and keys are disabled before the data is loaded.
Once data is loaded, it is validated against all the constraints and keys.
If data is found invalid or dirty it is not included in index and all future
processes are skipped on this data.

What is OLTP?

OLTP is abbreviation of On-Line Transaction Processing. This system is
an application that modifies data the instance it receives and has a
large number of concurrent users.

What is OLAP?

OLAP is abbreviation of Online Analytical Processing. This system is an
application that collects, manages, processes and presents
multidimensional data for analysis and management purposes.

What is the difference between OLTP and OLAP?

Data Source
OLTP: Operational data is from original data source of the data.

OLAP: Consolidation data is from various source.

Process Goal
OLTP: Snapshot of business processes which does fundamental business tasks.

OLAP: Multi-dimensional views of business activities of planning and decision making.

Queries and Process Scripts
OLTP: Simple quick running queries ran by users.

OLAP: Complex long running queries by system to update the aggregated data.

Database Design
OLTP: Normalized small database. Speed will be not an issue due to
smaller database and normalization will not degrade performance.
This adopts entity relationship(ER) model and an application-oriented
database design.

OLAP: De-normalized large database. Speed is issue due to largern database and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks.
This adopts star,snowflake or fact constellation mode of subject-oriented database
design.

Back up and System Administration

OLTP: Regular Database backup and system administration can do the job.

OLAP: Reloading the OLTP data is good considered as good backup option.

Describes the foreign key columns in fact table and dimension table?

Foreign keys of dimension tables are primary keys of entity tables.
Foreign keys of facts tables are primary keys of Dimension tables.

What is Data Mining?

Data Mining is the process of analyzing data from different perspectives and summarizing
it into useful information.

What is the difference between view and materialized view?

A view takes the output of a query and makes it appear like a virtual
table and it can be used in place of tables.

A materialized view provides indirect access to table data by storing
the results of a query in a separate schema object.

What is ODS?

ODS is abbreviation of Operational Data Store. A database structure that is a repository
for near real-time operational data rather than long term trend data.
The ODS may further become the enterprise shared operational database,
allowing operational systems that are being reengineered to use the ODS as there operation databases.

What is VLDB?

VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be considered to be a VLDB. Typically, these are decision support systems or transaction processing applications serving large numbers of users.

Is OLTP database is design optimal for Data Warehouse?

No. OLTP database tables are normalized and it will add additional time to queries to return results. Additionally OLTP database is smaller and it does not contain longer period (many years) data, which needs to be analyzed.

A OLTP system is basically ER model and not Dimensional Model.
If a complex query is executed on a OLTP system,it may cause a heavy overhead on the OLTP server that will affect the normal business processes.

If de-normalized is improves data warehouse processes, why fact table is in normal form?

Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table.

What are lookup tables?

A lookup table is the table placed on the target table based upon the primary key of the target,
it just updates the table by allowing only modified (new or updated) records based on the lookup condition.

What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions . It is always easy to retrieve data from aggregated tables than visiting original table which has million records.
Aggregate tables reduces the load in the database server and increases the performance of the query and can retrieve the result quickly.

What is real time data-warehousing?

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes
available instantly.

What are conformed dimensions?

Conformed dimensions mean the exact same thing with every possible fact table to which they are joined . They are common to the cubes.

What is conformed fact?

Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

How do you load the time dimension?

Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.

What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what detail are you willing to put for each transactional fact.

What are non-additive facts?

Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However they are not considered as useless. If there is changes in dimensions the same facts can be
useful.

What are Additive Facts? Or what is meant by Additive Fact?

The fact tables are mostly very huge and almost never fetch a single record into our answer set.
We fetch a very large number of records on which we then do, adding, counting, averaging, or
taking the min or max. The most common of them is adding. Applications are simpler if they store facts in an additive format as often as possible.
Thus, in the grocery example, we don’t need to store the unit price.
We compute the unit price by dividing the dollar sales by the unit sales whenever necessary.

What are the 3 important fundamental themes in a data warehouse?

The 3 most important fundamental themes are:
1. Drilling Down
2. Drilling Across and
3. Handling Time

What is meant by Drilling Down?

Drilling down means nothing more than “give me more detail”.
Drilling Down in a relational database means “adding a row header” to an existing SELECT
statement.

For instance, if you are analyzing the sales of products at a manufacturer level, the
select list of the query reads:

SELECT MANUFACTURER, SUM(SALES).

If you wish to drill down on the list of manufacturers to show the brand sold, you add the BRAND row header:

SELECT MANUFACTURER, BRAND, SUM(SALES).

Now each manufacturer row expands into multiple rows listing all the brands sold. This is the
essence of drilling down.

We often call a row header a “grouping column” because everything in the list that’s not
aggregated with an operator such as SUM must be mentioned in the SQL GROUP BY clause.
So the GROUP BY clause in the second query reads, GROUP BY MANUFACTURER, BRAND.

What is meant by Drilling Across?

Drilling Across adds more data to an existing row. If drilling down is requesting ever finer and
granular data from the same fact table, then drilling across is the process fo linking two or more
fact tables at the same granularity, or, in other words, tables with the same set of grouping
columns and dimensional constraints.

A drill across report can be created by using grouping columns that apply to all the fact tables
used in the report.

The new fact table called for in the drill-across operation must share certain dimensions with the
fact table in the original query. All fact tables in a drill-across query must use conformed
dimensions.

What is the significance of handling time?

Example, when a customer moves from a property, we might want to know:

1. who the new customer is
2. when did the old customer move out
3. when did the new customer move in
4. how long was the property empty etc

What are the important fields in a recommended Time dimension table?

Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag

What is the main difference between Data Warehousing and Business Intelligence?

The differentials are:

DW - is a way of storing data and creating information through leveraging data marts.
DM's are segments or categories of information and/or data that are grouped together to provide 'information' into that segment or category.
DW does not require BI to work. Reporting tools can generate reports from the DW.

BI - is the leveraging of DW to help make business decisions and recommendations.
Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.

What is a Physical data model?

During the physical design process, you convert the data gathered during the logical design
phase into a description of the physical database, including tables and constraints.

What is a Logical data model?

A logical design is a conceptual and abstract design. We do not deal with the physical
implementation details yet;
we deal only with defining the types of information that we need.
The process of logical design involves arranging data into a series of logical relationships called
entities and attributes.

What are an Entity, Attribute and Relationship?

An entity represents a chunk of information. In relational databases, an entity often maps to a
table.
An attribute is a component of an entity and helps define the uniqueness of the entity. In relational databases, an attribute maps to a column.
The entities are linked together using relationships.

What is junk dimension?

A number of very small dimensions might be lumped together to form a single dimension,
a junk dimension - the attributes are not closely related.
Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

21 comments:

AnonymousMay 18, 2012 at 11:56 PM
Hello, How do I get in touch with you? There is no email or contact info listed .. please advise .. thanks .. Mary. Please contact me maryregency at gmail dot com
ReplyDelete
Replies
UnknownAugust 19, 2012 at 2:24 AM
nice work....
ReplyDelete
Replies
Priya KannanMay 25, 2017 at 10:34 PM
Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge.
Data Warehousing Training in Chennai
ReplyDelete
Replies
sathyaSeptember 26, 2017 at 6:59 AM
I am expecting more interesting topics from you. And this was nice content and definitely it will be useful for many people.

MSBI Training in Chennai

Informatica Training in Chennai

Dataware Housing Training in Chennai
ReplyDelete
Replies
ganga pragyaSeptember 5, 2018 at 10:31 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownOctober 10, 2018 at 1:48 AM
I recently came across your blog and have been reading along. I thought I would leave my first comment.

Data Science course in rajaji nagar | Data Science with Python course in chenni
Data Science course in electronic city | Data Science course in USA
Data science course in pune | Data science course in kalyan nagar

ReplyDelete
Replies
MounikaOctober 10, 2018 at 2:02 AM
Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
online Python certification course | python training in OMR | python training course in chennai
ReplyDelete
Replies
Madhu BalaNovember 9, 2018 at 8:59 PM
Excellent content!!! After reading your blog, I am curious to read the next part of the blog.

selenium Training in Chennai
Selenium Training Chennai
ios training institute in chennai
Digital Marketing Course in Chennai
.Net coaching centre in chennai
Future of testing professional
Loadrunner Training in Chennai
ios developer training in chennai
ReplyDelete
Replies
dwarakeshNovember 17, 2018 at 2:02 AM
Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.
Software Testing Training in Chennai | Best Software Testing Institute
Authorized Dotnet Training in Chennai | Dotnet Training in Chennai
PHP Training in Chennai | Best PHP Training Institute |PHP syllabus
Advanced Android Training in Chennai | Best Android Training in Chennai
AngularJS Training in Chennai |Advanced SAS Training in Chennai | Best SAS Training in Chennai
ReplyDelete
Replies
TejutejuNovember 29, 2018 at 5:07 AM
Nice Information Keep Updating Big Data Hadoop Online Training India
ReplyDelete
Replies
cynthiawilliamsJanuary 4, 2019 at 4:10 AM
Thanks for sharing this information admin, it helps me to learn new things. Continue sharing more like this.
RPA Training in Chennai
RPA course in Chennai
Blue Prism Training in Chennai
Angularjs Training in Chennai
AWS Training in Chennai
Data Science Course in Chennai
DevOps Training in Chennai
R Programming Training in Chennai
ReplyDelete
Replies
jefrinJanuary 30, 2019 at 11:03 PM
Nice blog interesting to read thanks

Best DevOps Training in Chennai
ReplyDelete
Replies
jefrinFebruary 20, 2019 at 3:25 AM
great blog very good to read this blog keep on posting
Dotnet training in chennai
ReplyDelete
Replies
jefrinJune 5, 2019 at 4:41 AM

I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
Data science Course Training in Chennai |Best Data Science Training Institute in Chennai
RPA Course Training in Chennai |Best RPA Training Institute in Chennai
AWS Course Training in Chennai |Best AWS Training Institute in Chennai
Devops Course Training in Chennai |Best Devops Training Institute in Chennai
ReplyDelete
Replies
PrwatechAugust 25, 2019 at 11:55 PM
Thanks for sharing great info with us.

I wanted to write a little comment to support you and wish you a good continuation All the best for all your blogging efforts.Your good knowledge and kindness in playing with all the pieces were very useful.
Python Course In Bangalore
ReplyDelete
Replies
nishaJune 6, 2020 at 9:12 PM
The Blog is really very admired. thanks for sharing this blog. keep sharing,every concepts of this blog is explained and arranged very neatly in the manner.

Data Science Training Course In Chennai | Data Science Training Course In Anna Nagar | Data Science Training Course In OMR | Data Science Training Course In Porur | Data Science Training Course In Tambaram | Data Science Training Course In Velachery
ReplyDelete
Replies
deivaJune 8, 2020 at 8:12 PM
"Nice article Instagram and Facebook have provided an amazing place for new brands to grow and flourish.
Digital Marketing Training Course in Chennai | Digital Marketing Training Course in Anna Nagar | Digital Marketing Training Course in OMR | Digital Marketing Training Course in Porur | Digital Marketing Training Course in Tambaram | Digital Marketing Training Course in Velachery

"
ReplyDelete
Replies
divyaJune 12, 2020 at 5:19 PM
Really you have done great job,There are may person searching about that now they will find enough resources by your post nice page
Ai & Artificial Intelligence Course in Chennai
PHP Training in Chennai
Ethical Hacking Course in Chennai Blue Prism Training in Chennai
UiPath Training in Chennai
ReplyDelete
Replies
lavanyaJuly 22, 2020 at 11:08 AM
Very amazing post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.

Java training in Chennai

Java training in Bangalore

Java training in Hyderabad

Java Training in Coimbatore

Java Online Training

ReplyDelete
Replies
UnknownFebruary 11, 2021 at 11:27 PM
Thanks for sharing amazing information.Gain the knowledge and hands-on experience

places to learn Teradata in Bangalore
ReplyDelete
Replies
Jon HendoMarch 24, 2021 at 10:37 AM
Virtual events are naturally more inclusive than in-person events in many respects but there are still a lot of things that planners can do to make them more accessible to people with disabilities. event marketing and Best Happy Autumn Wishes
ReplyDelete
Replies

Add comment

Pages

Thursday, July 28, 2011

Datawarehousing Concepts

21 comments: