Thursday, July 28, 2011

Datawarehousing Concepts

Datawarehousing Interview Questions And Answers




Hi Everybody,
First I like to share data warehousing related questions. This topic covers from scratch to end in High level.

Need of Data warehouse
To Analysis of data and History Maintenance.
Companies Require Strategic information to face the competition in market. The Operation system are not designed for strategic information.
To Maintain History of data for whole Organization and to have a single place where the entire data stored.


What is data warehousing and Explain Approaches?

Many companies follow either Characteristic defined by W.H.Inmon or Sean kelly.
Inmon definition
Subjected Oriented,Integrated,Non Volatile,Time Variant.

Sean Kelly definition
Seperate,Available,Integrated,TimeStamped,Suject Oriented,Non Volatile,Accessible.

Dwh Approaches
There are two Approches
1.Top Down by Inmon

2.Bottom Up by Ralph kimbal

Inmon approach -->Enterprise datawarehouse structured first and next Datamart created.(TopDown).
Ralph kimbal------>Datamart designed first.Later Datamarts to Datawarehouse designed.(BottomUp).

What are the responsibilities of a data warehouse consultant/professional?

The basic responsibility of a data warehouse consultant is to ‘publish the right data’.
Some of the other responsibilities of a data warehouse consultant are:

1. Understand the end users by their business area, job responsibilities, and computer
tolerance.

2. Find out the decisions the end users want to make with the help of the data warehouse.

3. Identify the ‘best’ users who will make effective decisions using the data warehouse

4. Find the potential new users and make them aware of the data warehouse.

5. Determining the grain of the data.

6. Make the end user screens and applications much simpler and more template driven.

What are fundamental stages of Data Warehousing?

Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.

Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.


Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)

Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.


What is Datamart Explain Types?

It is a specific Subject area or Functionality or Task.It is Designed to facilitate end user Analysis.

Wrong Answer-- It is a subset of warehouse--Please dont use this wrong answer.
Types of Datamarts
Dependent,Independent,Logical.
Dependent--->Warehouse created first and datamart is created next.
Independent-->Datamart is created directly from the source systems without depending on the warehouse.
Logical--->It is a backup or replica of any other Datamart.


How to create Datawarehouse and Datamart?

DWH----->By Applying Datawarehouse Approach on any Database.

DM------->Its Created by either using Views or Complex Tables.

What is Dimensional Modeling?

It provides relationship between Dimension and Fact with the help of particular model.(Star,Snowflake etc)

What do you mean by Dimension table and Explain Dimension Types?

Dimension table is a collection of Attributes which defines a Functionality or Task. 

Features:
1.It contains textual information or descriptive information.
2.Does not contain any measurable information.
3.Answers for wht,where,when,why qstns.
4,These tables are Master tables and also Maintains History.

Types of Dimension
a.Confirmed
b.Degenerated
c.Junk
d.Role Playing
e.SCD
f.Dirty

What is Fact table and explain types of Measures?

Fact table is a main table in Relational Model.it contains two sections.
a.Foreign keys to Dimensions
b.Measures or Facts.

Features
1.Fact table contains measurable information or Numerical information.
2.Answers for how many,how much related questions.
3.These tables are children or transactional tables also contain history.

Types of Measures

Additive Measure,Semi Additive Measure, Non Additive Measure.

What is Factless Fact Table?

A table which does not contain any Meaningful or Additive measures.


What is Surrogate key? How do we generate?

It is a key contains Unique values like a Primary Key.
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.
It is just a unique identifier or number for each row that can be used for the primary key to the table.

we may generate this key in 2 ways

System generated
Manual sequence

What is the necessity of having surrogate keys?

1.Production may reuse keys that it has purged but that you are still maintaining.

2.Production might legitimately overwrite some part of a product description or a
customer description with new values but not change the product key or the customer
key to a new value. We might be wondering what to do about the revised attribute
values (slowly changing dimension crisis)

3.Production may generalize its key format to handle some new situation in the
transaction system.
E.g. changing the production keys from integers to alphanumeric
or may have 12-byte keys you are used to have become 20-byte keys.

4.Acquisition of companies

What are the advantages of using Surrogate Keys?

1. We can save substantial storage space with integer valued surrogate keys.

2.Eliminate administrative surprises coming from production.

3.Potentially adapt to big surprises like a merger or an acquisition.

4.Have a flexible mechanism for handling slowly changing dimensions.

What is SCD? Explian SCD types?

SCD--->Slowly Changing Dimension
As a Dimensions maintains history of the Data.A process into this dimensions in less volume so we call this dimensions as Slowly Changing Dimension.The process we follow here called SCD process.

SCD Types
Type 1 ---> No History
The new record replaces the original record. Only one record exist in database - current data.

Type 2----> History Maintained ---> 1. Current Expired Method
2.Effective Date Range Method.
A new record is added into the customer dimension table.
Two records exist in database - current data and previous history data.

Type 3---->History Maintained.
The original data is modified to include new data. One record exist in database - new information are attached with old information in same row.

What are the techniques for handling SCD’s?

Overwriting
Creating another dimension record
Creating a current value filed

What are the Different methods of loading Dimension tables?

There are two different ways to load data in dimension tables.

Conventional (Slow) :
All the constraints and keys are validated against the data before, it is
loaded, this way data integrity is maintained.

Direct (Fast) :
All the constraints and keys are disabled before the data is loaded.
Once data is loaded, it is validated against all the constraints and keys.
If data is found invalid or dirty it is not included in index and all future
processes are skipped on this data.

What is OLTP?

OLTP is abbreviation of On-Line Transaction Processing. This system is
an application that modifies data the instance it receives and has a
large number of concurrent users.

What is OLAP?

OLAP is abbreviation of Online Analytical Processing. This system is an
application that collects, manages, processes and presents
multidimensional data for analysis and management purposes.

What is the difference between OLTP and OLAP?

Data Source
OLTP: Operational data is from original data source of the data.

OLAP: Consolidation data is from various source.

Process Goal
OLTP: Snapshot of business processes which does fundamental business tasks.


OLAP: Multi-dimensional views of business activities of planning and decision making.

Queries and Process Scripts
OLTP: Simple quick running queries ran by users.

OLAP: Complex long running queries by system to update the aggregated data.

Database Design
OLTP: Normalized small database. Speed will be not an issue due to
smaller database and normalization will not degrade performance.
This adopts entity relationship(ER) model and an application-oriented
database design.

OLAP: De-normalized large database. Speed is issue due to largern database and de-normalizing will improve performance as there will be lesser tables to scan while performing tasks.
This adopts star,snowflake or fact constellation mode of subject-oriented database
design.

Back up and System Administration

OLTP: Regular Database backup and system administration can do the job.

OLAP: Reloading the OLTP data is good considered as good backup option.


Describes the foreign key columns in fact table and dimension table?

Foreign keys of dimension tables are primary keys of entity tables.
Foreign keys of facts tables are primary keys of Dimension tables.

What is Data Mining?

Data Mining is the process of analyzing data from different perspectives and summarizing
it into useful information.

What is the difference between view and materialized view?

A view takes the output of a query and makes it appear like a virtual
table and it can be used in place of tables.

A materialized view provides indirect access to table data by storing
the results of a query in a separate schema object.


What is ODS?

ODS is abbreviation of Operational Data Store. A database structure that is a repository
for near real-time operational data rather than long term trend data.
The ODS may further become the enterprise shared operational database,
allowing operational systems that are being reengineered to use the ODS as there operation databases.

What is VLDB?

VLDB is abbreviation of Very Large DataBase. A one terabyte database would normally be considered to be a VLDB. Typically, these are decision support systems or transaction processing applications serving large numbers of users.

Is OLTP database is design optimal for Data Warehouse?

No. OLTP database tables are normalized and it will add additional time to queries to return results. Additionally OLTP database is smaller and it does not contain longer period (many years) data, which needs to be analyzed.

A OLTP system is basically ER model and not Dimensional Model.
If a complex query is executed on a OLTP system,it may cause a heavy overhead on the OLTP server that will affect the normal business processes.

If de-normalized is improves data warehouse processes, why fact table is in normal form?

Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table.


What are lookup tables?

A lookup table is the table placed on the target table based upon the primary key of the target,
it just updates the table by allowing only modified (new or updated) records based on the lookup condition.

What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped to certain levels of dimensions . It is always easy to retrieve data from aggregated tables than visiting original table which has million records.
Aggregate tables reduces the load in the database server and increases the performance of the query and can retrieve the result quickly.



What is real time data-warehousing?

Data warehousing captures business activity data. Real-time data warehousing captures business activity data as it occurs. As soon as the business activity is complete and there is data about it, the completed activity data flows into the data warehouse and becomes
available instantly.

What are conformed dimensions?

Conformed dimensions mean the exact same thing with every possible fact table to which they are joined . They are common to the cubes.


What is conformed fact?

Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.

How do you load the time dimension?

Time dimensions are usually loaded by a program that loops through all possible dates that may appear in the data. 100 years may be represented in a time dimension, with one row per day.

What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data warehouse. Level of granularity would mean what detail are you willing to put for each transactional fact.

What are non-additive facts?

Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. However they are not considered as useless. If there is changes in dimensions the same facts can be
useful.

What are Additive Facts? Or what is meant by Additive Fact?

The fact tables are mostly very huge and almost never fetch a single record into our answer set.
We fetch a very large number of records on which we then do, adding, counting, averaging, or
taking the min or max. The most common of them is adding. Applications are simpler if they store facts in an additive format as often as possible.
Thus, in the grocery example, we don’t need to store the unit price.
We compute the unit price by dividing the dollar sales by the unit sales whenever necessary.


What are the 3 important fundamental themes in a data warehouse?

The 3 most important fundamental themes are:
1. Drilling Down
2. Drilling Across and
3. Handling Time

What is meant by Drilling Down?

Drilling down means nothing more than “give me more detail”.
Drilling Down in a relational database means “adding a row header” to an existing SELECT
statement.

For instance, if you are analyzing the sales of products at a manufacturer level, the
select list of the query reads:

SELECT MANUFACTURER, SUM(SALES).

If you wish to drill down on the list of manufacturers to show the brand sold, you add the BRAND row header:

SELECT MANUFACTURER, BRAND, SUM(SALES).

Now each manufacturer row expands into multiple rows listing all the brands sold. This is the
essence of drilling down.

We often call a row header a “grouping column” because everything in the list that’s not
aggregated with an operator such as SUM must be mentioned in the SQL GROUP BY clause.
So the GROUP BY clause in the second query reads, GROUP BY MANUFACTURER, BRAND.


What is meant by Drilling Across?

Drilling Across adds more data to an existing row. If drilling down is requesting ever finer and
granular data from the same fact table, then drilling across is the process fo linking two or more
fact tables at the same granularity, or, in other words, tables with the same set of grouping
columns and dimensional constraints.

A drill across report can be created by using grouping columns that apply to all the fact tables
used in the report.

The new fact table called for in the drill-across operation must share certain dimensions with the
fact table in the original query. All fact tables in a drill-across query must use conformed
dimensions.

What is the significance of handling time?

Example, when a customer moves from a property, we might want to know:

1. who the new customer is
2. when did the old customer move out
3. when did the new customer move in
4. how long was the property empty etc


What are the important fields in a recommended Time dimension table?

Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag

What is the main difference between Data Warehousing and Business Intelligence?


The differentials are:

DW - is a way of storing data and creating information through leveraging data marts.
DM's are segments or categories of information and/or data that are grouped together to provide 'information' into that segment or category.
DW does not require BI to work. Reporting tools can generate reports from the DW.


BI - is the leveraging of DW to help make business decisions and recommendations.
Information and data rules engines are leveraged here to help make these decisions along with statistical analysis tools and data mining tools.

What is a Physical data model?

During the physical design process, you convert the data gathered during the logical design
phase into a description of the physical database, including tables and constraints.


What is a Logical data model?

A logical design is a conceptual and abstract design. We do not deal with the physical
implementation details yet;
we deal only with defining the types of information that we need.
The process of logical design involves arranging data into a series of logical relationships called
entities and attributes.


What are an Entity, Attribute and Relationship?

An entity represents a chunk of information. In relational databases, an entity often maps to a
table.
An attribute is a component of an entity and helps define the uniqueness of the entity. In relational databases, an attribute maps to a column.
The entities are linked together using relationships.


What is junk dimension?

A number of very small dimensions might be lumped together to form a single dimension,
a junk dimension - the attributes are not closely related.
Grouping of Random flags and text Attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.


Thursday, March 17, 2011

Hash Index Examples


Example 1

Consider the following Hash Index definition:

CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY HASH (employee_number);

This index is built for the table 'emp1' which is defined as follows:
CREATE SET TABLE emp1
(employee_number          INTEGER
, manager_employee_number INTEGER
, department_number       INTEGER
, job_code                INTEGER
, last_name     CHAR(20) NOT NULL
, first_name VARCHAR(30) NOT NULL
, hire_date         DATE NOT NULL
, birthdate         DATE NOT NULL
, salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );

Points to consider about this hash index definition:

  • Each hash index row contains the employee number, the department number.
  • Specifying the employee number is unnecessary, since it is the primary index of the base table and will therefore be automatically included.
  • The BY clause indicates that the rows of this index will be distributed by the employee_number hash value.
  • The ORDER BY clause indicates that the index rows will be ordered on each AMP in sequence by the employee_number hash value.
Example 2
The same hash index definition could have been abbreviated as follows:
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1;

This is essentially the same definition because of the defaults for hash indexes.


  • The BY clause defaults to the primary index of the base table.
  • The ORDER BY clause defaults to the order of the base table rows.

Hash Index Definition Rules

There are two key rules which govern the use of the BY and ORDER BY clauses:
  • The column(s) specified in the BY clause must be a subset of the columns which make up the hash index.
  • When the BY clause is specified, the ORDER BY clause must also be specified.

Covered Query

The following is an examply of a simple query which is covered by this index:
SELECT employee_number, department_number FROM emp1;
Normally, this query would result in a full table scan of the employee table. With the existence of the hash index, the optimizer can pick a less costly approach, namely retrieve the necessary information directly from the index rather than accessing the lengthier (and costlier) base rows.
Consider the explain of this query:
EXPLAIN
SELECT employee_number, department_number FROM emp1;

1) First, we lock a distinct TD000."pseudo table" for read on a RowHash to prevent global deadlock for TD000.hash_1.
2) Next, we lock TD000.hash_1 for read.
3) We do an all-AMPs RETRIEVE step from TD000.hash_1 by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows. The estimated time for this step is 0.15 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is   0.15 seconds.

Example 3
The following is an alternate definition of the hash index 'hash_1'.
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY VALUES(employee_number);

Points to consider about this hash index definition:
  • This definition produces the same hash index, however the index rows are ordered based on employee_number value rather than the hash value.
  • This might be more useful for certain 'range processing' queries.
  • This definition would be equally helpful in covering the query indicated previously. The order of index rows would be of no significance.

Tuesday, March 8, 2011

How to generate row numbers in teradata?

We have come across rownum function in oracle . This function gives number for every row/record in Oracle. In teradata there is no direct approach like in oracle. There are various approaches suggested around.

Approach 1:
Here Sum function is used  over rows preceding in the SOURCE_TABLE

select
sum(1) over( rows unbounded preceding ),
columnA,
columnB
from
SOURCE_TABLE;

Approach 2:
Here ROW_NUMBER function is used to generate row_number on columnA
select
ROW_NUMBER() over( ORDER BY columnA ),
columnA,
columnB
from
SOURCE_TABLE;


If you have  to use the  row number concept  in target table  as well, then  following  approach using "identity column" (from V2R6 onwards )  be used :

CREATE MULTISET TABLE TARGET_TABLE
  (
   columnA INTEGER GENERATED BY DEFAULT AS IDENTITY
       (START WITH 1
        INCREMENT BY 20) ,
   columnB VARCHAR(20) NOT NULL
  )
UNIQUE PRIMARY INDEX pidx (ColA);

P.S:  Identity columns does differ from sequence concept in oracle. The numbers assigned in these columns are not guaranteed to be sequenctial. The Identity column in Teradata is used to guaranteed row-uniqueness.

This  works without use of Identity approach.
create TABLE TARGET_TABLE as
(
   select
   ROW_NUMBER() over( ORDER BY columnA ) NUMBER_SEQ ,
   columnA,
   columnB,
   columnC
   from a join b on a.id=b.id
) with data ;

How to select Nth row from a table ?

how to select a particular row from the table?
Ans:

By using  ROW_NUMBER() Function , we can mark all the selected rows with numbers .
Then use QUALIFY clause to get excat row number.


select
columnA,columnB
from
source_table
qualify row_number() over (order by columnA ) = Nth record ;
Here 'N' being particular row number.

P.S:  The same query can be used to select Top N records;
select
columnA,columnB
from 
source_table 
qualify row_number() over (order by columnA ) <= N;

ROW NUMBER

Rank vs Row_Number in Teradata

Anyone working on Teradata would be aware of the two most common OLAP functions in Teradata which are used for common use i.e.
  1. Rank()
  2. Row_Number()

  In one of my posts I have mentioned about how to generate sequence number (ex. Surrogate Key) using Row_Number.   Today let us find out the difference in usage of rank and row number functions by taking following scenario on tables (sales_daily)


SELECT    
item_id,
sales_date,
Actual_sales,
RANK() OVER (ORDER  BY Actual_sales DESC)  ,                                  -- RANK function
ROW_NUMBER() OVER (ORDER     BY Actual_sales DESC)                   -- ROW_NUMBER function
FROM   Sales_daily
WHERE    
sales_date BETWEEN DATE  '2010-01-01'     AND DATE '2010-03-01';


The result set is as follows:
Item_id
Sales_date
Actual_sales
Rank
Row_Number()
120
4/1/2010
550
1
1
120
4/2/2017
550
1
2
120
4/2/2020
450
3
3
120
4/2/2006
350
4
4
120
4/2/2027
350
4
5
120
4/1/2005
350
4
6
120
4/1/2003
250
7
7
120
4/2/2003
250
7
8
120
4/1/2025
200
9
9
120
4/1/2002
200
9
10
120
4/1/2021
150
11
11
120
4/2/2001
150
11
12
120
4/1/2001
150
11
13
120
4/1/1931
100
14
14


In the result set, we can observe the following,  
1)       Rows 1 and 2 have same rank assigned  since that have same actual_sales value
2)       Rows 1 and 2 have different row numbers assigned even though same actual_sales value
      3)   Row 3 has rank “3” assigned as ranking positions

Hence , 
rows are always assigned an incremented sequence number with ROW_NUMBER.
Rows are always assigned the same number with RANK function.

LOCKS

Difference between read lock and access lock?

For access lock to be placed we have to specify it explicitly as LOCKING FOR ACCESS.
Access lock is one wherein the table is not locked means you can do insert/update/delete on the table while access lock will access the table ,in this the dirty reads may happen , which  means you will not get the latest changes whatever happened on table to be reflected in your answer set.

 READ lock will lock the table wherein you can not do insert/update and structural changes in table.
It is placed by simple SELECT statement for by explicitly specifying as LOCKING FOR READ.

Syntax:
read lock :
Locking table for Access;

access lock :
Locking table for Read Access

The main difference between read lock and access lock is data Integrity.On placing a read lock the user expects data integrity, while as for access lock the user cannot expect data integrity.

Consider following scenarios
1. User A places READ lock and User B WRITE places lock
User B will have to wait for User A to complete its read in order to start insert/updates/deletes on the data.

2. User A places ACCESS lock and User B WRITE places lock.
User A & B access the data simultaneously, hence User A cannot expect to get consistent results.

Tuesday, February 1, 2011

eBay's Teradata implementation headed to 20 petabytes

EBay, one of Teradata’s largest customers, will more than double the petabytes in its data warehouse over the next year.
Curt Monash of Monash Research recaps a few key points from a conversation with eBay this week. Teradata often cites eBay as a reference customer. Among the main items:
  • EBay’s Teradata implementation is now topping 10 petabytes. That will grow to 20 petabytes next year.
  • The Teradata database is called Singularity and ingests data so it’s easier to index and query.
  • The auction giant has Teradata technology for transactions as well as enterprise data.
  • Hadoop is on eBay’s radar and will be used for analyzing images. However, Hadoop isn’t eBay’s favored way of crunching data.
That final point speaks to the hybrid approach emerging in data warehousing and analytics. Corporations are likely to mix and match Hadoop with Teradata and similar systems. That’s why Teradata has been in such a rush to build connectors to Hadoop via deals with Cloudera, which also has a similar partnership with EMC’s Greenplum.
Teradata has been cited as a potential acquisition for Hewlett Packard, which needs to bolster its analytics portfolio. Meanwhile, new CEO Leo Apotheker has to expand HP’s software portfolio.
Related:
Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Teradata delivers strong quarter, ups 2010 outlook

Teradata delivered a strong quarter across the board and raised its outlook for the year.
The company reported third quarter earnings of $75 million, or 44 cents a share. Non-GAAP earnings were 46 cents a share. Teradata’s third quarter revenue tally of $489 million was up 15 percent from a year ago.
Wall Street was expecting earnings of 39 cents a share on revenue of $464.2 million.
Meanwhile, Teradata increased its outlook for 2010. The company projected revenue growth of 12 percent up from its previous outlook for growth of 8 percent to 10 percent. Teradata also projected earnings of $1.71 a share to $1.74 a share. Non-GAAP earnings were projected to be $1.80 to $1.83. Wall Street was expecting 2010 earnings of $1.70 a share.
Teradata’s results are notable given that it plays in the hot data warehousing and analytics market. The company also happens to be a fine acquisition target for larger players trying to beef up their analytics capabilities.
By the numbers:
  • Gross margin was 57.1 percent in the third quarter, up from 53.4 percent a year ago.
  • Revenue in the Americas was $292 million, up 18 percent from a year ago. Europe Middle East and Africa revenue was $109 million, and Asia Pacific/Japan sales were $88 million.
  • The company ended the quarter with $741 million.

Teradata rolls out latest database, pushes time aware analysis

Teradata on Monday said its Database 13.10 is available and stumped for better time aware analysis in data warehousing applications.
So-called temporal analysis refers to the ability to analyze data based on changes in history instead of just the most current data. Teradata argues that seeing how data changed will cut down on the use of customer reports as sell as implementation times. In a nutshell, Teradata Database 13.10 time-aware capabilities are supposed to highlight how markets have changed rather than how they look at one point in time.
Teradata’s latest database will automatically update and check changes by date so there’s a complete change history. Typically, these time-data updates required manual scripts. For instance, automated time analysis could highlight how a sales region may perform if a star sales team moves to another division through time. Product sales can be tracked year over year even if the goods change categories.

The database update, along with hardware tweaks, comes as Teradata is one of the few independent data warehousing players left. Oracle is gunning for Teradata with its Exadata product line. Meanwhile, IBM bought Netezza in a move to bolster its analytics appliance line-up.
Among other Teradata odds and ends Monday:
  • Database 13.10 also features new compression technology to save storage space and better analytics precision.
  • Teradata’s platform will feature the latest Intel Xeon processors.
  • Those new processors along with other design enhancements provide a performance boost for the Teradata Active Enterprise Data Warehouse 5650.
  • Teradata is also shipping the Teradata Extreme Performance Appliance 4600, which runs completely on solid state drive (SSD) technology to cut lag time.
  • The Teradata Data Warehouse Appliance 2650 was updated along with the 1650 and the Data Mart Appliance 560, which is an entry-level data warehousing appliance.

  • Teradata will support seven new SAP Business Objects applications including new services and on-demand options.
  • Karmasphere will provide software to build and run MapReduce jobs. The upshot is that customers can access Hadoop and Teradata files with simple interfaces.
  • Teradata is also updating its relationship manager tools with new dashboards and reporting. These reports will also be available on Android and iPad devices.
Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Teradata Partners with RainStor for Big Data Retention and Retrieval


RainStor’s innovative technology significantly improves on-line data retention and retrieval for compliance needs

SAN FRANCISCO – Teradata Corporation (NYSE:TDC), today announced a partnership with RainStor, Inc. to provide specialized data retention capabilities enabling customers to keep compliance data for longer periods of time at a low cost. With RainStor’s innovative technology, data is de-duplicated and compressed enabling a 97 percent storage footprint reduction. Additionally, data is directly accessible online and can be queried using standard SQL language and a number of other business intelligence tools.
This new partnership addresses the challenge of spiraling costs associated with retaining large and growing volumes of historical data. With built-in data reduction capabilities, the cost of retaining multi-terabytes, growing to petabytes goes down significantly. Organizations can now meet industry compliance regulations, and also improve enterprise intelligence by ongoing access to larger volumes of historical data.
Many Teradata customers, in a mix of industries, have begun to delve into massive volumes of historical detailed data, as these customer examples demonstrate.
Telecommunications – U. S. and European Union regulations require that telecommunication providers retain call detail records (CDR) and also Internet protocol detail records (IPDR), which are created by smart phone users. The data must be retained for a minimum 12 months and be readily available to law enforcement agencies for criminal investigations. The challenge for telecommunications providers is that they now create millions of CDRs and IPDRs per day.
Retailers – With Teradata, retailers are able to meet sales tax compliance reporting and auditing requirements by analyzing millions of transactions, which occurred over several years and involved thousands of products, across thousands of stores. By executing this detailed analysis in hours or a few days, not weeks and months, retailers can ensure that they are properly paying their taxes, which saves time, reduces labor costs, and avoids penalties and fines.
Finance – A financial services customer stores 14 years of historical consumer banking data on everyone who has opened and closed an account to enable easy access for archival retrieval in support of governmental compliance.
“Our partnership with RainStor’s specialized data compression and reduction software helps broaden Teradata’s reach to the retention market. This complements the Teradata Purpose-Built Platform and back-up, archive and recovery solutions,” commented Scott Gnau, chief development officer, Teradata Corporation. “This solution can eliminate the need to store historical data on tape and thereby avoid any business risk of not providing immediate response in compliance situations.”
“Most industry sectors are now required to maintain online access to historical data, which not only complies with regulations but drives smart business decisions for a competitive advantage,” said John Bantleman, chief executive officer, RainStor, Inc. “We are very excited about the combined data retention capabilities Teradata and RainStor are bringing to market, which we believe to be a unique industry offering that meets today’s big data problem.”
About RainStor
RainStor is a technology pioneer in online information preservation. The company’s specialized data repository significantly reduces the total cost of retaining data through extreme data reduction, simplified data management and near-perfect scalability. RainStor solutions are deployed by technology partners across multiple industries to reduce the cost and complexity of preserving information in the enterprise or the cloud. RainStor is a privately-held company with offices in San Francisco, U.S.A. and Gloucester, UK. For more information, visit http://www.rainstor.com.
About Teradata 
Teradata Corporation (NYSE: TDC) is the world’s largest company solely focused on raising intelligence and achieving enterprise agility through its database software, enterprise data warehousing, data warehouse appliances, consulting, and enterprise analytics. Visit Teradata on the web at www.teradata.com.
Teradata is a trademark or registered trademark of Teradata Corporation in the United States and other countries.