TERADATA BLOG: Mar 17, 2011

Example 1

Consider the following Hash Index definition:

CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY HASH (employee_number);
This index is built for the table 'emp1' which is defined as follows:
CREATE SET TABLE emp1
(employee_number          INTEGER
, manager_employee_number INTEGER
, department_number       INTEGER
, job_code                INTEGER
, last_name     CHAR(20) NOT NULL
, first_name VARCHAR(30) NOT NULL
, hire_date         DATE NOT NULL
, birthdate         DATE NOT NULL
, salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );
Points to consider about this hash index definition:

Each hash index row contains the employee number, the department number.

Specifying the employee number is unnecessary, since it is the primary index of the base table and will therefore be automatically included.

The BY clause indicates that the rows of this index will be distributed by the employee_number hash value.

The ORDER BY clause indicates that the index rows will be ordered on each AMP in sequence by the employee_number hash value.

Example 2

The same hash index definition could have been abbreviated as follows:
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1;
This is essentially the same definition because of the defaults for hash indexes.

The BY clause defaults to the primary index of the base table.
The ORDER BY clause defaults to the order of the base table rows.

Hash Index Definition Rules

There are two key rules which govern the use of the BY and ORDER BY clauses:

The column(s) specified in the BY clause must be a subset of the columns which make up the hash index.
When the BY clause is specified, the ORDER BY clause must also be specified.

Covered Query

The following is an examply of a simple query which is covered by this index:
SELECT employee_number, department_number FROM emp1;
Normally, this query would result in a full table scan of the employee table. With the existence of the hash index, the optimizer can pick a less costly approach, namely retrieve the necessary information directly from the index rather than accessing the lengthier (and costlier) base rows.
Consider the explain of this query:
EXPLAIN
SELECT employee_number, department_number FROM emp1;
1) First, we lock a distinct TD000."pseudo table" for read on a RowHash to prevent global deadlock for TD000.hash_1.
2) Next, we lock TD000.hash_1 for read.
3) We do an all-AMPs RETRIEVE step from TD000.hash_1 by way of an all-rows scan with no residual conditions into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with low confidence to be 8 rows. The estimated time for this step is 0.15 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. -> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.15 seconds.

Example 3

The following is an alternate definition of the hash index 'hash_1'.
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY VALUES(employee_number);
Points to consider about this hash index definition:

This definition produces the same hash index, however the index rows are ordered based on employee_number value rather than the hash value.
This might be more useful for certain 'range processing' queries.
This definition would be equally helpful in covering the query indicated previously. The order of index rows would be of no significance.

TERADATA BLOG

Pages

Thursday, March 17, 2011

Hash Index Examples

Hash Index Definition Rules

Covered Query