Indexing is one of the most important features of the Teradata RDBMS. In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows, it also can be used to enforce the primary key and unique constraints for a table. The Teradata RDBMS support five types of indexes:
- Unique Primary Index (UPI)
- Unique Secondary Index (USI)
- Non-Unique Primary Index (NUPI)
- Non-Unique Secondary Index (NUPI)
- Join Index
The typical index contains two fields: a value and a pointer to instances of that value in a data table. Because the Teradata RDBMS uses hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash, which is used as the pointer. The row hash is not the value, but a mathematically transformed address. The Teradata RDBMS uses this transformed address as a retrieval index.
The following rules apply to the indexes used in the Teradata Relation database:
- An index is a scheme used to distribute and retrieve rows of a data table. It can be based on the values in one or more columns of the table.
- A table can have a number of indexes, including one primary index, and up to 32 secondary indexes.
- An index for a relational table may be primary or secondary, and may be unique or non-unique. Each kind of index affects system performance, and can be important to data integrity.
- An index is usually defined on a table column whose values are frequently used in specifying WHERE constraints or join conditions.
- An index is used to enforce PRIMARY KEY and UNIQUE constraints.
CREATE TABLE statement allows UNIQUE and PRIMARY Keys as defined constraints on a table, and each index may be given a name, which will allow the Teradata SQL statements refer to it.
Primary Index
Primary index determines the distribution of table rows on the disks controlled by AMPs. In Teradata RDBMS, a primary index is required for row distribution and storage. When a new row is inserted, its hash code is derived by applying a hashing algorithm to the value in the column(s) of the primary code (as show in the following figure). Rows having the same primary index value are stored on the same AMP.
Rules for Defining primary indexes
The primary indexes for a table should represent the data values most used by the SQL to access the data for the table. Careful selection of the primary index is one of the most important steps in creating a table. Defining primary indexes should follow the following rules:
- A primary index should be defined to provide a nearly uniform distribution of rows among the AMPs, the more unique the index, the more even the distribution of rows and the better space utilization.
- The index should be defined on as few columns as possible.
- Primary index can be either Unique or non-unique. A unique index must have a unique value in the corresponding fields of every row; a non-unique index permits the insertion of duplicate field values. The unique primary index is more efficient.
- Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table.
- PRIMARY key
- First UNIQUE constraint
- First column
Creating primary index
Unique primary index for a table is created using the (UNIQUE) PRIMARY INDEX clause of the CREATE TABLE statement. Non-unique primary indexes are creating in the same way, but omit the keyword UNIQUE. If an index is defined on more than one column, all index columns must be specified in the WHERE clause of a request in order for a row or rows to be directly accessed. Once created, the primary index cannot be dropped or modified, the index must be changed by recreating the table.
Examples of Creating Primary Index in Teradata RDBMS
Access data using primary index
The primary index should be based only on an equality search. When a query contains WHERE constraint, which has the unique primary index value(s), the request is processed by hashing the value to locate the AMP where the row is stored, and then retrieve the row that contains a matching value in the hash code portion of its rowID. For example, since employee_number is the unique primary index for the Customer_Service.employee table, assume that an employee_number value is used as an equality constraint in a request as follows:
- SELECT employee_number FROM customer_service.employee
- WHERE employee_number = 1024;
- Locate the AMP where the row is stored.
- Retrieve the row that contains a matching value in the hash code portion of its rowID.
Primary index versus primary key
The column(s) chosen to be the primary index for a table are frequently the same as the primary key during the data modeling process, but there are conceptual differences between these two terms:
| | |
Definition | A relational concept used to determine relationships among entities and to define referential constraints | Used to store rows on disk |
Requirement | Not required, unless referential integrity checks are to be performed | Required |
Defining | Define by CREATE TABLE statement | Defined by CREATE TABLE statement |
Uniqueness | Unique | Unique or non-unique |
Function | Identifies a row uniquely | Distributes rows |
Values can be changed? | No | Yes |
Can be null? | No | Yes |
Related to access path? | No | Yes |
Secondary Index
In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes allow access to information in a table by alternate, less frequently used paths. A secondary index is a subtable that is stored in all AMPs, but separately from the primary table. The subtables, which are built and maintained by the system, contain the following;
- RowIDs of the subtable rows
- Base table index column values
- RowIDs of the base table rows (points)
Defining and creating secondary index
Secondary index are optional. Unlike the primary index, a secondary index can be added or dropped without recreating the table. There can be one or more secondary indexes in the CREATE TABLE statement, or add them to an existing table using the CREATE INDEX statement or ALTER TABLE statement. DROP INDEX can be used to dropping a named or unnamed secondary index. Since secondary indexes require subtables, these subtables require additional disk space and, therefore, may require additional I/Os for INSERTs, DELETEs, and UPDATEs.
Generally, secondary index are defined on column values frequently used in WHERE constraints.
Examples of creating and updating Secondary Index in Teradata RDBMS
Access data using secondary index
If a Teradata SQL request uses secondary index values in a WHERE constraint, the optimizer may use the rowID in a secondary index subtable to access the qualifying rows in the data table. If a secondary index is used only periodically by certain applications and is not routinely used by most applications, disk space can be saved by creating the index when it is needed and dropping it immediately after use.
A unique secondary index is very efficient, it typically allows access of only two AMPs, requires no spool file, and has one row per value, therefore, when a unique secondary index is used to access a row, two AMPs are involved. Unique secondary indexes can thus improve performance by avoiding the overhead of scanning all AMPs. For example, if a unique secondary index is defined on the department_name column of the Customer_service.department table (assuming that no two departments have the same name), then the following query is processed using two AMPs:
- SELECT department_number FROM customer_service.department
- WHERE department_name = 'Education';
- SELECT * FROM customer_service.contact
- WHERE contact_name = 'Mike';
Non-unique secondary indexed accessed is used only for request processing when it is less costly than a complete table search.
Join Index
A join index is an indexing structure containing columns from multiple tables, specifically the resulting columns form one or more tables. Rather than having to join individual tables each time the join operation is needed, the query can be resolved via a join index and, in most cases, dramatically improve performance.
Effects of Join index
Depending on the complexity of the joins, the Join Index helps improve the performance of certain types of work. The following need to be considered when manipulating join indexes:
- Load Utilities The join indexes are not supported by MultiLoad and FastLoad utilities, they must be dropped and recreated after the table has been loaded.
- Archive and Restore Archive and Restore cannot be used on join index itself. During a restore of a base table or database, the join index is marked as invalid. The join index must be dropped and recreated before it can be used again in the execution of queries.
- Fallback Protection Join index subtables cannot be Fallback-protected.
- Permanent Journal Recovery The join index is not automatically rebuilt during the recovery process. Instead, the join index is marked as invalid and the join index must be dropped and recreated before it can be used again in the execution of queries.
- Triggers A join index cannot be defined on a table with triggers.
- Collecting Statistics In general, there is no benefit in collecting statistics on a join index for joining columns specified in the join index definition itself. Statistics related to these columns should be collected on the underlying base table rather than on the join index.
Defining and creating secondary index
Join indexes can be created and dropped by using CREATE JOIN INDEX and DROP JOIN INDEX statements. Join indexes are automatically maintained by the system when updates (UPDATE, DELETE, and INSERT) are performed on the underlying base tables. Additional steps are included in the execution plan to regenerate the affected portion of the stored join result.
Examples of creating and updating Join Index in Teradata RDBMS
Access data using Join index
Join index is useful for queries where the index structure contains all of the columns referenced by one or more joins in a query. Join index was developed so that frequently executed join queries could be processed more efficiently.
like the other indexes, a join index store rowID pointers to the associated base table rows.
Another use of the join index is to define it on a single table. this will improve the performance of single table scans that can be resolved without accessing the base table.
Using Index to Process SQL Statement or Access Data
Each type of index has a specific effect on system performance. Row selection is more efficient using a unique index. When a SELECT statement uses a unique index in a WHERE clause, no spool file needs be created for intermediate storage of the result because only on row is expected. An index that is not unique allows more than one row to have the same index value. Therefore, row selection using a non-unique index may require a spool file to hold intermediate rows for final processing.
The Teradata Relational database systems do not permit explicit use of indexes in SQL queries. When a request is entered, the optimizer examines the following available information about the table to determine whether an index is used during processing:
- Number of rows in the table
- Statistics collected for the table
- Number and types of indexes defined for the table
- UPDATES, DELETES, and PRIMARY KEY and UNIQUE constraints
- How the statement is structured.
- Whether current statistics exist for the table.
- Whether PRIMARY KEY or UNIQUE constraints need to be validated.