Interview Questions   Tutorials   Discussions   Programs   

Teradata - How to do the performance tuning in Query? Provide me with practical example?

asked mar September 20, 2014 06:37 AM  

How to do the performance tuning in Query? Provide me with practical example?


1 Answers

answered By vishnoiprem   0  
Please see the performance improvements steps

 1. Create Table:

            When you are designing/creating the table, check the below steps.

    Always create Multiset table. If you have a requirement to create a SET table then make sure that you have a unique primary index or unique secondary index on that table.
    If there is no UPI or USI on the SET table then the new row will check every row with same row hash value byte-by-byte to make sure that a duplicate doesn’t insert into the SET table.
    Choose the primary index that has even distribution and high join access.
    If the column has unique values and is not a primary index then create Unique Secondary Index. But Fastload and Multiload will not work if the table has Unique Secondary index.
    Do not create any UNICODE columns unless absolutely required. UNICODE requires double the storage space and stats often become unusable for Teradata.
    If the table has no updates then you can change default Free Space Percent Parameter of a table to 0 so that the data block can hold many rows and then all rows can be retrieved in a single I/O.
    Try to create vertical partition if the table has too many columns and some of the columns are used very rare. If you reduce the table size then the data block can hold many rows so I/O can be more efficient.
    If the 90% of users queries data for current year then horizontal partition gives best performance. You can split the table by creating two tables history and current. Another approach is by creating partition.
    Teradata partition reduces the overhead of scanning the complete table thus improving performance. Choose the partition table when is accessed by date ranges or character columns. Make sure that number of partitions is less.
    Compression reduces the storage capacity and the number of I/O’s required for an operation. Wherever possible, do the compression.
    If there is a join column in a table and has more null values then replace   the null values with some random number using random function and then multiply with -1 then those records can be easily distinguished from other records. Better to give random range like 1 to 10000 so it will not impact distribution if there is an upgrade in near future. Business users don’t want to see negative or unknown members in the reports so replace negative values with null in an application layer.
    Try to define all join columns with not null at table level even though source is providing null values. ETL logic should populate it with some random values if they are null so there will be no data skew issues or spool out error in applications.
    Wherever possible try to avoid outer joins. We can avoid outer joins by inserting some unknown (dummy) records into the dimension (master) table. Same range values will be inserted into the FACT (Transaction) table wherever there is a null in corresponding column.
    Note down that there is a difference between create table (copy) and create table (select). Create table (copy) retains all definitions and indexes where as create table (select) not.

2. Sql Tuning:

1.Use Top N option and column names to see sample data in a table. If there is a just “select and  * “ then the optimizer has to replace * with all columns from that table
   Lock Row For Access
                     Select Top 5 Empno, Ename From Employee;
2. Use "Create AS Table" if there is a need to create a table from existing  table. It operates on an efficient block-by-block basis that bypasses journaling.
You can create only structure or structure & data or structure,data and stats.


3. Use Insert/Select if there is a need to copy data from one table to another empty table. INSERT/SELECT operates on an efficient block-by-block basis that bypasses journaling. Insert/Select will handle conversions also.

INSERT into Summary_Table
SELECT store, region, sum (sales), count (sale_item)
FROM Region_1 GROUP BY 1,2

4. Use Nowait option if you don’t want your request to wait in the queue.
Locking Emp For Read Nowait

5. Nesting view could add a substantial time to parser. Try to reduce nesting views creation.

6. If values are nearly unique values then “Distinct” clause may outperform “Group By”. When there are many duplicate values then “Group By” performs better than “Distinct”.

7. Make sure that the join columns have always same data type. Otherwise one of the table rows would have to undergo translation and does full table scan even though there are stats on the join column.

8. If you are doing a lot of deletes on rows from a table, consider the use of MultiLoad instead of BTEQ / SQL Assistant to do the deletes. MultiLoad completely avoids use of the TJ and is restartable. Another approach is, do Insert/Select into an empty table then drop the original table and then rename the new table to original table.

9. Check the data distribution of primary index.
Select HashAmp (HashBucket (HashRow (Last_Name, Fisrt_name)))
            , Count (*)
From Emp
Group By 1
Order By 2 Desc

10. If possible try to avoid using any functions on join columns or on where columns. For example if substr or coalesce.. Etc. functions used on where column then the optimizer is not going use stats even though if there are stats on the column.

11. If there is sql with an outer join then make sure that the inner table filter condition should be present in on condition then no extra data is loaded into spool file from inner table for further processing. Filter condition for outer table should be present in where condition.
Select A.Empno
           , A.Ename
           , B.Dname
From Emp A left outer join Dept B
 On (A.Deptno = B.Deptno
        And B.Deptno=10
Here Emp table called as Outer Table and Dept table called as Inner table.

12. You can re-write correlated or minus queries with left outer join. Correlated query has to be executed for every row returned by the outer query. Correlated queries are expensive for the optimizer.
Example:  Get all departments that have no employees.
Select A.Deptno
           , A.Dname
           , A.Location
From Dept A
Where Not Exists (Select ‘1’
                         ;            From Emp B
                        & nbsp;           Where B.DeptNo = A.Deptno
Above query can be re-written like below and gives better performance than above.
Select A.Deptno
           , A.Dname
           , A.Location
From Dept A Left Outer Join Emp B
On (A.DeptNo = B.Deptno
 Where B.Deptno Is Null
13.  Avoid use of large list of values in IN/NOT IN clauses. Store them in some temporary table and use that table in the query.
14.  If intermediate tables are used for processing then make sure that it has same PI of source and target tables. Collect the stats on intermediate table NUPI after loading.
15.  If “like” used in a where clause, it is better to try to use one or more leading character in the clause, if at all possible otherwise the optimizer is going to full table scan even though if there is a index on this column.
16.   Use “Not Null” for columns, which are declared as Nullable in table definition so the tables skew can be avoided in joins.
17.  Always check the plan and run the below command to see if optimizer has recommended any stats for better execution. Collect the stats only if the optimizer is recommending high confidence. Avoid the stats if the columns have low cardinality and multi column stats (more than three).
Diagnostic help stats on for the session
18.  Create join index when the table is consistently joined with other tables other than primary index and retrieval benefits are greater than setting up and maintenance.
   add comment

Your answer

Join with account you already have



Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!