Home > Software > Data-Warehouse > DataStage
Interview Questions   Tutorials   Discussions   Programs   Discussion   

DataStage - Importance of Surrogate Key in Data warehousing?

asked Rams August 31, 2014 11:46 AM  

Importance of Surrogate Key in Data warehousing?


1 Answers

answered By vishnoiprem   0  
In data warehouse environment each dimension table should have primary key which uniquely identifies dimension record. We can use natural keys or business keys as a primary key in dimension table however these keys are not recommended to be used as primary key in dimension table due to following reasons.

1. These keys are generally intelligent alphanumeric keys like ABC123 which consumes lot of indexes space when used as primary key. It makes index traversing slower and index size big.
2. Business keys are often re-uses over the period of time say after 5 years product key prd123  might be used for some other products. And data warehouse keeps historical as well as current data which might problematic if business keys are re-used and used as primary key.
To solve this issue, surrogate keys are used as primary key in dimension tables.

surrogate key

Surrogate keys are nothing but integers which do not have any meaning in terms of business and used as primary key in dimension table. Due to this surrogate keys are often called as meaningless key.
Data warehouse best practice is to have dimension table joined to fact table using only surrogate key and not business key.

Benefits of using surrogate key.

1. As we discussed earlier, business keys can be re-used over the period of time. Using surrogate keys can make dimension table handle this change easily as surrogate keys are meaningless less numbers and even if business keys are re-used new surrogate key can identify dimension record uniquely.
2. Surrogate keys are generally small integer numbers, which makes index size smaller when used as index column. This gives better performance due small index size.
3. Surrogate keys can be used when source data do not have consistent keys to uniquely identify a record. This often comes handy when new data source are introduced in a data warehouse environment.
4. Surrogate keys are needed to implement slowly changing dimensions.
   add comment

Your answer

Join with account you already have



Ready to start your tutorial with us? That's great! Send us an email and we will get back to you as soon as possible!