GENERAL SUGGESTIONS FOR OPTIMIZING
Reduce the number of transformations. There is always overhead involved in moving data between transformations.
Consider more shared memory for large number of transformations.
Session shared memory between 12MB and 40MB should suffice.
Calculate once, use many times.
Avoid calculating or testing the same value over and over.
Calculate it once in an expression, and set a True/False flag.
Within an expression, use variable ports to calculate a value that can be used multiple times within that transformation.
Only connect what is used.
Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier.
This is also helpful for maintenance. If a transformation needs to be reconnected, it is best to only have necessary ports set as input and output to reconnect.
In lookup transformations, delete unused ports. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, which cuts down on the amount of cache necessary and thereby improves performance.
Watch the data types.
The engine automatically converts compatible types.
Sometimes data conversion is excessive. Data types are automatically converted when types are different between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping.
Plan for reusable transformations upfront.
Use variables. Use both mapping variables as well as ports that are variables. Variable ports are especially beneficial when they can be used to calculate a complex expression or perform a disconnected lookup call only once instead of multiple times
Use mapplets to encapsulate multiple reusable transformations.
Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions.
Only manipulate data that needs to be moved and transformed.
Reduce the number of non-essential records that are passed through the entire mapping.
Use active transformations that reduce the number of records as early in the mapping as possible (i.e., placing filters, aggregators as close to source as possible).
Utilize single-pass reads.
Redesign mappings to utilize one Source Qualifier to populate multiple targets. This way the server reads this source only once. If you have different Source Qualifiers for the same source (e.g., one for delete and one for update/insert), the server reads the source for each Source Qualifier.
Remove or reduce field-level stored procedures.
If you use field-level stored procedures, the PowerCenter server has to make a call to that stored procedure for every row, slowing performance.
When a source definition or a target definition is used in multiple folders then use shortcuts instead of making copies. This ensures uniform metadata across folder.
Shortcuts can also be created for Reusable transformations, Mapplets, Mapping and Business Components
Document the mapping and all objects with in the mapping.
This is extremely important for metadata researching, reporting purposes and for understanding the deliverable of mappings.
Each object contains a Description Box which can be used to enter the description of the object. Also each port has a Description box.
The use of a document link such as a URL to an HTML document is discouraged because document movement can destroy the validity of the link.
Eliminate Transformation Errors
In large numbers, a transformation error slows the performance of the Informatica Server. With each transformation error, the Informatica Server pauses to determine the cause of the error and to remove the row causing the error from the data flow. Then the Informatica Server typically writes the row into the session log file. Check the session log to see where the transformation errors occur. If the errors center around particular transformations, evaluate those transformation constraints.
If a mapping joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
If multiple sources from the same database are joined in a mapping then it is advisable to put a join query in query analyzer rather than using a join transformation as join transformations create overload in processing due to caching.
Using flat files located on the server machine loads faster than a database located in the server machine.
Fixed-width files are faster to load than delimited files because delimited files require extra parsing.
If processing intricate transformations, consider loading source flat file into a relational database, which allows the PowerCenter mappings to access the data in an optimized fashion by using filters and custom SQL Selects where appropriate.
Numeric operations are faster than string operations.
Optimize char-varchar comparisons (i.e., trim spaces before comparing).
Operators are faster than functions (i.e., || vs. CONCAT).
Optimize IIF expressions.
Avoid date comparisons in lookup; replace with string.
Test expression timing by replacing with constant.
If possible, use output expressions before using variable ports when the expression is not going to be reused.
Remove all DEFAULT value expressions where possible. Having a default value, even the Error (xxx) command slows down the session. It causes unnecessary evaluation of values for every data element in the map. A more efficient way is to use a variable with an IIF (xxxx, DEFAULT VALUE, xxxx) condition with in an expression. This will always be faster than a default value if assigned to an output port.
Text expressions (e.g. IS_SPACES, IS_NUMBER) have a slowing effect on mappings too. They require that an entire string be parsed. Since a port-to-port direct conversion without testing for an invalid numeric value will kill a transformation, try this: IIF ( *1>=-999999999, , NULL). An alpha in this expression will return a NULL to the computation. In this case, the IIF condition is slightly faster than the IS_NUMBER, because it doesnt parse the entire string.
Use the DECODE function in place of a combination of IIF and ISNULL. This will enhance the performance.
Optimizing IIF Expressions IIF expressions can return a value as well as an action, which allows for more compact expressions
Factor out common expressions/transformations and perform them before data pipelines split.
Optimize Char-Char and Char-Varchar Comparisons by using the Treat CHAR as CHAR On Read option in the Informatica Server setup so that the Informatica Server does not trim trailing spaces from the end of Char source fields.
In expression transformation where all the three kinds of ports are used namely input, output and variables, their ordering should be all input ports first, then variable ports and output ports at the end.
Replacing Common Sub-Expressions with Local Variables If you use the same sub-expression several times in one transformation, you can make that sub-expression a local variable. You can use a local variable only within the transformation, but by calculating the variable only once, you can speed performance.
When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This typically improves performance by 10 to 20 percent.
Caching Lookups When you enable caching, the Informatica Server caches the lookup table and queries the lookup cache during the session. When this option is not enabled, the Informatica Server queries the lookup table on a row-by-row basis. You can increase performance using a shared or persistent cache:
The rule of thumb is not to cache any table over 500,000 rows. This is only true if the standard row byte count is 1,024 or less. If the row byte count is more than 1,024, then the 500k rows will have to be adjusted down as the number of bytes increase (i.e., a 2,048 byte row can drop the cache row count to between 250K and 300K, so the lookup table should not be cached in this case). This is just a general rule though. Try running the session with a large lookup cached and not cached. Caching is often still faster on very large lookup tables.
Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows. For fewer number of lookup calls, do not cache if the number of lookup table rows is large. For small lookup tables (i.e., less than 5,000 rows), cache for more than 5 to 10 lookup calls.
If caching lookups and performance is poor, consider replacing with an unconnected, uncached lookup.
For overly large lookup tables, use dynamic caching along with a persistent cache. Cache the entire table to a persistent file on the first run, enable the update else insert option on the dynamic cache and the engine will never have to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains.
Reduce the number of cached rows using a SQL override with a restriction. Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement. This allows you to reduce the number of rows included in the cache.