Improving DB2 Universal Database Performance
DB2 Universal Database (UDB), Version 8.1, can be used in environments ranging from simple stand-alone systems to complex combinations of servers and clients running on a wide variety of platforms. Regardless of which environment is used, one of the most important aspects from the users’ point of view is how well (or how poorly) the database applications they work with on a day-to-day basis perform. But just what is “performance”? How do you know when it is bad? And if it is bad, what can be done to improve it?
In its simplest terms, performance is the way a computer system behaves while it is executing a given task. Performance is typically measured in terms of system response time, throughput and availability, and each of these metrics can be affected by several factors, including the type of hardware being used, the system and database configuration used, the type of and number of users or applications working with the system or database at a given point in time and the workload performed by each user or application.
If you are a database administrator looking to improve performance, where do you begin? From a database perspective, performance problems can usually be traced back to one or more of the following:
- Poor or incorrect system (environment) configuration.
- Poor instance configuration.
- Poor database configuration.
- Poor database design.
- Poor application design.
Therefore, you should focus your initial tuning efforts on each of these areas, gradually working through each one until the desired performance is obtained. Keep in mind that the greatest performance-tuning gains usually come from your initial efforts. Subsequent tuning usually results in progressively smaller gains and requires progressively greater amounts of effort.
Indexes and Performance
Although system configuration, instance configuration and database configuration play an important part in how well a database system performs, the one factor that can have the greatest impact on performance is the database design itself. And one of the most important aspects of a good database design is the existence of indexes that have been planned around queries that are executed against the database on a regular basis.
As with other relational database systems, the primary purpose of an index in a DB2 UDB database is to help the database engine (known as the Database Manager) quickly locate records stored in a table. If a table is referenced in a query and no corresponding index exists, the entire table must be scanned sequentially to locate the desired data. If the table contains a large amount of data, such a scan can take a significant amount of time. (Although a table scan might be more efficient for complex queries that return most of the rows stored in a table, it can have a significantly negative impact on performance when used to resolve simple queries that return a small number of rows.)
In most cases, if an index is created for frequently used columns in a table, performance often can be greatly improved for data-access and data-update operations. That’s because index files are generally smaller and require less time to read than their corresponding tables, particularly as tables grow in size. Furthermore, the entire index may not need to be scanned. Predicates can be applied to an index to reduce the number of rows that must actually be read. (Each index entry contains a search-key value and a pointer to the row containing that value. If you specify the ALLOW REVERSE SCANS option when an index is created, values can be searched in both ascending and descending order. It is therefore possible to bracket the search, given the right predicate.) An index can also be used to obtain rows in an ordered sequence, eliminating the need for the database manager to sort records after they have been read from a table.
In addition to a search-key value and a row pointer, a DB2 UDB index can contain additional columns (known as “include” columns) that are essentially non-indexed columns in the indexed row. Such columns make it possible for a query to retrieve required information directly from the index, without having to access the table the index is associated with. Therefore, indexes are important because they:
- Provide a fast, efficient method for locating specific rows of data in very large tables.
- Provide a logical ordering of the rows of a table. (Data is stored in a table in no particular order; when indexes are used, the values of one or more columns can be sorted in ascending or descending order. This is very beneficial when processing queries that contain ORDER BY and GROUP BY clauses.)
- Improve overall query performance. (If no index exists on a table, a table scan must be performed for each table referenced in a query. The larger the table, the longer a table scan takes because a table scan requires each table row to be accessed sequentially.)
- Can be used to enforce the uniqueness of records stored in a table.
- Can require a table to use clustering storage, which causes the rows of a table to be physically arranged according to the ordering of their index column values. (Although all indexes provide a logical ordering of data, only a clustering index provides a physical ordering of data.)
- Can provide greater concurrency in multi-user environments. (Because records can be located faster, acquired locks do not have to be held as long.)
However, there is a price to pay for these benefits:
- Each index created requires additional storage or disk space. The exact amount of space needed is dependent upon the size of the associated table, along with the size and number of columns contained in the index.
- Every insert and update operation performed on a table requires additional updating of the indexes associated with that table. This is also true when data is bulk-loaded into a table using DB2’s LOAD utility.
- Each index potentially adds an alternative access path that the DB2 optimizer must consider when generating the optimum access plan to use to resolve a query. This in turn increases compilation time when static queries are embedded in an application program.
Thus, you should choose carefully when creating indexes. Tables that are used for data mining, business intelligence, business warehousing and other applications that execute many (and often complex) queries, while rarely modifying data, are prime targets for one or more indexes. On the other hand, tables that are used in OLTP (On-Line Transactional Processing) environments or other environments where data throughput is high should use indexes sparingly.
Using the Design Advisor
So how do you decide when having an index would be beneficial, and how do you determine what indexes should exist? If you have a lot of experience with database and database application design, these decisions may be easy to make. On the other hand, if you have relatively little experience in this area or if you want to validate the decisions you have already made, you can turn to DB2 UDB’s Design Advisor.
The Design Advisor is a special tool that is designed to capture specific information about typical workloads (queries or sets of SQL operations) performed against your database and recommend changes such as the creation of new indexes or the deletion of unused indexes, based upon the information provided. Additionally, if the Design Advisor recommends one or more indexes that would improve query/workload performance, it can also generate the data definition language (DDL) statements used to create them and store these statements in a file or a user-created table named ADVISE_INDEX.
The Design Advisor can be invoked by executing the db2advis command from the DB2 Command Line Processor (CLP). The basic syntax for this command is: