Understand Oracle Optimizer Statistics

From Ittichai Chammavanijakul's Wiki
Jump to navigation Jump to search

[References: http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDgQFjAA&url=http%3A%2F%2Fwww.oracle.com%2Focom%2Fgroups%2Fpublic%2F%40otn%2Fdocuments%2Fwebcontent%2F1354477.pdf&ei=1PDNUOqGJJG6yAHu6oHIBg&usg=AFQjCNGwaegmfmC_ILOTMh2cE6Fbr-c9Lw&sig2=AT11Pb9zmsck-g5B8lTEtw&bvm=bv.1355325884,d.aWc&cad=rja]

Personal note Understand Optimizer

  • The Cost Based Optimizer (CBO) examines all possible plans for a SQL statement and chooses the one with the lowest cost.
  • The cost represents the estimated resource usage for a given execution plan. The resource includes CPU, I/O, etc.
  • The lower the cost is, the more efficient plan execution plan will be - and in theory, the more quickly the SQL query will run.
  • In order for the Optimizer to accurately determine the cost, it must have information (or Optimizer statistics) about all of the objects accessed in the SQL statement as well as about the system on which the statement will run on.
  • Optimizer Statistics are stored in the data dictionary, and can be accessed using the data dictionary view such as USER_TAB_STATISTICS.
  • Performance Statistics (via v$ views) relates to the state of the system and the SQL workload executing on it.
  • Optimizer Statistics
    • Table Statistics
      • Table Statistics, viewed via USER_TAB_STATISTICS, include:
        • Number of rows in the table
        • Number of data blocks used for the table
        • Average row length in the table
      • The optimizer uses this statistic with other information to compute the cost of various operations in the execution plan and to estimate the number of rows the operation will produce. For example, the cost of a table access is calculated using the number of data blocks combined with the value of the parameter DB_FILE_MULTIBLOCK_READ_COUNT.
    • Column Statistics
      • Column Statistics, viewed via USER_TAB_COL_STATISTICS, include:
        • Number of the distinct values in a column (NDV)
        • The minimum and maximum value found in the column
      • The optimizer uses this statistic with other information (e.g. the number of rows in the table statistics) to compute the cost of various operations in the execution plan and to estimate the number of rows the operation will produce. For example, a table has 100 records, and the table access evaluates and equality predicate on a column that has 10 distinct values, then the Optimizer, assuming *uniform data distribution*, estimates the cardinality to be the number of tows in the table divided by the number of the distinct values for the column or 100/10 = 10.
    • Additional Column Statistics
      • Histograms
        • Histograms tell the Optimizer about the distribution of data within a column.
        • By default, without Histograms, the Optimizer assumes a uniform distribution of rows across the distinct value.
        • If the data distribution in that column is not uniform (i.e., a data skew) then the cardinality estimate will be incorrect.
        • Oracle automatically determines the columns that need histograms based on the column usage information (SYS.COL_USAGE$), and the presence of a data skew. For example, Oracle will not automatically create a histogram on a unique column if it is only seen in equality predicates.