Creating data distributions You can generate statistics for a table and you can build data distributions for each table that your query accesses. Updating statistics for columns with user-defined data types Programmers can write functions that gather statistics for columns with user-defined data types. You can store the data distributions for user-defined data types in an sbspace. The gathering of statistics through sampling can increase the speed of the update statistics operation.
Display data distributions You can use the dbschema utility to display data distributions. Parent topic: Improving individual query performance. If you do not specify any clause that begins with the FOR keyword, statistics are updated for every table and SPL routine in the current database, including the system catalog tables.
Similarly, if you use a clause that begins with the FOR keyword , but do not specify a table or SPL routine name, the database server updates the statistics for all tables, including temporary tables, or all SPL routines in the current database.
If you use the FOR TABLE clause without a specific table name to build distributions on all of the tables in the database, distributions will also be built on all of the temporary tables in your session. Although a change to the database might obsolete the corresponding statistics in the systables , syscolumns , sysindexes , and sysdistrib system catalog tables, the database server does not automatically update them.
You upgrade a database for use with a newer database server. You can choose to convert the indexes table by table or for the entire database at one time. Follow the conversion guidelines in the Informix Migration Guide. The term many modifications is relative to the resolution of the distributions. The statement does not update the syscolumns and sysindexes tables. Any information about indexes, the syscolumns, and the sysindexes tables in the following pages does not apply to Enterprise Decision Server.
Use the ONLY keyword to collect data for one table in a hierarchy of typed tables. If you do not specify the ONLY keyword and the table that you specify has subtables, the database server creates distributions for that table and every table under it in the hierarchy.
Figure Because neither of the previous examples mentioned the level at which to update the statistical data, the database server uses the low mode by default. If pages are found with the delete flag marked as 1 , the corresponding keys are removed from the B-tree cleaner list.
This operation is particularly useful if a system crash causes the B-tree cleaner list which exists in shared memory to be lost. For information on the B-tree cleaner list, see your Administrator's Guide.
Use the LOW mode option to generate and update statistical data regarding table, row, and page count statistics in the systables system catalog table. In Dynamic Server, the LOW mode option updates index and column statistics for specified columns also.
The database server generates and updates this statistical data in the syscolumns , and sysindexes tables. When you use the low mode, the database server generates the least amount of information about the column. If you do not specify any columns, the database server removes all the distribution data for that table. You must have the DBA privilege or be the owner of the table to use this option.
As the example shows, you drop the distribution data at the same time you update the statistical data that the low mode option generates. Use the MEDIUM mode option to update the same statistics that you can perform with the low mode and also generate statistics about the distribution of data values for each specified column. The database server places distribution information in the sysdistrib system catalog table. The constructed distribution is statistically significant.
When you use the MEDIUM mode option, the data for the distributions is obtained by sampling a percentage of data rows. Because the data obtained by sampling is usually much smaller than the actual number of rows, this mode executes more quickly than the HIGH mode.
Because the data is obtained by sampling, the results might vary that is, different sample rows might produce different distribution results. Frank R. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password.
Post as a guest Name. Email Required, but never shown. The Overflow Blog. Podcast Making Agile work for data science. Stack Gives Back Featured on Meta. New post summary designs on greatest hits now, everywhere else eventually.
Visit chat. Related 2. Hot Network Questions. Question feed. Stack Overflow works best with JavaScript enabled. Accept all cookies Customize settings.
0コメント