SarCheck(TM): Automated Analysis of Solaris sar and ps data

(English text version s3.06)


This is an analysis of the data contained in the file /tmp/rpt. The data was collected on 06/03/1999, from 08:00:01 to 17:00:00, from system 'drew'. There were 27 sar data records used to produce this analysis. Operating system is Solaris 2.7. 1 processor is present. 64 megabytes of memory are present.

SUMMARY

When the data was collected, no CPU bottleneck could be detected. No significant memory bottleneck was seen. No significant I/O bottleneck was seen. A change has been recommended to at least one tunable parameter. Limits to future growth have been noted in the Capacity Planning section.

RECOMMENDATIONS SECTION

All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.

A CPU upgrade is not recommended because the current CPU had significant unused capacity.

No disk recommendations have been made because no bottleneck was seen.

Change the value of ncsize from 4236 to 6354. This change is recommended because the DNLC hit ratio, as calculated from kernel statistics, is 82.77 percent. Kernel statistics have been used instead of sar -a data because the values seen in the sar data were too low to calculate the DNLC hit ratio accurately. The low values seen in the sar -a data mean that the degree of improvement realized by implementing this recommendation is likely to be small. This parameter can be changed by adding the following line to the /etc/system file: 'set ncsize = 6354'. NOTE: Don't forget to check /etc/system first to see if there's already a set command modifying this tunable parameter. If there is, modify that command instead of adding another one. Once this change has been implemented, changes to the maxusers parameter will no longer affect ncsize.

Change the value of ufs_ninode from 4236 to 6354. This recommendation will increase the value of ufs_ninode to match the value of ncsize. Whenever a recommendation is made to increase ncsize, this program will also recommend increasing ufs_ninode to the same value. For more information, see page 309 of the second edition of Sun Performance and Tuning by Adrian Cockcroft. This parameter can be changed by adding the following line to the /etc/system file: 'set ufs_ninode = 6354'. NOTE: Don't forget to check /etc/system first to see if there's already a set command modifying this tunable parameter. If there is, modify that command instead of adding another one. Once this change has been implemented, changes to the maxusers parameter will no longer affect ufs_ninode.

More information on how to change tunable parameters is available in the System Administration Guide. We recommend making a copy of /etc/system before making changes, and understanding how to use boot -a in case your changes to /etc/system create an unbootable system.

RESOURCE ANALYSIS SECTION

Average CPU utilization was only 0.5 percent. This indicates that spare capacity exists within the CPU. If any performance problems were seen during the monitoring period, they were not caused by a lack of CPU power.

The run queue had an average depth of 1.2. This indicates that there was not likely to be a performance problem caused by processes waiting for the CPU.

The CPU was idle (neither busy nor waiting for I/O) and apparently had nothing to do an average of 98.5 percent of the time. If overall performance was good, this means that on average, the CPU was lightly loaded. If performance was generally unacceptable, the bottleneck may have been caused by remote file I/O which cannot be directly measured with sar and cannot be considered by SarCheck.

The CPU was waiting for I/O an average of 1.0 percent of the time. This confirms the lack of a regularly occurring I/O bottleneck. The time that the system was waiting for I/O peaked at 24 percent from 10:20:00 to 10:40:00.

The average cache hit ratio of logical reads was 90.0 percent, and the average cache hit ratio of logical writes was 67.5 percent. These statistics, and the lack of any significant memory bottleneck, indicate that there is little to gain by changing the value of bufhwm.

In the event of a system crash, an average of 32 seconds worth of data will be lost because it will not have been written to disk. This is controlled by the autoup and tune_t_fsflushr parameters. This statistic has been calculated using the formula: autoup + (tune_t_fsflushr / 2).

The ratio of exec to fork system calls was 0.94. This indicates that PATH variables are efficient.

The DNLC hit ratio, as calculated from the sar -a statistics, was 43.09 percent. Due to the presence of low numbers in the iget/s or namei/s fields of the sar -a report, this hit ratio is likely to be inaccurate. As a result, the current DNLC hit ratio has been calculated using real time kernel statistics, and the current hit ratio is 82.77 percent. As a rule, if the DNLC ratio is much less than 90 percent, an increase in the value of ncsize will be recommended. The disadvantage of using kernel statistics is that they represent real time data, not data collected at the same time as the sar and ps -elf statistics.

The value of maxuprc is 917 and the size of the process table as reported by sar was 922. There is no reason to change the value of maxuprc or max_nprocs based on this data.

The number of active inodes exceeded the size of the inode cache. This can be seen by examining the sar -v statistics & noting how the both inod-sz columns have the same value. Peak used/max statistics for the inode cache during the monitoring period were 4960/4960. The percentage of igets with page flushes (%ufs_ipf) peaked at 43.26% from 11:00:00 to 11:20:00. The non-zero peak indicates that the inode cache should be larger. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of high page flushing activity, then an increase in the size of the inode cache may result in a noticeable performance improvement.

At least 1.5 percent of the system's memory, or 0.9 megabytes, was always unused during sar sampling. This indicates that while the system is not in need of memory, there isn't an unusually large quantity of physical memory that remains unused. Please note that this value is not a true high-water mark of memory usage and only reflects what was happening when sar sampled system activity.

The average page scanning rate was 0.68 per second. Page scanning peaked at 4.00 per second from 10:20:00 to 10:40:00. The page daemon scanning rate does not show evidence of a memory bottleneck. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of high scanning activity, then a performance bottleneck may be caused by the lack of memory available. The threshold at which page scanning is considered to be a problem has been calculated at 182.0 per second. This calculation is based on the value of autoup and the likely value of handspreadpages, and is optimized for sar sampling rates of 10 - 60 minutes.

The -dtbl swich has been used to format disk statistics into the following table.

Disk Device Statistics
Results sorted by Average Percent Busy
Disk Device Average Percent Busy Peak Percent Busy Queue Depth when occupied Average Service Time
dad0 10.9 25.0 0.1 13.5
nfs1 0.0 0.1 0.0 4.7
sd2 0.0 0.0 0.0 0.0

More information on performance analysis and tuning can be found in The System Administration Guide Volumes 1 & 2, and in Adrian Cockcroft's Sun Performance and Tuning.

CAPACITY PLANNING SECTION

The section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

Based on the data available in this single sar report, the system should be able to support a moderate increase in workload at peak times, and memory is likely to be the first resource bottleneck. See the following paragraphs for additional information.

The CPU can support an increase in workload of at least 100 percent at peak times. Because some swap space was used and significant page scanning or swapping statistics were not seen, the amount of memory present can probably handle a moderate increase in workload. The busiest disk can support a workload increase of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

The process table, measured by sar -v, can hold at least twice as many entries as were seen.

Please note: In no event can Aurora Software Inc. be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. Evaluation copy for: Your Company. This software expires on 08/31/1999 (mm/dd/yyyy). Code version: 3.06 for Solaris SPARC 64-bit. Serial number: 00028784.

Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze -o' at the prompt to produce the order form and follow the instructions.

(c) copyright 1994-1999 by Aurora Software Inc., Plaistow NH 03865, USA, All Rights Reserved. http://www.sarcheck.com/

Statistics for system: drew
Statistics collected on: 06/03/99
Average CPU utilization: 0.5%
Peak CPU utilization: 5%
Average user CPU utilization: 0.3%
Average sys CPU utilization: 0.1%
Average waiting for I/O: 1.0%
Average run queue depth: 1.2
Peak run queue depth: 2.0
Actual DNLC hit percentage: 82.77%
Pct of phys memory unused: 1.5%
Average page scanning rate: 0.7/sec
Peak page scanning rate: 4.0/sec
Page scanning threshold: 182.0/sec
Average cache read hit ratio: 90.0%
Average cache write hit ratio: 67.5%
Disk device w/highest peak: dad0
Avg pct busy for that disk: 10.9%
Peak pct busy for that disk: 25.0%