SarCheck(TM): Automated Analysis of Solaris sar and ps data

(English text version s5.00.01)


This is an analysis of the data contained in the file /tmp/rpt. The data was collected on 05/03/2002, from 10:20:01 to 16:40:00, from system 'drew'. There were 19 sar data records used to produce this analysis. Operating system is Solaris 2.7. 1 processor is present. 64 megabytes of memory are present.

Data collected by the ps -elf command on 05/03/2002 from 10:40:00 to 16:40:00, and stored in the file /opt/sarcheck/ps/20020503, will also be analyzed.

SUMMARY

When the data was collected, no CPU bottleneck could be detected. No significant memory bottleneck was seen. No significant I/O bottleneck was seen. A change has been recommended to at least one tunable parameter. Limits to future growth have been noted in the Capacity Planning section.

At least one suspiciously large process has been detected. See the Resource Analysis Section for details.

RECOMMENDATIONS SECTION

All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days, implement only regularly occurring recommendations, and implement them one at a time.

Change the value of maxpgio from 60 to 65536. The reason for this significant change can be found in the Resource Analysis Section. This parameter can be changed by adding the following line to the /etc/system file: 'set maxpgio = 65536'. NOTE: Don't forget to check /etc/system first to see if there's already a set command modifying this tunable parameter. If there is, modify that command instead of adding another one.

Change the value of priority paging from 0 to 1. This parameter can be changed by adding the following line to the /etc/system file: 'set priority_paging = 1'. NOTE: Don't forget to check /etc/system first to see if there's already a set command modifying this tunable parameter. If there is, modify that command instead of adding another one.

Change the value of ncsize from 4236 to 5295. This change is recommended because the DNLC hit ratio, as calculated from kernel statistics, is 88.03 percent. Kernel statistics have been used instead of sar -a data because the values seen in the sar data were too low to calculate the DNLC hit ratio accurately. The low values seen in the sar -a data mean that the degree of improvement realized by implementing this recommendation is likely to be small. This parameter can be changed by adding the following line to the /etc/system file: 'set ncsize = 5295'. NOTE: Don't forget to check /etc/system first to see if there's already a set command modifying this tunable parameter. If there is, modify that command instead of adding another one. Once this change has been implemented, changes to the maxusers parameter will no longer affect ncsize.

Change the value of ufs_ninode from 4236 to 5295. This recommendation will increase the value of ufs_ninode to match the value of ncsize. Whenever a recommendation is made to increase ncsize, this program will also recommend increasing ufs_ninode to the same value. For more information, see page 309 of the second edition of Sun Performance and Tuning by Adrian Cockcroft. This parameter can be changed by adding the following line to the /etc/system file: 'set ufs_ninode = 5295'. NOTE: Don't forget to check /etc/system first to see if there's already a set command modifying this tunable parameter. If there is, modify that command instead of adding another one. Once this change has been implemented, changes to the maxusers parameter will no longer affect ufs_ninode.

We do not recommend implementing all of these changes at once. More information on how to change tunable parameters is available in the System Administration Guide. We recommend making a copy of /etc/system before making changes, and understanding how to use boot -a in case your changes to /etc/system create an unbootable system.

RESOURCE ANALYSIS SECTION

Average CPU utilization was only 1.0 percent. This indicates that spare capacity exists within the CPU. If any performance problems were seen during the monitoring period, they were not caused by a lack of CPU power. CPU utilization peaked at 9 percent from 10:20:01 to 10:40:00. A CPU upgrade is not recommended because the current CPU had significant unused capacity.

The CPU was waiting for I/O an average of 0.1 percent of the time. This confirms the lack of a regularly occurring I/O bottleneck.

Graph of CPU utilization

The CPU was idle (neither busy nor waiting for I/O) and apparently had nothing to do an average of 98.9 percent of the time. If overall performance was good, this means that on average, the CPU was lightly loaded. If performance was generally unacceptable, the bottleneck may have been caused by remote file I/O which cannot be directly measured with sar and cannot be considered by SarCheck.

The run queue had an average depth of 1.4. This indicates that there was not likely to be a performance problem caused by processes waiting for the CPU. Average run queue depth (when occupied) peaked at 3.0 from 14:40:00 to 15:00:01. During that interval, the queue was occupied 0 percent of the time.

The peak run queue occupancy seen was 1 percent from 10:20:01 to 10:40:00. The following graph shows both the run queue length and occupancy. The occupancy is shown as %runocc/10, where a run queue occupied 100 percent of the time would be shown a vertical line reaching a height of 10.0.

Graph of run queue length

The average cache hit ratio of logical reads was 99.2 percent, and the average cache hit ratio of logical writes was 85.6 percent. Despite the room for improvement seen in the hit ratios, no recommendation has been made because disk activity was light.

The fsflush daemon used only 0.01 percent of the CPU. This indicates that it is probably not using enough of the CPU to cause a problem.

In the event of a system crash, an average of 32 seconds worth of data will be lost because it will not have been written to disk. This is controlled by the autoup and tune_t_fsflushr parameters. This statistic has been calculated using the formula: autoup + (tune_t_fsflushr / 2).

The ratio of exec to fork system calls was 0.97. This indicates that PATH variables are efficient.

The DNLC hit ratio, as calculated from the sar -a statistics, was 100.00 percent. Due to the presence of low numbers in the iget/s or namei/s fields of the sar -a report, this hit ratio is likely to be inaccurate. As a result, the current DNLC hit ratio has been calculated using real time kernel statistics, and the current hit ratio is 88.03 percent. As a rule, if the DNLC ratio is much less than 90 percent, an increase in the value of ncsize will be recommended. The disadvantage of using kernel statistics is that they represent real time data, not data collected at the same time as the sar and ps statistics.

The value of maxuprc is 917 and the size of the process table as reported by sar was 922. There is no reason to change the value of maxuprc or max_nprocs based on this data.

The number of active inodes fit in the inode cache. This indicates that the inode cache was large enough to efficiently meet the needs of the system. Peak used/max statistics for the inode cache during the monitoring period were 2295/4236. The value of %ufs_ipf was always zero, but an increase in the size of the DNLC cache was recommended. Whenever a recommendation is made to increase ncsize, this program will also recommend increasing ufs_ninode to the same value. For more information, see page 309 of the second edition of Sun Performance and Tuning by Adrian Cockcroft.

At least 14.1 percent of the system's memory, or 9.04 megabytes, was always unused during sar sampling. The value of lotsfree was 114 pages, or 0.89 megabytes of memory. This indicates that while the system is not in need of memory, there isn't an unusually large quantity of physical memory that remains unused. Please note that this value is not a true high-water mark of memory usage and only reflects what was happening when sar sampled system activity.

Graph of megabytes of free memory remaining

The average page scanning rate was 0.27 per second. Page scanning peaked at 3.24 per second from 10:20:01 to 10:40:00. Peak resource utilization statistics can be used to help understand performance problems. If performance was worst during the period of high scanning activity, then a performance bottleneck may be caused by heavy I/O or the lack of memory available.

Graph of page scanning rate

The values of slowscan and fastscan were 100 and 3649. No changes are recommended because no problems were seen.

We recommended a change to priority paging from 0 to 1. Priority paging stops random or non 8k filesystem I/O from slowing the system. The pager only frees application pages when there is a real memory shortage.

The value of maxpgio, the parameter that limits the number of page-outs per second, is set to 60. Recent testing indicates that the value of maxpgio should be set so high that it is effectively eliminated. The recommendation to increase maxpgio to 65536 will prevent the page scanner from limiting the number of writes per second.

The average system-wide local I/O rate as measured by the r+w/s column in the sar -d data was 0.32 per second. This I/O rate peaked at 3 per second from 10:20:01 to 10:40:00.

Graph of Total Disk I/O rate

The following graph shows the average percent busy and service time for 2 disks, not sorted with the -dbusy or -dserv switches.

Graph of up to 5 disks, not sorted by percent busy or service time

The -dtoo switch has been used to format disk statistics into the following table.

Disk Device Statistics
Disk Device Average Percent Busy Peak Percent Busy Queue Depth when occupied Average Service Time
dad0
(c0t0d0)
0.4 3.0 0.2 14.0
sd2
(c0t2d0)
0.0 0.0 0.0 0.0

The device dad0 (c0t0d0) was busy an average of 0.4 percent of the time and had an average queue depth of 0.2 (when occupied). The average service time reported for this device and its accompanying disk subsystem was 14.0 milliseconds. This is relatively fast considering that queuing time is included in this statistic. Service time is the delay between the time a request was sent to a device and the time that the device signaled completion of the request.

During multiple time intervals ps -elf data indicated that there were 57 processes present. This was the largest number of processes seen with ps -elf but it is not likely to be the absolute peak because the operating system does not store the true "high-water mark" for this statistic.

Graph of the number of processes present

The -ptoo switch has been used to format ps data into the following table.

Interesting ps -elf data
Command User User ID Percent CPU Memory Growth Memory Use
/usr/openwin/bin/Xsun drw 246 0.04 11.2 pg/hr
0.088 mb/hr
2874 pages
22.453 mb

Unusually large process size seen in /usr/openwin/bin/Xsun, owned by drw, pid 246. The size of this process was 2874 pages, or 22.453 megabytes of memory. On this system the page size is 8 kilobytes.

More information on performance analysis and tuning can be found in The System Administration Guide Volumes 1 & 2, and in Adrian Cockcroft's Sun Performance and Tuning.

CAPACITY PLANNING SECTION

The section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

Based on the data available in this single sar report, the system should be able to support a moderate increase in workload at peak times, and memory is likely to be the first resource bottleneck. See the following paragraphs for additional information.

Graph of remaining room for growth

The CPU can support an increase in workload of at least 100 percent at peak times. Because some swap space was used and significant page scanning or swapping statistics were not seen, the amount of memory present can probably handle a moderate increase in workload. The busiest disk can support a workload increase of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

The process table, measured by sar -v, can hold at least twice as many entries as were seen.

Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. Evaluation copy for: test. This software expires on 06/04/2002 (mm/dd/yyyy). Code version: 5.00.01 for Solaris SPARC 64-bit. Serial number: 20202020.

Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze -o' at the prompt to produce the order form and follow the instructions.

**SPECIAL PROMOTION** Order a SarCheck license by 06/04/2002 and we'll send you one of our famous propeller hats! See the order form (analyze -o) for details.

(c) copyright 1994-2002 by Aptitune Corporation, Plaistow NH 03865, USA, All Rights Reserved. http://www.sarcheck.com/

Statistics for system: drew
  Start of peak interval End of peak interval Date of peak interval
Statistics collected on: 05/03/2002      
Average CPU utilization: 1.0%      
Peak CPU utilization: 9% 10:20:01 10:40:00 05/03/2002
Average user CPU utilization: 0.8%      
Average sys CPU utilization: 0.2%      
Average waiting for I/O: 0.1%      
Peak waiting for I/O: 1.0% Multiple peaks Multiple peaks  
Average run queue depth: 1.4      
Peak run queue depth: 3.0 14:40:00 15:00:01 05/03/2002
Current DNLC hit ratio: 88.03%      
Pct of phys memory unused: 14.1%      
Average page scanning rate: 0.3/sec      
Peak page scanning rate: 3.2/sec 10:20:01 10:40:00 05/03/2002
Page scanning threshold: 81.0/sec      
Average cache read hit ratio: 99.2%      
Average cache write hit ratio: 85.6%      
Average systemwide I/O rate: 0.32      
Peak systemwide I/O rate: 3.00 10:20:01 10:40:00 05/03/2002
Disk device w/highest peak: dad0 (c0t0d0)      
Avg pct busy for that disk: 0.4%      
Peak pct busy for that disk: 3.0% 10:20:01 10:40:00 05/03/2002
Max number of processes seen by ps: 57 Multiple peaks Multiple peaks  
Approx CPU capacity remaining: 100%+      
Approx I/O bandwidth remaining: 100%+      
Remaining process tbl capacity: 100%+      
Can memory support add'l load: Moderate