SarCheck(TM): Automated Analysis of Linux data

(English text version 5.02.01)


This is an analysis of the data contained in the file test1012. The data was collected on 10/12/2004, from 00:00:00 to 23:50:01, from system 'abcdef'. Operating system was 2.4.9-e.40summi. 4 processors are present. 8047 megabytes of memory are present.

Table of Contents

SUMMARY

When the data was collected, no CPU bottleneck could be detected. A moderate memory bottleneck was seen.

RECOMMENDATIONS SECTION

All recommendations contained in this report are based solely on the conditions which were present when the performance data was collected. It is possible that conditions which were not present at that time may cause some of these recommendations to result in worse performance. To minimize this risk, analyze data from several different days and implement only regularly occurring recommendations.

Add memory. Additional memory may improve performance. If possible, borrow some memory for test purposes, and monitor system performance and resource utilization before and after its' installation.

Change the bdflush parameter 'nfract' from 30 to 40. This is the percentage of dirty buffers allowed in the buffer cache before the kernel flushes some of them.

Change the bdflush parameter 'ndirty' from 64 to 128. This is the number of dirty blocks written to disk at one time when the bdflush daemon wakes up.

Change the bdflush parameter 'nrefill' from 64 to 512. This will allow the operating system to obtain more clean buffers when refill_freelist() is called.

Change the bdflush parameter 'nref_dirt' from 256 to 2048. This is recommended in order to keep its value 4 times higher than nrefill.

To change the value of the bdflush parameters immediately as described in the above recommendations, use the following command:

    echo "40 128 512 2048 500 3000 60 0 0" > /proc/sys/vm/bdflush

With some kernels, this will not work because the file /proc/sys/vm/bdflush is read-only and you may not be able to change its permissions. If you are able to make this change and it improves performance, you can make the change permanent by adding the command to the /etc/rc.d/rc.local file.

Change the freepages parameter 'freepages.high' from 1914 to 2871. This will increase the threshold used to decide that the system is developing a shortage of free memory pages.

Change the freepages parameter 'freepages.low' from 1276 to 1914. This will increase the threshold used to decide that the system has a serious shortage of free memory pages.

Change the freepages parameter 'freepages.min' from 638 to 957. This will increase the threshold used to decide that the system has a critical shortage of free memory pages.

To change the value of the freepages parameters immediately as described in the above recommendations, use the following command:

    echo "957 1914 2871" > /proc/sys/vm/freepages

With some kernels, this will not work because the file /proc/sys/vm/freepages is read-only and you may not be able to change its permissions. If you are able to make this change and it improves performance, you can make the change permanent by adding the command to the /etc/rc.d/rc.local file.

RESOURCE ANALYSIS SECTION

Average CPU utilization was only 6.2 percent. This indicates that spare capacity exists within the CPU. If any performance problems were seen during the monitoring period, they were not caused by a lack of CPU power. CPU utilization peaked at 26.68 percent from 20:20:00 to 20:30:01. A CPU upgrade is not recommended because the current CPU had significant unused capacity.

Graph of CPU utilization

CPU number 0 was busy for an average of 6.41 percent of the time. During the peak interval from 20:20:00 to 20:30:01, this CPU was 35.83 percent busy. The CPU was busy with user work 2.89 percent of the time and was busy with system work 3.24 percent of the time. The sys/usr ratio on this CPU was 1.12:1. This is below the threshold of 2.80:1.

CPU number 1 was busy for an average of 5.75 percent of the time. During the peak interval from 19:20:00 to 19:30:00, this CPU was 32.81 percent busy. The CPU was busy with user work 2.57 percent of the time and was busy with system work 2.98 percent of the time. The sys/usr ratio on this CPU was 1.16:1.

CPU number 2 was busy for an average of 6.37 percent of the time. During the peak interval from 04:20:00 to 04:30:00, this CPU was 64.80 percent busy. The CPU was busy with user work 2.55 percent of the time and was busy with system work 2.93 percent of the time. The sys/usr ratio on this CPU was 1.15:1.

CPU number 3 was busy for an average of 6.35 percent of the time. During the peak interval from 04:00:00 to 04:10:00, this CPU was 75.52 percent busy. The CPU was busy with user work 2.46 percent of the time and was busy with system work 3.07 percent of the time. The sys/usr ratio on this CPU was 1.25:1.

Individual CPU Statistics
CPU# Average %Busy Average %User Average %Sys Average %Nice Peak %Busy
0 6.41 2.89 3.24 0.28 35.83
1 5.75 2.57 2.98 0.20 32.81
2 6.37 2.55 2.93 0.89 64.80
3 6.35 2.46 3.07 0.82 75.52

The average amount of free memory was 7192.8 pages or 28.1 megabytes. The minimum amount of free memory was 1063 pages or 4.15 megabytes at 09:10:00. A roughly horizontal line was detected in the free memory statistics at 5.0 megabytes. This may indicate the approximate point at which the operating system gets more aggressive about freeing up pages of memory. This value matches the freepages.low parameter, indicating that the system probably can not reclaim memory until the number of free pages falls below the value of freepages.low. The average swap out rate during those intervals when free memory appeared to be at a memory threshold was 0.61 per second and the average CPU utilization was 9.85 percent.

Graph of megabytes of free memory remaining

The above graph has been zoomed in to show the relationship between the size of the free list and the values of freepages parameters.

The freepages.min value was 638 pages or 2.5 megabytes. The freepages.low value was 1276 pages or 5.0 megabytes. The freepages.high value was 1914 pages or 7.5 megabytes. If the system's free list drops below freepages.high the kernel will start gently swapping. A moderate memory bottleneck was seen. The number of pages of free memory occasionally dipped below the value of freepages.low but was never less than freepages.min. Changes to the freepages parameters have been recommended to help address this.

The value of nfract, the dirty buffer threshold used to wake up bdflush, was set to 30 percent. The goal of tuning nfract is to keep it low enough that the number of dirty buffers in the cache is not enough to degrade performance, but high enough to allow as many dirty buffers in the cache as possible. In this case a recommendation was made to increase the value to 40 percent. Although Red Hat's documentation indicates that the seventh parameter of bdflush is age_super, its value compared to nfract and to 100 infers that it may really represent nfract_sync. If this parameter really does represent nfract_sync, set its value about two thirds of the way between nfract and 100.

The value of ndirty was set to allow bdflush to write 128 buffers to the disk at one time. The recommended increase will attempt to improve performance by allowing more buffers to be written to disk at once. The large amount of memory seen indicates that this is a system which should be able to handle a significant amount of activity at once.

The nrefill parameter was set to 64 buffers. This parameter controls the number of buffers to be added to the free list whenever bdflush calls refill_freelist(). A recommendation to increase this value to 512 will result in fewer calls to refill_freelist(). The nref_dirt parameter was set to allow refill_freelist() to wake up bdflush whenever it found more than 256 dirty buffers. A recommendation to increase this value to 2048 will keep it properly aligned with nrefill.

The value of the page_cluster parameter was 4. This means that 16 pages are read at once. Values of 4 or 5 are better for large systems that perform non-interactive jobs using sequential I/O. There may be an advantage in lowering this value if response times need to be improved during heavy I/O, but I/O-bound jobs may suffer as a result.

Page ins peaked at 31274.06 per second from 22:10:00 to 22:20:00. An unusually high page in rate was detected. This may be normal for your environment, but it is still worth noting. The average page out rate was 2055.132 per second. Page outs peaked at 20826.47 per second from 21:10:01 to 21:20:01. An unusually high page out rate was detected. This may be normal for your environment, but it is still worth noting.

Graph of swap out rate

The average swap in rate was 0.50 per second. Swap ins peaked at 17.08 per second from 06:00:00 to 06:10:00. The average swap out rate was 0.55 per second. Swap outs peaked at 16.83 per second from 21:00:00 to 21:10:01.

The kswapd parameter tries_base was set to 512. This controls the number of pages that kswapd will try to free each time it runs. The kswapd parameter tries_min was set to 32. This controls the number of times that kswapd tries to free a page of memory when it's called. The kswapd parameter swap_cluster was set to 8. This controls the number of pages that kswapd will try to write when it is called.

Graph of swap space used

The amount of swap space in use peaked at 282.25 megabytes at 10:00:00. The average amount of swap space in use was 145.17 megabytes. The size of swap space was 8189.25 megabytes. The peak amount of swap space in use was 3.45 percent of the total. Note that some of the swap space was more desirable as defined by its priority.

There were 4 swap partitions seen in /proc/swaps. The priority of swap partitions ranged from a high of -1 to a low of -4. This makes sense if some partitions are likely to be faster than others but will not cause the I/O load to be divided evenly among all the swap partitions. The rate of swap operations peaked at 24.84 per second from 21:00:00 to 21:10:01. Some performance degradation is possible if this activity was not balanced throughout the monitoring interval.

According to data collected from /proc/partitions, the system-wide disk I/O rate averaged 95.32 per second and peaked at 1336.36 per second from 18:10:01 to 18:20:00 on 10/12/2004. The read rate averaged 63.58 per second and peaked at 1233.26 per second from 18:10:01 to 18:20:00 on 10/12/2004. The write rate averaged 31.74 per second and peaked at 200.07 per second from 21:00:00 to 21:10:01 on 10/12/2004.

Graph of systemwide disk I/O rate

The DTOO entry in the sarcheck_parms file has been used to format disk statistics into the following table.

Disk Device Statistics
Disk Device Average
%busy
Peak
%busy
Average
IO/sec
Peak
IO/sec
Average
read/sec
Average
write/sec
sda 1.25 7.49 2.31 22.08 0.70 1.61
sdb 1.26 12.26 8.95 18.26 0.19 8.77
sdc 0.90 19.84 3.51 33.91 0.24 3.28
sdd 9.16 83.66 49.08 1234.78 44.11 4.97
sde 20.98 99.88 31.47 214.14 18.35 13.12

The I/O rate on disk device sda averaged 2.31 per second and peaked at 22.08 per second from 23:40:00 to 23:50:01. on 10/12/2004. The read rate averaged 0.70 per second. The write rate averaged 1.61 per second. This disk was busy for an average of 1.25 percent of the time and was 7.49 percent busy at peak times.

The I/O rate on disk device sdb averaged 8.95 per second and peaked at 18.26 per second from 23:40:00 to 23:50:01. on 10/12/2004. The read rate averaged 0.19 per second. The write rate averaged 8.77 per second. This disk was busy for an average of 1.26 percent of the time and was 12.26 percent busy at peak times.

The I/O rate on disk device sdc averaged 3.51 per second and peaked at 33.91 per second from 21:00:00 to 21:10:01. on 10/12/2004. The read rate averaged 0.24 per second. The write rate averaged 3.28 per second. This disk was busy for an average of 0.90 percent of the time and was 19.84 percent busy at peak times.

The I/O rate on disk device sdd averaged 49.08 per second and peaked at 1234.78 per second from 18:10:01 to 18:20:00. on 10/12/2004. The read rate averaged 44.11 per second. The write rate averaged 4.97 per second. This disk was busy for an average of 9.16 percent of the time and was 83.66 percent busy at peak times.

The I/O rate on disk device sde averaged 31.47 per second and peaked at 214.14 per second from 22:50:01 to 23:00:00. on 10/12/2004. The read rate averaged 18.35 per second. The write rate averaged 13.12 per second. This disk was busy for an average of 20.98 percent of the time and was 99.88 percent busy at peak times.

The value of the ctrl-alt-del parameter was 0. The value of 0 is better in almost all cases because it prevents an immediate reboot if the ctrl, alt, and delete keys are pressed simultaneously.

There were an average of 623.61 interrupts per second and the peak interrupt rate seen was 4294.91 per second from 10:00:00 to 10:10:00. The following graph shows the total interupt rate during the monitoring period.

Graph of Interrupt rate

CAPACITY PLANNING SECTION

The section is designed to provide the user with a rudimentary linear capacity planning model and should be used for rough approximations only. These estimates assume that an increase in workload will affect the usage of all resources equally. These estimates should be used on days when the load is heaviest to determine approximately how much spare capacity remains at peak times.

Based on the limited data available in this single day of data, the system cannot support an increase in workload at peak times without some loss of performance or reliability, and the bottleneck is likely to be disk I/O. Implementation of some of the suggestions in the recommendations section may help to increase the system's capacity.

Graph of remaining room for growth

The CPU can support an increase in workload of at least 100 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report. The system was actively trying to meet the need for free memory. A significant increase in workload may have to be accompanied by an increase in memory. The busiest disk can support a workload increase of approximately 0 percent at peak times. For more information on peak CPU and disk utilization, refer to the Resource Analysis section of this report.

NOTE: Sorry, capacity planning calculations for filesystem statistics are under development. We don't know when this will be resolved but we're working on it. Licensees with software subscriptions have access to new features as soon as they're ready.

CUSTOM SETTINGS SECTION

The default SYSUSR threshold was changed in the sarcheck_parms file from 2.5 to 2.8.

Please note: In no event can Aptitune Corporation be held responsible for any damages, including incidental or consequent damages, in connection with or arising out of the use or inability to use this software. All trademarks belong to their respective owners. This software is provided for the exclusive use of: test. This software expires on 01/09/2005 (mm/dd/yyyy). Code version: SarCheck for Linux 5.02.01. Serial number: 66666876.

Thank you for trying this evaluation copy of SarCheck. To order a licensed version of this software, just type 'analyze -o' at the prompt to produce the order form and follow the instructions.

(c) Copyright 2003-2004 by Aptitune Corporation, Plaistow NH 03865, USA, All Rights Reserved. http://www.sarcheck.com/

Statistics for system: abcdef
  Start of peak interval End of peak interval Date of peak interval
Statistics collected on: 10/12/2004      
MAC Address: 00:02:B3:3A:41:58      
Average combined CPU utilization: 6.22%      
Average user CPU utilization: 2.62%      
Average sys CPU utilization: 3.05%      
Average 'nice' CPU utilization: 0.55%      
Peak combined CPU utilization: 26.68% 20:20:00 20:30:01 10/12/2004
Average page out rate: 2055.13/sec      
Peak page out rate: 20826.47/sec 21:10:01 21:20:01 10/12/2004
Average swap out rate: 0.55/sec      
Peak swap out rate: 16.83/sec 21:00:00 21:10:01 10/12/2004
Average swap space in use: 145.17 megabytes      
Peak swap space in use: 282.25 megabytes 10:00:00   10/12/2004
Average amount of free memory: 7193 pages or
28.1 megabytes
     
Minimum amount of free memory: 1063 pages or
4.15 megabytes
09:10:00   10/12/2004
Average system-wide I/O rate: 95.32/sec      
Peak system-wide I/O rate: 1336.36/sec 18:10:01 18:20:00 10/12/2004
Average read rate: 63.58/sec      
Peak read rate: 1233.26/sec 18:10:01 18:20:00 10/12/2004
Average write rate: 31.74/sec      
Peak write rate: 200.07/sec 21:00:00 21:10:01 10/12/2004
Disk device w/highest peak: sde      
Avg pct busy for that disk: 20.98%      
Peak pct busy for that disk: 99.88% 21:10:01 21:20:01 10/12/2004
Average Interrupt rate: 623.61/sec      
Peak Interrupt rate: 4294.91/sec 10:00:00 10:10:00 10/12/2004
Approx CPU capacity remaining: 100%+      
Approx I/O bandwidth remaining: 0.0%      
Can memory support add'l load: Moderate