System monitoring with the Sysstat package

A common task for System Administrators is to monitor and care for a server.
Thats fairly easy to do at a moments notice, but how to keep a record of this information over time?

On way to monitor your server is to use the Sysstat package.
http://perso.orange.fr/sebastien.godard/

Systat is actually a collection of utilities designed to collect information about the performance of a linux installation, and record them over time.

Its fairly easy to install too, since it is included as a package on many distributions.
To install on Centos 4.3, just type the following:

yum install sysstat

or you may use the apt-get to install it.

We now have the sysstat scripts install on the system. Lets try the sar command.

sar
Linux 2.6.16-xen (xen30) 08/17/2006

11:00:02 AM CPU %user %nice %system %iowait %idle
11:10:01 AM all 0.00 0.00 0.00 0.00 99.99
Average: all 0.00 0.00 0.00 0.00 99.99

Several bits of information, such as Linux kernel, hostname, and date are reported.
More importantly, the various ways CPU time being spent on the system is shown.
%user, %nice, %system, %iowait, and %idle describe ways that the CPU may be utilized.
%user and %nice refer to your software programs, such as Mysql or Apache.
%system refers to the Kernel’s internal workings.
%iowait is time spent waiting for Input/Output, such as a disk read or write. Finally, since the kernel accounts for 100% of the runnable time it can schedule, any unused time goes into %idle.

The information above is shown for a 1 second interval. How can we keep track of that information over time?
If our system was consistently running heavy in %iowait, we might surmise that a disk was getting overloaded, or going bad.
At least, we would know to investigate.

So how do we track the information over time? We can schedule sar to run at regular intervals, say, every 10 minutes.
We then direct it to send the output to sysstat’s special log files for later reports.
The way to do this is with the Cron daemon.

By creating a file called ’sysstat’ in /etc/cron.d, we can tell cron to run sar every day.
Fortunately, the Systat package that yum installed already did this step for us.

more /etc/cron.d/sysstat
# run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib/sa/sa1 1 1
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib/sa/sa2 -A

The sa1 script logs sar output into sysstat’s binary log file format, and sa2 reports it back in human readable format.
The report is written to a file in /var/log/sa.

ls /var/log/sa
sa17 sar17

sa17 is the binary sysstat log, sar17 is the report. (Today’s date is the 17th)

There is quite alot of information contained in the sar report, but there are a few values that can tell us how busy the server is.
Values to watch are swap usage, disk IO wait, and the run queue.
These can be obtained by running sar manually, which will report on those values.

sar
Linux 2.6.16-xen (xen30) 08/17/2006

11:00:02 AM CPU %user %nice %system %iowait %idle
11:10:01 AM all 0.00 0.00 0.00 0.00 99.99
11:20:01 AM all 0.00 0.00 0.00 0.00 100.00
11:30:02 AM all 0.01 0.26 0.19 1.85 97.68
11:39:20 AM all 0.00 2.41 2.77 0.53 94.28
11:40:01 AM all 1.42 0.00 0.18 3.24 95.15
Average: all 0.03 0.62 0.69 0.64 98.02

There were a few moments where of disk activity was hi in the %iowait column, but it didnt stay that way for too long. An average of 0.64 is pretty good.

How about my swap usage, am I running out of Ram? Being swapped out is normal for the Linux kernel, which will swap from time to time. Constant swapping is bad, and generally means you need more Ram.

sar -W
Linux 2.6.16-xen (xen30) 08/17/2006

11:00:02 AM pswpin/s pswpout/s
11:10:01 AM 0.00 0.00
11:20:01 AM 0.00 0.00
11:30:02 AM 0.00 0.00
11:39:20 AM 0.00 0.00
11:40:01 AM 0.00 0.00
11:50:01 AM 0.00 0.00
Average: 0.00 0.00

Nope, we are looking good. No persistant swapping has taken place.

How about system load? Are my processes waiting too long to run on the CPU?

sar -q
Linux 2.6.16-xen (xen30) 08/17/2006

11:00:02 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
11:10:01 AM 0 47 0.00 0.00 0.00
11:20:01 AM 0 47 0.00 0.00 0.00
11:30:02 AM 0 47 0.28 0.21 0.08
11:39:20 AM 0 45 0.01 0.24 0.17
11:40:01 AM 0 46 0.07 0.22 0.17
11:50:01 AM 0 46 0.00 0.02 0.07
Average: 0 46 0.06 0.12 0.08

No, an average load of .06 is really good.
Notice that there is a 1, 5, and 15 minute interval on the right.
Having the three time intervals gives you a feel for how much load the system is carrying.
A 3 or 4 in the 1 minute average is ok, but the same number in the 15 minute column may indicate
that work is not clearing out, and that a closer look is warranted.

This was a short look at the Sysstat package. We only looked at the out put of three of sar’s attributes, but there are others.
Now, armed with sar in your toolbox, your system administration job just became a little easier.


FAQ
If you get this out put when you issue the SAR command as below, please make sure you have set cron for as discuss above to run first, so that it can create the binary data file.

[root@rat02 sa]# sar
Cannot open /var/log/sa/sa18: No such file or directory

No comments: