It was the first time yesterday that I took sometime to understand what is the “average load” that appears with the top shell command. It turns out that it is a very informative number, if used right.
Linux averages the number of processes running, or can run – but may be blocked waiting for I/O. The average is calculated three times: for a minute, 5 minutes and 15 minutes.
In case that no processes were blocked on I/O, this number should always be less than the number of available CPU cores. For insance, if the load average is 2.6 this means that in average 2.6 processes were running or wanted to run. If you have only two cores, this means that there’s in average 0.6 processes that cannot find an available CPU to run, which means that you need to upgrade your machine (or server).
Since this number grows when processes are blocked on I/O, it is nice to know whether there’s any blocked on I/O or not before going ahead and spending the last penny to buy new hardrware. The solution is a tool called atop. If the disks are stressed and cannot keep up with the requests (and consequently processes will be waiting for the disk) you will see the DSK line (or lines if you had more than one) in red.
At that point you should find who is the culprit, and do something to run that process in a better way.