When you ask yourself how much load is put on a certain object, for example, an employee in a company or a processor in a computer, you always observe a certain time interval and check how long the object has been working in that period and how many breaks have been taken. Here, we assume (different from real employees) that the object observed is always performing its operations with exactly the same speed.
The time intervals monitored, in which operations (executing the code of running programs) are distributed onto individual processors in a computer, are very short. For most systems, this period is 10 ms (0.01 seconds); this includes macOS. A processor core with 2.5 GHz can execute 25 million sub-operations in this time interval.
Let’s assume a processor does a given job in the current scheduling period within 2 ms. So it had a time of 10 ms for the task, but the job was finished after 2 ms, i.e. until the next task comes in, it will be idle for 8 ms. In this example, the load would be 20%.
If a core is under full load, at 100%, this will mean it had no chance to take a break. In the 10 ms given, it was busy all the time. The job could not be finished, so the core still has to work in the next time interval.
When a core is not under full load in practice, this typically does not mean that the given tasks are “too simple”. In most cases, a processor cannot continue its work because it has to wait for incoming data from other components, e.g. another processor, or a device like main memory, disk storage, network interface, etc. In an interactive application —a process which communicates directly with a user— the processor is typically waiting most of the time for the user sitting in front of the screen, who basically works several million times slower than the processor itself.
During idle times —in our initial example when the processor had been working for 0.002 seconds, but then had to wait for 0.008 seconds— modern processor cores usually enter standby mode. Most parts of the core are shut down for the mentioned 0.008 seconds and the power is cut off. The energy consumption decreases in a degree which matches the load. In addition, the operating temperature will be reduced drastically.
So far, we have considered the load of a single processor only. Up-to-date computers contain multiple processors, however. How do you calculate the total load put on the entire computer, not only on a single processor? To make things simple, we look at a computer with 2 processors only, and imagine a job which can be distributed in any percentage onto the two available processors.
If only one processor does the job and is under full load, while the other is doing nothing, it will be easy to understand that the total load of the computer should be defined as being 50% in this case. Only one of the two processors is working, so this is only the half. The system could complete double the job in the same time, if the second processor was also put under full load. So if one processor has 100% load and the other 0%, the total load will be 50%.
The distribution of load is unbalanced and not “fair” in this case. Both processors could share the load evenly, which would result in a load of 50% for each of them. You can surely imagine that it would also be possible for one to complete 75% of the job, while 25% are remaining for the other. We do not require a mathematical derivation here, but it can be seen easily that the appropriate metric for the total load of a computer is the average (arithmetic mean) of the loads of all individual processors.