In a standard PC all those components are at least once available.
In a PC with full dual performance, all those components should be available at least twice. That means two Arithmetic Units but also two Discs. In such a PC the performance is the same if you execute the same program once or twice simultaneous in parallel, assuming that each program runs in a different processor. This is true both when the application is highly CPU bound, meaning that it requires many arithmatic calculations and uses the Arithmetic Unit heavily, versus IO bound, meaning that the program uses the Disc heavily.
This is not necessarily the case if the Disc is not installed redundant. In that case the performance of two CPU bound programs will not change if you execute the program once or twice. On the other hand the performance of two IO bound programs will decrease. The reason is during execution both programs want to use the same resource (the disc) simultaneous, which is not possible, resulting in a loss in performance of both.
Detail
Now let us study each of the components in more detail.
- The disc is the main primary storage unit which contains programs and data. Its performance is slow.
- Slow Memory is secondary level of storage for programs and data. Its performance is medium. Programs that the user wants to execute are copied the disc to slow memory and when finished (modified) programs are copied from slow memory to disc.
- Fast memory is a third level of storage for programs and data. Its performance is fast. Programs in fast memory are ready to be executed. The operating system takes care for control between fast and slow memory.
- Tasks are the indivudual programs of a PC (in slow or fast memory). Each task is identified by what is called a task status buffer or TSB The task status buffer contains all necessary data to control and execute a program properly. In order to do that the TSB contains the start address, a continuation address and the time. Each task status buffers is link to next task status buffers.
- The start address contains the location where the program begins. This is necessary in order to start the program. The continuation address is important to know when program execution is temporary interrupted. The time is important to show the next time when the program will be started.
-
This whole sequence of task status buffers forms what is called a chain. The same name for task is thread.
- The Task Manager has three tasks:
- To give control to the next task or (small) thread in the chain.
- To update the information in the task status buffer when a program is delayed or when finished.
- To add or remove the links between the task status buffers when tasks (small threads) are added or removed.
-
There are four types of registers:
- Program Register or Instruction Counter. This register contains the memory adress of the Program. After each instruction executed this address is increased with one (assuming all instructions have the same length) or the Program Register gets the address as decided by a jump (goto or if) instruction.
The Task Manager initialy copies the start address from the TSB into the Program Register. The revers happens when the program is temporarily delayed.
- Instruction Register. This register contains the instruction copied from fast memory at the Program Register. It is possible that in a single processor CPU the CPU has more instruction registers to improve performance.
- Data Registers. Normally many. There are two types: Input data registers and output data registers.
- The Arithmatic Unit is the most important part. This is the center of the CPU were the mathematical operations like: Add, Subtract, Multiply and Divide are performed. Input comes from the input registers and output is stored in the output registers.
This more or less describes all the units of a CPU.
Multi Processors and Hyper-Threading
In a single core CPU you have at least: fast memory, one set of Registers and one Aritmatic Unit. This is called one Pipe line. The maximum performance is 100%.
In a dual core CPU you have two complete separate Pipe Lines. The maximum performance of each pipeline is 50% of the full performance. The most important requirement is two separate Program Registers.
When you have two separate Program Registers you can control two Programs simultaneous in order to solve two different problems. That means multi-processing. You can also use multi-programming in order to solve one problem. Multi-programming gives the possibility to divide one program into two tasks or threads and to execute both tasks or threads in parallel. Multiprogramming requires synchronisation between the tasks or threads.
In a Quad core processor CPU you have four complete separate Pipe Lines. The maximum performance of each pipe line is 25%. That means you can have 4 independent programs or 4 dependent programs solving 1 program using synchronisation.
In a "1 core"/"2 thread" CPU the "2 thread" stands for Hyper-Threading. Hyper-Threading means that only the Registers are implemented twice but not the Arithmatic Unit. This means that from the user (operating system) point of view this is a dual processor system and the first impression is that such a configuration is twice as powerfull. But this is not necessary true because there is only one Arithmatic Unit. Internally this is a single core processor.
- The most benefit and improved performance you will have from such a configuration if in one hyper-thread the program is highly IO bound versus in the other hyper-thread highly CPU bound is. If first program needs the Disc the second program can immediate take over and use the Arithmatic Unit until the Disc operation of the first program is finished. That means such a usage depents highly on switching. A CPU with a lot of fast memory and at least all the registers implemented twice will outperform "in this case" a CPU which does not have those capabilities.
- The least benefit you will have in such a configuration if in both hyper-threads both programs are highly CPU bound. The reason becomes easy because both programs perform mathematical calculations and require the Arithmatic Unit. That part is the bottle neck and because it is not implemented twice the overall performance will not improve.
The performance could even decrease because:
- first in case you want to solve one physical application and use the CPU for 100% you have to use parallel programming. That means you have to divide your application in two independent parts. You also need extra synchronisation between both parts in order to find the solution. This synchronization requires extra CPU power which will decrease overall performance. Those independent parts become separate hyper-threads during program execution.
- secondly because extra CPU hardware to support hyper-threads could even result in a slower clock speed. Extra hardware to support extra cores can als have the same effect.
In the above text the word Hyper-Threading is used, in order to describe an architecture were from the user point of view the CPU looks to have double the number of processors i.e. two arithmatic units but physically this is not the case. A much better name is Virtual Processors.
On a CPU i5 which is a 2 Core/4 Thread when I load the program PlanetS which is written in Visual Basic 2010 and which does not use the BackGroundworker the number of threads used by the task manager is 5. This is inaccordance with the number of event-handlers. This seams the case for all Visual Basic 2010 programs loaded independent of the number BackGroundworkers used. The number of processes that each program represent is 1.
On that same CPU when I end program PlanetPP (Also Visual Basic 2010) instantaneous which 5 BackGroundworkers active the total number of threads decreases with 10.
This tells me that a thread (in the context of the task manager) is much more a task (programme) and that the task of the task manager is to assign (or to remove) a task to one of the (in the case of an i5) 4 Virtual Processors. That means an i5 has 2 Processors and 4 Virtual Processors.
Core 1
Virt Proc 1 Virt Proc 2
---------- -----------
Fast Pr Mem 1 Fast Pr Mem 2
| |
P Reg 1 P Reg 2
| |
Ins Reg 1 Ins reg 2
| |
------------------
|
I1 I2 I3 | O1 O2 O3
|
Artihmatic Unit 1
Add 1 Sub 1 Mul 1 Div 1
Move 1
|
Core 2
Virt Proc 3 Virt Proc 4
----------- -----------
Fast Pr Mem 3 Fast Pr Mem 4
| |
P Reg 3 P Reg 4
| |
Ins reg 3 Ins Reg 4
| |
------------------
|
I1 I2 I3 | O1 O2 O3
|
Artihmatic Unit 2
Add 2 Sub 2 Mul 2 Div 2
Move 2
|
***********************
* Common Memory *
* *
* M1 M2 M3 M4 *
***********************
^
|
|
|
<-------------
|
Figure 1: 2 Core / 4 Thread CPU
|
Figure 1 Shows a 2 Core / 4 Thread. That means:
- there a 4 Virtual Processors. Each Virtual Processor contains one of:
- Fast Program Memory, Program Register and Instruction Register
-
The Program Register is a pointer to the Fast Program Memory. The Instruction Register contains the Instruction at that Location.
- There are 2 Arithmatic Units. Each Arithmatic Unit contains one of:
- Logical Input Registers, Logical Output Registers, Adder, Subtracter, Multiplier, Divider and Move
In the case of a 2 Core / 2 Thread there are no Virtual Processors 2 and 4.
There is one Common Memory. Both Arithmatic Units can directly adress this Common Memory by means of a move operation (or something equivalent) With the move operation they can read and write data from Common Memory to and from their input and output registers.
In the case of parallel programming you reserve for each program a small area of this Common Memory to store data, identified as M1, M2, M3 and M4.
One usage of this data is as a Command. In the programs PlanetPP and FibonacciPP this is implemented as the array STATE(i).
For parallel programming to work properly with Visual Basic:
- You have to subdivide your application into independent branches. Each branch is implemented as a Backgroundworker. Each Backgroundworker operates as a different thread in a different Virtual Processor.
- One Backgroundworker services as the master. The other Backgroundworkers are slaves.
- The master sets all Commands to a specific value inorder to inform the slaves what to do. Each slave investigate its own Command inorder to know what to do using either M2, M3 or M4. For example Backgroundworker 2 uses STATE(2)
- When finished each slave clears its own Command.
- Finally for the master this is an indication that one iteration is finished and that the next one can start.
By following this strategy there is no chance of race conditions.