Multi Processors, Parallel Programming, Threads and Threading
Advantages and Disadvatages

This document is mainly about: Multi Processors, Parallel Programming, Threads and Threading.
Also the following Wikipedia articles are discussed:
  1. Parallel computing
  2. Hyper-threading
  3. Simultaneous multithreading
  4. Multithreading (computer architecture)
  5. Multi-core processor


Introduction

In order to understand Multi Processing and Threads and Hyper-Threading the following hardware and software components are important:
Disc, Slow Memory, Fast Memory, Tasks, Task Manager, Registers and Arithmetic Unit.
In a standard PC all those components are at least once available.
In a PC with full dual performance, all those components should be available at least twice. That means two Arithmetic Units but also two Discs. In such a PC the performance is the same if you execute the same program once or twice simultaneous in parallel, assuming that each program runs in a different processor. This is true both when the application is highly CPU bound, meaning that it requires many arithmatic calculations and uses the Arithmetic Unit heavily, versus IO bound, meaning that the program uses the Disc heavily.
This is not necessarily the case if the Disc is not installed redundant. In that case the performance of two CPU bound programs will not change if you execute the program once or twice. On the other hand the performance of two IO bound programs will decrease. The reason is during execution both programs want to use the same resource (the disc) simultaneous, which is not possible, resulting in a loss in performance of both.


Detail

Now let us study each of the components in more detail. This more or less describes all the units of a CPU.


Multi Processors and Hyper-Threading

In a single core CPU you have at least: fast memory, one set of Registers and one Aritmatic Unit. This is called one Pipe line. The maximum performance is 100%.
In a dual core CPU you have two complete separate Pipe Lines. The maximum performance of each pipeline is 50% of the full performance. The most important requirement is two separate Program Registers. When you have two separate Program Registers you can control two Programs simultaneous in order to solve two different problems. That means multi-processing. You can also use multi-programming in order to solve one problem. Multi-programming gives the possibility to divide one program into two tasks or threads and to execute both tasks or threads in parallel. Multiprogramming requires synchronisation between the tasks or threads.
In a Quad core processor CPU you have four complete separate Pipe Lines. The maximum performance of each pipe line is 25%. That means you can have 4 independent programs or 4 dependent programs solving 1 program using synchronisation.

In a "1 core"/"2 thread" CPU the "2 thread" stands for Hyper-Threading. Hyper-Threading means that only the Registers are implemented twice but not the Arithmatic Unit. This means that from the user (operating system) point of view this is a dual processor system and the first impression is that such a configuration is twice as powerfull. But this is not necessary true because there is only one Arithmatic Unit. Internally this is a single core processor.

In the above text the word Hyper-Threading is used, in order to describe an architecture were from the user point of view the CPU looks to have double the number of processors i.e. two arithmatic units but physically this is not the case. A much better name is Virtual Processors.

On a CPU i5 which is a 2 Core/4 Thread when I load the program PlanetS which is written in Visual Basic 2010 and which does not use the BackGroundworker the number of threads used by the task manager is 5. This is inaccordance with the number of event-handlers. This seams the case for all Visual Basic 2010 programs loaded independent of the number BackGroundworkers used. The number of processes that each program represent is 1.
On that same CPU when I end program PlanetPP (Also Visual Basic 2010) instantaneous which 5 BackGroundworkers active the total number of threads decreases with 10.
This tells me that a thread (in the context of the task manager) is much more a task (programme) and that the task of the task manager is to assign (or to remove) a task to one of the (in the case of an i5) 4 Virtual Processors. That means an i5 has 2 Processors and 4 Virtual Processors.

            Core 1             

 Virt Proc 1       Virt Proc 2 
 ----------        ----------- 
Fast Pr Mem 1     Fast Pr Mem 2
     |                  |      
   P Reg 1           P Reg 2   
     |                  |      
  Ins Reg 1         Ins reg 2  
     |                  |      
      ------------------       
              |                 
    I1 I2 I3  |  O1 O2 O3       
              |                 
       Artihmatic Unit 1        
  Add 1  Sub 1  Mul 1  Div 1    
           Move 1               
            Core 2              

 Virt Proc 3       Virt Proc 4  
 -----------       -----------  
Fast Pr Mem 3     Fast Pr Mem  4
     |                  |       
  P Reg 3            P Reg 4    
     |                  |       
 Ins reg 3          Ins Reg  4  
     |                  |       
      ------------------        
              |                 
    I1 I2 I3  |  O1 O2 O3       
              |                 
       Artihmatic Unit 2        
  Add 2  Sub 2  Mul 2  Div 2    
            Move 2              



   ***********************
   *   Common   Memory   *
   *                     *  
   *   M1  M2   M3  M4   *
   ***********************
              ^           
              |           
              |           
              |           
<-------------            




Figure 1:   2 Core / 4 Thread  CPU
Figure 1 Shows a 2 Core / 4 Thread. That means: In the case of a 2 Core / 2 Thread there are no Virtual Processors 2 and 4.

There is one Common Memory. Both Arithmatic Units can directly adress this Common Memory by means of a move operation (or something equivalent) With the move operation they can read and write data from Common Memory to and from their input and output registers.

In the case of parallel programming you reserve for each program a small area of this Common Memory to store data, identified as M1, M2, M3 and M4. One usage of this data is as a Command. In the programs PlanetPP and FibonacciPP this is implemented as the array STATE(i). For parallel programming to work properly with Visual Basic:

  1. You have to subdivide your application into independent branches. Each branch is implemented as a Backgroundworker. Each Backgroundworker operates as a different thread in a different Virtual Processor.
  2. One Backgroundworker services as the master. The other Backgroundworkers are slaves.
  3. The master sets all Commands to a specific value inorder to inform the slaves what to do. Each slave investigate its own Command inorder to know what to do using either M2, M3 or M4. For example Backgroundworker 2 uses STATE(2)
  4. When finished each slave clears its own Command.
  5. Finally for the master this is an indication that one iteration is finished and that the next one can start.
By following this strategy there is no chance of race conditions.


Wikipedia

In Wikipedia the subjects: Multi Processors, Parallel Programming, Threads and Threading are discussed in different documents.

Parallel programming requires:

  1. First that each branch uses it own set of data.
  2. Secondly you need some form of synchronization between the branches.
Please read the following documnent Visual Basic 2010 Parallel Programming to study how parallel programming is implemented using Visual Basic 2010

1. Wikipedia Parallel computing

In the paragragph: Race conditions, mutual exclusion, synchronization, and parallel slowdown
is written:
Threads will often need to update some variable that is shared between them.
Why do you want to do that? One example could be that both are updating the same counter. That means you have two branches and in each branch you want to perform the statement: COUNT = COUNT + 1. The problem is here that you do want to perform both programs simultaneous except this statement. Next is written:
This is known as a race condition.
The race condition arises because the macro statement COUNT = COUNT + 1 in reality are three micro statements. (One get, One add and one put)
Next is written
The programmer must use a lock to provide mutual exclusion. A lock is a programming language construct that allows one thread to take control of a variable and prevent other threads from reading or writing it, until that variable is unlocked.
That is true but the text does not explain how this is done in detail. How do you communicate between processors ?
In the next line is explained that program A performs the statement "1A Lock variable V" and program B the statement: "1B Lock variable V"
How do you know that both programs don't try to execute the same statements at the same time ? You don't know or it is not obvious that this does not happen. Next is written:
One thread will successfully lock variable V (1), while the other thread will be locked out—unable to proceed until V is unlocked again. This guarantees correct execution of the program.(2)
(1) Of course that is what you hope, but this is not that simple. Any sucessfull strategy requires that the two programs are not identical. One strategy is different priorities. A different strategy is a master slave concept which is used in the VB 2010 document.
(2) is correct if (1) is correct, but how do you guarantee (1) ?
IMO the whole concept of locking variables is not very practical and should not be used.
Next is written:
Many parallel programs require that their subtasks act in synchrony. This requires the use of a barrier. Barriers are typically implemented using a software lock. One class of algorithms, known as lock-free and wait-free algorithms, altogether avoids the use of locks and barriers. However, this approach is generally difficult to implement and requires correctly designed data structures.
Synchronisation is a very important issue if you want to use parellel programming in order to solve one application. When you follow the approach outlined in the VB 2010 document you will see that this is not difficult to implement using Visual Basic 2010 and the BackGroundworker

2. Wikipedia Hyper-threading

In the first paragragh is written:
For each processor core that is physically present, the operating system addresses two virtual processors, and shares the workload between them when possible.
That is correct. In a 1 Core/2 Thread CPU the "2 Threads" mean two virtual processors.
Next is written:
Intel recommends disabling HTT when using operating systems that have not been optimized for this chip feature. [1]
It is interesting that Intel mentions this. I want that. But if I ask to Microsoft how to do that there is no response. See: What is the reason that a pentium 4 3.0 behaves as a 1Core/2Thread

In the paragragh called: 3 Performance
is written:

The advantages of hyper-threading are listed as: (1) improved support for multi-threaded code, (2) allowing multiple threads to run simultaneously, (3) improved reaction and response time.
(1) This is only an advantage if the use of multi-threaded code has an advantage. What exactly is multi-threaded code ?
(2) This is only an advantage if "multiple threads to run simultaneous" have an advantage. That this is allowed is not directly an advantage.
(3) Reaction time is only an issue if your application requires a lot of man machine interaction. Response time has to do with duration and that is most often a function of the clock speed.
Next is written.
Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4.
My test do not support this claim: CPU performance P4 2.8 versus P4 3.0 Next:
Tomshardware.com states "In Some Cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz without HT turned on".
Same comment. This is maybe only true if the duration of the test is less than 1 minute. Next:
The performance improvement seen is very application-dependent, however when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper Threading Technology is turned on.
I do not call a 50% decrease in performance slightly.

3. Wikipedia Simultaneous multithreading

In the first paragraph is written:
SMT permits multiple independent threads of execution to better utilize the resources provided by modern processor architectures.
That is maybe true, but this is no guarantee for better overall performance, compared with a single core / one thread CPU with the same clock speed.

In the paragraph: 2.5 Disadvantage is written:

Simultaneous multithreading cannot improve performance if any of the shared resources are limiting bottlenecks for the performance. In fact, some applications run slower when simultaneous multithreading is enabled.
This is typical true in a 1C/2T versus a 1C/1T, but the text does not explain that. See also reflection.

4. Wikipedia Multithreading (computer architecture)

This document contains a paragraph: 1.1 Advantages
The question is if those advantages are applicable related to Visual Basic 2010. I doubt this. It should be remembered that you should compare a 1C/2T with a 1C/1T or a 2C/4T with a 2C/2T and explain the advantages strictly based on an increase in Virtual Processors (threads) not on Physical Processors (cores).

In the paragraph: 1.2 Disadvantages is written:

Hardware support for multithreading is more visible to software, thus requiring more changes to both application programs and operating systems than Multiprocessing.
This gives the wrong impression particular because of the comparison with Multiprocessing. Both require in order to solve one physical application parallel programming and synchronization. The claim is only true in the case of a single core.
In the next sentence we read:
Intel claims etc while a program performing a loop of non-optimized dependent floating-point operations actually gains a 100 percent speed improvement when run in parallel.
I doubt that. The question is also here do they compare a 1C/1T with a 1C/2T.
I have done a similar test with the program VStest1. See CPU Performance and CPU Performance Pentium 4. What those tests show on a i5, which is an 2C/4T, that the total performance increase with a factor 2 strictly comes because this is a 2 Core and not because this is a 4 Thread.

5. Wikipedia Multi-core processor

In the paragragh called: 2 Development
is written:
Many applications are better suited to thread level parallelism (TLP) methods,
I doubt if that is true, specific in the sense of Virtual Processors. What is true that certain applications could benefit from multi core processors specific if they do not require parallel programming, that means if they can be divided into two parts which are independent.

In the paragraph: 2.3 Advantage is written:

The largest boost in performance will likely be noticed in improved response-time while running CPU-intensive processes, like antivirus scans, ripping/burning media (requiring file conversion), or file searching.
This is not true. Antivirus scans and file searching are both applications which are highly IO bound. If you excute them in parallel each will heavily influence the performance of the other because they have to wait. The most benefit could come from applications which use the Arithmatic Unit for 100%.

In the paragraph: 2.4 Disadvantage is written:

Also, the ability of multi-core processors to increase application performance depends on the use of multiple threads within applications.
In stead of threads they should have written: independent branches

Reflection

IMO the general message about the advantage of threading i.e. virtual processors is much too optimistic.
All documents do not make a clear distinction between parallel computing on a CPU which the same number of cores and threads versus an architecture where the number of threads is twice the number of cores. Simpler is to say: a CPU which does not versus which supports the virtual processor concept. For mathematical applications the virtual processor concept is clearly a disadvantage.
Microsoft and Intel together should supply the tools to disable virtual processors. If it is possible to produce the same CPU's without virtual processors but with higher clock speeds than that is even much better.


If you want to give a comment you can use the following form Comment form
Created: 22 Februari 2011

Back to my home page Contents of This Document