CPU Performance Evaluation - Visual Basic 2010 Benchmark

Question 1 Which Single Core CPU has the best performance
Question 2 Which Dual Core CPU has the best performance
Question 3 Which Quad Core CPU has the best performance
Question 4 Is it worthwhile to uses parallel programming in simulations
Question 5 What should a benchmark test
Question 6 What is the best CPU architecture
Question 7 Is Amdahl's Law true


Purpose

If you study this homepage you will see that one central theme is computer simulations of physical systems, specific the planets around the Sun. The programs are written in Quick Basic and Visual Basic. Some programs are written using Excel, which in practice means also in Visual Basic.
Recently I bought a new PC. In order to test I run a planet simulation: Planet3Dsimple and I discovered that the performance was less than my previous PC. Planet3Dsimple is written using Visual Basic 5.0.
In order to better understand I tested the same program for 5 different CPU's. The program tested has a processor load of 100%. Following are the results:

  1. Intel Core i5 M 460 @ 2.53 GHzThis one has a CPU Mark of 2539. Evaluation 7 planets: 5.2
  2. Intel Pentium 4 2.80 GHzThis one has a CPU Mark of 415. Evaluation 7 planets: 8.38
  3. Intel Pentium 4 3.00 GHzThis one has a CPU Mark of 491. Evaluation 7 planets: Start value 9.5 and drops.
  4. GenuineIntel X86 Family 6 Model 8 Stepping 3. Evaluation 7 planets: 2.54
  5. GenuineIntel Pentium(r) II processor intel MMX(TM) technology. Evaluation 7 planets: 1.44
CPU 1 is a DUAL core and supports 4 Threads. This means that the same program can be executed 4 times in order to reach 100% load. If you load the program 1, 2, 3 or 4 times the average performance figures are: 5.2 4.8 4.1 and 3.6
CPU 2 is a Single core and supports 1 Thread.
CPU 3 is a Single core and supports 2 Threads.
The results of the test show that CPU 2 has the best performances to solve a single solution. The problem with CPU 3 is that the performance is not constant. For more detail see: Benchmark Pentium 4 For CPU 1 the performance is not as expected and decreases if more load is added.

For a detailed evaluation of Visual Basic 2010 go here: Visual Basic 2010 Evaluation


Parallel Programming

For an introduction read this: Visual Basic 2010 Parallel Programming techniques
The next step is to write a Visual Basic program and use parallel programming.
Visual Basic 2010 (Visual Studio) supports parallel programming and parallel threading.
To get more information about threading in general read this Microsoft MSDN document:
Threads and Threading and Wikipedia OpenMP - Open Multi Programming
More detailed information about Multi Programming and Threading including advantages and disadvantages: Multi processors and Threads

In order to support parallel programming Visual Basic 2010 uses what is called a Backgroundworker. Each backgroundworker is equivalent with a thread and runs in a different processor. To get more information what a backgroundworker is read this Microsoft MSDN document: BackGroundWorker.
The document describes two examples.

The Backgroundworker program example is operated via what is called a Form. The Form contains two buttons: A Start button and a Cancel Button.
The Backgroundworker program example consists of 4 Event Handlers and one BackGroundworker. The BackGroundworker example shows that in order to control one thread you need 4 event handlers: on to start, on to stop, one to monitor and one to show the results. The Backgroundworker program example creates one thread which generates the numbers 10, 20 .. 100 and then is finished. The example uses effective only one Processor. This is not an example of parallel programming.


Test 1: VStest1 CPU 1

In order to test CPU 1 I started with the following two programs: VStest1 and VStest2.
The most important characteristic of VStest1 is, that the calculations executed in each thread are independent of each other. In fact you are solving 4 independent problems. In VStest2 this is not the case. In VStest2 they are linked to make it one problem. See below in this document.

To get a copy select VB2010 exe.zip.

When you try to execute the "VB 2010" programs may be the following error message is displayed:
"To run this application, you first must install one of the following versions of the .NET Framework: v4.0.30319".
You can get this download from: Microsoft .NET Framework 4 (Web Installer) . In my case this happened with CPU 3. For a comparison between CPU 2 and CPU 3 go to: Benchmark Pentium 4 3.0. The fact that CPU 3 is slow in some tests is not caused by this download.

In order to write VStest1 the following tasks were performed:

  1. The Backgroundworker Dowork, the ProgressChanged EH and the RunWorkCompleted EH three were each copied 4 times to manage 4 threads. This allows you to test 4 Processors simultaneous.
  2. The StartAsyncButton EH was modified to start a new thread each time when the Start Button is pressed.
  3. The same was done for the CancelAsyncButton EH in order to terminate a thread each time when the cancel button is pressed.
  4. A parameter np (# of processors) was created to monitor the number of active processors.
  5. The central calculation of the BackGroundworker Dowork was modified to perform a 10 by 10 matrix operation using two parameters j (inner loop) and i (outer loop). The whole matrix operation can be executed a certain number of times, which is called the load factor. This whole calculation is called one cycle. The total number of cycles per second are calculated and that gives a performance number.

Operation VStest1

When you start the program you get a control form which contains of 2 parts.


Test results - VStest1

  1. For a load factor of 1 and by selecting the Start Button 4 times the individual performance factors for processor 1 are: 76453, 70368, 50582 and 38816.
    What those numbers mean is that the more processors you use (each with a different problem) the performance of processor 1 diminishes
    This is not what you maybe are expecting, because the programs in each thread (processor) are independent of each other.
  2. For a load factor of 1 and by selecting the Start Button 4 times the total performance factors are: 76453, 141270, 155483 and 160997.
    What those numbers mean is that the more processors you use (with each a different problem) the total CPU performance does not increase linear
    141270 = 2 * 70368, 155483 = 3 * 50582 and 160997 = 4 * 38816.
  3. For a load factor of 1000 and by selecting the Start button 4 times the individual performance factors for processor 1 are: 101, 90, 68 and 44
  4. For a load factor of 1000 and by selecting the Start Button 4 times the total performance factors are: 101, 181, 200 and 204.

fotoVStest1.1

VStest1 - Load factor 1

fotoVStest1.1000

VStest1 - Load factor 1000

VStest1 requires the following:

Listings VStest1: VStest1Form.vb and VStest1Form.Designer.vb


Test 2: VStest2 CPU 1

The most important characteristic of VStest1 is, that the calculations executed in each thread are independent of each other. In fact you are solving 4 independent problems.
However that is not what we want. What we want is to divide the problem into parts (equal to the number of processors available) and to solve the whole problem, subdivided into parts, in parallel, in each processor. This requires extra communication and synchronisation compared to VStest1.
                                      np--------------------      
                                      |                     V     
 StartAsyncPB EH------                |          ------> Doworkpp2
                      |               |         |           .     
                      V               |         V           V     
CancelAsyncPB EH---> npreq --> Doworkpp1 <--> state() <--> Doworkpp3
                      ^                         ^           .     
                      |                         |           V     
   EndAsyncPB EH------                           ------> Doworkpp4
 
                 Figure 1  - Parallel Communication               
In order to do that the following modifications were made: To learn more about parallel programming go to this document:
Visual Basic 2010 Parallel Programming

Operation VStest2

When you start the program you get a control form which contains of 2 parts.


Test results - VStest2

  1. For a load factor of 1 and by selecting the Start Button 4 times the performance factors are resp: 39341, 49295, 36251 and 33179
    What those numbers mean is that the more processors you use (to solve one problem) the optimum performance is with two processors
  2. For a load factor of 1000 and by selecting the Start button 4 times the performance factors are resp: 46, 74, 70 and 79.
  3. For a load factor of 5000 and by selecting the Start button 4 times the performance factors are resp: 9, 15, 16 and 16

fotoVStest2.1

VStest2 - Load factor 1

fotoVStest2.1000

VStest2 - Load factor 1000

VStest2 requires the following:

Listings VStest2: VStest2Form.vb and VStest2Form.Designer.vb


VStest1 and VStest2 with CPU 2

Following are the results with VStest1 and CPU 2:
  1. For a load factor of 1 and by selecting the Start Button 4 times the individual performance factors are: 110591, 55937, 38425 and 28944
    What the number 55937 means that if you start a second thread the performance of the first decreases by roughly 50%.
    This is as expected because 50% of the load goes to thread 1 and 50 % to thread 2
  2. For a load factor of 1 and by selecting the Start Button 4 times the total performance factors are: 110591, 110633, 112120 and 113443
    What those number means is means that if you start more threads the total performance stays constant
    This is as expected because CPU 2 has only one processor.
  3. For a load factor of 1000 and by selecting the Start Button 4 times the individual performance factors are resp: 174 , 88, 59 and 47
  4. For a load factor of 1000 and by selecting the Start Button 4 times the total performance factors are resp: 174 , 178, 178 and 188
Following are the results with VStest2 and CPU 2:
  1. For a load factor of 1 and by selecting the Start Button once the performance factor is approximate: 55384
  2. For a load factor of 1000 and by selecting the Start Button once the performance factor is: 67
  3. For a load factor of 5000 and by selecting the Start Button once the performance factor is: 14

fotoVStest1.CPU2.1

VStest1 - Load factor 1

fotoVStest1.CPU2.1000

VStest1 - Load factor 1000

fotoVStest1.CPU2

VStest2 - Load factor 1 and 1000


Answer Question 4 - Parallel programming simulations

The results of the tests show that a Dual Core and four threads and with parallel programming the performance is the best when 2 virtual processors are used for simulations of physical systems. With 3 or more virtual processors the performance does not improve.
The main reason is that extra performance of a third virtual processor is cancelled by a decrease in performance of the first two processors.
At the other hand the overall performance is less than a single core CPU of 2.8 Ghz - See also
Reflection part 2


Answer Question 5 - Benchmark

The Benchmark number for CPU 1 of 2539 (DUAL, 4 threads) is much too high as an indication of performance compared to the 415 of CPU 2
What you need is at least a Benchmark number which represents the performance of the CPU which a Benchmark were the program solves one Problem and which uses all the threads simultaneous.
Based on the results of my evaluation the Benchmark number for CPU 1 (2C/4T) should be roughly 460 compared to the Benchmark number of 415 of CPU 2


Answer Question 6 - CPU architecture

The best CPU architecture is the CPU:
  1. with the highest clock speed (3.2 GHZ ?)
  2. and with the number of threads equal to the number of cores.

Answer Question 7 - Amdahl's Law

Amdahl's Law claims that CPU performance decreases as a function of the number of processors which use parallel programming. See: Amdahl's law and Parallel computing

This claim for the CPU's tested is not correct. The performance decreases because of two reasons:

  1. first as a function of the numbers of programs (processors and threads) that use parallel programming, that means that employ some form of synchronisation. This are the programs VSTEST2, PlanetPP and FibonacciPP.
  2. secondly as a function of the total number of programs which are executed simultaneous.
Amdahl's Law only describes reason one, not the second type. The second type is demonstrated by the programs VSTEST1 and Planets3Dsimple. Both programs demonstrate that the more programs are loaded, the more the performance of the previous one available, degrade. The worst the first programme loaded.


Reflection - part 1

  1. The 4 results for single processor applications for CPU 1 are: 76453, 101, 39341 and 46 (The first two are from VStest1 and the second two from VStest2)
    The 4 corresponding results for single processor applications for CPU 2 are: 101591, 174, 55384 and 67
    When you compare those results CPU 2 is much better (available at least in 2003)
  2. The 4 results for CPU 1 performing VStest2 with a load of 5000 are: 9,15,16 and 16. The comparable number for CPU 2 is 14.
    What that means is that CPU 1 is only slightly more powerful than CPU 2 for very high load situations when parallel processing is used.
  3. Those same tests also show that parallel programming only makes sense for a maximum of two processors. With 3 and 4 processors the performance does not improve and for lower load factors even decreases. However, and that is important, the overall performance is almost equal compared with (Single) CPU 2.
  4. The only use that CPU 1 is really more powerful than CPU 2 if you use each processor for a different application, that means if the 4 applications are independent of each other and do not require any form of synchronisation.
    This are in VStest1 in CPU 1 the total numbers of 160997 and 137 versus in CPU2 of the numbers 84398 and 122
  5. See also: Are we reaching the maximum of CPU performance This is a Usenet discussion in sci.physics.research


Reflection - part 2

3 single Core CPU's were tested and 1 Dual Core CPU with 4 threads. (2C/4T)
That does not mean that there are no CPU's who do perform the three tests (VStest1, VStest2 and Planet) better.
For the single Core most probably the
AMD Sempron 145 which has a CPU mark of 916, performs better.

When you compare the test results for VStest1 the results are that when you use 1 thread the performance is 76453 and with 4 threads 160997, that means roughly a factor 2 better, which is the same as the number of cores.
This leads to the surprising conclusion that for scientific applications maybe a 2 Core and 2 Thread Intel CPU is much more practical than a 2C/4T. The same may apply for a 4 Core/4 Thread versus a 4C/8T and a 6 Core/6 Thread versus a 6C/12T
In science what you want is to solve a single problem using your PC for 100%.

For possible examples of CPU's with the same number of Cores and Threads see here:
  1. Intel® Core™2 Extreme Desktop Processor Family : X6800 Benchmark 1831 (2C/2T), Q6800 Benchmark 3642 and X9775 Benchmark 3810 (4C/4T)
  2. Intel® Core™2 Quad Processor Q8000 Series : Q8200 Benchmark 3261, Q8400 Benchmark 3664 and Q9650 Benchmark 4604 All: 4C/4T
  3. Intel® Xeon® Processor Family : X7542 Benchmark ? , X7460 Benchmark 18304 and L7455 All: 6C/6T

It would be very interesting to learn about the results of tests, which the same number of Cores and Threads using multiprocessing i.e. VStest2 or Planet1


Feedback

None


E-mail:nicvroom@pandora.be.

Created: 1 December 2010
Updated 1 January 2011
Updated 21 Februari 2011

Back to my home page Contents of This Document