CPU Performance Evaluation - Visual Basic 2010 Benchmark
Question 1 |
Which Single Core CPU has the best performance |
Question 2 |
Which Dual Core CPU has the best performance |
Question 3 |
Which Quad Core CPU has the best performance |
Question 4 |
Is it worthwhile to uses parallel programming in simulations |
Question 5 |
What should a benchmark test |
Question 6 |
What is the best CPU architecture |
Question 7 |
Is Amdahl's Law true |
Purpose
If you study this homepage you will see that one central theme is computer simulations of physical systems, specific the planets around the Sun. The programs are written in Quick Basic and Visual Basic. Some programs are written using Excel, which in practice means also in Visual Basic.
Recently I bought a new PC. In order to test I run a planet simulation: Planet3Dsimple and I discovered that the performance was less than my previous PC. Planet3Dsimple is written using Visual Basic 5.0.
In order to better understand I tested the same program for 5 different CPU's. The program tested has a processor load of 100%. Following are the results:
- Intel Core i5 M 460 @ 2.53 GHzThis one has a CPU Mark of 2539. Evaluation 7 planets: 5.2
- Intel Pentium 4 2.80 GHzThis one has a CPU Mark of 415. Evaluation 7 planets: 8.38
- Intel Pentium 4 3.00 GHzThis one has a CPU Mark of 491. Evaluation 7 planets: Start value 9.5 and drops.
- GenuineIntel X86 Family 6 Model 8 Stepping 3. Evaluation 7 planets: 2.54
- GenuineIntel Pentium(r) II processor intel MMX(TM) technology. Evaluation 7 planets: 1.44
CPU 1 is a DUAL core and supports 4 Threads. This means that the same program can be executed 4 times in order to reach 100% load. If you load the program 1, 2, 3 or 4 times the average performance figures are: 5.2 4.8 4.1 and 3.6
CPU 2 is a Single core and supports 1 Thread.
CPU 3 is a Single core and supports 2 Threads.
The results of the test show that CPU 2 has the best performances to solve a single solution. The problem with CPU 3 is that the performance is not constant. For more detail see: Benchmark Pentium 4 For CPU 1 the performance is not as expected and decreases if more load is added.
For a detailed evaluation of Visual Basic 2010 go here: Visual Basic 2010 Evaluation
Parallel Programming
For an introduction read this: Visual Basic 2010 Parallel Programming techniques
The next step is to write a Visual Basic program and use parallel programming.
Visual Basic 2010 (Visual Studio) supports parallel programming and parallel threading.
To get more information about threading in general read this Microsoft MSDN document:
- Threads and Threading and Wikipedia
OpenMP - Open Multi Programming
-
More detailed information about Multi Programming and Threading including advantages and disadvantages: Multi processors and Threads
In order to support parallel programming Visual Basic 2010 uses what is called a Backgroundworker. Each backgroundworker is equivalent with a thread and runs in a different processor. To get more information what a backgroundworker is read this Microsoft MSDN document: BackGroundWorker.
The document describes two examples.
- The first example (Called Backgroundworker) is used as a starting point or frame for the applications VStest1 and VStest2 described in the next paragraph.
- The second example (Called Fibonacci) is used as a starting point or frame for PlanetPP described in: CPU performance - part 2
The Backgroundworker program example is operated via what is called a Form. The Form contains two buttons: A Start button and a Cancel Button.
The Backgroundworker program example consists of 4 Event Handlers and one BackGroundworker.
- StartAsyncButton EH. This Event Handler is linked to the Start Button and Starts the execution of BackGroundworker.
- CancelAsyncButton EH. This Event Handler is linked to the Cancel Button and Ends the execution of the BackGroundworker.
- Dowork. This is the name of BackGroundworker. This is the program where the actual calculations (of what you want to calculate or simulate) are done.
- ProgressChanged EH. This event handler updates (when activated) the Form which progress information of the program.
- RunWorkCompleted EH. This event handler shows at the end the final results of the BackGroundWorker.
The BackGroundworker example shows that in order to control one thread you need 4 event handlers: on to start, on to stop, one to monitor and one to show the results.
The Backgroundworker program example creates one thread which generates the numbers 10, 20 .. 100 and then is finished. The example uses effective only one Processor. This is not an example of parallel programming.
Test 1: VStest1 CPU 1
In order to test CPU 1 I started with the following two programs: VStest1 and VStest2.
The most important characteristic of VStest1 is, that the calculations executed in each thread are independent of each other. In fact you are solving 4 independent problems. In VStest2 this is not the case. In VStest2 they are linked to make it one problem. See below in this document.
To get a copy select VB2010 exe.zip.
- The Zip files contains the files VStest1.EXE, VStest2.EXE. which both are explained in this document.
- Also Planets3DSimple.EXE and MSVBVM50.dll. Planets3DSimple is a Visual Basic 5.0 program to simulate the 7 planets. The program requires the MSVBVM50dll file.
- The zip file also contains the programs: planet.exe, planetPP.exe and planets.exe. The purpose of these programs is also to test parallel programming. To get more information see: VB Benchmark part 2
When you try to execute the "VB 2010" programs may be the following error message is displayed:
- "To run this application, you first must install one of the
following versions of the .NET Framework: v4.0.30319".
-
You can get this download from: Microsoft .NET Framework 4 (Web Installer) .
In my case this happened with CPU 3. For a comparison between CPU 2 and CPU 3 go to: Benchmark Pentium 4 3.0. The fact that CPU 3 is slow in some tests is not caused by this download.
|
In order to write VStest1 the following tasks were performed:
- The Backgroundworker Dowork, the ProgressChanged EH and the RunWorkCompleted EH three were each copied 4 times to manage 4 threads. This allows you to test 4 Processors simultaneous.
- The StartAsyncButton EH was modified to start a new thread each time when the Start Button is pressed.
- The same was done for the CancelAsyncButton EH in order to terminate a thread each time when the cancel button is pressed.
- A parameter np (# of processors) was created to monitor the number of active processors.
- The central calculation of the BackGroundworker Dowork was modified to perform a 10 by 10 matrix operation using two parameters j (inner loop) and i (outer loop). The whole matrix operation can be executed a certain number of times, which is called the load factor. This whole calculation is called one cycle. The total number of cycles per second are calculated and that gives a performance number.
Operation VStest1
When you start the program you get a control form which contains of 2 parts.
- The top three lines contain general information about the performance of the program.
- Line 1 contains the active Processor numbers. Maximum = 4. The fifth box contains the text: Total
- Line 2 contains the performance factors for each thread. This is the average number of cycles per second. The fifth value is the total of those 4 performance factors.
- Line 3 contains four parameters: # Proc, Load and count.
- # Proc This parameter shows the number of active threads or processors.
- Load This parameter shows the load factor.
- Count This parameter shows the time in number of seconds. This parameter is cleared when the number of active threads is changed via pushbutton Start or Cancel
Only one parameter can be modified by the user: Load. If modified the new value is only used after all threads are cancelled.
- The middle line contains 4 buttons:
- Start. The purpose of the Start button is to start the program (i.e. the first thread) or to start the next thread.
- Cancel The purpose of the Cancel Button is to terminate one Thread.
- End. The purpose of the End Button is to terminate all threads. In case when there are no active thread the program terminates.
Test results - VStest1
- For a load factor of 1 and by selecting the Start Button 4 times the individual performance factors for processor 1 are: 76453, 70368, 50582 and 38816.
- What those numbers mean is that the more processors you use (each with a different problem) the performance of processor 1 diminishes
This is not what you maybe are expecting, because the programs in each thread (processor) are independent of each other. -
- For a load factor of 1 and by selecting the Start Button 4 times the total performance factors are: 76453, 141270, 155483 and 160997.
- What those numbers mean is that the more processors you use (with each a different problem) the total CPU performance does not increase linear
141270 = 2 * 70368, 155483 = 3 * 50582 and 160997 = 4 * 38816. -
- For a load factor of 1000 and by selecting the Start button 4 times the individual performance factors for processor 1 are: 101, 90, 68 and 44
- For a load factor of 1000 and by selecting the Start Button 4 times the total performance factors are: 101, 181, 200 and 204.
VStest1 - Load factor 1
|
VStest1 - Load factor 1000
|
VStest1 requires the following:
- 4 BackgroundWorkers: Backgroundworkerpp1, pp2, pp3 and pp4
- 3 Buttons: StartAsyncButton, CancelAsyncButton and EndButton
- 8 Labels: Resultlabelpp1, pp2, pp3, pp4, Label1 (Proc #), Label2 (Performance), Label3 (# proc) Label4 (Load)
- 6 Text: ProcTpp1, pp2, ppp3, pp4, NprocT and LoadT
Listings VStest1: VStest1Form.vb and VStest1Form.Designer.vb
Test 2: VStest2 CPU 1
The most important characteristic of VStest1 is, that the calculations executed in each thread are independent of each other. In fact you are solving 4 independent problems.
However that is not what we want. What we want is to divide the problem into parts (equal to the number of processors available) and to solve the whole problem, subdivided into parts, in parallel, in each processor. This requires extra communication and synchronisation compared to VStest1.
-
np--------------------
| V
StartAsyncPB EH------ | ------> Doworkpp2
| | | .
V | V V
CancelAsyncPB EH---> npreq --> Doworkpp1 <--> state() <--> Doworkpp3
^ ^ .
| | V
EndAsyncPB EH------ ------> Doworkpp4
Figure 1 - Parallel Communication
In order to do that the following modifications were made:
- The major changes are in event handler 1: Doworkpp1. This event handler becomes the master. The other Dowork event handlers become slaves.
- The variable np is the number of processors. This number is used in the communication between the PB handlers and the Dowork event handler 1.
- StartAsyncPB EH increases npreq with 1
- CancelAsynsPB EH decreases npreq with 1
- EndAsyncPB EH sets npreq to -1
- The variable np is the actual number of processors. This number is used in the communication between Dowork event handler 1 and the Dowork event handlers 2,3 and 4. DOworkpp1 calculates this number. The other Dowork event handlers are only allowed to read.
- Communication between the different Dowork Event Handlers is done by means of the array state(5).
The operation between processor 1 and 2 (and 3) general is the following:
- Processor 1 set state(2) to 1 (start state), performs its own matrix computations and waits until state(2) is 0 (stop state)
If required Processor 1 performs the same with state(3) etc.
- Processor 2 monitors state(2). When state(2) is 1 than performs its own matrix computations and when finished sets state(2) to 0 (stop state)
- Processor 3 monitors state(3). When state(3) is 1 than performs its own matrix computations and when finished sets state(3) to 0 (stop state)
- The variable pstr (processor start) is 0 for processor 1, 1 for processor 2, 2 for processor 3 and 3 for processor 4.
- Outer loop control, the variable i:
- In program VStest1 the outer loop is calculated by the following statement: For i = 0 to 10
- In program VStest2 the outer loop is calculated with the statement: For i = pstr to 10 step np.
This means that in the case of 4 processors:
- In processor 1, i has the values 0, 4 and 8
- In processor 2, i has the values 1, 5 and 9
- In processor 3, i has the values 2, 6 and 10
- and in processor 4, i has the values 3 and 7
To learn more about parallel programming go to this document:Visual Basic 2010 Parallel Programming
Operation VStest2
When you start the program you get a control form which contains of 2 parts.
- The top four lines contain general information about the performance of the program.
- Line 1 contains communication values for each thread. To see those values you have to select the Monitor push button
- Line 2 contains the active Processor numbers. Maximum = 4. The fifth box contains the text: Total
- Line 3 contains the performance factors for each thread. This is the average number of cycles per second. The fifth value is the total of those 4 performance factors.
- Line 4 contains four parameters: # Proc, Load and count.
- # Proc This parameter shows the number of active threads or processors.
- Load This parameter shows the load factor.
- Count This parameter shows the time in number of seconds. This parameter is cleared when the number of active threads is changed via pushbutton Start or Cancel
Only one parameter can be modified by the user: Load. If modified the new value is only used after all threads are cancelled.
- The middle line contains 4 buttons:
- Start. The purpose of the Start Button EH is to start the BackGroundworker (i.e. the first thread) or to start the next thread.
- Cancel The purpose of the Cancel Button EH is to terminate one BackGroundworker or Thread.
- End. The purpose of the End Button EH is to terminate all BackGroundworkers (Threads). In case when there are no active thread the application ends.
- Monitor The purpose of the Monitor Button EH is to monitor the communication between Thread 1 (the master) and the other threads (The slaves)
Test results - VStest2
- For a load factor of 1 and by selecting the Start Button 4 times the performance factors are resp: 39341, 49295, 36251 and 33179
- What those numbers mean is that the more processors you use (to solve one problem) the optimum performance is with two processors
-
- For a load factor of 1000 and by selecting the Start button 4 times the performance factors are resp: 46, 74, 70 and 79.
- For a load factor of 5000 and by selecting the Start button 4 times the performance factors are resp: 9, 15, 16 and 16
VStest2 - Load factor 1
|
VStest2 - Load factor 1000
|
VStest2 requires the following:
- 5 BackgroundWorkers: Backgroundworkerpp1, pp2, pp3, pp4 and pp5
- 4 Buttons: StartAsyncButton, CancelAsyncButton, EndButton and MonitorButton
- 10 Labels: StateL (State), Resultlabelpp1, pp2, pp3, pp4, pp5, Label1 (Proc #), Label2 (Performance), Label3 (# proc) Label4 (Load)
- 12 Text: ProcTpp1, pp2, ppp3, pp4, pp5, StatTpp1, pp2, pp3, pp4,pp5, NprocT and LoadT
Listings VStest2: VStest2Form.vb and VStest2Form.Designer.vb
VStest1 and VStest2 with CPU 2
Following are the results with VStest1 and CPU 2:
- For a load factor of 1 and by selecting the Start Button 4 times the individual performance factors are: 110591, 55937, 38425 and 28944
- What the number 55937 means that if you start a second thread the performance of the first decreases by roughly 50%.
This is as expected because 50% of the load goes to thread 1 and 50 % to thread 2 -
- For a load factor of 1 and by selecting the Start Button 4 times the total performance factors are: 110591, 110633, 112120 and 113443
- What those number means is means that if you start more threads the total performance stays constant
This is as expected because CPU 2 has only one processor. -
- For a load factor of 1000 and by selecting the Start Button 4 times the individual performance factors are resp: 174 , 88, 59 and 47
- For a load factor of 1000 and by selecting the Start Button 4 times the total performance factors are resp: 174 , 178, 178 and 188
Following are the results with VStest2 and CPU 2:
- For a load factor of 1 and by selecting the Start Button once the performance factor is approximate: 55384
- For a load factor of 1000 and by selecting the Start Button once the performance factor is: 67
- For a load factor of 5000 and by selecting the Start Button once the performance factor is: 14
VStest1 - Load factor 1
|
VStest1 - Load factor 1000
|
VStest2 - Load factor 1 and 1000
|
Answer Question 4 - Parallel programming simulations
The results of the tests show that a Dual Core and four threads and with parallel programming the performance is the best when 2 virtual processors are used for simulations of physical systems. With 3 or more virtual processors the performance does not improve.
The main reason is that extra performance of a third virtual processor is cancelled by a decrease in performance of the first two processors.
At the other hand the overall performance is less than a single core CPU of 2.8 Ghz - See also Reflection part 2
Answer Question 5 - Benchmark
The Benchmark number for CPU 1 of 2539 (DUAL, 4 threads) is much too high as an indication of performance compared to the 415 of CPU 2
What you need is at least a Benchmark number which represents the performance of the CPU which a Benchmark were the program solves one Problem and which uses all the threads simultaneous.
Based on the results of my evaluation the Benchmark number for CPU 1 (2C/4T) should be roughly 460 compared to the Benchmark number of 415 of CPU 2
Answer Question 6 - CPU architecture
The best CPU architecture is the CPU:
- with the highest clock speed (3.2 GHZ ?)
- and with the number of threads equal to the number of cores.
Answer Question 7 - Amdahl's Law
Amdahl's Law claims that CPU performance decreases as a function of the number of processors which use parallel programming. See: Amdahl's law and Parallel computing
This claim for the CPU's tested is not correct. The performance decreases because of two reasons:
- first as a function of the numbers of programs (processors and threads) that use parallel programming, that means that employ some form of synchronisation. This are the programs VSTEST2, PlanetPP and FibonacciPP.
- secondly as a function of the total number of programs which are executed simultaneous.
Amdahl's Law only describes reason one, not the second type. The second type is demonstrated by the programs VSTEST1 and Planets3Dsimple. Both programs demonstrate that the more programs are loaded, the more the performance of the previous one available, degrade. The worst the first programme loaded.
Reflection - part 1
- The 4 results for single processor applications for CPU 1 are: 76453, 101, 39341 and 46 (The first two are from VStest1 and the second two from VStest2)
The 4 corresponding results for single processor applications for CPU 2 are: 101591, 174, 55384 and 67
When you compare those results CPU 2 is much better (available at least in 2003)
- The 4 results for CPU 1 performing VStest2 with a load of 5000 are: 9,15,16 and 16. The comparable number for CPU 2 is 14.
What that means is that CPU 1 is only slightly more powerful than CPU 2 for very high load situations when parallel processing is used.
- Those same tests also show that parallel programming only makes sense for a maximum of two processors. With 3 and 4 processors the performance does not improve and for lower load factors even decreases. However, and that is important, the overall performance is almost equal compared with (Single) CPU 2.
- The only use that CPU 1 is really more powerful than CPU 2 if you use each processor for a different application, that means if the 4 applications are independent of each other and do not require any form of synchronisation.
This are in VStest1 in CPU 1 the total numbers of 160997 and 137 versus in CPU2 of the numbers 84398 and 122
- See also: Are we reaching the maximum of CPU performance This is a Usenet discussion in sci.physics.research
Reflection - part 2
3 single Core CPU's were tested and 1 Dual Core CPU with 4 threads. (2C/4T)
That does not mean that there are no CPU's who do perform the three tests (VStest1, VStest2 and Planet) better.
For the single Core most probably the AMD Sempron 145
which has a CPU mark of 916, performs better.
When you compare the test results for VStest1 the results are that when you use 1 thread the performance is 76453 and with 4 threads 160997, that means roughly a factor 2 better, which is the same as the number of cores.
This leads to the surprising conclusion that for scientific applications maybe a 2 Core and 2 Thread Intel CPU is much more practical than a 2C/4T.
The same may apply for a 4 Core/4 Thread versus a 4C/8T and a 6 Core/6 Thread versus a 6C/12T
In science what you want is to solve a single problem using your PC for 100%.
|
- If you have two cores (and 2 threads) one core becomes the master (where you perform disc i/o and display i/o) and the other one the slave (where 50% of the actual calculations are done in parallel with the master). If such a system gives 100% load you are satisfied.
- Adding 2 threads is not necessary, specific when the consequences are that the performance of the first two goes down.
For possible examples of CPU's with the same number of Cores and Threads see here:
- Intel® Core™2 Extreme Desktop Processor Family :
X6800 Benchmark 1831 (2C/2T),
Q6800 Benchmark 3642 and
X9775 Benchmark 3810 (4C/4T)
- Intel® Core™2 Quad Processor Q8000 Series :
Q8200 Benchmark 3261,
Q8400 Benchmark 3664 and
Q9650 Benchmark 4604
All: 4C/4T
- Intel® Xeon® Processor Family
:
X7542 Benchmark ? ,
X7460 Benchmark 18304
and L7455 All: 6C/6T
It would be very interesting to learn about the results of tests, which the same number of Cores and Threads using multiprocessing i.e. VStest2 or Planet1
Feedback
None
E-mail:nicvroom@pandora.be.
Created: 1 December 2010
Updated 1 January 2011
Updated 21 Februari 2011
Back to my home page Contents of This Document