CVE (CPU) Prestatie Evaluatie - Visual Basic 2010 Benchmark
Vraag 1 |
Welke CPU met een Enkelvoudige Kern heeft de beste prestaties ? |
Vraag 2 |
Welke CPU met een Dubbelvoudige Kern heeft de beste prestaties ? |
Vraag 3 |
Welke CPU met een Viervoudige Kern heeft de beste prestaties ? |
Vraag 4 |
Is het de moeite waard om bij simulaties gebruik te maken van parallel programmeren ? |
Vraag 5 |
Wat moet een benchmark testen |
Vraag 6 |
Wat is de beste CVE (CPU) architectuur ? |
Vraag 7 |
Geldt de wet van Amdahl ? |
Doel
Als je deze homepage bestudeert dan bemerkt men dat een centraal thema is het simuleren op de computer van fysische systemen, specifiek de planeten rond de zon.
De programma's zijn geschreven in Quick Basic en Visual Basic.
Sommige programma's zijn gemaakt met behulp van Excel, dat wil in de praktijk zeggen geschreven in Visual Basic.
Onlangs kocht ik een nieuwe pc.
Om deze te testen probeerde ik het planeten simulatie programma: Planet3Dsimple. Ik ontdekte dat de prestaties minder waren dan mijn vorige PC. Planet3Dsimple is geschreven met behulp van Visual Basic 5.0.
Om dit beter te kunnen begrijpen heb ik dat zelfde programma getest op 5 verschillende CVE's (CPU's).
Het programma Planet3Dsimple belast de processor voor 100%.
Hieronder volgen de resultaten:
- Intel Core i5 M 460 @ 2.53 GHzDeze CPU heeft een waarde (Mark) van 2539.
Evaluatie 7 planeten: 5.2
- Intel Pentium 4 2.80 GHzDeze CPU heeft een waarde (Mark) van 415.
Evaluatie 7 planeten: 8.38
- Intel Pentium 4 3.00 GHzDeze CPU heeft een waarde (Mark) van 491. Evaluatie 7 planeten: Start waarde 9,5 en zakt langzaam
- GenuineIntel X86 Family 6 Model 8 Stepping 3. Evaluatie 7 planeten: 2.54
- GenuineIntel Pentium(r) II processor intel MMX(TM) technologie. Evaluatie 7 planeten: 1.44
CPU 1 heeft Twee kernen en ondersteunt 4 kettingen (Threads). Dit betekent dat het zelfde programma 4 keer moet worden uitgevoerd om tot belasting 100% te komen.
Als u het programma 1, 2, 3 of 4 keer laad dan zijn de gemiddelde prestaties cijfers: 5,2 4,8 4,1 en 3,6
CPU 2 CPU 2 heeft een kern en ondersteunt 1 ketting (Thread)
CPU 3 CPU 3 heeft een kern en ondersteunt 2 kettingen (Threads).
De resultaten van de test blijkt dat CPU 2 de beste prestatie levert om een enkelvoudig probleem op te lossen.
Het probleem met CPU 3 is dat de prestatie niet constant blijft. Voor meer details kijk bij: Benchmark Pentium 4 CPU 1 levert de verwachte prestatie niet en deze daalt als de belasting toeneemt.
Voor een gedetailleerde evaluatie van Visual Basic 2010 kijk hier: Visual Basic 2010 Evaluatie (Engels)
Parallel Programmeren
De volgende stap is om een Visual Basic-programma te schrijven en parallel programmeren te gebruiken.
Visual Basic 2010 (Visual Studio) ondersteunt parallel programmering en parallel verdrading (threading).
Voor meer informatie over verdrading in het algemeen lees dit:
- Draden en verdrading (threading - engels) en
OpenMP - Open Multi Programmering
-
Meer gedetailleerde informatie over Multi Programmeren en kettingen (Threading) met inbegrip van de voordelen en nadelen: Meervoudige-processoren en kettingen (Threads)
Om parallel programmeren te ondersteuen maakt Visual Basic 2010 gebruik van wat wordt genoemd een "Backgroundworker".
Er is een "Backgroundworker" per ketting (virtuele processor) en er zijn net zoveel backgroundworkers als er kettingen (threads) zijn.
Voor meer informatie over wat een "Backgroundworker" is, kijk hier: BackGroundWorker.
Het document beschrijft twee voorbeelden.
- Het eerste voorbeeld (Met de naam "Backgroundworker") wordt gebruikt als uitgangspunt of sjabloon voor de toepassingen VStest1 en VStest2 beschreven in de volgende paragraaf.
- Het tweede voorbeeld (Called Fibonacci) wordt gebruikt als uitgangspunt of sjabloon voor het programma Planet beschreven in: CPU prestaties - deel 2
Het programma Backgroundworker wordt bediend via een zogenaamd "formulier".
Het formulier bevat twee knoppen: een Start-knop en een Annulatie-knop
Het programma Backgroundworker bestaat ook uit 4 Gebeurtenis-Afhandelaars (GA's) en een BackGroundworker.
- StartAsyncButton GA. Deze Gebeurtenis-Afhandelaar is gekoppeld aan de Start-knop en start de uitvoering van de BackGroundworker.
- CancelAsyncButton GA.Deze Gebeurtenis-Afhandelaar is gekoppeld aan de Annulatie-knop en beindigt de uitvoering van de BackGroundworker.
- Dowork. Dit is de naam van BackGroundworker.
Dit is het programma waar de feitelijke berekeningen (van wat je wil berekenen of simuleren) worden gedaan.
- ProgressChanged GA. Deze Gebeurtenis-Afhandelaar past (indien geactiveerd) het Formulier aan met informatie over de voortgang van het programma.
- RunWorkCompleted GA. Deze Gebeurtenis-Afhandelaar toont aan het einde de definitieve resultaten van de BackGroundWorker.
Het BackGroundworker voorbeeld laat zien dat om een programma draad te besturen je 4 Gebeurtenis-afhandelaars nodig hebt: een om te starten, een om te stoppen, een om te zien wat de actuele situatie is en een om de resultaten te tonen.
Het Backgroundworker voorbeeld vormt een programma draad die de nummers 10, 20 .. tot 100 genereert en dan klaar is. Het voorbeeld maakt effectief slechts gebruik van een processor. Dit is geen voorbeeld van parallel programmeren.
Test 1: VStest1 CPU 1
In order to test CPU 1 I started with the following two programs: VStest1 and VStest2.
The most important characteristic of VStest1 is, that the calculations exectuted in each thread are independent of each other. In fact you are solving 4 independent problems. In VStest2 this is not the case. In VStest2 they are linked to make it one problem. See below in this document.
To get a copy select VStest.zip.
- The Zip files contains the files VStest1.EXE, VStest2.EXE. which both are explained in this document.
- Also Planets3DSimple.EXE and MSVBVM50.dll. Planets3DSimple is a Visual Basic 5.0 program to simulate the 7 planets. The program requires the MSVBVM50dll file.
- The zip file also contains the programs: planet.exe, planetPP.exe and planet1.xls. The purpose of these programs is also to test parallel programming. To get more information see: VB Benchmark part 2
When you try to execute the "VB 2010" programs may be the following error message is displayed:
- "To run this application, you first must install one of the
following versions of the .NET Framework: v4.0.30319".
-
You can get this download from: Microsoft .NET Framework 4 (Web Installer) .
In my case this happened with CPU 3. For a comparison between CPU 2 and CPU 3 go to: Benchmark Pentium 4 3.0. The fact that CPU 3 is slow in some tests is not caused by this download.
|
In order to write VStest1 the following tasks were performed:
- The Backgroundworker Dowork, the ProgressChanged EH and the RunWorkCompleted EH three were each copied 4 times to manage 4 threads. This allows you to test 4 Processors simultaneous.
- The StartAsyncButton EH was modified to start a new thread each time when the Start Button is pressed.
- The same was done for the CancelAsyncButton EH in order to terminate a thread each time when the cancel button is pressed.
- A parameter np (# of processors) was created to monitor the number of active processors.
- The central calculation of the BackGroundworker Dowork was modified to perform a 10 by 10 matrix operation using two parameters j (inner loop) and i (outer loop). The whole matrix operation can be executed a certain number of times, which is called the load factor. This whole calculation is called one cycle. The total number of cycles per second are calculated and that gives a performance number.
Operation VStest1
When you start the program you get a control form which contains of 2 parts.
- The top three lines contain general information about the performance of the program.
- Line 1 contains the active Processor numbers. Maximum = 4. The fifth box contains the text: Total
- Line 2 contains the performance factors for each thread. This is the average number of cycles per second. The fifth value is the total of those 4 performance factors.
- Line 3 contains four parameters: # Proc, Load and count.
- # Proc This parameter shows the number of active threads or processors.
- Load This parameter shows the load factor.
- Count This parameter shows the time in number of seconds. This parametr is cleared when the number of active threads is changed via pushbutton Start or Cancel
Only one parameter can be modified by the user: Load. If modified the new value is only used after all threads are cancelled.
- The middle line contains 4 buttons:
- Start. The purpose of the Start button is to start the program (i.e. the first thread) or to start the next thread.
- Cancel The purpose of the Cancel Button is to terminate one Thread.
- End. The purpose of the End Button is to terminate all threads. In case when there are no active thread the program terminates.
Test resultaten - VStest1
- For a load factor of 1 and by selecting the Start Button 4 times the individual performance factors for processor 1 are: 76453, 70368, 50582 and 38816.
- What those numbers mean is that the more processors you use (each with a different problem) the performance of processor 1 diminishes
This is not what you maybe are expecting, because the programs in each thread (processor) are independent of each other. -
- For a load factor of 1 and by selecting the Start Button 4 times the total performance factors are: 76453, 141270, 155483 and 160997.
- What those numbers mean is that the more processors you use (with each a different problem) the total CPU performance does not increase lineair
141270 = 2 * 70368, 155483 = 3 * 50582 and 160997 = 4 * 38816. -
- For a load factor of 1000 and by selecting the Start button 4 times the individual performance factors for processor 1 are: 101, 90, 68 and 44
- For a load factor of 1000 and by selecting the Start Button 4 times the total performance factors are: 101, 181, 200 and 204.
VStest1 - Load factor 1
|
VStest1 - Load factor 1000
|
VStest1 requires the following:
- 4 BackgroundWorkers: Backgroundworkerpp1, pp2, pp3 and pp4
- 3 Buttons: StartAsyncButton, CancelAsyncButton and EndButton
- 8 Labels: Resultlabelpp1, pp2, pp3, pp4, Label1 (Proc #), Label2 (Performance), Label3 (# proc) Label4 (Load)
- 6 Text: ProcTpp1, pp2, ppp3, pp4, NprocT and LoadT
Listings VStest1: VStest1ControlForm.vb and VStest1ControlForm.Designer.vb
Test 2: VStest2 CPU 1
The most important characteristic of VStest1 is, that the calculations exectuted in each thread are independent of each other. In fact you are solving 4 independent problems.
However that is not what we want. What we want is to divide the problem into parts (equal to the number of processors available) and to solve the whole problem, subdivided into parts, in parallel, in each processor. This requires extra communication and synchronisation compared to VStest1.
-
np--------------------
| V
StartAsyncPB EH------ | ------> Doworkpp2
| | | .
V | V V
CancelAsyncPB EH---> npreq --> Doworkpp1 <--> state() <--> Doworkpp3
^ ^ .
| | V
EndAsyncPB EH------ ------> Doworkpp4
Figure 1 - Parallel Communication
In order to do that the following modifications were made:
- The major changes are in event handler 1: Doworkpp1. This event handler becomes the master. The other Dowork event handlers become slaves.
- The variable np is the number of processors. This number is used in the communication between the PB handlers and the Dowork event handler 1.
- StartAsyncPB EH increases npreq with 1
- CancelAsynsPB EH decreases npreq with 1
- EndAsyncPB EH sets npreq to -1
- The variable np is the actual number of processors. This number is used in the communication between Dowork event handler 1 and the Dowork event handlers 2,3 and 4. DOworkpp1 calculates this number. The other Dowork event handlers are only allowed to read.
- Communication between the different Dowork eventhandlers is done by means of the array state(5).
The operation between processor 1 and 2 (and 3) general is the following:
- Processor 1 set state(2) to 1 (start state), performs its own matrix computations and waits until state(2) is 0 (stop state)
If required Processor 1 performs the same with state(3) etc.
- Processor 2 monitors state(2). When state(2) is 1 than performs its own matrix computations and when finished sets state(2) to 0 (stop state)
- Processor 3 monitors state(3). When state(3) is 1 than performs its own matrix computations and when finished sets state(3) to 0 (stop state)
- The variable pstr (processor start) is 0 for processor 1, 1 for processor 2, 2 for processor 3 and 3 for processor 4.
- Outer loop control, the variable i:
- In program VStest1 the outer loop is calculated by the following statement: For i = 0 to 10
- In program VStest2 the outer loop is calculated with the statement: For i = pstr to 10 step np.
This means that in the case of 4 processors:
- In processor 1, i has the values 0, 4 and 8
- In processor 2, i has the values 1, 5 and 9
- In processor 3, i has the values 2, 6 and 10
- and in processor 4, i has the values 3 and 7
To learn more about parallel programming go to this document:Visual Basic 2010 Parallel Programming
Bediening VStest2
When you start the program you get a control form which contains of 2 parts.
- The top four lines contain general information about the performance of the program.
- Line 1 contains communication values for each thread. To see those values you have to select the Monitor push button
- Line 2 contains the active Processor numbers. Maximum = 4. The fifth box contains the text: Total
- Line 3 contains the performance factors for each thread. This is the average number of cycles per second. The fifth value is the total of those 4 performance factors.
- Line 4 contains four parameters: # Proc, Load and count.
- # Proc This parameter shows the number of active threads or processors.
- Load This parameter shows the load factor.
- Count This parameter shows the time in number of seconds. This parametr is cleared when the number of active threads is changed via pushbutton Start or Cancel
Only one parameter can be modified by the user: Load. If modified the new value is only used after all threads are cancelled.
- The middle line contains 4 buttons:
- Start. The purpose of the Start Button EH is to start the BackGroundworker (i.e. the first thread) or to start the next thread.
- Cancel The purpose of the Cancel Button EH is to terminate one BackGroundworker or Thread.
- End. The purpose of the End Button EH is to terminate all BackGroundworkers (Threads). In case when there are no active thread the application ends.
- Monitor The purpose of the Monitor Button EH is to monitor the communication between Thread 1 (the master) and the other threads (The slaves)
Test resultaten - VStest2
- For a load factor of 1 and by selecting the Start Button 4 times the performance factors are resp: 39341, 49295, 36251 and 33179
- What those numbers mean is that the more processors you use (to solve one problem) the optimum performance is with two processors
-
- For a load factor of 1000 and by selecting the Start button 4 times the performance factors are resp: 46, 74, 70 and 79.
- For a load factor of 5000 and by selecting the Start button 4 times the performance factors are resp: 9, 15, 16 and 16
VStest2 - Load factor 1
|
VStest2 - Load factor 1000
|
VStest2 requires the following:
- 5 BackgroundWorkers: Backgroundworkerpp1, pp2, pp3, pp4 and pp5
- 4 Buttons: StartAsyncButton, CancelAsyncButton, EndButton and MonitorButton
- 10 Labels: StateL (State), Resultlabelpp1, pp2, pp3, pp4, pp5, Label1 (Proc #), Label2 (Performance), Label3 (# proc) Label4 (Load)
- 12 Text: ProcTpp1, pp2, ppp3, pp4, pp5, StatTpp1, pp2, pp3, pp4,pp5, NprocT and LoadT
Listings VStest2: VStest2ControlForm.vb and VStest2ControlForm.Designer.vb
Resultaten VStest1 en VStest2 met CPU 2
Following are the results with VStest1 and CPU 2:
- For a load factor of 1 and by selecting the Start Button 4 times the individual performance factors are: 110591, 55937, 38425 and 28944
- What the number 55937 means that if you start a second thread the performance of the first decreases by roughly 50%.
This is as expected because 50% of the load goes to thread 1 and 50 % to thread 2 -
- For a load factor of 1 and by selecting the Start Button 4 times the total performance factors are: 110591, 110633, 112120 and 113443
- What those number means is means that if you start more threads the total performance stays constant
This is as expected because CPU 2 has only one processor. -
- For a load factor of 1000 and by selecting the Start Button 4 times the individual performance factors are resp: 174 , 88, 59 and 47
- For a load factor of 1000 and by selecting the Start Button 4 times the total performance factors are resp: 174 , 178, 178 and 188
Following are the results with VStest2 and CPU 2:
- For a load factor of 1 and by selecting the Start Button once the performance factor is approximate: 55384
- For a load factor of 1000 and by selecting the Start Button once the performance factor is: 67
- For a load factor of 5000 and by selecting the Start Button once the performance factor is: 14
VStest1 - Load factor 1
|
VStest1 - Load factor 1000
|
VStest2 - Load factor 1 and 1000
|
Antwoord op vraag 4 - Nut parallel programmeren bij simulaties
Uit de resultaten van de proeven blijkt dat een CVE met twee kernen en vier draden en door gebruik te maken van parallel programmeren de beste prestaties bij simulaties van fysische systemen behaald worden als er 2 draden (virtuele processoren) worden gebruikt. Met 3 of meer virtuele processoren verbeteren de prestaties niet.
De belangrijkste reden is dat de extra prestaties van derde virtuele processor teniet worden gedaan door een daling in de prestaties van de eerste twee virtuele processoren.
Bovendien is de algemene prestaties gelijk of minder dan de CPU van 2.8 GHz (met een kern en een draad) - Zie ook Nabeschouwing deel 2
Answer Question 5 - Benchmark
The best CPU architecture is the CPU:
- with the highest clock speed (3.2 GHZ ?)
- and with the number of threads equal to the number of cores.
Antwoord Vraag 7 - De wet van Amdahl
De wet van Amdahl stelt dat de prestaties van de CVE (CPU) afneemt als functie van het aantal processoren die gebruik maken van parallel programmeren.
Kijk bij: De wet van Amdahl (Engels) en Parallel rekenen (Engels)
Deze wet geldt niet in zijn geheel voor de geteste CPU's. De prestaties verminderen vanwege twee redenen:
- In de eerst plaats als een functie van het aantal programma's (processoren en draden/threads) die gebruik maken van parallel programmeren, dat wil zeggen dat er sprake is van een vorm van synchronisatie.
Dit zijn de programma's VSTEST2, PlanetPP en FibonacciPP.
- In de tweede plaats als een functie van het totale aantal programma's die gelijktijdig worden uitgevoerd .
De wet van Amdahl slaat op reden een, niet op reden twee.
De programma's VSTEST1 en Planets3Dsimple slaan op reden twee.
Beide programma's laten zien dat hoe meer programma's men laad, hoe meer de prestaties van de al aanwezige programma's naar beneden gaan.
Het ergste bij het programma dat als eerste geladen werd.
Nabeschouwing - deel 1
- The 4 results for single processor applications for CPU 1 are: 76453, 101, 39341 and 46 (The first two are from VStest1 and the second two from VStest2)
The 4 corresponding results for single processor applications for CPU 2 are: 101591, 174, 55384 and 67
When you compare those results CPU 2 is much better (available at least in 2003)
- The 4 results for CPU 1 performing VStest2 with a load of 5000 are: 9,15,16 and 16. The comparable number for CPU 2 is 14.
What that means is that CPU 1 is only slightly more powerfull than CPU 2 for very high load situations when parallel processing is used.
- Those same tests also show that parallel programming only makes sense for a maximum of two processors. With 3 and 4 processors the performance does not improve and for lower load factors even decreases. However, and that is important, the overall performance is almost equal compared with (Single) CPU 2.
- The only use that CPU 1 is really more powerfull than CPU 2 if you use each processor for a different application, that means if the 4 applications are independent of each other and do not require any form of synchronisation.
This are in VStest1 in CPU 1 the total numbers of 160997 and 137 versus in CPU2 of the numbers 84398 and 122
- See also: Are we reaching the maximum of CPU performance This is a Usenet discussion in sci.physics.research
Nabeschouwing - deel 2
3 single Core CPU's were tested and 1 Dual Core CPU with 4 threads. (2C/4T)
That does not mean that there are no CPU's who do perform the three tests (VStest1, VStest2 and Planet) better.
For the single Core most probably the AMD Sempron 145
which has a CPU mark of 916, performs better.
When you compare the test results for VStest1 the results are that when you use 1 thread the performance is 76453 and with 4 threads 160997, that means roughly a factor 2 better, which is the same as the number of cores.
This leads to the surprising conclusion that for scientific applications maybe a 2 Core and 2 Thread Intel CPU is much more practical than a 2C/4T.
The same may apply for a 4 Core/4 Thread versus a 4C/8T and a 6 Core/6 Thread versus a 6C/12T
In science what you want is to solve a single problem using your PC for 100%.
|
- If you have two cores (and 2 threads) one core becomes the master (where you perform disc i/o and display i/o) and the other one the slave (where 50% of the actual calculations are done in parallel with the master). If such a system gives 100% load you are satisfied.
- Adding 2 threads is not necessary, specific when the consequences are that the performance of the first two goes down.
For possible examples of CPU's with the same number of Cores and Threads see here:
- Intel® Core™2 Extreme Desktop Processor Family :
X6800 Benchmark 1831 (2C/2T),
Q6800 Benchmark 3642 and
X9775 Benchmark 3810 (4C/4T)
- Intel® Core™2 Quad Processor Q8000 Series :
Q8200 Benchmark 3261,
Q8400 Benchmark 3664 and
Q9650 Benchmark 4604
All: 4C/4T
- Intel® Xeon® Processor Family
:
X7542 Benchmark ? ,
X7460 Benchmark 18304
and L7455 All: 6C/6T
It would be very interesting to learn about the results of tests, which the same number of Cores and Threads using multiprocesing i.e. VStest2 or Planet1
Commentaar
Geen
E-mail:nicvroom@pandora.be.
Gemaakt: 1 December 2010
Bijgewerkt op 1 januari 2011
Bijgewerkt op 21 februari 2011
Terug naar het begin Inhoud van dit Document