Performance data for Paravirtops+VMI enabled kernels

Krishna Raj Raja krraja at vmware.com
Fri Feb 9 02:50:08 GMT 2007


Hi,
 
I work for the performance group at VMware. I have been collecting some
performance numbers on the 2.6.20-rc6 kernel that I would like to share with
you.
These are numbers comparing the paravirtops+VMI-enabled kernel with the
non-paravirtops kernel running on native hardware.
 
There is a negligible performance difference between the
paravirtops+VMI-enabled kernel and the non-paravirtops kernel on a variety of
benchmarks, as shown below.
In many cases, the paravirtops+VMI-enabled kernel performs marginally better.
This is either experimental noise or favorable cache disturbances at the
micro-architectural level.
 
I had trouble installing Feisty herd-2 on my machines, so one of set of
benchmark results were obtained on the Ubuntu Edgy i386 server distro. The
3d-benchmark results below however, were obtained on Feisty-herd-3 (the
installer issue that I was having seems to be have been fixed in this
release)
 
Please feel free to mail me if you have any questions.
 
Thanks
-Krishna

Ubuntu Feisty Performance


Test Setup

Hardware: AMD Dual Core Opteron 270 (4 cores), 2.0Ghz, L2 1M, 64K L1, 5G RAM
(non-PAE kernel only 4G used) 

Distro: Ubuntu 6.10 Edgy Server i386 

Kernel: 

*	PVOPS+VMI: 2.6.20-rc6-ubuntu (testing at 432106): CONFIG_PARAVIRT=y,
CONFIG_VMI=y 
*	non-paravirtops: 2.6.20-5.7-generic (clean at 432106):
config-2.6.20-5.7-generic 
*	ubuntu wireless and flash memory drivers failed to compile and were
disabled in the config 


Native Performance: Pvops+VMI vs non-paravirtops Kernel



Macro Benchmarks

3 iterations, score is avg of last 2 iterations, 1st run discarded. Scores
are in secs, lower scores are better. Scores within brackets are throughput
numbers, higher is better. 

Workload 

non-paravirtops 

Pvops + VMI 

non-paravirtops /VMI 

SpecJBB 

730.62 (score:9606) 

730.98 (score:10424) 

0.99 (1.08) 

reAIM (fserver)+ 

55.52 (score:2427) 

55.80 (score:2415) 

0.995 (0.995) 

Kernel Compile (1P, -j1) 

380.45 

391.425 

0.972 

Kernel Compile (1P, -j2) 

391.60 

377.83 

1.036 

Kernel Compile (2P, -j2) 

204.075 

204.555 

0.997 

Kernel Compile (2P, -j8) 

201.425 

201.63 

0.999 

Ogg Encoding* 

212.68 

213.12 

0.998 

SpecJBB is throughput based benchmark (fixed run time). Numbers inside the
bracket should be considered 

* wave file of track length 48m 24s at 256kbps bitrate, 5 iterations, 1st
iteration discarded 

+ fsever workload has been customized to run on tmpfs instead of regular fs 


CPU Microbenchmarks

maxcpus=1 

SPECPU 2006, gcc 4.1.2, scores are run time in secs lower scores are better,
scores within brackets are estimated base ratio - higher ratios are better 

Workload 

non-paravirtops 

Pvops + VMI 

non-paravirtops/VMI 

perlbench 

948 (10.3) 

949 (10.3)

0.998 

bzip2 

1550 (6.22) 

1560 (6.2) 

0.993 

gcc 

913 (8.82) 

913 (8.82) 

1.000 

mcf 

1020 (8.91) 

1030 (8.90) 

0.990

gobmk 

991 (10.6) 

992 (10.6) 

0.999 

hmmer 

1670 (5.58) 

1670 (5.58) 

1.000 

sjeng 

1320 (9.17) 

*1320 (9.19) 

1.000 

libquantum 

2200 (9.41) 

2200 (9.41) 

1.000 

h264ref 

1980 (11.2) 

1980 (11.2) 

1.000 

omnetpp 

859 (7.28) 

860 (7.27) 

0.998 

astar 

1140 (6.18) 

1140 (6.18) 

1.000 

xalancbmk 

1100 (6.26) 

1100 (6.26) 

1.000 


I/O Benchmarks

Client machine: Dell 1600C P4 xeon 2 x 2.4Ghz, 1G RAM, Win2K professional SP4


Setup: Client machine is connected to server using crossover cable at Gigabit
Link speed. 

maxcpus=1 


Netperf

default: netperf -H $1 -l 60 -t TCP_STREAM -- -m 8192 -M 8192 -s 4096 -S 8192


tuned: netperf -H $1 -l 60 -t TCP_STREAM -- -m 8192 -M 8192 -s 32768 -S 65536


Scores are throughput in Mbps, average of 4 runs, higher scores are better 

Workload 

non-paravirtops 

Pvops + VMI 

VMI/non-paravirtops 

send,default 

418.10 

416.16 

0.995 

send,tuned 

932.42 

926.33 

0.993 

recv, default 

227.40 

226.555 

0.996 

recv, tuned 

945.11 

944.78 

0.999 


IOMeter

IOMeter 2006_07_27-RC3, 1 worker thread 

Scores are throughput numbers in IOPS, higher scores are better. Numbers
inside the brackets are throughput in Mbps 

Workload 

non-paravirtops 

Pvops + VMI 

VMI/non-paravirtops 

4K sequential read 

11823.98 (46.18)

12345.57 (48.22) 

1.044 

16K sequential read 

4571.101 (71.43) 

4576.95 (71.51) 

1.001 

32K sequential read 

2331.338 (72.85) 

2333.80 (72.93) 

1.001 

4K sequential write 

11480.75 (44.84) 

11842.80 (46.26) 

1.031 

16K sequential write 

4008.865 (62.64) 

4011.06 (62.67) 

1.000 

32K sequential write 

1942.484 (60.70) 

1946.50 (60.82) 

1.002 

4K random read 

252.96 (0.98) 

252.076 (0.98) 

0.996 

16K random read 

242.577 (3.79)

242.098 (3.78) 

0.998 

32K random read 

229.32 (7.16) 

228.90 (72.15) 

0.998 

4K random write 

841.58 (3.28) 

842.54 (3.291) 

1.001 

16K random write 

552.13 (8.62) 

545.06 (8.51) 

0.987 

32K random write 

626.61 (19.58) 

626.41 (19.57) 

0.999 


3d Benchmarks

System: Dell Precision 390, Intel Core 2 Duo 6400@ 2.13Ghz, 2G RAM, Nvidia
Quadro NVS 285, 

Distro: Ubuntu Feisty Herd-3 Desktop i386, Nvidia Driver build 1.0-9746: 

Kernel: 

*	non-paravirtops: 2.6.20-rc6-ubuntu1 (clean at 432106): CONFIG_VMI=n,
CONFIG_PARAVIRT=n 
*	Ubuntu-generic: 2.6.20-6-generic: stock Feisty-herd-3 kernel 
*	VMI: 2.6.20-rc6-ubuntu (testing at 432106): CONFIG_PARAVIRT=y,
CONFIG_VMI=y, 

Test: SPECviewperf-9.0.3 

Scores are in Frames Per Sec (FPS), Higher scores are better 

Workload 

non-paravirtops 

Ubuntu-generic 

VMI 

VMI/Ubuntu-generic 

VMI/non-paravirtops 

3dsmax-04 

6.882 

6.876 

6.930 

1.007 

1.006 

catia-02 

8.860 

8.779 

8.843 

1.007 

0.998 

ensight-03 

4.648 

4.637 

4.638 

1.000 

0.997 

light-08 

9.047 

8.849 

9.052 

1.022 

1.000 

maya-02 

15.48 

15.70 

15.47 

0.985 

0.999 

proe-04 

7.139 

7.286 

7.122 

0.977 

0.997 

sw-01 

8.948 

8.916 

8.940 

1.002 

0.999 

ugnx-01 

1.085 

1.083 

1.083 

1.000 

0.998 

tcvis-01 

1.600 

1.600 

1.605 

1.003

1.003 

NVidia Quadro NVS is a business desktop class graphics card, scores are low 

2.6.20-6-generic stock herd-3 kernel has PVOPS/VMI enabled by default 

With Nvidia Geforce 7600GT: 

Scores are in Frames Per Sec (FPS), Higher scores are better 

Workload 

non-paravirtops 

PVOPS+VMI 

VMI/non-paravirtops 

3dsmax-04 

10.77 

10.67 

0.990 

catia-02 

11.49 

11.53 

1.003 

ensight-03 

9.060 

9.040 

0.997 

light-08 

10.01 

10.09 

1.007 

maya-02 

28.64 

28.64 

1.000 

proe-04 

9.593 

9.694 

1.010 

sw-01 

17.29 

17.34 

1.002 

ugnx-01 

3.066 

3.065 

0.999 

tcvis-01 

3.918 

3.918 

1.000 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20070208/884641a6/attachment-0001.htm 


More information about the ubuntu-devel mailing list