Benchmarks don't lie (TM), part 2

Christian Bauer Christian.Bauer at Uni-Mainz.DE
Sun Sep 28 19:30:15 CEST 2003


Hi!

On Wed, Sep 24, 2003 at 12:07:40AM +0200, Richard B. Kreckel wrote:
> AMD released the Opteron processor family today leaving people with the
> budget to buy new hardware wondering what exactly to purchase next.

Well, for those among us who don't have the budget to always buy the latest
kick-ass machines (with their "SDRAM memory" and "hardware accelerated 3D"
and other crazy stuff), the GiNaC Retro Hardware Testing Labs are proud to
present what you've all been waiting for:

  The ultimate CAS shootout at 2x200 MHz
   - No rules, no mercy. Two CPUs enter, one CPU leaves.
     (then, after a while, the other CPU leaves, as soon as I manage to get
     the heat sink off the f*cking thing...)

The contestants:

System 1 - ppc:
  Umax Pulsar, Dual PowerPC 604e ("Extreme"?) at 200 MHz
    L1 cache: 32KB I, 32KB D per CPU
  Apple Tsunami board (also used in PowerMac 9500)
    L2 cache: 512KB for both CPUs, at 50 MHz
    50 MHz system bus
    144MB EDO RAM, 60ns
  Yellow Dog Linux 2.3 (based on Red Hat 7.2)
    Kernel 2.4.19-4asmp
    GCC 2.95.4

System 2 - x86:
  Dual Pentium Pro 512K at 200 MHz
    L1 cache: 8KB I, 8KB D per CPU
    L2 cache: 512KB per CPU, at 200 MHz
  Intel Providence (PR440FX) board
    66 MHz system bus
    256MB registered EDO RAM, 60ns
  Red Hat Linux 7.3
    Kernel 2.4.20-20.7smp
    GCC 2.96

Both machines were equipped with Matrox Millennium graphics cards and
SCSI hard disks (ppc: 4GB IBM Fast Narrow; x86: 2GB Conner Fast Wide).

The Umax Pulsar features a fan that appears to be optimized for maximum
noise output. Jet pilots should feel right at home with this computer.
The Intel machine, on the other hand, sports a hard disk that I could still
hear while standing under the shower. Ear protection should be worn at
all times when running both systems in the same room.

But on to the benchmarks...

The tests consisted of compiling GiNaC 1.0.15 (GiNaC >=1.1 would have
required GCC 3), and running its standard benchmark suite. The compiler
options used were

  ppc: -g -O2 -mcpu=604e
  x86: -g -O2 -march=pentiumpro

and GiNaC was configured with the --disable-static option (the shared
library will be the one used most by applications, anyway).

For the compilation test, only the time required for compiling the library
and tools (ginsh/viewgar) was measured, not the time for compiling the
benchmark suite. The library was built with "make -j 2" ("make -j 3" was
slower by about 30s on both machines).

                                                        ppc      x86
----------------------------------------------------------------------
compile GiNaC 1.0.15                                  25m 34s  16m 42s

The Pentium Pro really shines here, which may be due to its faster and
larger (combined) L2 cache. But this comparison isn't quite fair really,
as the compilers are of course using different backends on both systems
and producing different output.

So, without further ado, on to the real tests:

                                                        ppc      x86
----------------------------------------------------------------------
commutative expansion and substitution, size 100       1.43s    1.62s
commutative expansion and substitution, size 200       7.32s    7.14s
                                               ratio  [5.12]   [4.41]
Laurent series expansion of Gamma function, order 20   9.91s    7.429s
Laurent series expansion of Gamma function, order 25  38.74s   28.339s
                                               ratio  [3.91]   [3.81]
determinant of symbolic 10x10 Vandermonde matrix       6.55s    6.86s
determinant of symbolic 12x12 Vandermonde matrix      56.57s   63.28s
                                               ratio  [8.64]   [9.22]
determinant of symbolic 8x8 Toeplitz matrix            4.82s    5.65s
determinant of symbolic 9x9 Toeplitz matrix           18.98s   21.12s
                                               ratio  [3.94]   [3.74]
Lewis-Wester test A (divide factorials)                0.38s    0.56s
Lewis-Wester test B (sum of rational numbers)          0.04s    0.059s
Lewis-Wester test C (gcd of big integers)              0.4s     0.619s
Lewis-Wester test D (normalized sum of rational fcns)  1.5s     1.689s
Lewis-Wester test E (normalized sum of rational fcns)  1.28s    1.489s
Lewis-Wester test F (gcd of 2-var polys)               0.17s    0.19s
Lewis-Wester test G (gcd of 3-var polys)               3.91s    4.459s
Lewis-Wester test H (det of 80x80 Hilbert)            23.12s   27.66s
Lewis-Wester test I (invert rank 40 Hilbert)           7.37s    8.6s
Lewis-Wester test K (invert rank 70 Hilbert)          47.17s   54.45s
                                               ratio  [6.40]   [6.33]
Lewis-Wester test J (check rank 40 Hilbert)            3.95s    5.05s
Lewis-Wester test L (check rank 70 Hilbert)           22.25s   28.36s
                                               ratio  [5.63]   [5.62]
Lewis-Wester test M1 (26x26 sparse, det)               0.88s    1.189s
Lewis-Wester test O1 (three 15x15 dets) (average)    109.783s  90.246s
Lewis-Wester test P (det of sparse rank 101)           2.86s    4.19s
Lewis-Wester test P' (det of less sparse rank 101)    14.66s   17.51s
computation of antipodes in Yukawa theory (total)    192.64s  172.27s
timing Fateman's polynomial expand benchmark         362.21s  293.579s

Now, this comes as a bit of a surprise. After reading the MuPAD benchmarks
published at http://www.heise.de/ct/english/96/11/270/ running on machines
very similar to mine, I really expected the Pentium Pro to wipe the floor
with the PowerPC here, but it's actually the other way round. The 604e wins
almost all categories, with some notable exceptions: the Gamma series
expansion, O1, the Yukawa thing, and the expand benchmark.

On the other hand, judging from the "ratio" lines above, the performance of
the Pentium Pro appears to scale better with larger data sets (again with
one exception: the Vandermonde determinants). This, no doubt, is due to the
faster cache and generally better memory interface of the Intel machine.

But still, my next personal machine won't be a Pentium Pro, and it won't be
a "G2" PowerMac, either. The VCS 2600 is going cheap on eBay, though...

Bye,
Christian

-- 
  / Physics is an algorithm
\/ http://www.uni-mainz.de/~bauec002/



More information about the GiNaC-list mailing list