===================== July 25, 1998 =============================

I had the opportunity to have you bench run on two SGI:
an Octane with 2 R10K@250MHz and an Origin200QC (2Mb of secondary
cache).

The bench were run by an SGI employee.

Octane 2xR10000@250MHz

# processor 	| temps (heure)
----------------------------
	1	|	.93
	2	|	.48

Origin 200QC 2xR10000@180MHz

# processor 	| temps (heure)
----------------------------
	1	|	1.2
	2	|	 .62
	4	|	 .32

Tru
-- 
CEA Centre d'etudes de Saclay 91191 Gif sur Yvette CEDEX FRANCE
DSM/DRECAM/SCM             | DSV/DBCM/SBPM
Bat 137 piece 107          | Bat 528 piece 215
thuynh@cea.fr

===================================================================

Timing in seconds for MbCO + 3830 water molecules (14026 atoms), 1000 steps 12-14 A shift on SGI Power Challenge.

Nodes             1          2           4           8       16

E ext 9423.8 4697.3 2438.0 1170.7 587.8 E int 85.6 47.6 29.5 17.0 11.4 Wait 0.0 38.8 62.6 22.1 24.8 Comm 0.1 17.7 41.5 39.9 68.9 List 853.3 428.7 223.2 112.0 63.6 Integ 21.7 10.6 7.7 5.6 9.7 Total 10384.5 5240.7 2802.4 1367.2 766.1 Total(hours) 2.88 1.456 0.778 0.380 0.213 Eff 100.0% 99.1% 93.0% 95.0% 84.7% Speedup 1.0 1.98 3.71 7.6 13.55

E ext : External energy terms (electrostatics + Lenard-Jones) E int : Internal energy terms (bond, angle, dihedral) Wait : Load unbalance Comm : Communication time (Vector Distr. Global {Sum,Brdcst}) List : Nonbond list generation time Integ : Time needed to integrate equations of motion Total : Total elapsed time Eff : Efficiency = speedup divided by number of nodes Speedup: Time for N nodes divided by time for one node.

Origin 2000

Origin @ dontask.ii.uib.no  (http://www.parallab.uib.no)
 (64 processors, load was 10)

Since f77 compiler 7.2 gives wrong results when compiling CHARMM with
the -O3, only the following makefiles have -O3 option in it (for all
files within the module): energy.mk, images.mk, nbonds.mk. This gives
correct results, while the speed for the following benchmarks is the same
as if compiling the whole CHARMM with -O3.
(EXPAND works the same as NOEXPAND).
c26a1 has to be patched in order to support CMPI MPI keywords. This
combination means that CHARMM uses MPI for send and receive and global
combine routines from CHARMM, while specifying just PARALLEL and PARAFULL
means use combine routines provided by MPI.

The following CHARMM executables were tried: 

name                   keywords in pref.dat
------------------------------------------------
charmm-mpi             PARALLEL PARAFULL
charmm-pvm             PARALLEL PARAFULL CMPI PVMC SGIMP SYNCHRON
charmm-pvm-gencomm     PARALLEL PARAFULL CMPI PVMC SGIMP SYNCHRON GENCOMM
charmm-cmpi-sync       PARALLEL PARAFULL CMPI MPI SYNCHRON
charmm-cmpi-async      PARALLEL PARAFULL CMPI MPI
charmm-cmpi-gencomm    PARALLEL PARAFULL CMPI MPI GENCOMM

bench: MbCO+3830w 1000 steps of dynamics [time in seconds]:

============================================================
executable                  Number of nodes
------------------------------------------------------------
                          8       16       32
------------------------------------------------------------
charmm-mpi              634.8   351.4    226.2
charmm-pvm              626.4   344.7    Doesn't want to start so many
charmm-pvm-gencomm      634.3   351.7    Doesn't want to start so many
charmm-cmpi-sync        624.0   358.6    200.4
charmm-cmpi-async       642.2   346.1    201.6
charmm-cmpi-gencomm     625.4   343.1    211.7







bench1: MbCO+4985w 55.5A cubic box PME simulation (100 steps) [time in seconds]:

============================================================
executable                  Number of nodes
------------------------------------------------------------
                          8       16       32
------------------------------------------------------------
charmm-mpi              255.7   344.0    323.2
charmm-pvm              181.5   138.9    Doesn't want to start so many
charmm-pvm-gencomm      175.7   139.3    Doesn't want to start so many
charmm-cmpi-sync        194.3   214.6    291.4
charmm-cmpi-async       194.6   187.1    286.0
charmm-cmpi-gencomm     175.1   207.6    115.1

Other SG workstations:

                 1 node 
-----------------------
Power Indigo 2   2.93 h
Indy             8.93 h

Comments

See SGI WWW pages for more details