19 MAR 2001 ...........


CPU  Performance and Monitoring.
-------------------------------------------
-------------------------------------------

a) Performance
   -----------



LINPACK
-------

The majority of HEP analysis is based on floating point operations,
consequently LINPACK has been used as the main performance test.
Three versions have been found on the web written in FORTRAN, C 
and JAVA , respectively. The JAVA version was originally an applet
and has been converted to an application.
(see http://www.netlib.org/benchmark/linpackjava/  and references therein)

All versions of the code give close to 40 Mflop/s for double precision on
ppepc42.

Java Linpack Source code. Run instructions and output

Java Linpack Source code - long run. Run instructions and output

C Linpack Source code  Run instructions and output

FORTRAN Linpack Source code  Run instructions and output







HeapTest
--------

This JAVA program performs a set of memory allocation and computation tasks.
The tasks are distributed across a number of threads; for details see 
the reference below.

A  run on ppepc42 gives:- 

Max # threads =      5
Total (heap + CPU) cycles = 2


# Threads	2 Heap, 0 CPU	1 Heap, 1 CPU	0 Heap, 2 CPU

1	10610	7305	2901

2	10527	6796	3051

3	10354	7444	3216

4	9584	7478	3368

5	10366	6997	3448



Ref:   http://developer.java.sun.com/developer/technicalArticles/Programming/JVMPerf/




SciMark
-------

The results of the SciMark benchmark are shown here  (X8 needed to read table).
This was obtained using:   appletviewer http://math.nist.gov/scimark2/run.html 
and runs the benchmark using the cache.
For details see: http://www.epcc.ed.ac.uk/javagrande/javag.html.
                 http://math.nist.gov/scimark2/about.html



Runs using  downloaded code are as follows:


[ppepc41] /ppe_data/sv001/data/skilli/scimark20 > java jnt.scimark2.commandline

SciMark 2.0a

Composite Score: 81.31599474565635     <- approx Mflops/sec using cache
FFT (1024): 37.61733606909959
SOR (100x100):   214.29660677410084
Monte Carlo : 19.35088320302679
Sparse matmult (N=1000, nz=5000): 73.96839617654105
LU (100x100): 61.34675150551359

java.vendor: Sun Microsystems Inc.
java.version: 1.3.0
os.arch: i386
os.name: Linux
os.version: 2.2.16-3
[ppepc41] /ppe_data/sv001/data/skilli/scimark20 > java jnt.scimark2.commandline -large

SciMark 2.0a

Composite Score: 33.14671516551844    <- approx Mflops/sec using out-of-cache memory 
FFT (1048576): 9.668659808552293
SOR (1000x1000):   78.85440170208969
Monte Carlo : 19.919520580545072
Sparse matmult (N=100000, nz=1000000): 31.527094040392804
LU (1000x1000): 25.763899696012317

java.vendor: Sun Microsystems Inc.
java.version: 1.3.0
os.arch: i386
os.name: Linux
os.version: 2.2.16-3
[ppepc41] /ppe_data/sv001/data/skilli/scimark20 > 




-----------------------------------------------------------------------------------------------------------------


The CPU used in the preceding tests has the following characteristics (from /proc/cpuinfo ):

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Pentium III (Coppermine)
stepping        : 3
cpu MHz         : 801.837
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
sep_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 3
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 pn mmx fxsr xmm
bogomips        : 1599.08


-----------------------------------------------------------------------------------------------------------------





b) CPU Monitoring
   --------------

A series of JAVA programs have been written to examine how CPU performance may
be studied. 
A multi-client server has been written to transfer the CPU information in
/proc/stat to one or more monitoring clients. To run the monitoring system
a server is set up in each CPU to be monitored and a client ( or several clients
if it is required to have different forms of monitoring) runs the monitoring
program. 


1) Output from one of the versions of the monitoring program is shown  here.
   (The plot should be enlarged X4 to show the detail).  here.
   The main panel shows 128 buttons corresponding to (potential) CPUs. Four buttons correspond to
   CPUs ppepc40-43. Buttons are blue if the CPU efficiency ( % CPU time used for user programs) is 
   greater than 10% , red if the efficiency is  less than 10%, and are yellow if unconnected. 
   Clicking a button gives a set of three JAS-based histograms that update every 
   2 seconds (approximately) to give the time-dependence of the % of CPU time
   used for user programs, the % of time the CPU is idle and the % of CPU time
   used by the system. For the time axis, 0 is now , -200 is 200 seconds(approx)
   in the past. 
   The monitoring program is run on ppepc42; this accounts for the significant
   system usage on this machine.
   Note that the servers (for example 41)  use very little CPU time.


2) The monitor shown  here  gives the cumulative efficiency in the 
   small plots (efficiency is the horizontal axis).
   The larger central plot gives the instantaneous efficiency (vertical) versus CPU number(horizontal). 

3) This version of of the monitor makes use of the JAVA Progress Bar and Slider Bar Widgets to
   display the CPU efficiency. The time over which the efficiency is averaged can be set
   by the slider.
   A typical display is shown  here.


4) The monitor shown  here   displays three histograms for CPU utilisation: 
   fraction of CPU time in user, idle or system status.
   As shown, 128 CPUs can be displayed; four are plotted here. If a CPU is unavailable a negative
   signal is plotted.


c) Network Performance Tools
   -------------------------
  In progress .......