19 MAR 2001 ...........
CPU Performance and Monitoring.
-------------------------------------------
-------------------------------------------
a) Performance
-----------
LINPACK
-------
The majority of HEP analysis is based on floating point operations,
consequently LINPACK has been used as the main performance test.
Three versions have been found on the web written in FORTRAN, C
and JAVA , respectively. The JAVA version was originally an applet
and has been converted to an application.
(see http://www.netlib.org/benchmark/linpackjava/ and references therein)
All versions of the code give close to 40 Mflop/s for double precision on
ppepc42.
Java Linpack Source code. Run instructions and output
Java Linpack Source code - long run. Run instructions and output
C Linpack Source code Run instructions and output
FORTRAN Linpack Source code Run instructions and output
HeapTest
--------
This JAVA program performs a set of memory allocation and computation tasks.
The tasks are distributed across a number of threads; for details see
the reference below.
A run on ppepc42 gives:-
Max # threads = 5
Total (heap + CPU) cycles = 2
# Threads 2 Heap, 0 CPU 1 Heap, 1 CPU 0 Heap, 2 CPU
1 10610 7305 2901
2 10527 6796 3051
3 10354 7444 3216
4 9584 7478 3368
5 10366 6997 3448
Ref: http://developer.java.sun.com/developer/technicalArticles/Programming/JVMPerf/
SciMark
-------
The results of the SciMark benchmark are shown here (X8 needed to read table).
This was obtained using: appletviewer http://math.nist.gov/scimark2/run.html
and runs the benchmark using the cache.
For details see: http://www.epcc.ed.ac.uk/javagrande/javag.html.
http://math.nist.gov/scimark2/about.html
Runs using downloaded code are as follows:
[ppepc41] /ppe_data/sv001/data/skilli/scimark20 > java jnt.scimark2.commandline
SciMark 2.0a
Composite Score: 81.31599474565635 <- approx Mflops/sec using cache
FFT (1024): 37.61733606909959
SOR (100x100): 214.29660677410084
Monte Carlo : 19.35088320302679
Sparse matmult (N=1000, nz=5000): 73.96839617654105
LU (100x100): 61.34675150551359
java.vendor: Sun Microsystems Inc.
java.version: 1.3.0
os.arch: i386
os.name: Linux
os.version: 2.2.16-3
[ppepc41] /ppe_data/sv001/data/skilli/scimark20 > java jnt.scimark2.commandline -large
SciMark 2.0a
Composite Score: 33.14671516551844 <- approx Mflops/sec using out-of-cache memory
FFT (1048576): 9.668659808552293
SOR (1000x1000): 78.85440170208969
Monte Carlo : 19.919520580545072
Sparse matmult (N=100000, nz=1000000): 31.527094040392804
LU (1000x1000): 25.763899696012317
java.vendor: Sun Microsystems Inc.
java.version: 1.3.0
os.arch: i386
os.name: Linux
os.version: 2.2.16-3
[ppepc41] /ppe_data/sv001/data/skilli/scimark20 >
-----------------------------------------------------------------------------------------------------------------
The CPU used in the preceding tests has the following characteristics (from /proc/cpuinfo ):
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 3
cpu MHz : 801.837
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
sep_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 3
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 pn mmx fxsr xmm
bogomips : 1599.08
-----------------------------------------------------------------------------------------------------------------
b) CPU Monitoring
--------------
A series of JAVA programs have been written to examine how CPU performance may
be studied.
A multi-client server has been written to transfer the CPU information in
/proc/stat to one or more monitoring clients. To run the monitoring system
a server is set up in each CPU to be monitored and a client ( or several clients
if it is required to have different forms of monitoring) runs the monitoring
program.
1) Output from one of the versions of the monitoring program is shown here.
(The plot should be enlarged X4 to show the detail). here.
The main panel shows 128 buttons corresponding to (potential) CPUs. Four buttons correspond to
CPUs ppepc40-43. Buttons are blue if the CPU efficiency ( % CPU time used for user programs) is
greater than 10% , red if the efficiency is less than 10%, and are yellow if unconnected.
Clicking a button gives a set of three JAS-based histograms that update every
2 seconds (approximately) to give the time-dependence of the % of CPU time
used for user programs, the % of time the CPU is idle and the % of CPU time
used by the system. For the time axis, 0 is now , -200 is 200 seconds(approx)
in the past.
The monitoring program is run on ppepc42; this accounts for the significant
system usage on this machine.
Note that the servers (for example 41) use very little CPU time.
2) The monitor shown here gives the cumulative efficiency in the
small plots (efficiency is the horizontal axis).
The larger central plot gives the instantaneous efficiency (vertical) versus CPU number(horizontal).
3) This version of of the monitor makes use of the JAVA Progress Bar and Slider Bar Widgets to
display the CPU efficiency. The time over which the efficiency is averaged can be set
by the slider.
A typical display is shown here.
4) The monitor shown here displays three histograms for CPU utilisation:
fraction of CPU time in user, idle or system status.
As shown, 128 CPUs can be displayed; four are plotted here. If a CPU is unavailable a negative
signal is plotted.
c) Network Performance Tools
-------------------------
In progress .......