HPC Performance Monitoring Tool
Features
1. Hardware Data Collection
- Detects and collects essential hardware characteristics of compute nodes.
- Gathers details such as server type, CPU, GPU, RAM, interconnect, pci, drives, partitions, raid configuration etc.
2. Resource Benchmarking
- Executes a series of predefined benchmarks to evaluate current available resources.
- Benchmarks include CPU and GPU performance, memory utilization, interconnect bandwidth and disk subsystem bandwidth.
- Benchmarks are run at user-defined intervals to ensure up-to-date monitoring.
3. Data Transmission
- The collected data is sent to a MySQL-based web server.
- The server hosts a performance monitoring GUI for comprehensive oversight of node performance and resource utilization.
Requirements
- Linux-based HPC environment: PHP7, iperf3, inxi
- Hosting environment: Apache2, PHP7, Mysql8
Limitations
- This is the root user version of the tool. Root access to the HPC environment is required at the moment. But regular user accounts can be used on the hosting server.
Installation
Cloning
Clone this repository to the a user environment on the login node. It is preferrable to install the directory to a user account not root, to make sure it is accessible from all nodes, as the root account is usually restricted to the headnode.
git clone https://github.com/serdar-acir/HPC_Monitor.git
Setting up the hosting server
Setup a separate maria-db based web server on a hosting platform. I am not going to get into details of this step as opening an hosting account is beyond the scope of this document. Mysql or maria-db based hosting is required.
Upload the hosting_src
directory to the hosting server and configure the HPC.config
file for your HPC cluster. You can manage multiple HPC clusters with the GUI. Here is a sample HPC.config
for two HPC clusters called HPC1 and HPC2.
//General configuration
date_default_timezone_set('Europe/Istanbul');
$clusters = ["HPC1", "HPC2"];
$descs = [
"16 compute nodes, 240 CPU Cores, 4.3 TB Memory, 64 TB Storage, 12 GPUs, 29952 GPU Cores" //description of HPC2
];
//Database configuration
$database="database_name";
$host="localhost";
$user="username";
$password="password";
?>
On the hosting server set up your maria-db database and reach to hosting_src/setup.php
page via a browser. Enter the required data and complete the installation.
Setting up the login node
At the login node access to the login_node_src
folder as root and configure HPC1.config
file according to your specific HPC environment as described in its README file. You can name HPC1.config
to the name of your cluster such as my_cluster.config
. “my_cluster” will be displayed on the monitoring tool as the cluster name.
Here is a sample HPC1.config
.
$node_array = array ("login","cn01","cn02","cn03","cn04","cn05","cn06","cn07","cn08","cn09","cn10","cn11","cn12","cn13","cn14","cn15","cn16");
$home_ip = "1.2.3.4"; //ip address of the storage unit (preferably the fast interconnect ip address such as infiniband, RoCE etc.)
// if there are multiple storage units choose one.
$recording_host = "http://xxx.xxx.xxx/"; //the url of the web interface
//the web interface both collects the data via http/s port and serves as the performance GUI
?>
Modify the node_array
for your own HPC setup. The home_ip
is the ip address of the storage unit usually through a high speed interconnect like infiniband or RoCE. Make sure you do not enter the ip address of the management ethernet here.
The recording_host
is the ip address or the subdomain of the hosting where you will access the GUI web interface.
Collecting HPC infrastructure data
./login_node_src/collect_data/data_collect.sh
Crontab
You need to collect performance data from your HPC cluster and send it to the hosting server periodically. At the login node enter a crontab
entry, like:
*/5 * * * * cd ~/HPC_Monitor/root_version && /usr/bin/php ~/HPC_Monitor/root_version/sap_cron2.php
This will set up 5 minutes benchmarking intervals.
Now you can access the performance monitoring GUI through your web browser to view the collected data.
article provided by:https://www.serdaracir.net/hpc/hpc-monitoring-tool-installation/