Oh no! Where's the JavaScript?
Your Web browser does not have JavaScript enabled or does not support JavaScript. Please enable JavaScript on your Web browser to properly view this Web site, or upgrade to a Web browser that does support JavaScript.

HPC Monitoring Tool

tekzsbin September 30 2024 11:23

This guide that explains the installation of the Performance Monitoring tool for HPC clusters, a real-time HPC performance monitoring tool with automatic node detection and basic benchmarking.

HPC Performance Monitoring Tool

Features

 

1. Hardware Data Collection

  • Detects and collects essential hardware characteristics of compute nodes.
  • Gathers details such as server type, CPU, GPU, RAM, interconnect, pci, drives, partitions, raid configuration etc.

2. Resource Benchmarking

  • Executes a series of predefined benchmarks to evaluate current available resources.
  • Benchmarks include CPU and GPU performance, memory utilization, interconnect bandwidth and disk subsystem bandwidth.
  • Benchmarks are run at user-defined intervals to ensure up-to-date monitoring.

3. Data Transmission

  • The collected data is sent to a MySQL-based web server.
  • The server hosts a performance monitoring GUI for comprehensive oversight of node performance and resource utilization.

Requirements

  • Linux-based HPC environment: PHP7, iperf3, inxi
  • Hosting environment: Apache2, PHP7, Mysql8

Limitations

  • This is the root user version of the tool. Root access to the HPC environment is required at the moment. But regular user accounts can be used on the hosting server.

Installation

Cloning

Clone this repository to the a user environment on the login node. It is preferrable to install the directory to a user account not root, to make sure it is accessible from all nodes, as the root account is usually restricted to the headnode.

git clone https://github.com/serdar-acir/HPC_Monitor.git

Setting up the hosting server

Setup a separate maria-db based web server on a hosting platform. I am not going to get into details of this step as opening an hosting account is beyond the scope of this document. Mysql or maria-db based hosting is required.

Upload the hosting_src directory to the hosting server and configure the HPC.config file for your HPC cluster. You can manage multiple HPC clusters with the GUI. Here is a sample HPC.config for two HPC clusters called HPC1 and HPC2.


//General configuration
date_default_timezone_set('Europe/Istanbul');
$clusters = ["HPC1", "HPC2"];
$descs = [
"16 compute nodes, 240 CPU Cores, 4.3 TB Memory, 64 TB Storage, 12 GPUs, 29952 GPU Cores" //description of HPC2
];
//Database configuration
$database="database_name";
$host="localhost";
$user="username";
$password="password";
?>

On the hosting server set up your maria-db database and reach to hosting_src/setup.php page via a browser. Enter the required data and complete the installation.

Setting up the login node

At the login node access to the login_node_src folder as root and configure HPC1.config file according to your specific HPC environment as described in its README file. You can name HPC1.config to the name of your cluster such as my_cluster.config. “my_cluster” will be displayed on the monitoring tool as the cluster name.

Here is a sample HPC1.config.


$node_array = array ("login","cn01","cn02","cn03","cn04","cn05","cn06","cn07","cn08","cn09","cn10","cn11","cn12","cn13","cn14","cn15","cn16");
$home_ip = "1.2.3.4"; //ip address of the storage unit (preferably the fast interconnect ip address such as infiniband, RoCE etc.)
// if there are multiple storage units choose one.
$recording_host = "http://xxx.xxx.xxx/"; //the url of the web interface
//the web interface both collects the data via http/s port and serves as the performance GUI
?>

Modify the node_array for your own HPC setup. The home_ip is the ip address of the storage unit usually through a high speed interconnect like infiniband or RoCE. Make sure you do not enter the ip address of the management ethernet here.

The recording_host is the ip address or the subdomain of the hosting where you will access the GUI web interface.

Collecting HPC infrastructure data

./login_node_src/collect_data/data_collect.sh 

Crontab

You need to collect performance data from your HPC cluster and send it to the hosting server periodically. At the login node enter a crontab entry, like:

*/5 * * * * cd ~/HPC_Monitor/root_version && /usr/bin/php ~/HPC_Monitor/root_version/sap_cron2.php

This will set up 5 minutes benchmarking intervals.

Now you can access the performance monitoring GUI through your web browser to view the collected data.

 

1 comment

1 comment

Leave a Comment

Please Login to Post a Comment.
  • tekzsbin
    Is there a fully automated version of this?
    - 30 September 2024 16:30

    Sign In

    Not a member yet? Click here to register.
    Forgot Password?

    Users Online Now

    Guests Online 2
    Members Online 0

    Total Members: 2
    Newest Member: sacir


    Powered by PHPFusion. Copyright © 2024 PHP Fusion Inc.
    Released as free software without warranties under GNU Affero GPL v3.

    Theme by PHP Fusion Inc
    145,753 unique visits