Center for Quantitative Life Sciences, Oregon State University
2025-10-31
ior, mdtest, pfind).A complete cycle from setup to analysis.
io500_wrapper.shThis is the main entry point for running a benchmark.
#!/usr/-bin/env bash
# IO500 Benchmark Wrapper Script
# Usage:
# ./io500_wrapper.sh <cluster> <node> <storage> <device> <test_dir> [volume]
# Example:
# ./io500_wrapper.sh cluster1 node01 nfs hdd /path/to/nfs/dirconfig-nfs.ini, config-ssd.ini).envsubst to inject variables ($TEST_DIR, $OUTPUT_DIR) into the config.io500 (with mpirun if multiple processors are detected).results.json from the raw result_summary.txt.io500_wrapper.shThe wrapper also has a --reprocess mode.
# io500_wrapper.sh (continued)
# OR for reprocessing existing results:
# Usage:
# ./io500_wrapper.sh --reprocess <results_directory>
# Example:
# ./io500_wrapper.sh --reprocess ./results/cluster1/node01/local/ssd/2025.10.23...results.json without re-running the hours-long benchmark.parse_results function, which embeds a Python script.Our wrapper automates config file selection for different storage types.
We use different .ini files to tune the workload for the target storage:
config-ssd.ini: n = 50000000config-nfs.ini: n = 1000000config.ini (default): n = 10000000The n parameter in the [mdtest-*] sections defines the number of files to create.
n to properly stress.n.The io500_wrapper.sh script automatically chooses:
config-ssd.ini for local ssdconfig-nfs.ini for nfs hddconfig.ini for all other combinations.IO500 has a mechanism to ensure it’s measuring sustained performance.
stonewall-time = 300 setting in the config file mandates that key write/create phases must run for at least 300 seconds.ior-easy-write) completes in less than 300 seconds, the benchmark correctly marks the result as [INVALID].Performance numbers are useless without context.
gather_node_metadata.shThis script captures the hardware and software configuration of the node under test.
It saves this information to a node_metadata.json file in a structured path:
./io500_results/<cluster>/<node>/node_metadata.json
Individual JSON files are good, but a single table is better for analysis.
aggregate_results.pyrun_metadata.json) for each test.node_metadata.json).results.json).aggregated_results.csv.The final CSV contains one row per benchmark run, with columns for hardware specs, run parameters, and every metric from the IO500 test.
A practical guide to running a new benchmark.
Note on Environment
I use pixi to help manage dependencies and tasks.
Project code and results are in gitlab.
If you need a CQLS account: https://access.cqls.oregonstate.edu/
I will add your account to the repo.
To start, a user just needs to run (download a pixi release if necessary, first):
The io500_wrapper.sh script handles metadata collection automatically.
Automatic Metadata Gathering
You do not need to run gather_node_metadata.sh manually before every test.
When io500_wrapper.sh starts, it checks if node_metadata.json exists for the target node. If the file is missing, the wrapper will automatically call gather_node_metadata.sh to create it.
This ensures we never have a benchmark run without the corresponding hardware details.
{
"cluster_name": "wildwood",
"node_name": "chrom1",
"hostname": "chrom1.hpc.oregonstate.edu",
"cpu_model": "AMD Opteron(tm) Processor 6376",
"cpu_count": 64,
"cpu_cores": 8,
"memory_gb": 995,
"network_interface": "enp9s0f0",
"network_speed": "10Gbps",
"kernel": "5.14.0-427.18.1.el9_4.x86_64",
"os": "Rocky Linux 9.4 (Blue Onyx)",
"metadata_timestamp": "2025-10-28T20:58:30Z"
}Execute the wrapper script with the correct parameters for the test. The script is run via pixi run within our controlled environment.
Example Commands
The arguments specify the cluster, node, storage backend, and test path. An optional final argument can be used to name the specific storage volume.
Local HDD on chrom1:
NFS-backed HDD on chrom1:
Local SSD on olympus:
These commands are typically submitted as a parallel batch job, as shown next.
io500_wrapper.sh script automatically detects the number of allocated CPUs from the Slurm environment variable $SLURM_CPUS_ON_NODE.mpirun -np $NUM_PROCS) will be used for the benchmark.NUM_PROCS=XX pixi run ...We use a wrapper script hqsub to submit our benchmark jobs to Slurm, specifying the number of processors with the -p flag. We typically use 16 CPUs for these tests.
# Example of submitting a 16-core job to test
# a local SSD on the 'olympus' node.
hqsub 'pixi run ./io500_wrapper.sh wildwood olympus local ssd /scratch/davised/disk_test md126' \
-p 16 \
-r job.olympus_ssd_io500 \
-w olympus \
-q sharptonThe script then uses this count to launch with mpirun.
run_metadata.json and a results.json that we later aggregate into a csv file.{
"ior-easy-write": {
"value": 0.186573,
"unit": "GiB/s",
"time": 357.073,
"valid": true
},
"mdtest-easy-write": {
"value": 21.306493,
"unit": "kIOPS",
"time": 301.924,
"valid": true
},
"ior-hard-write": {
"value": 0.177066,
"unit": "GiB/s",
"time": 356.216,
"valid": true
},
"mdtest-hard-write": {
"value": 14.479136,
"unit": "kIOPS",
"time": 313.218,
"valid": true
},
"find": {
"value": 603.6442,
"unit": "kIOPS",
"time": 18.092,
"valid": true
},
"ior-easy-read": {
"value": 0.18754,
"unit": "GiB/s",
"time": 354.587,
"valid": true
},
"mdtest-easy-stat": {
"value": 65.507102,
"unit": "kIOPS",
"time": 98.751,
"valid": true
},
"ior-hard-read": {
"value": 0.18154,
"unit": "GiB/s",
"time": 347.031,
"valid": true
},
"mdtest-hard-stat": {
"value": 141.592495,
"unit": "kIOPS",
"time": 32.922,
"valid": true
},
"mdtest-easy-delete": {
"value": 17.76809,
"unit": "kIOPS",
"time": 366.711,
"valid": true
},
"mdtest-hard-read": {
"value": 15.494298,
"unit": "kIOPS",
"time": 292.032,
"valid": true
},
"mdtest-hard-delete": {
"value": 23.672366,
"unit": "kIOPS",
"time": 195.132,
"valid": true
},
"score": {
"bandwidth": 0.183131,
"iops": 42.79838,
"total": 2.799595,
"valid": true
}
}After one or more benchmark runs are complete, update the master CSV file.
# This can be run from anywhere with access to the results directory
./aggregate_results.py ./io500_results aggregated_results.csv
# or
pixi run aggregate
➜ pixi run aggregate
✨ Pixi task (aggregate in report): aggregate_results.py io500_results ============================================================
IO500 Results Aggregation
============================================================
Scanning base directory: io500_results
Output file: aggregated_results.csv
============================================================
Recursively scanning io500_results for result directories...
Found 55 result directories
✓ Loaded: wildwood/ayaya01/nfs/nfs7/hdd
✓ Loaded: wildwood/ayaya01/local/ssd
✓ Loaded: wildwood/aspen12/local/ssd
✗ Skipping wildwood/aspen12/nfs/hdd/2025.10.30-14.51.24: missing results.json
✓ Loaded: wildwood/build1/local/hdd
✓ Loaded: wildwood/build1/nfs/fs0/ssd
✓ Loaded: wildwood/build1/nfs/nfs6/hdd
✓ Loaded: wildwood/build1/nfs/nfs4/hdd
✓ Loaded: wildwood/microbiome/nfs/nfs6/hdd
✓ Loaded: wildwood/microbiome/nfs/nfs4/hdd
✓ Loaded: wildwood/microbiome/nfs/fs0/ssd
✓ Loaded: wildwood/microbiome/local/hdd
✓ Loaded: wildwood/ayaya02/nfs/nfs6/hdd
✓ Loaded: wildwood/ayaya02/nfs/nfs7/hdd
✓ Loaded: wildwood/ayaya02/nfs/nfs4/hdd
✓ Loaded: wildwood/ayaya02/local/ssd
✓ Loaded: wildwood/jackson/local/ssd
✓ Loaded: wildwood/jackson/nfs/fs0/ssd
✓ Loaded: wildwood/jackson/nfs/fs0/ssd
✓ Loaded: wildwood/jackson/nfs/nfs4/hdd
✓ Loaded: wildwood/jackson/nfs/nfs7/hdd
✓ Loaded: wildwood/jackson/nfs/nfs6/hdd
✓ Loaded: wildwood/darwin/local/hdd
✓ Loaded: wildwood/darwin/nfs/nfs4/hdd
✓ Loaded: wildwood/darwin/nfs/nfs6/hdd
✓ Loaded: wildwood/cascade/local/ssd
✗ Skipping wildwood/cascade/nfs/hdd/2025.10.30-12.40.14: missing results.json
✓ Loaded: wildwood/chrom1/local/hdd
✓ Loaded: wildwood/chrom1/nfs/fs0/ssd
✓ Loaded: wildwood/chrom1/nfs/nfs7/hdd
✓ Loaded: wildwood/chrom1/nfs/nfs4/hdd
✓ Loaded: wildwood/chrom1/nfs/nfs6/hdd
✓ Loaded: wildwood/cqls-gpu4/nfs/nfs6/hdd
✓ Loaded: wildwood/cqls-gpu4/nfs/nfs4/hdd
✓ Loaded: wildwood/cqls-gpu4/local/ssd
✓ Loaded: wildwood/bact0/local/hdd
✓ Loaded: wildwood/bact0/nfs/fs0/ssd
✓ Loaded: wildwood/bact0/nfs/nfs6/hdd
✓ Loaded: wildwood/bact0/nfs/nfs7/hdd
✓ Loaded: wildwood/bact0/nfs/nfs4/hdd
✓ Loaded: wildwood/olympus/local/nvme0n1/ssd
✓ Loaded: wildwood/olympus/local/md126/ssd
✓ Loaded: wildwood/olympus/nfs/nfs7/hdd
✓ Loaded: wildwood/olympus/nfs/nfs4/hdd
✓ Loaded: wildwood/olympus/nfs/nfs6/hdd
✓ Loaded: wildwood/olympus/nfs/fs0/ssd
✓ Loaded: wildwood/samwise/local/hdd
✓ Loaded: wildwood/samwise/nfs/fs0/ssd
✓ Loaded: wildwood/samwise/nfs/nfs4/hdd
✓ Loaded: wildwood/samwise/nfs/nfs6/hdd
✓ Loaded: wildwood/neo/local/hdd
✓ Loaded: wildwood/neo/nfs/fs0/ssd
✓ Loaded: wildwood/neo/nfs/nfs7/hdd
✓ Loaded: wildwood/neo/nfs/nfs6/hdd
✓ Loaded: wildwood/neo/nfs/nfs4/hdd
⚠ Warning: 2 directories are missing results.json
You can regenerate these by running:
./io500_wrapper.sh --reprocess <directory>
============================================================
Successfully loaded 53 benchmark results
============================================================
Found 12 unique test types
============================================================
✓ Aggregated results written to: aggregated_results.csv
✓ Total benchmark runs: 53
✓ Total test types: 12
============================================================The script will scan for any new results and add them to the CSV.
➜ pixi run render
✨ Pixi task (render): quarto render io500_analysis.qmd
Starting python3 kernel...Done
Executing 'io500_analysis.quarto_ipynb'
Cell 1/24: 'setup'........................Done
Cell 2/24: 'summary-stats'................Done
Cell 3/24: 'score-table'..................Done
Cell 4/24: 'score-plot'...................Done
Cell 5/24: 'bandwidth-prep'...............Done
Cell 6/24: 'bandwidth-by-storage'.........Done
Cell 7/24: 'bandwidth-comparison-table'...Done
Cell 8/24: 'bandwidth-by-difficulty'......Done
Cell 9/24: 'iops-prep'....................Done
Cell 10/24: 'iops-by-operation'............Done
Cell 11/24: 'iops-summary-table'...........Done
Cell 12/24: 'network-speed-impact'.........Done
Cell 13/24: 'nfs-volume-comparison'........Done
Cell 14/24: 'storage-comparison-plot'......Done
Cell 15/24: 'cluster-heatmap'..............Done
Cell 16/24: 'cluster-table'................Done
Cell 17/24: 'device-comparison'............Done
Cell 18/24: 'top-performers'...............Done
Cell 19/24: 'time-analysis-prep'...........Done
Cell 20/24: 'time-by-test'.................Done
Cell 21/24: 'recommendations'..............Done
Cell 22/24: 'key-statistics'...............Done
Cell 23/24: 'raw-data'.....................Done
Cell 24/24: 'system-info'..................Done
pandoc
to: html
output-file: io500_analysis.html
standalone: true
embed-resources: true
section-divs: true
html-math-method: mathjax
wrap: none
default-image-extension: png
toc: true
toc-depth: 3
variables: {}
metadata
document-css: false
link-citations: true
date-format: long
lang: en
title: IO500 Benchmark Analysis
subtitle: HPC Cluster Storage Performance Comparison
author: Ed Davis
date: today
theme: cosmo
Output created: io500_analysis.html
pixi run render 40.00s user 3.96s system 31% cpu 2:20.92 totalWe have helper scripts to manage the data.
reprocess_all_results.shio500_wrapper.sh is improved.result_summary.txt in the entire results directory and re-runs the parsing step to regenerate all results.json files.We implement this separation in our analysis scripts. For example, to create a dataset containing only networked filesystems, we filter by storage_type:
# Select all storage types that are not 'local'
# This includes 'nfs', 'powerscale', 'isilon', etc.
networked = results[results['storage_type'] != 'local'].copy()
# Ensure we have the necessary data for plotting
networked = networked.dropna(
subset=['score_bandwidth', 'score_iops', 'network_speed']
)This allows us to perform specific analyses, such as correlating network speed with I/O bandwidth, which would be meaningless for local drives.
.qmd) as the foundation for the report. This allows us to combine code, text, and visualizations in one place.pandas for data wrangling and plotly for interactive visualizations.The report is structured to answer critical performance questions by breaking down the data:
The final output is a self-contained, interactive HTML file that allows for deep exploration of the results.
This report moves beyond static images and provides a dynamic way to understand our storage infrastructure. It allows us to answer specific questions like:
stat)?Live Report
The live, interactive HTML report generated from the Quarto document is available for review. Here you can hover over data points, filter results, and explore the detailed tables.
Sometimes jobs fail:
➜ cat job.aspen12_nfs7_io500/job.aspen12_nfs7_io500.e1857286
##hpcman.jobs={'runid':'1857286','runname':'job.aspen12_nfs7_io500','host':'aspen12','wd':'/nfs4/core/home/davised/projects/hpc-disk-bench','taskid':''}
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
Proc: [[13476,1],0]
Errorcode: -1
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node aspen12 calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------
Command exited with non-zero status 255
Memory (kb): 131292
# SWAP (freq): 0
# Waits (freq): 9914621
CPU (percent): 19%
Time (seconds): 2769.47
Time (hh:mm:ss.ms): 46:09.47
System CPU Time (seconds): 508.76
User CPU Time (seconds): 22.38NUM_PROCS if you aren’t using Slurmnfs4 vs nfs6 are both zfs), provide a name after test directory
RPO HPC Unification - October 2025