Skip to content

[Review]: Proposal of parameters for dynamic and detailed cooling optimization #15

@YaSuenag

Description

@YaSuenag

Comment Submission

While the WDPC specification (data bus / protobuf–centric) is very well structured for data integration across workload ⇄ power ⇄ cooling, there are several areas where the “spatial, implementation, and operational” perspectives required for optimal cooling are not explicitly defined as schemas or requirements.
In particular, if key aspects such as rack location and equipment mounting positions are missing, the accuracy of operational optimization will degrade.
Below is a comprehensive, prioritized list of critical viewpoints that should be explicitly covered (ordered by importance).

1) Physical layout / installation & implementation information

(“Where it is placed, how it is installed, and how air flows”)

  • Rack 3D coordinates and orientation:
    • Planar position (x, y), hot/cold aisle identification, front-facing orientation, ceiling height, and obstacles such as beams or cable trays.
      • Directly affects the accuracy of cold-air reachability estimation and bypass/recirculation modeling.
  • In-rack mounting position (U position) and airflow direction:
    • Per-device front intake / rear exhaust, blanking panel coverage ratio, relocation history.
      • Essential for predicting vertical thermal stratification and hotspot formation within racks.
  • Underfloor / above-floor pathway conditions:
    • Perforated tile open-area ratio, underfloor static pressure, presence of cable cutout sealing (grommets), bypass ratio.
  • Containment configuration:
    • Presence of cold- or hot-aisle containment, door and ceiling panel closure status, opening ratio, leakage rate.
  • Liquid cooling elements (where applicable):
    • CDU installation locations, manifold piping topology, quick-connect locations, valve settings, pressure loss, and leak detection zones.

2) Sensing granularity

(“What is measured, and at what resolution”)

  • Multi-point rack inlet/outlet sensing:
    • Front and rear temperatures (3–6 points: top/middle/bottom), ΔT (inlet–outlet), ΔP (front–rear differential pressure), rack CFM (estimated or measured).
  • Underfloor and zone pressure sensing:
    • Underfloor static pressure, zone differential pressure maps, CFM per perforated tile.
  • CRAC / CRAH / Fan wall metrics:
    • Supply air temperature, airflow rate (VFD speed), coil ΔT, variable damper position.
  • Server-internal metrics:
    • CPU/GPU junction temperature, DIMM/NIC temperature, fan tachometer values, power consumption, and throttle events (via Redfish/API).
  • Liquid cooling metrics:
    • Loop flow rate, supply/return temperatures, inlet pressure, conductivity and pH, corrosion/particle indicators, leak detection.
  • Power quality:
    • THD at branch circuits / rack PDUs, power factor, voltage fluctuation (to understand contributions to heat generation and equipment behavior).

3) Actuation

(“What can be controlled”)

  • Cooling side:
    • CRAC/CRAH airflow, supply temperature, coil water flow, VFDs, dampers, smart tiles (variable open area).
  • Liquid cooling side:
    • CDU ΔT targets, pump speed, valve control, leak fail-safe behavior.
  • IT side:
    • Workload placement and migration (K8s / Slurm), power capping / DVFS, temporary job throttling.

Minimal schema extensions proposed for WDPC

While respecting the existing Facility / Rack / Node / Component hierarchy, the following represent the minimum required additions for spatial, implementation, and actuation aspects.

// Physical layout (rack)
message RackPhysicalLayout {
 string rack_id = 1;                  // Reference existing ID
 string room_id = 2;
 double pos_x_m = 3;                  // Planar coordinates
 double pos_y_m = 4;
 string aisle_id = 5;                 // Aisle identifier
 enum AisleType { COLD = 0; HOT = 1; NONE = 2; }
 AisleType aisle_type = 6;
 enum Containment { NONE = 0; COLD_ENCLOSED = 1; HOT_ENCLOSED = 2; }
 Containment containment = 7;
 double orientation_deg = 8;          // Front-facing orientation
 double perforated_tile_open_area_pct = 9;
 double underfloor_static_pressure_pa = 10;
}
 
// In-rack mounting and airflow
message MountingPosition {
 string device_id = 1;
 string rack_id = 2;
 int32 u_start = 3;                   // Bottom U
 int32 u_height = 4;                  // Occupied U
 bool front_intake = 5;               // Front intake
 double blanking_panel_coverage_pct = 6;
}
 
message RackAirflowMetrics {
 string rack_id = 1;
 google.protobuf.Timestamp ts = 2;
 double inlet_top_c = 3;
 double inlet_mid_c = 4;
 double inlet_bot_c = 5;
 double outlet_mid_c = 6;
 double delta_p_pa = 7;               // Front–rear differential pressure
 double airflow_cfm = 8;              // Estimated allowed
 double recirculation_index = 9;      // Estimated metric
}
 
// Liquid cooling loop
message LiquidCoolingLoop {
 string loop_id = 1;
 string cdu_id = 2;
 repeated string rack_ids = 3;
 double supply_temp_c = 4;
 double return_temp_c = 5;
 double flow_rate_lpm = 6;
 double inlet_pressure_kpa = 7;
 double outlet_pressure_kpa = 8;
 bool leak_detected = 9;
}
 
// Declaration of actuation capabilities
message CoolingActuationCapability {
 string actuator_id = 1;              // CRAC / CRAH / Tile / CDU / Valve / FanWall
 enum ActuatorType { CRAC = 0; CRAH = 1; TILE = 2; CDU = 3; DAMPER = 4; FAN = 5; }
 ActuatorType type = 2;
 repeated string control_modes = 3;   // ["supply_temp","airflow","valve_pos","pump_rpm",...]
 map<string, string> setpoint_limits = 4; // {"supply_temp_min":"18","supply_temp_max":"27",...}
}
 
// IT-side thermal control interface (declaration)
message ITThermalControl {
 string node_id = 1;
 bool supports_power_cap = 2;
 bool supports_job_migration = 3;
 bool supports_dvfs = 4;
 map<string, string> api_endpoints = 5; // e.g., Redfish / K8s / Slurm plugin
}

Company Submission

NTT DATA

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions