Skip to content

Commit d623959

Browse files
[integ-tests-3.14.0 branch only] Create a script to enable DCV GL offline installation
## Problem ParallelCluster clusters should be able to be created in a network without Internet access. However, when the following items are all true, cluster creation fails: 1. RHEL/Rocky 2. x86 GPU instances for head node and/or login nodes 3. DCV enabled The failure can be seen in chef-client log: ``` ================================================================================ Error executing action `install` on resource 'dnf_package[/opt/parallelcluster/sources/nice-dcv-2024.0-19030-el9-x86_64/nice-dcv-gl-2024.0.1096-1.el9.x86_64.rpm]' ================================================================================ RuntimeError ------------ dnf-helper.py had stderr/stdout output: Errors during downloading metadata for repository 'epel': - Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/mirrorlist?repo=epel-9&arch=x86_64 [Failed to connect to mirrors.fedoraproject.org port 443: Connection timed out] Error: Failed to download metadata for repo 'epel': Cannot prepare internal mirrorlist: Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/mirrorlist?repo=epel-9&arch=x86_64 [Failed to connect to mirrors.fedoraproject.org port 443: Connection timed out] Errors during downloading metadata for repository 'rhel-9-appstream-rhui-rpms': - Curl error (28): Timeout was reached for https://rhui.us-east-1.aws.ce.redhat.com/pulp/mirror/content/dist/rhel9/rhui/9/x86_64/appstream/os [Failed to connect to rhui.us-east-1.aws.ce.redhat.com port 443: Connection timed out] Error: Failed to download metadata for repo 'rhel-9-appstream-rhui-rpms': Cannot prepare internal mirrorlist: Curl error (28): Timeout was reached for https://rhui.us-east-1.aws.ce.redhat.com/pulp/mirror/content/dist/rhel9/rhui/9/x86_64/appstream/os [Failed to connect to rhui.us-east-1.aws.ce.redhat.com port 443: Connection timed out] ``` ## Workaround This commit creates a script to download any missing transitive dependencies of DCV GL. This commit modifies the cookbook to install the transitive dependencies, and use `--disablerepo=*` to avoid yum/dnf contacting Internet for repo Metadata ### How to use the script: 1. Launch an instance with official ParallelCluster RHEL/Rocky AMI 2. On the instance, run the script as root (e.g. `./fix_dcv_gl_offline_installation.gl`) 3. Create an image from the instance 4. Use the created image as the [CustomAmi](https://docs.aws.amazon.com/parallelcluster/latest/ug/Image-v3.html#yaml-Image-CustomAmi) when creating clusters ## Testing The following test is successful, using the outcome AMI as CustomAmi from step 1-3: ``` test-suites: networking: test_cluster_networking.py::test_cluster_in_no_internet_subnet: dimensions: - regions: ["us-east-1"] instances: ["g5.xlarge"] oss: ["rhel9"] schedulers: ["slurm"] ``` ## Note This commit should only be merged in integ-tests-3.14.0. Long term fix will be done in the future for other branches
1 parent 314a193 commit d623959

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#!/bin/bash
2+
# Script to patch ParallelCluster AMI for DCV GL offline installation
3+
4+
set -ex
5+
6+
SOURCES_DIR="/opt/parallelcluster/sources"
7+
DCV_GL_DEPS_DIR="${SOURCES_DIR}/dcv-gl-deps"
8+
COOKBOOK_DIR="/etc/chef/cookbooks"
9+
RHEL_COMMON="${COOKBOOK_DIR}/aws-parallelcluster-platform/resources/dcv/partial/_rhel_common.rb"
10+
11+
# Find the nice-dcv-gl RPM
12+
DCV_GL_RPM=$(find "${SOURCES_DIR}" -name "nice-dcv-gl-*.rpm" 2>/dev/null | head -1)
13+
if [[ -z "${DCV_GL_RPM}" ]]; then
14+
echo "ERROR: nice-dcv-gl RPM not found in ${SOURCES_DIR}"
15+
exit 1
16+
fi
17+
echo "Found DCV GL package: ${DCV_GL_RPM}"
18+
19+
echo "=== Step 1: Download missing dependencies for nice-dcv-gl ==="
20+
mkdir -p "${DCV_GL_DEPS_DIR}"
21+
22+
# Use dnf to download only packages that would be installed (excludes already installed)
23+
dnf download --destdir="${DCV_GL_DEPS_DIR}" --resolve "${DCV_GL_RPM}" 2>/dev/null || true
24+
25+
# Remove the dcv-gl package itself if downloaded (we only want dependencies)
26+
rm -f "${DCV_GL_DEPS_DIR}"/nice-dcv-gl-*.rpm 2>/dev/null || true
27+
28+
if [[ -n "$(ls -A ${DCV_GL_DEPS_DIR} 2>/dev/null)" ]]; then
29+
echo "Downloaded dependencies:"
30+
ls -la "${DCV_GL_DEPS_DIR}"
31+
else
32+
echo "All dependencies already installed"
33+
fi
34+
35+
echo "=== Step 2: Patch cookbook ==="
36+
cp "${RHEL_COMMON}" "${RHEL_COMMON}.bak"
37+
38+
/opt/cinc/embedded/bin/ruby << 'RUBY_SCRIPT'
39+
file_path = '/etc/chef/cookbooks/aws-parallelcluster-platform/resources/dcv/partial/_rhel_common.rb'
40+
patch_content = <<~'PATCH'
41+
def install_dcv_gl
42+
dcv_gl_deps_dir = "#{node['cluster']['sources_dir']}/dcv-gl-deps"
43+
execute 'install dcv-gl dependencies offline' do
44+
command "rpm -ivh #{dcv_gl_deps_dir}/*.rpm"
45+
only_if { ::Dir.exist?(dcv_gl_deps_dir) && !::Dir.empty?(dcv_gl_deps_dir) }
46+
end
47+
48+
package = "#{node['cluster']['sources_dir']}/#{dcv_package}/#{dcv_gl}"
49+
package package do
50+
action :install
51+
source package
52+
options '--disablerepo=*'
53+
end
54+
end
55+
PATCH
56+
content = File.read(file_path)
57+
pattern = /^ def install_dcv_gl\n.*?^ end\n/m
58+
if content.match?(pattern)
59+
File.write(file_path, content.gsub(pattern, patch_content))
60+
puts "Patched successfully"
61+
else
62+
abort "ERROR: Could not find install_dcv_gl function"
63+
end
64+
RUBY_SCRIPT
65+
66+
echo "=== Done! Create AMI from this instance ==="

0 commit comments

Comments
 (0)