-
Notifications
You must be signed in to change notification settings - Fork 315
[integ-tests-3.14.0 branch only] Create a script to enable DCV GL offline installation #7161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[integ-tests-3.14.0 branch only] Create a script to enable DCV GL offline installation #7161
Conversation
What about users who do not want to use the official pcluster AMI but to use their own custom AMI? Why not following this approach? |
…line installation
## Problem
ParallelCluster clusters should be able to be created in a network without Internet access. However, when the following items are all true, cluster creation fails:
1. RHEL/Rocky
2. x86 GPU instances for head node and/or login nodes
3. DCV enabled
The failure can be seen in chef-client log:
```
================================================================================
Error executing action `install` on resource 'dnf_package[/opt/parallelcluster/sources/nice-dcv-2024.0-19030-el9-x86_64/nice-dcv-gl-2024.0.1096-1.el9.x86_64.rpm]'
================================================================================
RuntimeError
------------
dnf-helper.py had stderr/stdout output:
Errors during downloading metadata for repository 'epel':
- Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/mirrorlist?repo=epel-9&arch=x86_64 [Failed to connect to mirrors.fedoraproject.org port 443: Connection timed out]
Error: Failed to download metadata for repo 'epel': Cannot prepare internal mirrorlist: Curl error (28): Timeout was reached for https://mirrors.fedoraproject.org/mirrorlist?repo=epel-9&arch=x86_64 [Failed to connect to mirrors.fedoraproject.org port 443: Connection timed out]
Errors during downloading metadata for repository 'rhel-9-appstream-rhui-rpms':
- Curl error (28): Timeout was reached for https://rhui.us-east-1.aws.ce.redhat.com/pulp/mirror/content/dist/rhel9/rhui/9/x86_64/appstream/os [Failed to connect to rhui.us-east-1.aws.ce.redhat.com port 443: Connection timed out]
Error: Failed to download metadata for repo 'rhel-9-appstream-rhui-rpms': Cannot prepare internal mirrorlist: Curl error (28): Timeout was reached for https://rhui.us-east-1.aws.ce.redhat.com/pulp/mirror/content/dist/rhel9/rhui/9/x86_64/appstream/os [Failed to connect to rhui.us-east-1.aws.ce.redhat.com port 443: Connection timed out]
```
## Workaround
This commit creates a script to download any missing transitive dependencies of DCV GL. This commit modifies the cookbook to install the transitive dependencies, and use `--disablerepo=*` to avoid yum/dnf contacting Internet for repo Metadata
### How to use the script:
1. Launch an instance with official ParallelCluster RHEL/Rocky AMI
2. On the instance, run the script as root (e.g. `./fix_dcv_gl_offline_installation.gl`)
3. Create an image from the instance
4. Use the created image as the [CustomAmi](https://docs.aws.amazon.com/parallelcluster/latest/ug/Image-v3.html#yaml-Image-CustomAmi) when creating clusters
## Testing
The following test is successful, using the outcome AMI as CustomAmi from step 1-3:
```
test-suites:
networking:
test_cluster_networking.py::test_cluster_in_no_internet_subnet:
dimensions:
- regions: ["us-east-1"]
instances: ["g5.xlarge"]
oss: ["rhel9"]
schedulers: ["slurm"]
```
## Note
This commit should only be merged in integ-tests-3.14.0. Long term fix will be done in the future for other branches
227f70f to
d6b49a2
Compare
I agree the current approach is not comprehensive. But this is enough to unblock customers using official AMI. We will make long-term improvement in the next release. |
d623959
into
aws:integ-tests-3.14.0
Problem
ParallelCluster clusters should be able to be created in a network without Internet access. However, when the following items are all true, cluster creation fails:
The failure can be seen in chef-client log:
Workaround
This commit creates a script to download any missing transitive dependencies of DCV GL. This commit modifies the cookbook to install the transitive dependencies, and use
--disablerepo=*to avoid yum/dnf contacting Internet for repo MetadataHow to use the script:
./fix_dcv_gl_offline_installation.gl)Testing
The following test is successful, using the outcome AMI as CustomAmi from step 1-3:
Note
This commit should only be merged in integ-tests-3.14.0. Long term fix will be done in the future for other branches
Checklist
developadd the branch name as prefix in the PR title (e.g.[release-3.6]).Please review the guidelines for contributing and Pull Request Instructions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.