Skip to content

Commit 2f346ab

Browse files
committed
added nvlink only backup restore
1 parent 553df2e commit 2f346ab

File tree

4 files changed

+50
-52
lines changed

4 files changed

+50
-52
lines changed

content/cumulus-netq-51/Installation-Management/Backup-and-Restore-NetQ.md

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,16 @@ The following sections describe how to back up and restore your NetQ data and VM
99

1010
{{%notice note%}}
1111
- You must run backup and restore scripts with sudo privileges.
12-
- NetQ does not retain custom-signed certificates during the backup and restore process. If your deployment uses a custom-signed certificate, you must {{<link title="Install a Custom Signed Certificate" text="reconfigure the certificate">}} after you restore it on a new NetQ VM.
13-
- The backup and restore process does not retain several configurations necessary for the Grafana integration, including switch TLS certificates, authentication tokens (vm-tokens), OpenTelemetry configurations, and external time-series database configurations. After reinstalling NetQ, you must {{<link title="Integrate NetQ with Grafana" text="reconfigure these components">}}. Grafana will not display data from previous NetQ versions.
12+
- NetQ does not retain custom-signed certificates during the backup and restore process. If your deployment uses a custom-signed certificate, you must {{<link title="Install a Custom Signed Certificate" text="reconfigure the certificate">}} after you restore it on a new NetQ VM. *This caveat does not apply to NVLink clusters.*
13+
- The backup and restore process does not retain several configurations necessary for the Grafana integration, including switch TLS certificates, authentication tokens (vm-tokens), OpenTelemetry configurations, and external time-series database configurations. After reinstalling NetQ, you must {{<link title="Integrate NetQ with Grafana" text="reconfigure these components">}}. Grafana will not display data from previous NetQ versions. *This caveat does not apply to NVLink clusters.*
1414
{{%/notice%}}
1515

1616
## Back Up Your NetQ Data
1717

18-
Follow the process below for your deployment type to back up your NetQ data:
18+
Follow the process below for your deployment type to back up your NetQ data.
1919

2020
{{<tabs "TabID19" >}}
21-
22-
{{<tab "On-premises Deployments" >}}
21+
{{<tab "Other Deployments" >}}
2322

2423
1. Retrieve the `vm-backuprestore.sh` script:
2524

@@ -103,9 +102,46 @@ nvidia@netq-server:~$ sudo scp /opt/backuprestore/combined_backup_20250117054718
103102
```
104103

105104
{{</tab>}}
105+
{{<tab "NVLink Clusters" >}}
106+
107+
These steps apply exclusively to {{<link title="Install NetQ for NVLink" text="NetQ NVLink">}} three-node cluster deployments.
108+
109+
1. Run the {{<link title="nvl/#netq-nvl-cluster-backup" text="netq nvl cluster backup">}} command on each node in your cluster:
110+
111+
```
112+
nvidia@<hostname>: netq nvl cluster backup
113+
2025-06-17 06:30:52,717 - INFO - Parsed arguments: Namespace(action='backup', backup_path='nvlink_cluster_backup', drop_mongo_collections=False, cm_op_ns='infra', cm_target_ns=['infra', 'kafka', 'nmx'], mongo_db_name=None, mongo_collections=None, mongo_k8s_ns='infra', mongo_statefulset='mongodb', mongo_container='mongodb', mongo_replicaset='rs0')
114+
2025-06-17 06:30:52,717 - INFO - Action: Full Backup selected.
115+
2025-06-17 06:30:52,717 - INFO - --- Starting NVLINK Cluster Full Backup to: nvlink_cluster_backup_20250617063052 ---
116+
...
117+
2025-06-17 06:30:55,159 - INFO - Full backup completed to: nvlink_cluster_backup_20250617063052
118+
```
119+
120+
2. Copy the newly-created file to the `/tmp/data-infra/` directory:
121+
122+
```
123+
cp -r /home/nvidia/nvlink_cluster_backup_20250617063052 /tmp/data-infra
124+
```
106125

126+
{{</tab >}}
107127
{{</tabs>}}
108128

109129
## Restore Your NetQ Data
110130

111-
To restore your NetQ data, perform a {{<link title="Install the NetQ System" text="new NetQ VM installation">}} and follow the steps to restore your NetQ data when you run the `netq install` command. You will use the `restore` option, referencing the path where the backup file resides.
131+
{{<tabs "TabID129" >}}
132+
{{<tab "Other Deployments" >}}
133+
134+
To restore your NetQ data, perform a {{<link title="Install the NetQ System" text="new NetQ VM installation">}} and follow the steps to restore your NetQ data when you run the `netq install` command. You will use the `restore` option, referencing the path where the backup file resides.
135+
136+
{{</tab>}}
137+
{{<tab "NVLink Clusters" >}}
138+
139+
1. Restore your data by running the {{<link title="nvl/#netq-nvl-cluster-restore" text="netq nvl cluster restore">}} command with the `drop-mongo-collections` option. This option prevents NetQ from re-installing duplicate data.
140+
141+
```
142+
nvidia@<hostname>: netq nvl cluster restore /tmp/data-infra/nvlink_cluster_backup_20250617063052/ drop-mongo-collections
143+
```
144+
If this step fails, run `netq nvl bootstrap rest` and then try again.
145+
146+
{{</tab >}}
147+
{{</tabs>}}

content/cumulus-netq-51/Installation-Management/Install-NetQ/Install-NetQ-System.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ toc: 3
77

88
## NetQ for Ethernet
99

10+
The following deployment models use NetQ to monitor Ethernet-only networks.
11+
1012
{{<tabs "TabID11" >}}
1113

1214
{{<tab "On-premises" >}}
@@ -23,6 +25,8 @@ toc: 3
2325

2426
## NetQ for NVLink
2527

28+
The following deployment model uses NetQ to monitor NVLink-only networks.
29+
2630
{{<tabs "TabID35" >}}
2731

2832
{{<tab "On-premises" >}}
@@ -38,14 +42,16 @@ toc: 3
3842

3943
## NetQ for Ethernet and NVLink
4044

45+
The following deployment models use NetQ to monitor networks that use both Ethernet and NVLink. NetQ 5.1 introduces a new option that lets you deploy NetQ on as many nodes as you wish to include in your server cluster. This deployment option is in beta and will require a fresh installation upon subsequent NetQ releases.
46+
4147
{{<tabs "TabID56" >}}
4248

4349
{{<tab "On-premises" >}}
4450

4551
| Server Arrangement | Hypervisor | Requirements & Installation |
4652
| :--- | --- | :---: |
4753
| High-availability scale cluster: three-nodes | KVM or VMware | {{<link title="Install NetQ for Ethernet and NVLink" text="Start install">}} |
48-
| High-availability scale cluster: unlimited nodes (beta) | KVM or VMware | {{<link title="Install NetQ for Ethernet and NVLink (Beta)" text="Start install">}} |
54+
| High-availability scale cluster: user-defined nodes | KVM or VMware | {{<link title="Install NetQ for Ethernet and NVLink (Beta)" text="Start install">}} |
4955

5056
{{</tab>}}
5157

content/cumulus-netq-51/Installation-Management/Install-NetQ/Setup-NVLink-Cluster.md

Lines changed: 0 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -242,55 +242,11 @@ nvidia@netq-server:~$ vim /tmp/nvl-cluster-config.json
242242

243243
12. Run the installation command on your master node using the JSON configuration file that you created in the previous step. Specify the passwords for the read-write user and the read-only user in the `rw-password` and `ro-password` fields, respectively. The passwords must each include a minimum of eight characters.
244244

245-
{{< tabs "TabID268">}}
246-
{{< tab "New Install">}}
247-
248245
```
249246
nvidia@<hostname>:~$ netq install nvl bundle /mnt/installables/NetQ-5.1.0.tgz kong-rw-password <rw-password> kong-ro-password <ro-password> /tmp/nvl-cluster-config.json
250247
```
251248
<div class=“notices tip”><p>If this step fails for any reason, run <code>netq bootstrap reset</code> and then try again.</p></div>
252249

253-
{{< /tab >}}
254-
{{< tab "Restore Data and New Install">}}
255-
<!--need to check this with QA-->
256-
1. Add the `config-key` parameter to the JSON template from step 11 using the key created during the {{<link title="Back Up and Restore NetQ" text="backup process">}}. Edit the file with values for each attribute.
257-
258-
```
259-
nvidia@netq-server:~$ vim /tmp/nvl-cluster-config.json
260-
{
261-
"version": "v2.0",
262-
"interface": "<INPUT>",
263-
"config-key": "<INPUT>",
264-
"cluster-vip": "<INPUT>",
265-
"servers": [
266-
{
267-
"ip": "<INPUT>"
268-
"description": "<SERVER1>"
269-
},
270-
{
271-
"ip": "<INPUT>"
272-
"description": "<SERVER2>"
273-
},
274-
{
275-
"ip": "<INPUT>"
276-
"description": "<SERVER3>"
277-
},
278-
],
279-
"storage-path": "/var/lib/longhorn",
280-
"alertmanager_webhook_url": "<INPUT>"
281-
}
282-
```
283-
284-
2. Run the following command on your master node, using the JSON configuration file from the previous step. Include the restore option referencing the path where the backup file resides:
285-
286-
```
287-
nvidia@<hostname>:~$ netq install nvl bundle /mnt/installables/NetQ-5.1.0.tgz /tmp/nvl-cluster-config.json restore /home/nvidia/combined_backup_20241211111316.tar
288-
```
289-
290-
<div class="notices tip"><p><ul><li>If this step fails for any reason, run <code>netq bootstrap reset</code> and then try again.</li><li>If you restore NetQ data to a server with an IP address that is different from the one used to back up the data, you must <a href="https://docs.nvidia.com/networking-ethernet-software/cumulus-netq/Installation-Management/Install-NetQ/Install-NetQ-Agents/#configure-netq-agents">reconfigure the agents</a> on each switch as a final step.</li></ul></p></div>
291-
{{< /tab >}}
292-
{{< /tabs >}}
293-
294250
## Verify Installation Status
295251

296252
To view the status of the installation, use the `netq show status [verbose]` command. The following example shows a successful 3-node installation:

content/cumulus-netq-51/Installation-Management/Install-NetQ/Setup-NVLink-Ethernet-Combined-Cluster-Beta.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ bookhidden: true
77
---
88
Follow these steps to set up and configure your VMs in a cluster of servers. First configure the VM on the master node, and then configure the VM on each additional node. NVIDIA recommends installing the virtual machines on different servers to increase redundancy in the event of a hardware failure.
99
{{<notice info>}}
10-
This deployment type is currently in beta, and installations with more than five nodes will not support upgrades to future NetQ versions.
10+
This deployment type is currently in beta and will require a fresh installation upon subsequent NetQ releases.
1111
{{</notice>}}
1212
## System Requirements
1313

0 commit comments

Comments
 (0)