vif-plug-representor: fix support for plugging in PF ports#14
vif-plug-representor: fix support for plugging in PF ports#14iulhaq wants to merge 1 commit intoovn-org:mainfrom
Conversation
Allowing for plugging in PF ports also means user needs to set some more parameters from the Southbound. Previously only lookup against VF ports was done.
dshcherb
left a comment
There was a problem hiding this comment.
Thanks for the PR! I added a suggestion that I think would require less configuration and fit better with the isolation model in the original design.
| return false; | ||
| } | ||
|
|
||
| const char *opt_bus_name = smap_get(&ctx_in->lport_options, |
There was a problem hiding this comment.
My suggestion would be to reuse the PF MAC for this: when a vf-num is not specified, we know that this is a request to plug a PF.
The issue with using the PCI info for plugging is that the main (hypervisor) host and the DPU host see two different PCIe topologies and PCI addresses don't match. Whereas the burned-in MAC can be seen from both the hypervisor host and the DPU host via devlink.
Requiring DPU-side information to be passed in vif-plug:representor fields makes an implicit requirement that the requester has privileged access to look that information up. However, the original design decision when writing this was to infer the DPU-side information based on the host-side information (e.g. both hosts can see the PF MAC, board serial, VF numbers). This allows you to have privilege separation between:
- the hypervisor service that picks a free PF/VF and is the source for the information to do the lookup of a representor
- the CMS that sets values for
vif-plug:representor:<field> - the DPU-side service (ovn-controller).
The hypervisor-facing MAC is possible to get via devlink from the DPU:
# Run from the DPU host to get a MAC of a PF via its representor: hw_addr is the MAC address of a PF facing the main host.
ubuntu@bf3:~$ devlink -j port show pci/0000:03:00.0/65536 | jq
{
"port": {
"pci/0000:03:00.0/65536": {
"type": "eth",
"netdev": "pf0hpf",
"flavour": "pcipf",
"controller": 1,
"pfnum": 0,
"external": true,
"splittable": false,
"function": {
"hw_addr": "a0:88:c2:42:42:42"
}
}
}
}
# Run from the main (hypervisor) host to get a PF MAC which matches the above hw_addr
ubuntu@hypervisor:~$ ip -j link show enp1s0np0 | jq .[].address
"a0:88:c2:42:42:42"
# If the main host-side MAC for a netdev has been changed programmatically you can still retrieve it via the permaddr (IFLA_PERM_ADDRESS). But this only matters if you collect this info at the main host.
# https://github.com/iproute2/iproute2/commit/2b8e699
In summary: I would add and initialize a separate index to the port table to do a lookup of a PF based on a PF MAC address passed in and in the flavor == DEVLINK_PORT_FLAVOUR_PCI_PF branch that you added use a function to look up a PF by a PF MAC from vif-plug:representor:pf-mac.
ovn-vif/lib/vif-plug-providers/representor/vif-plug-representor.c
Lines 65 to 97 in 731e099
There was a problem hiding this comment.
Thanks for taking the time to review @dshcherb much appreciated!
@iulhaq I tend to agree with the point @dshcherb puts forward about the two sides of the PCIe complex being enumerated independently. I have seen with my own eyes systems that expose a behavior where the PCI address for the device do not match.
You addition of a pf-num option is useful though as we do actually currently have a bug for systems that expose two PFs to the host and where the host put them into a bond. The reason we did not catch that is that we used the vendor recommended approach which was to bond at the DPU side and expose only one PF to the host (Example from NVIDIA BlueField documentation).
What would you think of plugging the PF representor on the basis of lack of vf-num option provided, avoiding the addition of the PCI and device name information?
| return false; | ||
| } | ||
|
|
||
| const char *opt_bus_name = smap_get(&ctx_in->lport_options, |
There was a problem hiding this comment.
Thanks for taking the time to review @dshcherb much appreciated!
@iulhaq I tend to agree with the point @dshcherb puts forward about the two sides of the PCIe complex being enumerated independently. I have seen with my own eyes systems that expose a behavior where the PCI address for the device do not match.
You addition of a pf-num option is useful though as we do actually currently have a bug for systems that expose two PFs to the host and where the host put them into a bond. The reason we did not catch that is that we used the vendor recommended approach which was to bond at the DPU side and expose only one PF to the host (Example from NVIDIA BlueField documentation).
What would you think of plugging the PF representor on the basis of lack of vf-num option provided, avoiding the addition of the PCI and device name information?
Allowing for plugging in PF ports also means user needs to set some more parameters from the Southbound.
Previously only lookup against VF ports was done.