Skip to content

Commit e0ee785

Browse files
committed
tests: system-traffic: Fix flaky floating IP test.
The last ACK from the 3-WAY handshake in this test goes to userspace (miss upcall) after going through conntrack because +est traffic is handled differently from +new in this set of OpenFlow rules. The sender though proceeds with sending the data that may end up re-ordered with the aforementioned ACK. Since connection is very short, it is possible that this one ACK will be delivered even after the connection is already closed, i.e., after all the data is sent and the connection termination sequence (FIN-ACK-FIN-ACK) is done. Delivery in this case triggers an RST reply. And RST transitions TIME_WAIT into CLOSE (CLOSING in OVS terms), causing the test failure. It's not really possible to fully avoid the packet re-ordering as it is part of the upcall mechanism. But there is a couple ways to avoid it in this particular case, e.g., if we predict how the +est packet will look like and install the datapath flow for +est while processing the original +new, or by storing the upcall PID in the kernel for the whole time of action execution and not only for one packet we're executing actions for (TCP handshake replies are happening in the context of the initiator, which is OVS handler thread in our case). But these are large changes that will not help with test failures on older branches / older kernels. For now, fixing the test instead. The intention in the test was to check that the state is terminal at the end of the connection, i.e., that our actions do not leave established or incomplete conntrack entries. So it should be fine to check for both TIME_WAIT and CLOSING, as both of these are terminal states. Fixes: 8d48d5f ("system-traffic: Add conntrack floating IP test") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
1 parent 6e83daf commit e0ee785

File tree

1 file changed

+31
-12
lines changed

1 file changed

+31
-12
lines changed

tests/system-traffic.at

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9112,20 +9112,39 @@ table=21 ip,nw_dst=10.1.1.2 action=set_field:f0:00:00:01:01:02->eth_
91129112

91139113
AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
91149114

9115-
dnl non-FIP case
9115+
dnl non-FIP case.
9116+
dnl
9117+
dnl The last ACK from the 3-WAY handshake will go via upcall after going
9118+
dnl through conntrack because +est traffic is handled differently from +new.
9119+
dnl The sender though will proceed sending data. Since connection is very
9120+
dnl short, it is possible that this one ACK will be delivered after the
9121+
dnl connection is already closed, i.e. after all the data is sent and the
9122+
dnl connection termination sequence (FIN-ACK-FIN-ACK) is done. Delivery
9123+
dnl in this case will trigger RST reply. And RST will transition TIME_WAIT
9124+
dnl into CLOSE, hence the need to look for both states below.
91169125
NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.1.1.1 1234])
9117-
OVS_WAIT_UNTIL([[ovs-appctl dpctl/dump-conntrack | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' |
9118-
grep "tcp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=TIME_WAIT)"
9119-
]])
9120-
9121-
dnl Check that the full session ends as expected (i.e. TIME_WAIT). Otherwise it
9122-
dnl means the datapath didn't process the ct_clear action. Ending in SYN_RECV
9123-
dnl (OVS maps to ESTABLISHED) means the initial frame was committed, but not a
9124-
dnl second time after the FIP translation (because ct_clear didn't occur).
9126+
OVS_WAIT_UNTIL_EQUAL([ovs-appctl dpctl/dump-conntrack \
9127+
| grep -E '(TIME_WAIT|CLOSING)' | FORMAT_CT(10.1.1.2)],
9128+
[tcp,dnl
9129+
orig=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),dnl
9130+
reply=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),dnl
9131+
protoinfo=(state=<cleared>)])
9132+
9133+
dnl Check that the full session ends as expected (i.e. TIME_WAIT/CLOSING).
9134+
dnl Otherwise it means the datapath didn't process the ct_clear action.
9135+
dnl Ending in SYN_RECV (OVS maps to ESTABLISHED) means the initial frame
9136+
dnl was committed, but not a second time after the FIP translation (because
9137+
dnl ct_clear didn't occur).
9138+
dnl
9139+
dnl Same considerations about packet reordering apply, hence looking for
9140+
dnl both states.
91259141
NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.254.254.1 1234])
9126-
OVS_WAIT_UNTIL([[ovs-appctl dpctl/dump-conntrack | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' |
9127-
grep "tcp,orig=(src=10.254.254.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.1,dst=10.254.254.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=TIME_WAIT)"
9128-
]])
9142+
OVS_WAIT_UNTIL_EQUAL([ovs-appctl dpctl/dump-conntrack \
9143+
| grep -E '(TIME_WAIT|CLOSING)' | FORMAT_CT(10.254.254.2)],
9144+
[tcp,dnl
9145+
orig=(src=10.254.254.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),dnl
9146+
reply=(src=10.1.1.1,dst=10.254.254.2,sport=<cleared>,dport=<cleared>),dnl
9147+
protoinfo=(state=<cleared>)])
91299148

91309149
OVS_TRAFFIC_VSWITCHD_STOP
91319150
AT_CLEANUP

0 commit comments

Comments
 (0)