Skip to content

Commit 8fb2da5

Browse files
committed
tests: system-traffic: Fix flaky floating IP test.
The last ACK from the 3-WAY handshake in this test goes to userspace (miss upcall) after going through conntrack because +est traffic is handled differently from +new in this set of OpenFlow rules. The sender though proceeds with sending the data that may end up re-ordered with the aforementioned ACK. Since connection is very short, it is possible that this one ACK will be delivered even after the connection is already closed, i.e., after all the data is sent and the connection termination sequence (FIN-ACK-FIN-ACK) is done. Delivery in this case triggers an RST reply. And RST transitions TIME_WAIT into CLOSE (CLOSING in OVS terms), causing the test failure. It's not really possible to fully avoid the packet re-ordering as it is part of the upcall mechanism. But there is a couple ways to avoid it in this particular case, e.g., if we predict how the +est packet will look like and install the datapath flow for +est while processing the original +new, or by storing the upcall PID in the kernel for the whole time of action execution and not only for one packet we're executing actions for (TCP handshake replies are happening in the context of the initiator, which is OVS handler thread in our case). But these are large changes that will not help with test failures on older branches / older kernels. For now, fixing the test instead. The intention in the test was to check that the state is terminal at the end of the connection, i.e., that our actions do not leave established or incomplete conntrack entries. So it should be fine to check for both TIME_WAIT and CLOSING, as both of these are terminal states. Fixes: 8d48d5f ("system-traffic: Add conntrack floating IP test") Acked-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
1 parent 3567eb4 commit 8fb2da5

File tree

1 file changed

+31
-12
lines changed

1 file changed

+31
-12
lines changed

tests/system-traffic.at

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8844,20 +8844,39 @@ table=21 ip,nw_dst=10.1.1.2 action=set_field:f0:00:00:01:01:02->eth_
88448844

88458845
AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
88468846

8847-
dnl non-FIP case
8847+
dnl non-FIP case.
8848+
dnl
8849+
dnl The last ACK from the 3-WAY handshake will go via upcall after going
8850+
dnl through conntrack because +est traffic is handled differently from +new.
8851+
dnl The sender though will proceed sending data. Since connection is very
8852+
dnl short, it is possible that this one ACK will be delivered after the
8853+
dnl connection is already closed, i.e. after all the data is sent and the
8854+
dnl connection termination sequence (FIN-ACK-FIN-ACK) is done. Delivery
8855+
dnl in this case will trigger RST reply. And RST will transition TIME_WAIT
8856+
dnl into CLOSE, hence the need to look for both states below.
88488857
NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.1.1.1 1234])
8849-
OVS_WAIT_UNTIL([[ovs-appctl dpctl/dump-conntrack | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' |
8850-
grep "tcp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=TIME_WAIT)"
8851-
]])
8852-
8853-
dnl Check that the full session ends as expected (i.e. TIME_WAIT). Otherwise it
8854-
dnl means the datapath didn't process the ct_clear action. Ending in SYN_RECV
8855-
dnl (OVS maps to ESTABLISHED) means the initial frame was committed, but not a
8856-
dnl second time after the FIP translation (because ct_clear didn't occur).
8858+
OVS_WAIT_UNTIL_EQUAL([ovs-appctl dpctl/dump-conntrack \
8859+
| grep -E '(TIME_WAIT|CLOSING)' | FORMAT_CT(10.1.1.2)],
8860+
[tcp,dnl
8861+
orig=(src=10.1.1.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),dnl
8862+
reply=(src=10.1.1.1,dst=10.1.1.2,sport=<cleared>,dport=<cleared>),dnl
8863+
protoinfo=(state=<cleared>)])
8864+
8865+
dnl Check that the full session ends as expected (i.e. TIME_WAIT/CLOSING).
8866+
dnl Otherwise it means the datapath didn't process the ct_clear action.
8867+
dnl Ending in SYN_RECV (OVS maps to ESTABLISHED) means the initial frame
8868+
dnl was committed, but not a second time after the FIP translation (because
8869+
dnl ct_clear didn't occur).
8870+
dnl
8871+
dnl Same considerations about packet reordering apply, hence looking for
8872+
dnl both states.
88578873
NS_CHECK_EXEC([at_ns1], [echo "foobar" |nc $NC_EOF_OPT 10.254.254.1 1234])
8858-
OVS_WAIT_UNTIL([[ovs-appctl dpctl/dump-conntrack | sed -e 's/port=[0-9]*/port=<cleared>/g' -e 's/id=[0-9]*/id=<cleared>/g' |
8859-
grep "tcp,orig=(src=10.254.254.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),reply=(src=10.1.1.1,dst=10.254.254.2,sport=<cleared>,dport=<cleared>),protoinfo=(state=TIME_WAIT)"
8860-
]])
8874+
OVS_WAIT_UNTIL_EQUAL([ovs-appctl dpctl/dump-conntrack \
8875+
| grep -E '(TIME_WAIT|CLOSING)' | FORMAT_CT(10.254.254.2)],
8876+
[tcp,dnl
8877+
orig=(src=10.254.254.2,dst=10.1.1.1,sport=<cleared>,dport=<cleared>),dnl
8878+
reply=(src=10.1.1.1,dst=10.254.254.2,sport=<cleared>,dport=<cleared>),dnl
8879+
protoinfo=(state=<cleared>)])
88618880

88628881
OVS_TRAFFIC_VSWITCHD_STOP
88638882
AT_CLEANUP

0 commit comments

Comments
 (0)