Skip to content

Delete TURN allocation on socket close#101

Merged
sgfn merged 13 commits into
masterfrom
fix/437-alloc-mismatch
May 15, 2026
Merged

Delete TURN allocation on socket close#101
sgfn merged 13 commits into
masterfrom
fix/437-alloc-mismatch

Conversation

@sgfn
Copy link
Copy Markdown
Member

@sgfn sgfn commented May 5, 2026

Use ExTURN.Client.close/1 to send Refresh(lifetime=0) and delete the present allocation.

Resolves #100

ref: elixir-webrtc/ex_turn#10

joaothallis and others added 9 commits March 19, 2026 00:20
Before closing a socket, walk every relay candidate and gathering
transaction bound to it, invoke ExTURN.Client.close/1, and ship the
returned Refresh(lifetime=0) datagram on the original socket. Without
this, a TURN server (notably Cloudflare Realtime TURN) keeps the
5-tuple allocated until TTL expires; a future Allocate from the same
source port is then rejected with 437 Allocation Mismatch (RFC 5766
§6.2, RFC 8656 §6.2), gathering completes with no typ relay candidate,
and ICE fails.

Make the teardown run on abrupt parent death too. PeerConnection in
ex_webrtc 0.16 does not trap exits; if its DTLSTransport child crashes
(e.g. unifex_parse_arg when DTLS never negotiated), the linked cascade
kills ICE before PeerConnection can call ice_transport.close. Trap
exits in ICEAgent's init and propagate non-:normal EXITs as {:stop,
reason, state} so terminate/2 always runs the close path. :normal
EXITs (from short-lived children like gatherer worker processes) stay
noreply.

Transport.Mock in test support keeps closed sockets in the ETS table
with state: :closed so tests can assert what the agent sent on the
close path; setup_socket / open_ephemeral transparently reuse the slot
on re-open.

Depends on the matching ExTURN.Client.close/1 addition; pinned to that
commit via git dep until an ex_turn release ships.

Verified end-to-end against Cloudflare Realtime TURN via
ex_turn_cloudflare_repro: 20/20 iterations emit typ relay with zero
437s on narrow port-range cycling (was 0/20 without the fix, 437 on
iteration 1).
The handle_info({:EXIT, _, reason}, state) clause was uncovered: gen_server
intercepts EXITs from the parent and runs terminate/2 directly, so the
existing parent-death test never reached the clause. Add a test that links
a non-parent process to the agent and lets it exit abnormally, which is
the only path that actually drives the {:stop, reason, state} return.

Drop the case fallback in terminate/2; init/1 always returns a state map
with :ice_agent, so the _ -> :ok branch was unreachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ExTURN.Client.close/1 emits no logs and the transport's send/4 returns
{:error, _} silently, so a failed Refresh leaves no breadcrumb — exactly
the failure mode that triggers 437 Allocation Mismatch on the next port
reuse. Surface the error at warning level instead of swallowing it.

Add a Transport.Mock.fail_send/2 hook so the test can drive a real
allocation to :allocated, force the close-path send to return :enotconn,
and assert on the captured log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sgfn sgfn requested a review from Karolk99 May 5, 2026 15:49
@sgfn
Copy link
Copy Markdown
Member Author

sgfn commented May 5, 2026

@joaothallis

Rationale behind my changes to your branch:

  1. Don't trap exits: We don't expect ExWebRTC to crash, preparing for it seems a bit overkill. A fix for the crash on ExWebRTC.DTLSTransport.close/1 is coming
  2. Don't call ExTURN.Client.close/1 on clients in gathering_transactions: these clients are not :allocated yet, they "move" to local_cands on :allocation_created. The additional cleanup step would only deal with the corner case where we received the :allocation_created message, but have yet to process it during teardown -- not worth covering IMHO
  3. Logger.warning -> debug: There's virtually nothing the user can do when that final send fails, and we'd rather keep the verbosity level low

Comments are welcome:)

Comment on lines +2489 to +2496
defp release_turn_allocation(ice_agent, socket, client) do
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
case ice_agent.transport_module.send(socket, turn_addr, data) do
:ok -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
end
end
end
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NITPICK
It looks nice (if ExTURN.close/1 returns {:ok, state} it automatically returns), but on the other hand it would be better to always return the same value.

Suggested change
defp release_turn_allocation(ice_agent, socket, client) do
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
case ice_agent.transport_module.send(socket, turn_addr, data) do
:ok -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
end
end
end
defp release_turn_allocation(ice_agent, socket, client) do
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
:ok <- ice_agent.transport_module.send(socket, turn_addr, data) do
:ok
else
{:ok, _state} -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
end
end
end

Comment thread lib/ex_ice/priv/ice_agent.ex Outdated
with {:send, turn_addr, data, _client} <- ExTURN.Client.close(client) do
case ice_agent.transport_module.send(socket, turn_addr, data) do
:ok -> :ok
{:error, reason} -> Logger.debug("Couldn't send deallocate request, reason: #{reason}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that can cause problems for the user, maybe info would be better.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything the user can do in this case, as this most likely suggests a network issue. I'd leave it at debug

Copy link
Copy Markdown
Contributor

@Karolk99 Karolk99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look more closely at whether we need to close TURN allocations in cases when handle_terminate won't be called (when PeerConnection is closed with a different reason than :normal)

@sgfn
Copy link
Copy Markdown
Member Author

sgfn commented May 15, 2026

We should look more closely at whether we need to close TURN allocations in cases when handle_terminate won't be called (when PeerConnection is closed with a different reason than :normal)

Reverted back to trapping exits, now allocations are released during shutdowns and parent crashes, too

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 85.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.08%. Comparing base (f428324) to head (0491b26).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
lib/ex_ice/ice_agent.ex 75.00% 1 Missing ⚠️
lib/ex_ice/priv/ice_agent.ex 83.33% 1 Missing ⚠️
test/support/transport/mock.ex 90.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #101      +/-   ##
==========================================
+ Coverage   86.86%   87.08%   +0.22%     
==========================================
  Files          27       27              
  Lines        2078     2091      +13     
==========================================
+ Hits         1805     1821      +16     
+ Misses        273      270       -3     
Files with missing lines Coverage Δ
lib/ex_ice/ice_agent.ex 69.51% <75.00%> (+0.28%) ⬆️
lib/ex_ice/priv/ice_agent.ex 88.81% <83.33%> (+0.46%) ⬆️
test/support/transport/mock.ex 82.85% <90.00%> (+2.21%) ⬆️

... and 1 file with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f428324...0491b26. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sgfn sgfn merged commit 3c3c947 into master May 15, 2026
4 checks passed
@sgfn sgfn deleted the fix/437-alloc-mismatch branch May 15, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ICEAgent.close/1 does not release TURN allocations: Cloudflare TURN returns 437 on socket reuse

3 participants