Skip to content

Add WASI platform#2290

Open
pguyot wants to merge 3 commits intoatomvm:release-0.7from
pguyot:w17/wasi-platform
Open

Add WASI platform#2290
pguyot wants to merge 3 commits intoatomvm:release-0.7from
pguyot:w17/wasi-platform

Conversation

@pguyot
Copy link
Copy Markdown
Collaborator

@pguyot pguyot commented May 3, 2026

Also make significant fixes to socket to accomodate for wasi-libc.

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@github-advanced-security
Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@petermm
Copy link
Copy Markdown
Contributor

petermm commented May 3, 2026

The usual https://ampcode.com/threads/T-019dec8c-2eb3-73ab-8edf-c77623152593

Pick and choose - it does seem like we should define CLOSED_FD as -1 - but your call.

PR Review: WASI Platform + Socket Hardening

Reviewing the last two commits:

  • 9e1f22fcFix and change socket for a platform such as WASI
  • c4a02e53Add WASI platform

The combined diff is large; this review focuses on real correctness issues and small, actionable suggestions, rather than style.

Summary

# Category Severity File
1 Pre-existing CLOSED_FD == 0 sentinel surfaced/expanded by this PR Bug (latent) src/libAtomVM/otp_socket.c
2 Non-SMP WASI + mbedTLS build references mutex fields that are #ifdef-out Bug src/platforms/wasi/src/lib/sys.c
3 socket:accept/2 and socket:recv/3 leave the underlying select armed on several terminal paths Bug libs/estdlib/src/socket.erl
4 gen_tcp_socket / gen_udp_socket do not stop the select on timeout / unknown ref Suggestion libs/estdlib/src/gen_tcp_socket.erl, libs/estdlib/src/gen_udp_socket.erl
5 Direct (non-atomic) reads of int ATOMIC listeners_poll_count / select_events_poll_count Nit (only matters if SMP+wasip2 ever combined) src/platforms/wasi/src/lib/sys.c

The remaining WASI-specific socket hardening (errno_is_peer_close, ENOPROTOOPT/SO_LINGER no-ops, nif_select_write/2, send/sendto retry-on-eagain, spurious wakeup retry loops) generally looks sound.


1. CLOSED_FD is 0 — fd 0 is a legal socket fd

Refs: otp_socket.c#L221, otp_socket.c#L599-L603, otp_socket.c#L281-L284, otp_socket.c#L1976

#define CLOSED_FD 0

This sentinel is pre-existing (introduced in 2023), so it is not strictly a regression in this PR. However, the PR explicitly converts every if (rsrc_obj->fd == 0) site to rsrc_obj->fd == CLOSED_FD, doubling down on the assumption — and adds a new WASI Preview 2 target where the assumption is materially weaker. The underlying issue should be fixed.

Why fd 0 is a legal socket() return value

POSIX is explicit: socket(2) returns the lowest-numbered free fd on success, and only -1 on error.

"The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process."
— Linux socket(2), man7.org

So if fd 0 is free (e.g. stdin was closed, or the process was started without stdin), socket() may legitimately return 0.

What breaks today

otp_socket.c#L599-L603:

rsrc_obj->fd = socket(domain, type, protocol);
if (UNLIKELY(rsrc_obj->fd == -1 || rsrc_obj->fd == CLOSED_FD)) {
    AVM_LOGE(TAG, "Failed to initialize socket.");
    rsrc_obj->fd = CLOSED_FD;
    enif_release_resource(rsrc_obj);
    return make_errno_tuple(ctx);
}

A successful socket() returning 0 is reported as failure. Then otp_socket.c#L281-L284:

if (rsrc_obj->fd != CLOSED_FD) {
    close(rsrc_obj->fd);
    rsrc_obj->fd = CLOSED_FD;
}

…would later skip close(0).

Why this is more relevant for the new WASI target

src/platforms/wasi/README.md makes networking exclusive to wasm32-wasip2. Wasmtime — the reference WASI Preview 2 runtime — does not guarantee that stdin/stdout/stderr are inherited; they are null by default in embeddings:

"By default WASI programs have no stdin / no stdout / no stderr."
— Wasmtime C API, wasi.h docs

"StdinStream and StdoutStream are used to provide custom stdin/stdout streams if they're not inherited (or null, which is the default)."
Wasmtime WASIp2 docs

In Preview 2, wasi-libc still maps sockets through the libc fd namespace (wasi-libc#447), so a free fd 0 can plausibly be allocated to a socket() call. There is no documented guarantee that wasi-libc skips 0/1/2 when allocating socket descriptors.

Convention check

Erlang/OTP's own inet_drv.c uses #define INVALID_SOCKET -1. AtomVM itself already uses -1 as the closed sentinel in posix_nifs.c#L207:

#define CLOSED_FD (-1)

otp_socket.c is the outlier.

Suggested fix

--- a/src/libAtomVM/otp_socket.c
+++ b/src/libAtomVM/otp_socket.c
@@
-#define CLOSED_FD 0
+#define CLOSED_FD (-1)
@@
-    rsrc_obj->fd = socket(domain, type, protocol);
-    if (UNLIKELY(rsrc_obj->fd == -1 || rsrc_obj->fd == CLOSED_FD)) {
+    rsrc_obj->fd = socket(domain, type, protocol);
+    if (UNLIKELY(rsrc_obj->fd == CLOSED_FD)) {
         AVM_LOGE(TAG, "Failed to initialize socket.");
@@
-    if (UNLIKELY(fd == -1 || fd == CLOSED_FD)) {
+    if (UNLIKELY(fd == CLOSED_FD)) {

Anywhere a resource's fd is currently zero-initialized must also be initialized to CLOSED_FD (grep confirms rsrc_obj->fd = CLOSED_FD is already used in the OOM path; the resource's normal initializer should match).

Verdict: confirmed real latent bug. Pre-existing in otp_socket.c, but exposure widens with the wasm32-wasip2 target this PR introduces, since the runtime is not guaranteed to keep fd 0 reserved for stdin.

2. Non-SMP WASI + mbedTLS build dereferences fields that don't exist

Refs: sys.c#L77-L97, sys.c#L651-L707

struct WASIPlatformData
{
#ifndef AVM_NO_SMP
    pthread_mutex_t poll_mutex;
    pthread_cond_t poll_cond;
    bool should_poll;
    Mutex *entropy_mutex;
    Mutex *random_mutex;
#endif
    ...
};

entropy_mutex and random_mutex are conditional on SMP being enabled, but the mbedTLS helpers reference them unconditionally:

SMP_MUTEX_LOCK(platform->entropy_mutex);

Because SMP_MUTEX_LOCK is a function-like macro with an empty body in the non-SMP build, the argument is not token-pasted into the output and the build still compiles today — but only by accident. The intent (separate mutexes per resource) is broken in non-SMP builds, and any future change to SMP_MUTEX_LOCK (e.g. an assert(mutex != NULL)) will immediately break wasm32-wasip2, which is the supported non-SMP networking target according to the WASI README.

Either move the mutex fields out of the SMP guard, or guard the dereferences explicitly:

--- a/src/platforms/wasi/src/lib/sys.c
+++ b/src/platforms/wasi/src/lib/sys.c
@@ struct WASIPlatformData
-#ifndef AVM_NO_SMP
-    pthread_mutex_t poll_mutex;
-    pthread_cond_t poll_cond;
-    bool should_poll;
-    Mutex *entropy_mutex;
-    Mutex *random_mutex;
-#endif
+#ifndef AVM_NO_SMP
+    pthread_mutex_t poll_mutex;
+    pthread_cond_t poll_cond;
+    bool should_poll;
+#endif
+#if ATOMVM_HAS_MBEDTLS
+    Mutex *entropy_mutex; /* may be NULL in non-SMP; SMP_MUTEX_* are no-ops */
+    Mutex *random_mutex;
+#endif

…and only call smp_mutex_create when SMP is enabled. The current path of "field exists only in SMP, but is referenced outside SMP" is fragile.

3. socket:accept/2 / socket:recv/3 leave the select armed on several exit paths

Refs: socket.erl#L265-L287, socket.erl#L386-L417

accept/2 and recv/3 register a nif_select_read, then exit on any of:

  • successful read / accept,
  • {'$socket', Socket, abort, {Ref, closed}},
  • after Timeout timeout.

Only the explicit error branch calls nif_select_stop/1. The other exit paths leave the read select registered against the resource. The matching send_wait/2 / sendto_wait/3 paths added in this PR already do the right thing (call nif_select_stop/1 after select fires); the read-side helpers should mirror that.

Concrete consequence: a process can return from socket:recv/3 and still receive a delayed '$socket' message into its mailbox, which then leaks into whatever the caller does next. On wasm32-wasip2 this risk is amplified because a poll() re-arm reuses the entry until it is explicitly torn down.

--- a/libs/estdlib/src/socket.erl
+++ b/libs/estdlib/src/socket.erl
@@ accept0 receive
         {'$socket', Socket, select, Ref} ->
+            ?MODULE:nif_select_stop(Socket),
             case ?MODULE:nif_accept(Socket) of
                 {error, _} = E ->
-                    ?MODULE:nif_select_stop(Socket),
                     E;
                 {ok, _Socket} = Reply ->
                     Reply
             end;
         {'$socket', Socket, abort, {Ref, closed}} ->
+            ?MODULE:nif_select_stop(Socket),
             {error, closed}
     after Timeout ->
+        ?MODULE:nif_select_stop(Socket),
         {error, timeout}
     end;

@@ recv0_loop receive
         {'$socket', Socket, select, Ref} ->
+            ?MODULE:nif_select_stop(Socket),
             case ?MODULE:nif_recv(Socket, Length) of
                 {error, Reason} when Reason =:= timeout orelse Reason =:= eagain ->
-                    ?MODULE:nif_select_stop(Socket),
                     case recv0_remaining(Deadline) of
                         Done when Done =< 0, Deadline =/= infinity ->
                             {error, timeout};
                         _ ->
                             recv0_loop(Socket, Length, Deadline)
                     end;
                 {error, _} = E ->
-                    ?MODULE:nif_select_stop(Socket),
                     E;
                 {ok, Data} ->
                     {ok, Data}
             end;
         {'$socket', Socket, abort, {Ref, closed}} ->
+            ?MODULE:nif_select_stop(Socket),
             {error, closed}
     after Remaining ->
+        ?MODULE:nif_select_stop(Socket),
         {error, timeout}
     end;

The same pattern should be applied to recv0_r/5 and recvfrom0_loop/3.

4. gen_tcp_socket / gen_udp_socket should self-heal stale select entries

Refs: gen_tcp_socket.erl#L353-L407, gen_udp_socket.erl#L245-L264

The pending_selects map is now consistently cleared in this PR (good). Two paths still leave the underlying NIF select armed though:

  • the undefined branch of handle_info({'$socket', _, select, Ref}, ...) — the ref is unknown, but the NIF still has a selecting reader registered;
  • the timeout branch in handle_info({timeout, Ref, From}, ...) — the user-side timer fires, the gen_server replies {error, timeout}, but the underlying read select is still armed.

A subsequent select message can then fire into the gen_server, log a warning, and waste another iteration. Adding socket:nif_select_stop/1 on these branches makes the state machine self-healing.

--- a/libs/estdlib/src/gen_tcp_socket.erl
+++ b/libs/estdlib/src/gen_tcp_socket.erl
@@
-handle_info({'$socket', _Socket, select, Ref}, State) ->
+handle_info({'$socket', Socket, select, Ref}, State) ->
@@
         undefined ->
-            ?LOG_WARNING("Unable to find select ref ~p in pending selects", [Ref]),
+            ?LOG_WARNING("Unable to find select ref ~p in pending selects", [Ref]),
+            socket:nif_select_stop(Socket),
             {noreply, State};
@@ handle_info({timeout, Ref, From}, State) ->
         _ ->
+            socket:nif_select_stop(State#state.socket),
             gen_server:reply(From, {error, timeout}),
             {noreply, State#state{pending_selects = maps:remove(Ref, State#state.pending_selects)}}

(Apply analogously to gen_udp_socket.erl.)

5. Non-atomic reads of int ATOMIC fields

Refs: sys.c#L84-L90, sys.c#L201-L205

int ATOMIC listeners_poll_count;
int ATOMIC select_events_poll_count;
...
int listeners_poll_count = platform->listeners_poll_count;
int select_events_poll_count = platform->select_events_poll_count;

The fields are declared ATOMIC but read/written with plain assignment. In the supported matrix this is benign — wasm32-wasip2 is forced non-SMP, and the threaded WASI build does not expose networking — but it telegraphs an SMP intent that the implementation does not actually deliver. Either drop the ATOMIC qualifier here or use the project’s atomic load/store helpers consistently.

This is a nit unless someone later combines wasi-threads with wasip2 sockets.


Things explicitly checked, no finding

  • errno_is_peer_close() (otp_socket.c#L70-L84) — mapping EIO/EPIPE to peer-close on __wasip2__ matches the wasi-libc behavior described in the commit message. ECONNRESET retained for non-WASI BSDs.
  • nif_select_stop no-longer-erroring on ERL_NIF_SELECT_INVALID_EVENT (otp_socket.c#L1249-L1252) — needed for the new "tear-down then re-arm" pattern, looks right.
  • len == 0 short-circuit in nif_socket_send_internal (otp_socket.c#L2674-L2677) — defensible: a BEAM-level socket:send(S, <<>>) is a no-op rather than a peer-close probe. Slightly different from send(fd, "", 0) semantics on some libcs but more portable, and the new test_send_empty covers it.
  • enif_select_write in resources.c rejects non-NULL msg_env and creates a normal message — consistent with enif_select_read.
  • send_wait/sendto_wait matching {'$socket', Socket, abort, {_AnyRef, closed}} is permissive but correct: the resource is the same Socket term, so any abort still implies "closed".
  • The CMake-side WASI plumbing (wasi-sdk.cmake, src/CMakeLists.txt, src/lib/CMakeLists.txt) looks consistent; mbedTLS-via-FetchContent is gated on AVM_BUILD_MBEDTLS=ON.

Bottom line

Highest-impact items to address before merge:

  1. Stop using fd 0 as a closed sentinel in otp_socket.c (or document why it is safe on every supported platform).
  2. Decouple WASI mbedTLS support from AVM_NO_SMP so the documented wasm32-wasip2 non-SMP target stays buildable.
  3. Always pair a nif_select_read with a matching nif_select_stop in the blocking socket.erl helpers and the gen_*_socket timeout paths.

The rest of the PR — the WASI platform skeleton, nif_select_write/2, the spurious-wakeup retry loops, and the {error, closed} normalization — looks good.

@pguyot pguyot force-pushed the w17/wasi-platform branch 5 times, most recently from e572044 to 924812d Compare May 7, 2026 17:29
pguyot added 3 commits May 10, 2026 10:28
- Handle spurious select wakeups
- Allow linger and reuseaddr as no-ops if unsupported
- Add nif_select_write/2 to strengthen send/sendto
- Update tests to run when there is no inet driver
- Fix socket operations to return {error, closed} when the fd is closed

Signed-off-by: Paul Guyot <pguyot@kallisys.net>
Adopt OTP semantics for calls such as gen_tcp:close(Pid) when the
gen_server is already closed, thus fixing a race condition in tests

Signed-off-by: Paul Guyot <pguyot@kallisys.net>
Introduces the WASI (WebAssembly System Interface) platform allowing
AtomVM to run as a pure WebAssembly module under runtimes such as
wasmtime, WasmEdge, and wasmer.

Three target triples are supported:
- wasm32-wasip1: no SMP, no networking
- wasm32-wasip1-threads: SMP via wasi-threads, no networking
- wasm32-wasip2: networking via wasi:sockets@0.2.x, no SMP

Signed-off-by: Paul Guyot <pguyot@kallisys.net>
@pguyot pguyot force-pushed the w17/wasi-platform branch from 924812d to 8f94426 Compare May 10, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants