Skip to content

Android: add command mecanism so that SDLActivity#14962

Draft
1bsyl wants to merge 8 commits intolibsdl-org:mainfrom
1bsyl:br_android_RPC_v2
Draft

Android: add command mecanism so that SDLActivity#14962
1bsyl wants to merge 8 commits intolibsdl-org:mainfrom
1bsyl:br_android_RPC_v2

Conversation

@1bsyl
Copy link
Contributor

@1bsyl 1bsyl commented Feb 3, 2026

Add some RPC commands so that SDLActivity defers most of SDL code internal execution it to the SDL main thread
see #14925

Most SDLActivity code is deferred to main SDL C thread with some command RPC.
except:

  • lifecycle event (which remains in main SDL C thread).
  • code to count recreation of activity because it is SDLActivity only.
  • code to initialize (nativeSetupJNI / nativeQuit), because it occurs before C Thread.
  • code which doesn't return void. (nativeGetHint()). because, value is needed immediately .
  • nativeInitMainThread() / nativeInitMainThread(), already in C thread.

(SDL_Window *)Android_Window global variable is removed.
Android ActivityMutex is removed.
Android_LifecycleMutex, is used to lock the SDLActivity state. (Android_WaitActiveAndLockActivity()).
since it prevents the SDLActivity to add new lifecylce message, it prevents it to change state.
factorisation of EGLSurface/native_window handling. (use in surfaceChanged/Created/Destroy, and SDLCreate/DestroyWindow()

It seem to work quite nicely from my side. I tested various stuff, but not fully.
(onNativeFileDialog() was the less easy to update, and I really have to tested).
other functions are always simple. I guess, the ways it's written with named struct, prevent lot of type mistake.

Still a draft. please give a try.
I'll try it also in production maybe in a while

maybe 2 todo:

  • I didn't add the HID native functions .. I am not familiar with them. but then didn't rely on SDL and use no locking.
    so maybe it makes not difference. (they could be added anyway afterwards).

  • we probably lose some event timestamp precision because event are sent from Main thread, and not SDLActivity thread. maybe it's not a big deal.
    but if we need it for some command, we could add. (just enabled the timestamp, and then send it with the event).
    maybe we wait to see, if this is really useful

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 3, 2026

still need comment: "We still need to pump events for lifecycle activity even if there's no window or video hasn't been initialized."

  • done

@slouken slouken added this to the 3.6.0 milestone Feb 3, 2026
@slouken
Copy link
Collaborator

slouken commented Feb 3, 2026

I'm marking this as draft until you have a chance to test it more thoroughly in production.

@AntTheAlchemist, any chance you can try this patch in a production release and provide feedback?

@slouken slouken marked this pull request as draft February 3, 2026 15:19
@slouken
Copy link
Collaborator

slouken commented Feb 3, 2026

FWIW, right now there is only one window, but conceptually the pump call would get all life cycle events and events for all windows.

@1bsyl 1bsyl force-pushed the br_android_RPC_v2 branch from 769a432 to de582b3 Compare February 4, 2026 11:13
@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 5, 2026

I remember I did a while ago a patch to have multiple SDL_Window on Android, there were several SurfaceView and several Activity involved. I don't remember how it worked though.. just that it didn't provide all the functionality of SDL. maybe I could dust it out and give a try.

Btw, I am trying the patch in production. I need a one or two week to have more information.

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 12, 2026

ok, so far that's ok. it's been 1 week, and 138k users.
no user complain (I've collapsed the tab comment)

Production console says:
User-perceived crash rate 0.01%
User-perceived ANR rate 0.08%

Still some low crashes in EGL libs and ANR in nativePollOnce. but maybe not SDL related.

Screenshot From 2026-02-12 21-26-33

@MoNTE48
Copy link

MoNTE48 commented Feb 15, 2026

MultiCraft Open Source is on the line 😀
https://play.google.com/store/apps/details?id=com.multicraft.game
We decided to try this PR on the beta version of our game and will let you know if anything goes wrong.
We use Sentry to track crashes, by the way.

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 16, 2026

@MoNTE48 nice, thanks for sharing. Let us know !

BTW: "the compare vs ALL Release" metric is not really relevant. because my SDL lib was a modified version.
Now, the SDL in production is more or less a vanilla version, with:
forced opengles2, openSLES audio, no HID, use the flags SDL_ANDROID_GAMEPAD_AS_RPC

@MoNTE48
Copy link

MoNTE48 commented Feb 17, 2026

After 30K sessions, I can see that the ANR issue has been resolved. I can clearly see new ANRs with AppMetrica or the advertising library, but libSDL is no longer the cause.

I didn't enable the GAMEPAD_AS_RPC build flag, but I'm not sure how many of my players use gamepads.

Just one new, rare crash:

SIGSEGV: Segfault
  libandroid          0xaffaa894   ANativeWindow_setBuffersGeometry
  libMultiCraft       0x86b7726e   SDL_EGL_CreateSurface (SDL_egl.c:1287)
  libMultiCraft       0x86b7726e   Android_nativeSurfaceChanged (SDL_androidwindow.c:210)
  libMultiCraft       0x86b4ba52   Android_PumpRPC (SDL_android.c:3414)
  libMultiCraft       0x86b75f28   Android_WaitActiveAndLockActivity (SDL_androidevents.c:274)
  libMultiCraft       0x86b76f92   Android_CreateWindow (SDL_androidwindow.c:58)
  libMultiCraft       0x86b34844   SDL_CreateWindowWithProperties (SDL_video.c:2572)
  libMultiCraft       0x86b35614   SDL_CreateWindow (SDL_video.c:2615)
  libMultiCraft       0x8694fe98   irr::CIrrDeviceSDL::createWindowWithContext (CIrrDeviceSDL.cpp:502)
  libMultiCraft       0x869514be   irr::CIrrDeviceSDL::createWindow (CIrrDeviceSDL.cpp:280)
  libMultiCraft       0x869514be   irr::CIrrDeviceSDL::CIrrDeviceSDL (CIrrDeviceSDL.cpp:171)
  libMultiCraft       0x86260804   createDeviceEx (Irrlicht.cpp:103)
  libMultiCraft       0x86260804   RenderingEngine::RenderingEngine (renderingengine.cpp:168)
  libMultiCraft       0x86260804   ClientLauncher::init_engine (clientlauncher.cpp:432)
  libMultiCraft       0x86260804   ClientLauncher::run (clientlauncher.cpp:124)
  libMultiCraft       0x86463b4e   real_main (main.cpp:246)
  libMultiCraft       0x86489c6a   SDL_main (porting_android.cpp:66)
  libMultiCraft       0x86b4905a   SDL_CallMainFunction (SDL_main_callbacks.c:164)
  libMultiCraft       0x86b4905a   SDL_RunApp (SDL_runapp.c:37)
  libMultiCraft       0x86b4905a   Java_org_libsdl_app_SDLActivity_nativeRunMain (SDL_android.c:1062)
  base.odex           0x939a4b95   oatexec

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 17, 2026

Ok thanks for the info ! do you have information like how your crash rate or anr rate is now ?.

I also have the ANativeWindow_setBuffersGeometry crash in SDL_CreateWindow,
But this is something that we've already seen before, it doesn t seems to be new

I have also some rare crash in:
SDL_EGL_SwapBuffers
Android_nativeSurfaceDestroyed
GLES2_RunCommandQueue (glClear)
but they were also there before.

I believe it's possible that some ANR becomes Crash. or inverse. or change name.
So, the global ANR / Crash has to be considered.

(Of course, if this is something reproducible, even with adding SDL_Delay() in code, this is has to be debugged...)

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 18, 2026

looking at aosp code, setBuffersGeometry calls set_buffer_format, which doesn't check the handle validity.
maybe a check would be good anyway.

int32_t ANativeWindow_setBuffersGeometry(ANativeWindow* window,
        int32_t width, int32_t height, int32_t format) {
    int32_t err = native_window_set_buffers_format(window, format);
    ....
}

static inline int native_window_set_buffers_format(
        struct ANativeWindow* window,
        int format)
{
    return window->perform(window, NATIVE_WINDOW_SET_BUFFERS_FORMAT, format);
}

@AntTheAlchemist
Copy link
Contributor

I'm late to the party - sorry guys. I'm now taking a proper look at this and report back with results from a production release.

Currently, my gamepad heavy app stands at 0.02% crash rate and 0.18% ANR rate.

Current common crashes (without this pull):
android.view.View.onResolvePointerIcon (probably AdMob)
android.database.sqlite.SQLiteConnection.nativeOpen (I don't even use SQL!)
GLES2_RunCommandQueue
SDL_EGL_SwapBuffers
Android_SetWindowFullscreen (I think this is fixable. Happens during a quit and the app is in the background; will create a new issue)

Current common ANRs (without this pull):
SDL_DispatchEventWatchList
org.libsdl.app.HIDDeviceUSB.close
android.os.MessageQueue.nativePollOnce
org.libsdl.app.HIDDeviceUSB.open
chromium-TrichromeWebViewGoogle.aab-stable-755910930 - org.chromium.components.policy.CombinedPolicyProvider.b (AdMob for sure)
Java_org_libsdl_app_SDLActivity_onNativeSurfaceDestroyed

Watch this space.

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 19, 2026

nice!
I guess
Crash of Android_SetWindowFullscreen , ANR SDL_DispatchEventWatchList
should disappear/
Others are also there for me, and probably unrelated to SDL

@MoNTE48
Copy link

MoNTE48 commented Feb 24, 2026

@1bsyl what about Android_GLES_SwapWindow crash, introduced with this PR?

backtrace:
  #00  pc 0x0000000000012564  /vendor/lib/egl/libGLES_mali.so (__egl_platform_color_conversion_needed+4)
  #01  pc 0x0000000000012178  /vendor/lib/egl/libGLES_mali.so (__egl_platform_surface_post_processing_needed_android+12)
  #02  pc 0x000000000006afc8  /vendor/lib/egl/libGLES_mali.so (__egl_mali_post_color_buffer+268)
  #03  pc 0x000000000006ad50  /vendor/lib/egl/libGLES_mali.so (__egl_mali_post_to_window_surface+104)
  #04  pc 0x000000000006a054  /vendor/lib/egl/libGLES_mali.so (_egl_swap_buffers+392)
  #05  pc 0x0000000000068108  /vendor/lib/egl/libGLES_mali.so (eglSwapBuffers+72)
  #06  pc 0x000000000000ca45  /system/lib/libEGL.so (eglSwapBuffersWithDamageKHR+236)
  #07  pc 0x0000000000c8ea75  /data/app/com.multicraft.game-_b6TlGVZWfBmsZ1umVPPJA==/lib/arm/libMultiCraft.so (Android_GLES_SwapWindow+1242) (BuildId: 022c00bc78f244a538bbdc2b32f09fe9083f3ebb)
  #08  pc 0x0000000000b746e7  /data/app/com.multicraft.game-_b6TlGVZWfBmsZ1umVPPJA==/lib/arm/libMultiCraft.so (irr::video::COGLES1Driver::endScene()+290) (BuildId: 022c00bc78f244a538bbdc2b32f09fe9083f3ebb)
  #

@1bsyl 1bsyl force-pushed the br_android_RPC_v2 branch from 36daee4 to 4359ae4 Compare February 26, 2026 20:12
@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 26, 2026

@MoNTE48
not sure if this is really introduce by the PR :/ I've seen that before.

I adding this anyway
4359ae4
so that onNativeSurface Created/Changed/Destroyed methods of SDLActivity don't return to quickly:
onNativeSurfaceCreated will wait max 100ms that the C Thread creates the EGL surface. and same for Changed/Destroyed.
I am curious to see if that helps.

@1bsyl
Copy link
Contributor Author

1bsyl commented Feb 26, 2026

Screenshot From 2026-02-26 21-17-14

So this is 3 weeks.
95 crashes/383 anr.
seems to say: same number of crashes. less anr ...
no bad comment

This doesn't take into account :

Android: check native_window is valid before calling ANativeWindow_setBuffersGeometry

Android: add semaphores so that nativeSurface methods can wait up to

going to try another test with the two previous commits
and I will also re-base to latest head so that it has another android fix inside ( a35bcad )

@1bsyl 1bsyl force-pushed the br_android_RPC_v2 branch from 4359ae4 to 224c331 Compare February 26, 2026 20:26
@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 6, 2026

So it's 1 week after starting 2nd test.

that includes in addition:

  • Android: check native_window is valid before calling ANativeWindow_setBuffersGeometry 0da4d25
  • Android: add semaphores so that nativeSurface methods 224c331
  • ( another android fix inside from main rebase:
    Android: prevent SDLActivity and Main Thread to access mJoystick
    a35bcad )

Install base is the same as 1st test: ~130k users.

So far, number of crash are reduced: 15
more ANR maybe ? 86 vs 63
I see no crash in SDL_EGL_CreateSurface ( ANativeWindow_setBuffersGeometry )
nor Android_GLES_SwapWindow().

Screenshot From 2026-03-06 15-52-46

edit: as difference with 1st test, I rolled out the 2nd test directly to 100%. (whereas I waited 1 day before doing that for the 1st one).

@AntTheAlchemist
Copy link
Contributor

This patch doesn't work for my apps. It seems to block all input events, which only come through when the app enters background. The 2nd time I leave the app and return, I get a black screen. Am I missing something?

@AntTheAlchemist
Copy link
Contributor

SDL_AppEvent isn't being called for input events. Is anyone else using the SDL_App* stuff?

@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 7, 2026

@AntTheAlchemist I didn't try SDL_AppEvent, but now it's ok with last commit. It also takes into account block on pause (very last commit separated).

but there are more changes inside: all cycle event are moved with the RPC command flow, instead of being a separate channel.

this is going to be a new test in a few weeks, once the v2 finished.
but the reason I wanted to try this are:

  • pause/resume acknowledge with C thread. SDLActivity doesn't change state without knowing that C thread processed the event.
  • pause/resume are inserted correctly within the flow, especially regarding the NativeSurface commands that create/destroy EGL surface.
  • before: handling Background very quickly should be fine, but if Resume is handled to quickly, I guess it can happen before EGL surface creation and create issues.
  • block_on_pause code is more clear: it waits for Java Resume semaphore, in a separate function, only used in PollEvent() and SDL_AppEvent()
  • Fore/Background events must still be handled as WatchEvent.
  • also, overall code is more clear

NB:
SDL_WakeUp() is not clear to me, currently no op

@AntTheAlchemist
Copy link
Contributor

AntTheAlchemist commented Mar 8, 2026

Thanks for that. Seems to work, but I've found a problem. After a SDL_WINDOW_RESTORED, the first frame drawn produces a black screen. This used to be fixed with a pre-emptive duplicated call to SDL_RenderPresent(), but that no longer works. My app doesn't redraw every SDL_AppIterate. It sets a redraw and refresh flag after a SDL_WINDOW_RESTORED event is recieved, then in the following SDL_AppIterate call, no draw methods are valid. No other events are sent, so the app is juts a black screen. There's no way to fix this unless my app redraws every SDL_AppIterate.

@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 9, 2026

@AntTheAlchemist
.... just testing with testsprite (modified, see below !). and that seems to work when drawing only once, as you describe.

to be clear:

  • SDL_EVENT_WINDOW_RESTORED occurs when it comes from background (not at start).
  • I draw only once. and I've got a frame:
diff --git a/test/testsprite.c b/test/testsprite.c
index 575be1163..010a8b49c 100644
--- a/test/testsprite.c
+++ b/test/testsprite.c
@@ -79,6 +79,7 @@ static bool LoadSprite(const char *file)
     return true;
 }
 
+static int g_draw = 0; // or can start at 1 also.
 static void MoveSprites(SDL_Renderer *renderer, SDL_Texture *sprite)
 {
     int i;
@@ -86,6 +87,10 @@ static void MoveSprites(SDL_Renderer *renderer, SDL_Texture *sprite)
     SDL_FRect temp;
     SDL_FRect *position, *velocity;
 
+    if (g_draw) {
+        SDL_Log("will draw...");
+        g_draw = 0;
+    } else { return; }
     /* Query the sizes */
     SDL_SetRenderViewport(renderer, NULL);
     SDL_GetRenderSafeArea(renderer, &viewport);
@@ -565,6 +570,10 @@ SDL_AppResult SDL_AppEvent(void *appstate, SDL_Event *event)
     if (event->type == SDL_EVENT_RENDER_DEVICE_RESET) {
         LoadSprite(icon);
     }
+    if (event->type == SDL_EVENT_WINDOW_RESTORED) {
+        SDL_Log("SDL_EVENT_WINDOW_RESTORED !!!");
+        g_draw = 1;
+    }
     return SDLTest_CommonEventMainCallbacks(state, event);
 }

here's simplified log, when going to FG

11:26:04.785 25616 25616 V [                           SDL] onNativeSurfaceCreated
11:26:04.785 25616 25616 I [                   SurfaceView] 170029209 surfaceChanged -- format=4 w=1080 h=2121
11:26:04.785 25616 25616 V [                           SDL] surfaceChanged()
11:26:04.785 25616 25616 V [                           SDL] Window size: 1080x2121
11:26:04.785 25616 25616 V [                           SDL] Device size: 1080x2340
11:26:04.785 25616 25616 V [                           SDL] onNativeSurfaceChanged
11:26:04.785 25616 25616 V [                           SDL] nativeResume()
11:26:04.785 25616 25700 I [                       SDL/APP] BlockOnPause...resume
11:26:04.786 25616 25700 I [                       SDL/APP] pixel format wanted SDL_PIXELFORMAT_RGBA8888 (1), got SDL_PIXELFORMAT_RGBA8888 (1)

11:26:04.786 25616 25700 I [                       SDL/APP] SDL_EVENT_WINDOW_RESTORED !!!
11:26:04.786 25616 25700 I [                       SDL/APP] will draw...

11:26:04.786 25616 25616 V [                           SDL] nativeResume() done

11:26:04.788 25616 25700 I [                       SDL/APP] 825634.23 frames per second
11:26:04.789 25616 25616 I [                   SurfaceView] 170029209 surfaceRedrawNeeded
11:26:04.789 25616 25616 I [                   SurfaceView] 170029209 finishedDrawing
11:26:04.789 25616 25616 V [                   SurfaceView] Layout: x=0 y=84 w=1080 h=2121, frame=Rect(0, 0 - 1080, 2121)
11:26:04.790 25616 25657 D [                   SurfaceView] 170029209 updateSurfacePosition RenderWorker, frameNr = 1, position = [0, 84, 1080, 2205]
                                                            surfaceSize = 1080x2121
11:26:04.810 25616 25616 V [                           SDL] onWindowFocusChanged(): true
11:26:04.810 25616 25700 V [                           SDL] nativeFocusChanged()
------------------- more than 2 seconds ------------------------

@AntTheAlchemist
Copy link
Contributor

Your testprite mod should fail, even without this pull. There's a known bug: #11324

@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 9, 2026

@AntTheAlchemist, I am answering in the other ticket then ..

@AntTheAlchemist
Copy link
Contributor

Were you testing using opengles2? Vulkan wouldn't have this issue.

@AntTheAlchemist
Copy link
Contributor

@1bsyl is there a way I can merge this PR with the latest SDL so I can download the entire source as a ZIP? This is currently 50 commits behind.

@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 11, 2026

@AntTheAlchemist I am going to do a re-base

1bsyl added 8 commits March 11, 2026 18:44
 executes no SDL code,     but defers it to SDL main thread
SDLActivity hasFocus
otherwise, this error may appear:
DequeueBuffer: dequeueBuffer failed
eglSwapBuffersWithDamageKHRImpl:1411 error 300d (EGL_BAD_SURFACE)
and Android_nativeSurface* to Android_NativeSurface*
100 ms for the C Thread to perform the corresponding action
…nt() broken after resize

Call ANativeWindow_setBuffersGeometry when size changes.

in addition: remove semaphores for NativeSurfaceChanged/Created/Destroyed,
they were not active anyway
- at start, the timeout expire because the app hasn't start.
- in background. it would also expire as app is blocked-on-pause
@1bsyl 1bsyl force-pushed the br_android_RPC_v2 branch from dbdf818 to ca98027 Compare March 11, 2026 17:45
@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 11, 2026

@AntTheAlchemist it should be sync'ed with latest

@MoNTE48
Copy link

MoNTE48 commented Mar 11, 2026

Perhaps some of the changes could be in a separate PR, which would be reviewed and merged faster than so many changes?

In any case, thank you for your work!

@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 11, 2026

@MoNTE48 thanks !
I think the pr is quite inseparable ...

except for the 11324 which could be apart. (see SDL_androidwindow.c if you need it). but that not tested with Head

(I am still running my tests for few days again)

@1bsyl
Copy link
Contributor Author

1bsyl commented Mar 19, 2026

This is 3 weeks and the 2nd test has finished.

124 crashs and 501 anr's. (vs 95, and 383 for 1st test)

This seems to be more en quantity, but the global rates are similar :
still 0.02% for crashes and 0.12% for ANR which is lower/better. (vs 0.02% and 0.13% for 1st test)

Maybe the app get more used ?
Also, there has always been some variability ongoing (users, time spent, kinds of ads maybe ... )

If I look at crash details, there are no more the SIGSEV seen in the 1st test.
(still some SIGSV but 5 out of 100)
There are lots java.lang.OutOfMemoryError. Input dispatching timed out, and so on/

No bad user comment.

Screenshot From 2026-03-19 10-32-13

I am trying a 3rd test (03/19), that includes the latest commit:

  • Move CycleEvent within the RPC commands flow
  • Android: use BLOCK_ON_PAUSE with SDL_AppEvent
  • Fixed bug 11324 - Android opengles2: First SDL_RenderPresent() broken after resize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants