Bug summary
In PeerConnectionWrapper, the offer/offeree role is decided by a lexicographic
string comparison between the remote session ID (received over the signaling
channel) and the local session ID. In HPB (external signaling) mode these two
IDs are not in the same namespace, which makes the comparison meaningless and
can cause both peers to compute "I am NOT the offerer" simultaneously. No offer
is ever sent, no answer is ever produced, and the call hangs silently.
Steps to reproduce
- Configure a Nextcloud instance with the standalone signaling server (HPB).
- Start a P2P call (no MCU) between two Talk Android clients.
- Repeat until both peers happen to compute the same direction for
compareTo
(depends on the random session IDs generated by NC and HPB on each join).
The deadlock is intermittent because session IDs are random — but the
underlying logic guarantees a non-zero failure rate per call.
Expected behaviour
For every successful peer pairing, exactly one side becomes the offerer and
sends a WebRTC offer; the other side answers; media flows.
Actual behaviour
In HPB mode, both peers can simultaneously decide they are the offeree. No
offer is sent. The call stays in "connecting" state until the user gives up.
Versions
- Talk Android app:
master — bug present in this code path since commit
b98bb93e4
(2017-11-29). All released versions are affected.
- Nextcloud server: 33.0.3 (also affects all earlier versions where HPB is
supported).
- Talk (spreed) server app: 24.0.0-dev.3.
- Standalone signaling server (HPB): v2.1.1.
- Custom Signaling server configured: Yes.
- Custom TURN server configured: Yes.
- Custom STUN server configured: Yes.
Code references
PeerConnectionWrapper.java decides offerer/offeree at construction:
https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/webrtc/PeerConnectionWrapper.java#L119
boolean hasInitiated = sessionId.compareTo(localSession) < 0;
The localSession argument comes from CallActivity.createPeerConnectionWrapper:
https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/activities/CallActivity.kt#L2390
return PeerConnectionWrapper(
...
sessionId,
callSession, // ← Nextcloud session ID, even in HPB mode
...
)
callSession is set from ApplicationWideCurrentRoomHolder.session
(NC session ID, ~16 char hex token):
https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/activities/CallActivity.kt#L1534
The remote sessionId parameter, however, is the HPB session ID in HPB
mode. It is set from the WebSocket envelope senderWebSocketMessage.sessionId:
https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/webrtc/WebSocketInstance.kt#L184
ncSignalingMessage.from = callWebSocketMessage.senderWebSocketMessage!!.sessionId
The HPB session ID is a 58–70 char base64url string set in the
hello response:
https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/webrtc/WebSocketInstance.kt#L324
Why the comparison can deadlock
In standard internal signaling mode, both sessionId and localSession are
NC session IDs (same namespace, same length, same character set), so
compareTo defines a total order — exactly one side computes < 0.
In HPB mode, sessionId is the HPB session ID (base64url, 58–70 chars) and
localSession is still the NC session ID (~16 char hex). The two strings live
in different namespaces with different alphabets and lengths. Lexicographic
ordering between them carries no semantic meaning. There exist pairs of IDs
where:
- Peer A computes
remote_HPB.compareTo(local_NC_A) >= 0 ("A is not initiator")
- Peer B computes
remote_HPB.compareTo(local_NC_B) >= 0 ("B is not initiator")
Both peers wait for an offer that never arrives. The fix processParticipantList
in #6087 prevents the call from being torn down on the partial-update path,
but it does not unblock the offer/answer deadlock — that is a separate symptom
of the same underlying contract violation.
Suggested fix
Pass the HPB session ID (i.e. webSocketClient.sessionId) instead of
callSession to PeerConnectionWrapper when external signaling is enabled.
Both sides then compare in the same namespace.
A more defensive approach would be to rename the parameter to
localSignalingSessionId and document that it MUST be in the same namespace
as the remote sessionId.
Related
Bug summary
In
PeerConnectionWrapper, the offer/offeree role is decided by a lexicographicstring comparison between the remote session ID (received over the signaling
channel) and the local session ID. In HPB (external signaling) mode these two
IDs are not in the same namespace, which makes the comparison meaningless and
can cause both peers to compute "I am NOT the offerer" simultaneously. No offer
is ever sent, no answer is ever produced, and the call hangs silently.
Steps to reproduce
compareTo(depends on the random session IDs generated by NC and HPB on each join).
The deadlock is intermittent because session IDs are random — but the
underlying logic guarantees a non-zero failure rate per call.
Expected behaviour
For every successful peer pairing, exactly one side becomes the offerer and
sends a WebRTC offer; the other side answers; media flows.
Actual behaviour
In HPB mode, both peers can simultaneously decide they are the offeree. No
offer is sent. The call stays in "connecting" state until the user gives up.
Versions
master— bug present in this code path since commitb98bb93e4(2017-11-29). All released versions are affected.
supported).
Code references
PeerConnectionWrapper.javadecides offerer/offeree at construction:https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/webrtc/PeerConnectionWrapper.java#L119
The
localSessionargument comes fromCallActivity.createPeerConnectionWrapper:https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/activities/CallActivity.kt#L2390
callSessionis set fromApplicationWideCurrentRoomHolder.session(NC session ID, ~16 char hex token):
https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/activities/CallActivity.kt#L1534
The remote
sessionIdparameter, however, is the HPB session ID in HPBmode. It is set from the WebSocket envelope
senderWebSocketMessage.sessionId:https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/webrtc/WebSocketInstance.kt#L184
The HPB session ID is a 58–70 char base64url string set in the
helloresponse:https://github.com/nextcloud/talk-android/blob/master/app/src/main/java/com/nextcloud/talk/webrtc/WebSocketInstance.kt#L324
Why the comparison can deadlock
In standard internal signaling mode, both
sessionIdandlocalSessionareNC session IDs (same namespace, same length, same character set), so
compareTodefines a total order — exactly one side computes< 0.In HPB mode,
sessionIdis the HPB session ID (base64url, 58–70 chars) andlocalSessionis still the NC session ID (~16 char hex). The two strings livein different namespaces with different alphabets and lengths. Lexicographic
ordering between them carries no semantic meaning. There exist pairs of IDs
where:
remote_HPB.compareTo(local_NC_A) >= 0("A is not initiator")remote_HPB.compareTo(local_NC_B) >= 0("B is not initiator")Both peers wait for an offer that never arrives. The fix
processParticipantListin #6087 prevents the call from being torn down on the partial-update path,
but it does not unblock the offer/answer deadlock — that is a separate symptom
of the same underlying contract violation.
Suggested fix
Pass the HPB session ID (i.e.
webSocketClient.sessionId) instead ofcallSessiontoPeerConnectionWrapperwhen external signaling is enabled.Both sides then compare in the same namespace.
A more defensive approach would be to rename the parameter to
localSignalingSessionIdand document that it MUST be in the same namespaceas the remote
sessionId.Related
cause (peer not publishing media on the receiving side).