Skip to content

ConcurrencyException SIGNAL_ATTEMPTED_BEFORE_WAIT might lead to application deadlock issue #2609

@iaukhim

Description

@iaukhim

Bug Description

The application experiences a complete freeze when a ConcurrencyException with error code ConcurrencyException.SIGNAL_ATTEMPTED_BEFORE_WAIT is thrown. This happens because, under certain conditions, if this exception is not caught or handled, the ConcurrencyManager.releaseReadLock() method is not called on the CacheKey object whose numberOfReaders field was previously incremented.

In essence, we end up with CacheKey objects whose acquireDeferredLock() method can never be correctly executed because numberOfReaders was never decremented. In our case, since we use the property eclipselink.concurrency.manager.allow.concurrency.exception=false, this situation leads to a permanent deadlock if enough threads attempt to acquire a deferred lock on the problematic object.

Even if the exception is allowed (property set to true), the problem is not resolved because a thrown ConcurrencyException or WaitWasInterruptedException still does not decrement numberOfReaders.

Root Cause and Reproduction
The issue originates in org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject() at line 1119. The outcome of a SIGNAL_ATTEMPTED_BEFORE_WAIT thrown from ConcurrencyManager#releaseReadLock depends on whether it involves the last CacheKey in the AbstractSession#objectsLockedForClone collection.

If it is not the last CacheKey, the remaining CacheKey objects never have their releaseReadLock() method called, and their numberOfReaders fields are not decremented. Consequently, any subsequent call to acquireDeferredLock() on those CacheKey objects results in an infinite wait.

Problematic Code Block (in UnitOfWorkImpl.cloneAndRegisterObject):

if (identityMapLocked) {
     this.parent.getIdentityMapAccessorInstance().releaseWriteLock();
 } else {
     if (rootOfCloneRecursion) {
         if (this.objectsLockedForClone == null) {
             parentCacheKey.releaseReadLock();
         } else {
             for (Iterator iterator = this.objectsLockedForClone.values().iterator(); iterator.hasNext();) {
                 // If the issue occurs here before all readLocks are released,
                 // the CacheKey becomes practically blocked.
                 ((CacheKey)iterator.next()).releaseReadLock();
             }
             this.objectsLockedForClone = null;
         }
         executeDeferredEvents();
     }
 }

Observed Stack Traces:

Exception during read lock release:

org.eclipse.persistence.exceptions.ConcurrencyException: 
Exception Description: A signal was attempted before wait() on ConcurrencyManager. This normally means that an attempt was made to 
commit or rollback a transaction before it was started, or to rollback a transaction twice.
	at org.eclipse.persistence.exceptions.ConcurrencyException.signalAttemptedBeforeWait(ConcurrencyException.java:86)
	at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:742)
	at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:483)
	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1119)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildWorkingCopyCloneNormally(ObjectBuilder.java:966)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInUnitOfWork(ObjectBuilder.java:903)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInternal(ObjectBuilder.java:786)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:741)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:719)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.buildObject(ObjectLevelReadQuery.java:861)
	at org.eclipse.persistence.queries.ReadAllQuery.registerResultInUnitOfWork(ReadAllQuery.java:987)
	at org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:598)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1232)
	at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:913)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1191)
	at org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:485)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1279)
	at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:3029)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1898)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1880)
	at org.eclipse.persistence.mappings.CollectionMapping.executeBatchQuery(CollectionMapping.java:979)
	at org.eclipse.persistence.mappings.ForeignReferenceMapping.extractResultFromBatchQuery(ForeignReferenceMapping.java:650)
	at org.eclipse.persistence.mappings.CollectionMapping.extractResultFromBatchQuery(CollectionMapping.java:960)
	at org.eclipse.persistence.internal.indirection.BatchValueHolder.instantiate(BatchValueHolder.java:60)
	at org.eclipse.persistence.internal.indirection.QueryBasedValueHolder.instantiate(QueryBasedValueHolder.java:122)
	at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:97)
	at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:175)
	at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:238)
	at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:97)
	at org.eclipse.persistence.indirection.IndirectList.buildDelegate(IndirectList.java:275)
	at org.eclipse.persistence.indirection.IndirectList.getDelegate(IndirectList.java:458)
	at org.eclipse.persistence.indirection.IndirectList.stream(IndirectList.java:922)

Stack trace from a blocked thread (when the application is halted and killed because all threads are blocked):

org.eclipse.persistence.exceptions.ConcurrencyException: 
Exception Description: Wait was interrupted. 
Message: [null]
	at org.eclipse.persistence.exceptions.ConcurrencyException.waitWasInterrupted(ConcurrencyException.java:110)
	at org.eclipse.persistence.internal.helper.ConcurrencyManager.acquireDeferredLock(ConcurrencyManager.java:304)
	at org.eclipse.persistence.internal.identitymaps.CacheKey.acquireDeferredLock(CacheKey.java:218)
	at org.eclipse.persistence.internal.identitymaps.AbstractIdentityMap.acquireDeferredLock(AbstractIdentityMap.java:86)
	at org.eclipse.persistence.internal.identitymaps.IdentityMapManager.acquireDeferredLock(IdentityMapManager.java:153)
	at org.eclipse.persistence.internal.sessions.IdentityMapAccessor.acquireDeferredLock(IdentityMapAccessor.java:83)
	at org.eclipse.persistence.internal.sessions.AbstractSession.retrieveCacheKey(AbstractSession.java:5357)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:1016)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectInternal(ObjectBuilder.java:788)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObject(ObjectBuilder.java:741)
	at org.eclipse.persistence.internal.descriptors.ObjectBuilder.buildObjectsInto(ObjectBuilder.java:1396)
	at org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:605)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1232)
	at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:913)
	at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1191)
	at org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:485)
	at org.eclipse.persistence.internal.sessions.AbstractSession.internalExecuteQuery(AbstractSession.java:3367)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1898)
	at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1880)
	at org.eclipse.persistence.mappings.CollectionMapping.executeBatchQuery(CollectionMapping.java:979)
	at org.eclipse.persistence.mappings.ForeignReferenceMapping.extractResultFromBatchQuery(ForeignReferenceMapping.java:650)
	at org.eclipse.persistence.mappings.CollectionMapping.extractResultFromBatchQuery(CollectionMapping.java:960)
	at org.eclipse.persistence.internal.indirection.BatchValueHolder.instantiate(BatchValueHolder.java:60)
	at org.eclipse.persistence.internal.indirection.QueryBasedValueHolder.instantiate(QueryBasedValueHolder.java:122)
	at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:97)
	at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:175)
	at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:238)
	at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:97)
	at org.eclipse.persistence.indirection.IndirectMap.buildDelegate(IndirectMap.java:129)
	at org.eclipse.persistence.indirection.IndirectMap.getDelegate(IndirectMap.java:416)
	at org.eclipse.persistence.indirection.IndirectMap$1.<init>(IndirectMap.java:226)
	at org.eclipse.persistence.indirection.IndirectMap.entrySet(IndirectMap.java:225)

Proposed Solution
From my perspective, this is a bug. I think that all acquired read locks should be carefully released before propagating an exception.

Below is a rough example (for testing/reference only) of how the finally block could be modified to handle the exception more gracefully and ensure all locks are attempted for release:

} else {
    if (rootOfCloneRecursion) {
        if (this.objectsLockedForClone == null) {
            parentCacheKey.releaseReadLock();
        } else {
            ConcurrencyException caughtMainException = null;
            boolean releaseAllLocksOnSignalAttemptedBeforeWaitException = getReleaseAllLocksProperty();

            for (Iterator iterator = this.objectsLockedForClone.values().iterator(); iterator.hasNext(); ) {
                try {
                    ((CacheKey) iterator.next()).releaseReadLock();
                } catch (ConcurrencyException e) {
                    // Collect the exception only if:
                    // 1. The property is enabled AND this is a SIGNAL_ATTEMPTED_BEFORE_WAIT exception,
                    //    OR
                    // 2. We are already collecting exceptions (because we've encountered at least one
                    //    SIGNAL_ATTEMPTED_BEFORE_WAIT with the property enabled).
                    if ((releaseAllLocksOnSignalAttemptedBeforeWaitException &&
                            e.getErrorCode() == ConcurrencyException.SIGNAL_ATTEMPTED_BEFORE_WAIT
                    ) || caughtMainException != null) {
                        if (caughtMainException == null) {
                            caughtMainException = e;
                        }
                        // Continue releasing other locks even after catching the exception.
                    } else {
                        // If the exception is not the specific one, or the property is disabled,
                        // revert to the original EclipseLink logic (throw immediately).
                        throw e;
                    }
                }
            }
            this.objectsLockedForClone = null;
            if (caughtMainException != null) {
                throw caughtMainException; // Propagate after cleanup
            }
        }
        executeDeferredEvents();
    }
}

Clarification Request
If this is not considered a bug and is intended behavior, could you please advise on existing EclipseLink features or best practices to ensure that read locks are always correctly released, even in exceptional scenarios?

Steps to Reproduce
Requires a highly concurrent environment. The issue was reproduced by calling endpoints that interact with the problematic entities from several hundred threads simultaneously.

The L2 cache must be enabled.

Environment:

EclipseLink version: 2.7.15

Java/JDK version: OpenJDK 17

Entity Characteristics: Several problematic entities are involved, some featuring @manytomany relationships and others @manytoone relationships.

Expected Behavior
In the event of an exception during the release of acquired read locks, the system should attempt to release all acquired read locks before propagating the exception. At the very least, a best-effort cleanup should be performed to prevent leaving the cache in a permanently locked state.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions