Skip to content

fix: add deepcopy to not clone rm and cond_rm#423

Open
jazken wants to merge 3 commits intoapache:masterfrom
jazken:fix/deepcopy_assertion
Open

fix: add deepcopy to not clone rm and cond_rm#423
jazken wants to merge 3 commits intoapache:masterfrom
jazken:fix/deepcopy_assertion

Conversation

@jazken
Copy link

@jazken jazken commented Mar 1, 2026

Background Problem

We currently use pycasbin in our Django backend for checking access control. We scaled up the number of casbin backends and workers and introduced a Redis watcher. It worked perfectly in our development environment but it failed in our production environment. After much debugging, we realised it was partially due to the slower hardware in our production environment which created a race condition.

Why?

Python's default deepcopy traverses the entire object graph recursively. RoleManager (rm) and ConditionalRoleManager (cond_rm) contain bidirectionally linked Role objects where role.roles and role.users reference each other, forming a cyclic graph. Under concurrent load, two failure modes arise:

  1. RecursionError — deepcopy traverses the cyclic Role graph indefinitely.

Python's memo dict should catch this, but the cycle goes through reduce/getstate reconstruction paths that register the memo entry too late, allowing infinite recursion before the cycle is detected.

  1. RuntimeError: dictionary changed size during iteration — deepcopy iterates RoleManager.all_roles to copy it.

If a concurrent thread calls rm.add_link() via build_role_links() at the same moment, Python raises this error because the dict is mutated mid-iteration. This race window is wider on slower hardware where deepcopy takes longer, which is why the failure is non-deterministic and hardware-dependent.

Why excluding rm and cond_rm is safe

Both attributes are always rebuilt immediately after load_policy() completes via build_role_links() and build_conditional_role_links(). The deepcopy in load_policy() is only a snapshot of policy data (rules, tokens, values) — it never needs to carry live role manager state. Setting them to None in the copy is equivalent to the state of a freshly constructed Assertion before any role links have been built.

Transparency

The above is debugged together with the help of Claude, but the traceback and errors are validated by human.

@CLAassistant
Copy link

CLAassistant commented Mar 1, 2026

CLA assistant check
All committers have signed the CLA.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies Casbin’s Assertion model copying behavior to avoid deep-copying live role-manager state (rm/cond_rm) during policy reloads, addressing production race conditions observed when using watchers under concurrent load.

Changes:

  • Add a custom Assertion.__deepcopy__ implementation.
  • Exclude rm and cond_rm from deep copy to avoid recursive/cyclic graph traversal and concurrent mutation errors.
  • Add copy import to support the new deepcopy logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +120 to +124
for k, v in self.__dict__.items():
if k in ("rm", "cond_rm"):
setattr(result, k, None)
else:
setattr(result, k, copy.deepcopy(v, memo))
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting rm/cond_rm to None during deepcopy changes behavior when auto_build_role_links is disabled (or role links aren’t rebuilt after load_policy()): enforce_ex() builds the g() function from ast.rm/ast.cond_rm, so None makes g() fall back to name1 == name2 and effectively ignores any existing role manager state. Consider preserving the existing rm/cond_rm references (shallow copy / direct assignment) instead of forcing them to None, while still avoiding deep-copying their internal graphs.

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +125
def __deepcopy__(self, memo):
"""Custom deepcopy implementation that excludes rm and cond_rm attributes.
This stems from an issue with the watcher concurrent reloading causing an edge case of deepcopy error.
"""
cls = self.__class__
result = cls.__new__(cls)
memo[id(self)] = result
for k, v in self.__dict__.items():
if k in ("rm", "cond_rm"):
setattr(result, k, None)
else:
setattr(result, k, copy.deepcopy(v, memo))
return result
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces custom Assertion.__deepcopy__ behavior that affects load_policy() (which uses copy.deepcopy(self.model)), but there doesn’t appear to be a unit test covering the new deepcopy semantics (e.g., that rm/cond_rm are not deep-copied and that role-link behavior remains correct). Adding a focused test that exercises copy.deepcopy(Assertion) (and/or load_policy() on a model containing a RoleManager/ConditionalRoleManager) would help prevent regressions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants