Skip to content

gh-150490: Raise PyType_Modified for insertion into split dictionary (take 2)#152914

Open
DinoV wants to merge 2 commits into
python:mainfrom
DinoV:sharedkeys_type2
Open

gh-150490: Raise PyType_Modified for insertion into split dictionary (take 2)#152914
DinoV wants to merge 2 commits into
python:mainfrom
DinoV:sharedkeys_type2

Conversation

@DinoV

@DinoV DinoV commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Take 2 of #150489 - what's changed is that we now acquire the type lock in insert_split_key before we acquire the keys lock. We never do any work below the dictionary keys lock so we maintain an invariant that we don't have inverted lock acquisition leading to a deadlock. Also adds @nascheme's test case. ./python -m test --tsan-parallel --parallel-threads=4 -j4 -W` is no longer hanging.

When we insert into a split dictionary we update the shared keys version - this is used to invalidate caches for loading methods and loading class values and requires us to check the keys version. Instead we can raise PyType_Modified which lets us rely on the type version check + has inline values check instead of validating that we have the correct keys version. This gets rid of loading the cached keys version, the objects type (although likely the compiler eliminates this already), loading the cached keys from the type, and then loading the keys version and comparing it against the cached value.

@DinoV DinoV force-pushed the sharedkeys_type2 branch from 43d2f71 to e256c6b Compare July 2, 2026 19:47
@nascheme

nascheme commented Jul 2, 2026

Copy link
Copy Markdown
Member

Nice solution! This looks good, with the following minor comments:

  • I think the acquire of types mutex in insert_split_key() should have a comment beside it, explaining the lock ordering that is being relied on. It might not be obvious to people modifying that code in the future.
  • The _PyDict_SplitKeysInvalidated() function doesn't make it clear that it requires that the types mutex is held. I think this could just be a static function, it's only called from dictobject.c
  • Minor point: by switching to calling _PyType_Modified_Unlocked(), we lose the locking skip when tp_version_tag == 0. I don't know if that actually matters, probably not.

@nascheme

nascheme commented Jul 2, 2026

Copy link
Copy Markdown
Member

Claude Fable has some comments on the PR:

Additional findings, new to this commit:

  • Lost fast-path: PyType_Modified() had a lock-free quick-exit when tp_version_tag == 0. Now types.mutex is acquired unconditionally whenever the lock-free lookup misses - including when no insert happens (dk_usable == 0, or a racing thread already inserted) and for types that were never specialized. All first-time attribute inserts interpreter-wide serialize on one global mutex. It's one-time per key per class, so likely fine, but worth a scaling sanity check for workloads creating many classes.
  • Test quality: the print() should be removed; the four stopper threads are daemon and never joined, which risks dangling-thread environment-changed failures under regrtest (set stop_all and join them); barrier waits have no timeout, so a writer exception hangs the test instead of failing it.

Fable also suggests that we need to change the read order in specialize_instance_load_attr(). This was a race that existed in the previous version of this optimization as well. The fix only works in this version, since it requires that the type lock is held across both the invalidation and the key publish.

diff --git a/Python/specialize.c b/Python/specialize.c
index 3579c4acea9..899b0147013 100644
--- a/Python/specialize.c
+++ b/Python/specialize.c
@@ -979,13 +979,23 @@ do_specialize_instance_load_attr(PyObject* owner, _Py_CODEUNIT* instr, PyObject*
 static int
 specialize_instance_load_attr(PyObject* owner, _Py_CODEUNIT* instr, PyObject* name)
 {
-    // 0 is not a valid version
-    uint32_t shared_keys_version = 0;
-    bool shadow = instance_has_key(owner, name, &shared_keys_version);
     PyObject *descr = NULL;
     unsigned int tp_version = 0;
     PyTypeObject *type = Py_TYPE(owner);
+    // The type version must be read (and assigned, if currently unset)
+    // *before* checking the shared keys.  insert_split_key() invalidates
+    // the type version and publishes the new key while holding the type
+    // lock, so with this order a concurrent first insert of `name` either
+    // invalidates the version we cache here (the guard deopts), or is
+    // observed by instance_has_key() below (shadow == true, so the
+    // *_WITH_VALUES specializations are not used).  If the keys were read
+    // first, an insert landing between the two reads could give us a
+    // version assigned after the invalidation while `name` looks absent,
+    // and _GUARD_TYPE_VERSION would never deopt for the inserted key.
     DescriptorClassification kind = analyze_descriptor_load(type, name, &descr, &tp_version);
+    // 0 is not a valid version
+    uint32_t shared_keys_version = 0;
+    bool shadow = instance_has_key(owner, name, &shared_keys_version);
     int result = do_specialize_instance_load_attr(owner, instr, name, shadow, shared_keys_version, kind, descr, tp_version);
     Py_XDECREF(descr);
     return result;

The previously dangerous case - a fresh version paired with a stale "key absent" observation - can no longer occur because the keys read is now the later of the two. _Py_Specialize_StoreAttr() already uses the safe order.

@DinoV

DinoV commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Nice solution! This looks good, with the following minor comments:

  • I think the acquire of types mutex in insert_split_key() should have a comment beside it, explaining the lock ordering that is being relied on. It might not be obvious to people modifying that code in the future.
  • The _PyDict_SplitKeysInvalidated() function doesn't make it clear that it requires that the types mutex is held. I think this could just be a static function, it's only called from dictobject.c
  • Minor point: by switching to calling _PyType_Modified_Unlocked(), we lose the locking skip when tp_version_tag == 0. I don't know if that actually matters, probably not.

I just went ahead and inlined _PyDict_SplitKeysInvalidated as it only has the single caller. We do still get a check for tp_version_tag when we call _PyType_Modified_Unlocked but obviously that's after we hit the lock. I just don't think there's any avoiding the lock in this case. I think in practice this isn't going to be a big deal - most types are probably going to be initialized during startup rather than in parallel.

@DinoV DinoV marked this pull request as ready for review July 2, 2026 22:34
@DinoV DinoV requested review from markshannon and methane as code owners July 2, 2026 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants