Do we want to have weakref support?
Here's another thing to consider re: completeness of language implementation: do we want to support weak referenced (weakref module)?
On one hand, it's niche feature used rarely enough. On the other hand, it's something which would pretty much help low-memory usage: ability to cache some object in memory, but safely and transparently free it if there's memory pressure.
But efficient implementation of weakrefs is expensive - each object need to grow to contain pointer to its associated weakref. Taking into account that weakrefs are rare, less runtime-efficient, but much more memory-efficient impl of having a separate mapping from obj to its weakref can be used.
Thoughts/comments are welcome.
py: add weakref module with `ref` and `finalize` classes
Summary
This adds support for the standard weakref module, to make weak references to Python objects and have callbacks for when an object is reclaimed by the GC.
This feature was requested by PyScript, to allow control over the lifetime of external proxy objects (distinct from JS<->Python proxies).
Addresses issue #646 (that's nearly a 12 year old issue!).
Functionality added here:
weakref.ref(object [, callback])create a simple weak reference with optional callback to be called when the object is reclaimed by the GCweakref.finalize(object, callback, /, *args, **kwargs)create a finalize object that holds a weak reference to an object and allows more convenient callback usage and state change
Docs and tests are added. The module is enabled on the unix coverage build, and webassembly pyscript variant.
Implementation details
I tried to make this as efficient as possible, by adding another bit-per-block to the garbage collector, the WTB (weak table). Similar to the finalizer bit (FTB), if a GC block has its corresponding WTB bit set then a weak reference to that block is held. The details of that weak reference are stored in a global map, mp_weakref_map, which maps weak reference to ref/finalize objects, allowing the callbacks to be efficiently found when the object is reclaimed.
With this feature enabled the overhead is:
- 1/128th of the available memory is used for the new WTB table (eg a 128k heap now needs an extra 1k for the WTB).
- Code size is increased.
- At garbage collection time, there is a small overhead to check if the collected objects had weak references. This check is the same as the existing FTB finaliser scan, so shouldn't add much overhead. If there are weak reference objects alive (ref/finalize objects) then additional time is taken to call the callbacks and do some accounting to clean up the used weak reference.
Testing
Tests were added which should cover all new code, on both the unix and webassembly ports (webassembly behaves differently with the GC so needs separate tests).
Trade-offs and Alternatives
This is only enabled at the "everything" level, so should not affect any other port or build.
I tried to keep the code complexity impact to the GC down to a minimum, and I think I achieved that given the requirements for weak references. Most of the complexity is in the new py/modweakref.c module.
The ref and finalize objects are very similar behind the scenes, so I combined their implementation to reduce code size.