Friday 15 January 2010

Locating the instances of Python classes

There is no straightforward way to obtain the instances of a Python class. But it is functionality that is occasionally useful. As I often have to write code to track down instances, and I have seen other broken solutions elsewhere, I'm going to post my code here in case it is of use to others.

Python itself does not track the instances of a clas directly, but it does track them indirectly within its garbage collection and reference counting system. So, given you know what to look for, you can make use of the garbage collection module to trace down class instances.

Tracking down direct instances of a class is relatively straightforward.

>>> class SomeClass: pass
...
>>> instance = SomeClass()
>>> referrers = [
... v
... for v
... in gc.get_referrers(SomeClass)
... if type(v) is types.InstanceType and
... isinstance(v, SomeClass)
... ]
>>> referrers[0] is instance
True
Of course the class you are interested in can be inherited by other classes, and instances of those inheriting classes do not have direct references to the class of interest. But classes which inherit other classes, have an indirect reference to those others classes through their __bases__ attribute. So if the class of interest is referred to by a tuple, which is in turn referred to by a class, then that class inherits the class of interest.

Tracking down instances related through inheritance is a little less straightforward.
    def ReferringClass(v):
matches = [
v
for v
in gc.get_referrers(v)
if type(v) is types.ClassType or
type(v) is types.TypeType
]
if len(matches) == 1:
return matches[0]

def FindInstances(class_):
instances = {}
for v in gc.get_referrers(class_):
if type(v) is types.InstanceType and
isinstance(v, class_):
if class_ not in instances:
instances[class_] = []
instances[class_].append(v)
elif type(v) is tuple and class_ in v:
rclass_ = ReferringClass(v)
if rclass_ is not None:
instances.update(FindInstances(rclass_))
return instances
A call to FindInstances will return a dictionary of the located instances. The class of interest, and each class that inherits it at some level, will have an entry mapped to a list of the instances of the given class.

The following demonstrates FindInstances.
>>> class ClassOfInterest:
... pass
...
>>> class SomeOtherClass:
... pass
...
>>> class SingleInheritingClass(ClassOfInterest):
... pass
...
>>> class MultiInheritingClass(SomeOtherClass,
... ClassOfInterest):
... pass
...
>>> class IndirectClass(MultiInheritingClass):
... pass
...
>>>
>>> coi = ClassOfInterest()
>>> soc = SomeOtherClass()
>>> mic = MultiInheritingClass()
>>> ic = IndirectClass()
>>>
>>> d = FindInstances(ClassOfInterest)
>>> for class_, instances in d.iteritems():
... print class_, len(instances)
...
IndirectClass 1
MultiInheritingClass 1
ClassOfInterest 1

Wednesday 13 January 2010

Sorrows mudlib: An event notification/subscription model, part 3

Previous post: Sorrows mudlib: An event notification/subscription model, part 2

I've settled on three methods of registering for events using my module. My events module will only directly support the first two, and while the third is my preferred approach, it requires the legwork to be done externally.

  1. Manual registration of object instances.
  2. Manual registration of direct callbacks.
  3. Automatic registration of object instances.
All of the following examples, assume the presence of a pre-existing instance of EventHandler, named events. Remember that what events an instance is registered for, is intended to be determined by the name of functions within the class of the instance. Specifically, functions named with the prefix of event_.

Manual registration of object instances

The most likely use of this approach, is an instance registering itself by a call within its __init__ function.
class SomeObject:
def __init__(self):
events.Register(self)
This approach is problematic however if you are using a code reloading system, like I am. Adding a call to Register within __init__ in a change to a relevant class, will leave existing instances unregistered for any events you have added. Removing a call to Register within __init__ in a change to a relevant class, will leave existing instances registered for any events they were previously registered for.

Manual registration of direct callbacks

Often there is a need to dynamically register a callback for events, which is what this approach allows.
class SomeObject:
def Callback(self):
pass

instance = SomeObject()
events.SomeEvent.Register(instance.Callback)
It also suffers from the same problems related to code reloading systems, as the previous approach.

Automatic registration of object instances

As mentioned at the top of this post, this approach has to be driven by external legwork. In my case, this will be my code reloading system. All a developer will have to do, is name a function intended to be notified of an event within a class, with the event_ prefix.
class SomeObject:
def event_SomeEvent(self):
pass
The external legwork would look something like the following.
def OnClassChanged(class_, instances):
oldEvents = getattr(class_, "__EVENTS__", set())

matches = eh.ProcessClass(class_)
if len(matches):
## Handle instances yet to be created.
old_init = class_.__init__
def new_init(self, *args, **kwargs):
eh.Register(self)
old_init(*args, **kwargs)
class_.__init__ = new_init

## Handle existing instances.
newEvents = set([ t[0] for t in matches ])

# Remove existing registrations.
removedEvents = oldEvents - newEvents
addedEvents = newEvents - oldEvents
matchesLookup = dict(matches)
for instance in instances:
for eventName in removedEvents:
eh._Unregister(eventName, instance)
functionName = matchesLookup[eventName]
for eventName in addedEvents:
eh._Register(eventName, instance, functionName)

class_.__EVENTS__ = newEvents
The code reloading system would be required to provide a callback notifying that a class had been changed, and also a list of the instances of the class that are in existence. This is information that the code reloading system would most likely have.

Conclusion

The event model is now ready for use in my MUD framework. I still need to decide how to make it available for use, whether through a built-in EventHandler instance or otherwise. And I still need to link it up to the code reloading system.

Source code: events.py

Monday 11 January 2010

Sorrows mudlib: An event notification/subscription model, part 2

Previous post: Sorrows mudlib: An event notification/subscription model, part 1

At this time, my preferred approach is to identify event subscriptions by how functions are named within a given object.

    def event_OnServicesStarted(self):
However, when I think about functions simply receiving an event when it happens, using a simple decorator has some appeal.
    @event
def OnServicesStarted(self):
But I am not sure how extensive my event model is going to get. Do I want to allow a subscription to specify when it gets received, or how it gets received?

When events might get received

For some events, it is useful to know that you are receiving a notification before an event has had side-effects through its subscribers, when it is appropriate to act on it and cause side-effects or when all direct side-effects have happened.

A simple way to structure this is to allow subscription for one of three different partial broadcasts. The first, a pre-broadcast. The second, the actual broadcast. And the third, a post-broadcast.

Using the decorator syntax, subscription might happen in the following way.
    @event.pre
def OnServicesStarted(self): pass

@event
def OnServicesStarted(self): pass

@event.post
def OnServicesStarted(self): pass
Each decorated function will have the same name, and will overwrite any previous attribute on the class, specifically the event function before it. This means that this specific approach using decorators will not allow an object to register for more only one stage in the broadcast of a given event.

Using the name-based syntax, subscription might happen in the following way.
    def event_OnServicesStarted_pre(self): pass

def event_OnServicesStarted(self): pass

def event_OnServicesStarted_post(self): pass
This is a little more versatile than the decorator approach, as it allows an object to register for every stage in the broadcast of an event. Of course, to address this the function naming could be brought partially into the decorator approach, where the use of the _pre or _post suffix would have the same effect.

How events might get received

Using Stackless Python, my MUD framework cooperatively schedules the microthreads it is built upon. Cooperative scheduling is a lot simpler to program than preemptive scheduling, because you know where your code is going to block. You do not always want your code to block when you want to send an event, although at other times it might be acceptable.

So the broadcaster needs to be in control of how events get sent. They need to be able specify that a broadcast can block or not.

It might be done through a global function.
    event("OnServicesStarted")
Or avoiding the clunky string used for an event name.
    event.OnServicesStarted()
It is reasonable to assume this is how blocking event broadcasts are made, given that the natural expectation for a function call is for it to do processing and return. So, that leaves the question of how to do non-blocking event broadcasts.

One possibility, is a special attribute that injects differing behaviour.
    event.noblock.OnServicesStarted()
Here, the noblock attribute would simply return a secondary event object that starts event broadcasts in a new microthread.

Does this effect subscribers? Not unless a subscriber has the ability to interfere or interrupt a broadcast, which is not a desired part of this model. Event subscription does not need to know about, or cater for this. It is a broadcast related detail.

To decorate, or not

Not. There is no compelling reason, beyond appreciation of how it looks, to choose to use decorators. The function naming approach is fine for now, and does not cost an extra line of source code.

When are objects registered

It is all very well for objects to specify functions named in such a way that it is recognised they should be called when events happen. However, the object still needs to be registered for those events. What is the best way to go about this? How is the use of code reloading affected by this?

It is possible within the code reloading system, to set things up in such a way that when an object is instantiated, it automatically gets registered for events. The way this would work, is that when a class is first loaded, or subsequently reloaded, it would have event related post-processing applied. The code reloading system does not currently support this, but it is an easy feature to add to it.

A simpler approach, is to have any object which defines events, make an explicit call to the event system to register it for any events that it may handle. This would have to happen within the constructor for that object, the __init__ method. The problem is that this method is called when the object is instantiated, and if the registration call is added within a subsequent code change, existing instances of the object will never get registered. It complicates the model a programmer has to hold in their head about how effective code reloading is.

Taking care of registration with the aid of the code reloading system looks like the way to go.

Conclusion

I am pretty sure I have gone over what I need out of an event system. I have a preferred approach to how event subscriptions are declared. I have a preferred approach to how event broadcasts are made. At this point, implementation looks like the next step.

Next post: Sorrows mudlib: An event notification/subscription model, part 3
Source code: events.py