Thursday, 26 February 2009

Implementing code reloading, part 2

This post continues on from Implementing code reloading, part 1.

The biggest limitation when it comes to code reloading is running code.

  • Running code has local variables and external changes to these cannot be made.
  • Running code has bytecode which is in mid-execution and which cannot be replaced until that execution has finished.
Some illustration of these problems will be given below.

Local variables

This will create a function which when run as a Stackless tasklet, blocks in mid-execution on a Stackless channel.
>>> import stackless
>>> c = stackless.channel()
>>> def f():
... a = 1
... b = {}
... c.receive()
Next the tasklet is created and run, leaving it blocked on the channel.
>>> t = stackless.tasklet(f)()
>>> t.run()
The function in mid-execution has a frame encapsulating its running state, like the local variables within that executing function. These variables can be accessed as a dictionary. Sort of.
>>> dir(t)
[..., 'frame', ...]
>>> dir(t.frame)
[..., 'f_locals', ...]
>>> t.frame.f_locals
{'a': 1, 'b': {}}
>>> t.frame.f_locals['a'] = 2
>>> t.frame.f_locals
{'a': 1, 'b': {}}
What this demonstrates is that there is no real local variable dictionary. The one obtained above is actually constructed on request.

The repercussions of this are that code in mid-execution cannot be satisfactorily updated to reflect code reloading changes. That code has to be able to continue executing without error using whatever objects it is holding onto, whatever the approach taken to code reloading is.

Methods

A method is a reference to a function, whether it is obtained from a class or an instance. In the following case, a method is being obtained from a class.
>>> class X:
... def f(self):
... pass
...
>>> X.f
<unbound method X.f>
Methods have attributes which refer to the class (im_class) and function involved (im_func), and possibly also an instance (im_self). Bound methods are taken from instances and have values for all three attributes, whereas unbound methods are taken from classes and do not have an instance. All of these attributes are read only and cannot be changed.
>>> dir(f)
[..., 'im_class', 'im_func', 'im_self']
>>> f.im_class
<class __main__.X at 0x01A1F720>
>>> f.im_class = X
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: readonly attribute
So methods stored as local variables of frames are effectively unchangeable references to the classes and functions they were taken from, both of which can be updated by code reloading. The existing references held in the methods stay in use until the code the frame is associated with, exits.

If the approach taken to code reloading relies on old versions of updated classes not being used after the update, then this is problematic. Whatever approach taken to code reloading, the stale function reference may continue to be used, and programmers need to be aware of that as well.

Wednesday, 25 February 2009

Implementing code reloading, part 1

This post continues on from A custom namespacing system for Python, part 2.

Code reloading

There are many situations where it is useful to be able to update the code an application is using while it is running. Some of the most common are when:

  • A service is being provided by the application and interruption to that service for its users is undesirable.
  • Starting up the application takes an excessive amount of time and modifying its behaviour without restarting it can allow for more productive development.
  • The state which the application maintains is built up after it was started, and the ability to modify behaviour of the application without restarting it allows interaction with the accumulated state.
The obvious way to implement the process of code reloading is to take new versions of objects from a changed file and replace the existing versions which were taken from the file before it changed. This approach will be explored first.

Code reloading through overwriting

With the custom namespacing system previously created, a script file can be loaded and its contents placed into a given namespace.

First a script needs to be loaded.
scriptFile = LoadScript("scripts\example.py", "game")
Next the script is run, placing its contents into the given namespace.
RunScript(scriptFile)
This can easily be hooked up to file change events.
def OnFileChanged(filePath):
scriptFile = LoadScript("scripts\example.py", "game")
RunScript(scriptFile)
A simple overwriting code reloading system can be implemented by repeating this process every time a file changes, which is exactly what would happen if there were actually a file change event hooked up in the manner shown. In response to a file change, the previously created objects from before the change are overwritten by the newly created versions from the freshly changed file.

Overwriting functions

If each time a function is called it is accessed via the namespace, the newest version of the function will be the one which is called. This makes sense, as this is the whole point of the overwriting process which happens on a reload. If a programmer wants to keep calling the same version of the function and not start calling new versions as they get put in place, they need to keep their own reference to the function to call, avoiding use of the namespace.

Overwriting classes

Let's try a few things. To start with, assume that "example.py" provides a base class called BaseObject.
class BaseObject:
def PrintClassName(self):
print self.__class__.__name__ +" (original)"
If there a class is defined, it is likely that at some point that class will be instantiated.
import game
baseObjectBeforeReload = game.BaseObject()
At some point "example.py" is changed, specifically the BaseObject class, triggering a reload. This is the new version of the class:
class BaseObject:
def PrintClassName(self):
print self.__class__.__name__ +" (reloaded)"
Now the namespace entry game.BaseObject will be a new class with the old version overwritten. And since the previously created instance baseObjectBeforeReload was created from the old version of the class, it will still be using that old version of the class.

Executing this code:
baseObjectAfterReload = game.BaseObject()
print baseObjectAfterReload.__class__ is not baseObjectBeforeReload.__class__
baseObjectBeforeReload.PrintClassName()
baseObjectAfterReload.PrintClassName()
Gives this output:
False
"BaseObject (original)"
"BaseObject (reloaded)"
For certain applications, this might be a desirable result. It won't update the behaviour of existing instances which are in use within the application, but it will give a different behaviour for new instances created from the newer classes. If a programmer still wants to create instances from older versions of the class, they can do so by taking a reference to the class from the namespace before a reload has happened and the processing of a changed file has overwritten it.

Limitations to naive class overwriting

However, things get more complicated when inheritance is involved. For the purposes of illustration, here are two classes, BaseObject and its subclass Object.

In the file "example1.py", BaseObject is defined:
class BaseObject:
def TokenFunction(self):
print "TokenFunction called"
In the file "example2.py", Object inherits from BaseObject:
import game

class Object(game.BaseObject):
def TokenFunction(self):
game.BaseObject.TokenFunction(self)
There are two things to note about the Object class:
  • Object.__class__ is a direct reference to the whatever game.BaseObject was when the class was loaded.
  • game.BaseObject in TokenFunction is obtained when the function is executed, this means it is not guaranteed to be the same class as Object.__class__.
When these two classes differ, calling TokenFunction will cause the following error:
TypeError: unbound method TokenFunction() must be called with
BaseObject instance as first argument (got Object instance instead)
This is the primary limitation in the overwriting approach. If this approach is used, then a subclass can never refer to its base class through a namespace reference. It needs to do so through a reference to a version of the base class which won't change. self.__class__.__bases__[0] fits the bill, but using it each time is too unwieldy, and it is easier to store a reference to the version of the class used.
import game
BaseObject = game.BaseObject

class Object(BaseObject):
def TokenFunction(self):
BaseObject.TokenFunction(self)
If a programmer can live with this limitation and does not find the need to keep track of the older references too cumbersome, this form of naive overwriting might be more than sufficient. But to be realistic, this requirement is too impractical to work with.

Overwriting classes "intelligently"

By extending the reloading process it may be possible to remove the need to keep track of those older references. When a class is updated, all the subclasses which inherit the old version need to be updated to be compatible with the new version. The base classes of a subclass can be found in the __bases__ attribute of the subclass. If the reference to the old version of a reloaded class in this attribute is replaced with a reference to the new version, then the above error does not occur.

This can be done in the following manner:
import gc, types
for ob1 in gc.get_referrers(oldBaseObjectClass):
# Class '__bases__' references are stored in a tuple.
if type(ob1) is tuple:
for ob2 in gc.get_referrers(ob1):
if type(ob2) in (types.ClassType, types.TypeType):
if ob2.__bases__ is ob1:
__bases__ = list(ob2.__bases__)
idx = __bases__.index(oldBaseObjectClass)
__bases__[idx] = game.BaseObject
ob2.__bases__ = tuple(__bases__)
But the opposite case is now a problem. If existing references to the old version of the class are stored directly before the reload and used after it, the error can still occur.

This approach can be further extended to cover additional cases, like replacing references to the old class in global dictionaries, but there are references which cannot be replaced. This still leaves the burden on the programmer having to know what cases they need to avoid, in order for this code reloading solution to be workable.

Unit tests illustrating the overwriting approach and general limitations in code reloading approaches can be found here.

What next?

It is useful to know the limitations to code reloading, in order to determine how they apply to a given approach, and how they affect the behaviour of any reloaded code whatever the approach taken.

This post is followed by Implementing code reloading, part 2.

Edited: 2009/02/25, 6:00PM. Rewrote the last couple of paragraphs and linked to the next post.