Monday, 1 February 2010

Patching through code modification

Previous post: Tracking class instantiations

As I have been exploring patching __init__ of classes loaded by my code reloading framework so that I can track creation of instances, I've been considering other approaches.

In the previous post, where there was an existing __init__ method, I renamed it and had my replacement __init__ call it before it registered the freshly created instance. But I can do better, if I modified the bytecode of the existing method, I could inject my registration call directly into it. As an optimisation, in this case it does not add much. But it is interesting to look into, and there is the possibility that this sort of functionality can be added in a more general way within the code reloading framework.

I found three commonly mentioned bytecode manipulating frameworks:

  • bytecodehacks: No longer maintained and out of date for 2.6.
  • BytecodeAssembler: Lots of dependencies and it only allows creation of bytecode, not modification of existing bytecode.
  • byteplay: One file, allows modification of existing code, works out of the box.
byteplay looks like the only suitable candidate that I can just pick up and use.

Code to be modified:
>>> class Test:
... def __init__(self):
... if f():
... print 1
... return
... if g():
... print 2
... return
... print 3
...
I want to make my injected call after the logic in the function has been executed, but before it returns. In this function, there are multiple return points.

Passing the code into byteplay:
>>> import byteplay
>>> c = byteplay.Code.from_code(Test.__init__.func_code)
>>> print c.code

3 1 LOAD_GLOBAL f
2 CALL_FUNCTION 0
3 JUMP_IF_FALSE to 13
4 POP_TOP

4 6 LOAD_CONST 1
7 PRINT_ITEM
8 PRINT_NEWLINE

5 10 LOAD_CONST None
11 RETURN_VALUE
>> 13 POP_TOP

6 15 LOAD_GLOBAL g
16 CALL_FUNCTION 0
17 JUMP_IF_FALSE to 27
18 POP_TOP

7 20 LOAD_CONST 2
21 PRINT_ITEM
22 PRINT_NEWLINE

8 24 LOAD_CONST None
25 RETURN_VALUE
>> 27 POP_TOP

9 29 LOAD_CONST 3
30 PRINT_ITEM
31 PRINT_NEWLINE
32 LOAD_CONST None
33 RETURN_VALUE
Basically I want to inject my call before each LOAD_CONST None/RETURN_VALUE pair.

Code to inject:
>>> def f(self):
... events.Register(self)
Passing the code into byteplay:
>>> c2 = byteplay.Code.from_code(f.func_code)
>>> print c2.code

2 1 LOAD_GLOBAL events
2 LOAD_ATTR Register
3 LOAD_FAST self
4 CALL_FUNCTION 1
5 POP_TOP

6 LOAD_CONST None
7 RETURN_VALUE
Basically I want to select the bytecode entries matching displayed lines 1 through 7 and insert them in place of any existing pairs as described above. But something these bytecode listings do not show, is that line numbers are also marked up with bytecode entries. So I need to make sure I do not obliterate existing line numbers in the code I am modifying, or copy over line numbers in the code I am injecting.

Injecting the call before the returns:
offset = len(c.code) - 1
lastInstruction = None
while offset >= 0:
instruction, value = c.code[offset]
if lastInstruction == byteplay.RETURN_VALUE and \
instruction == byteplay.LOAD_CONST:
c.code[offset:offset+2] = c2.code[1:]
lastInstruction = instruction
offset -= 1
The resulting bytecode:
>>> print c.code

3 1 LOAD_GLOBAL f
2 CALL_FUNCTION 0
3 JUMP_IF_FALSE to 18
4 POP_TOP

4 6 LOAD_CONST 1
7 PRINT_ITEM
8 PRINT_NEWLINE

5 10 LOAD_GLOBAL events
11 LOAD_ATTR Register
12 LOAD_FAST self
13 CALL_FUNCTION 1
14 POP_TOP
15 LOAD_CONST None
16 RETURN_VALUE
>> 18 POP_TOP

6 20 LOAD_GLOBAL g
21 CALL_FUNCTION 0
22 JUMP_IF_FALSE to 37
23 POP_TOP

7 25 LOAD_CONST 2
26 PRINT_ITEM
27 PRINT_NEWLINE

8 29 LOAD_GLOBAL events
30 LOAD_ATTR Register
31 LOAD_FAST self
32 CALL_FUNCTION 1
33 POP_TOP
34 LOAD_CONST None
35 RETURN_VALUE
>> 37 POP_TOP

9 39 LOAD_CONST 3
40 PRINT_ITEM
41 PRINT_NEWLINE
42 LOAD_GLOBAL events
43 LOAD_ATTR Register
44 LOAD_FAST self
45 CALL_FUNCTION 1
46 POP_TOP
47 LOAD_CONST None
The next step is to make f, g and events, and to execute the modified bytecode.

Testing the bytecode:
>>> def f(): return False
...
>>> def g(): return False
...
>>> Test.__init__.im_func.func_code = c.to_code()
>>> class Events:
... def Register(self, instance):
... print "REGISTERED", instance
...
>>> events = Events()
>>> t = Test()
3
REGISTERED <__main__.Test instance at 0x01D2DAD0>
Excellent. I'll have to think about the possibilities for this. It has potential to allow the creation of all sorts of interesting features in a code reloading framework.