Friday 17 December 2010

Pickling

One of the use cases I have for unpickling is context-dependent whitelisting. You can easily do this by instantiating an Unpickler object and setting a replacement find_globals method on it.

        def find_global(moduleName, className):
t = namespaceSubstitutions.get((moduleName, className), None)
if t is not None:
moduleName, className = t

mod = __import__(moduleName, globals(), locals(), [])
# This won't have given us "X.B", but rather "X". So get "B" from "X".
idx = moduleName.rfind(".")
if idx != -1:
subModuleName = moduleName[idx+1:]
mod = getattr(mod, subModuleName)

obj = getattr(mod, className)
if moduleName +"."+ className in namespaceWhitelist or moduleName in namespaceWhitelist:
return obj
raise cPickle.UnpicklingError("%s.%s is not whitelisted for unpickling" %
(moduleName, className))

unpickler = cPickle.Unpickler(StringIO(packet))
unpickler.find_global = find_global

return unpickler.load()
This allows a whitelist to be enforced for specific cases where unpickling is done, perhaps for objects coming over the wire, but not for objects read from disk.

Another use case I have, this time for pickling, is to transform objects that are being pickled into forms that are compatible with whatever is unpickling them. Let's say your server application is built using a complicated and heavyweight framework that whatever machine it is running on literally groans at the effort of having to do so. Let's call it Twifted [1]. Your client application however has to be extremely lightweight, and doesn't have all the need for functionality that the server application does. So in order to make programming on the server more natural, you allow programmers to send objects in handy Twifted form over the wire to the client. Perhaps as return values from your RPC calls.

In order to do this, you hook into pickling. Now copyreg.pickle allows you to install global functions that can transform objects before they get pickled. But you want to do it on a case by case basis. It needs to be done for the objects that get sent over the wire, but not for the ones that go to disk. But the Pickler object doesn't have a friendly overridable function to allow you to do this, in much the same way find_globals does for Unpickler.

This is one possibility:
reducers = {}

def register_pickle_convertor(source_type, convertor_func):
global reducers
# Use the limited copy_reg API to do a trial install and to validate its correctness.
copy_reg.pickle(source_type, convertor_func)
# Now unregister it directly, because there is no API to do this.
del copy_reg.dispatch_table[source_type]

# Put the validated reducer in our back pocket for use as required.
print "Registered sake RPC convertor for objects of type %r" % source_type
reducers[source_type] = convertor_func

def modified_cPickle_dumps(obj):
global reducers
copy_reg.dispatch_table.update(reducers)
try:
return cPickle.dumps(obj, cPickle.HIGHEST_PROTOCOL)
finally:
# Remove our influence.
for source_type in reducers:
del copy_reg.dispatch_table[source_type]
I haven't put that much thought into it, but I'd like to find a better solution.

[1] I've never used Twisted, and the choice of this name has no bearing on all the suggestions to do so that I have ignored over the years.

3 comments:

  1. You're assuming moduleName is 'mod' or 'pkg.mod'. That single getattr won't work for 'pkg.subpkg.mod', you need a loop. Or you could use this trick:

    mod = __import__(moduleName, globals(), locals(), ['*'])

    and avoid the need of getattr() altogether.

    How safe do you think this kind of whitelisting is?

    ReplyDelete
  2. This seems like a bad idea to rely on this as a security mechanism since you wouldn't be able to tell what all is executed on any access to the pickled object.

    If you have other motivations such as preventing errors from bugs during event dispatch, than ignore ;-)

    ReplyDelete
  3. Marius, thanks for the tips. My known use cases are luckily two deep so do not require more complex logic. If you're using pickle, you're not really going for a secure solution. Personally, the biggest benefit to me is catching when I unintentionally send something across the wire. I'd be curious if you can think of problems with this WRT security though.

    ReplyDelete