Wednesday, 4 February 2009

A custom namespacing system for Python, part 2

This post continues on from A custom namespacing system for Python, part 1.

Disclaimer

Feedback on these custom namespacing system posts has been almost all negative. Comments range from suggesting I use __import__ instead, to suggesting that doing this is outright wrong.

One assumption seems to be that I do not know about __import__, which is incorrect. Another seems to be a disbelief that there should be any attempt to do something different in Python, like this for instance. Another might be that I use something like this because I don't like using the standard system which comes with Python. I can't guess what anyone reads into a blog post any more than a reader can guess why anyone would document the implementation of a system like this.

To me, one of the best aspects of the Python programming is its ability to do meta-programming. The fact that things like the package system are not forced on you, and you have the freedom and flexibility to create something like this.

Making the system more usable

In any case, the goal of the system was that all files within a given directory contributed the objects created within them to the same namespace, where the namespace was to match the directory hierarchy. However, in order to be usable, a system like this requires some additional functionality.

Namely:

  • Dependency resolution.
  • Intelligent filtering of namespace elements.
Dependency resolution

The current version of LoadScript in the ScriptDirectory class looks like this:
def LoadScript(self, filePath, namespacePath):
scriptFile = ScriptFile(filePath)
namespace = self.CreateNamespace(namespacePath)
self.InsertModuleAttributes(scriptFile.scriptGlobals, namespace)

return scriptFile
The problem is that loading a set of scripts in this way prevents dependencies existing between them.This is an unrealistic constraint. For any reasonably complex set of scripts, there are going to be dependencies and one of the most common cases will be classes defined in one script subclassing classes defined in another. Dependency resolution is required for this system to be usable.

If the execution related aspects are removed from the loading of scripts, then all scripts can be prepared before any are executed. The next step is then to do all the execution as a batch, with dependencies resolved as part of the process.

So LoadScript needs to be broken into two parts. The new version of LoadScript should be limited to loading the code and the execution related aspects can be put into a new function called RunScript.
def LoadScript(self, filePath, namespacePath):
return ScriptFile(filePath, namespacePath)

def RunScript(self, scriptFile):
scriptFile.Run()

namespace = self.CreateNamespace(scriptFile.namespacePath)
self.InsertModuleAttributes(scriptFile.scriptGlobals, namespace)
As LoadScript delegates the actual loading and execution to the ScriptFile class, this needs to be split up in the same way.

The current version of Load in the ScriptFile class looks like this:
def Load(self, filePath):
self.filePath = filePath

script = open(self.filePath, 'r').read()
self.codeObject = compile(script, self.filePath, "exec")

self.scriptGlobals = {}
eval(self.codeObject, self.scriptGlobals, self.scriptGlobals)
This needs to be broken into two parts in the same way. A Load function to read in and compile the script file's source code and a Run function to attempt to execute the resulting compiled code.

However, the dependency resolution process will need to track the files which failed to run. And if there turn out to be script files which the dependencies cannot be located for preventing the startup process from being completed, knowing what those files were trying is essential to any programmer using this system being able to work out what they did wrong. So we will handle both these aspects by returning a flag to indicate success, and on failure, storing information about import failures.
def Load(self, filePath):
self.filePath = filePath

script = open(self.filePath, 'r').read()
self.codeObject = compile(script, self.filePath, "exec")

def Run(self):
self.scriptGlobals = {}
try:
eval(self.codeObject, self.scriptGlobals, self.scriptGlobals)
except ImportError:
self.lastError = traceback.format_exception(*sys.exc_info())
return False

return True
The RunScript which was rewritten above will also need to be changed again to return the success flag up to its caller, but this is a simple change.
def RunScript(self, scriptFile):
if not scriptFile.Run():
return False

namespace = self.CreateNamespace(scriptFile.namespacePath)
self.InsertModuleAttributes(scriptFile.scriptGlobals, namespace)

return True
The next step is to rewrite the Load function in the ScriptDirectory class. Before it was enough to just load all the script files, executing them as part of the process.
def Load(self):
self.LoadDirectory(self.baseDirPath)
Now the two distinct steps need to be handled. The start of Load remains the same, as that now only handles the loading. But the second step of executing the loaded script files while resolving the encountered dependencies needs to follow it.

This can be in a simple manner with a straightforward algorithm.
  1. Make a list of all the known script files.
  2. Try and execute each script file in the list one by one.
    • If a script file is executed successfully, remove it from the list.
  3. Note that one more attempt has been made to execute all the remaining scripts
  4. If more than a reasonable number of attempts have been made, give up.
  5. Otherwise, go back to step 2.
Or as implemented in Python code.
scriptFilesToLoad = set(self.filesByPath.itervalues())
attemptsLeft = self.dependencyResolutionPasses
while len(scriptFilesToLoad) and attemptsLeft > 0:
scriptFilesLoaded = set()
for scriptFile in scriptFilesToLoad:
if self.RunScript(scriptFile):
scriptFilesLoaded.add(scriptFile)

# Update the set of scripts which have yet to be loaded.
scriptFilesToLoad -= scriptFilesLoaded

attemptsLeft -= 1
If this loop exits with scripts remaining to be loaded, then the loading process has failed, and the user should be notified so they can fix their errors, circular dependencies or whatever else they may have done wrong. Each script file will have recorded the error that occurred when it was last executed, so that information can be relayed to the user.
if len(scriptFilesToLoad):
logging.error("ScriptDirectory.Load failed to resolve dependencies")

# Log information about the problematic script files.
for scriptFile in scriptFilesToLoad:
scriptFile.LogLastError()
The LogLastError function is also rather straightforward.
def LogLastError(self, flush=True):
if self.lastError is None:
logging.error("Script file '%s' unexpectedly missing a last error", self.filePath)
return

logging.error("Script file '%s'", self.filePath)
for line in self.lastError:
logging.error("%s", line.rstrip("\r\n"))

if flush:
self.lastError = None
The function which created the ScriptDirectory instance and asked it to load also needs to be able to tell that the process failed. Adding the return of success flags, finishes off Load.
def Load(self):
## Pass 1: Load all the valid scripts under the given directory.
self.LoadDirectory(self.baseDirPath)

## Pass 2: Execute the scripts, ordering for dependencies and then add the namespace entries.
scriptFilesToLoad = set(self.filesByPath.itervalues())
attemptsLeft = self.dependencyResolutionPasses
while len(scriptFilesToLoad) and attemptsLeft > 0:
scriptFilesLoaded = set()
for scriptFile in scriptFilesToLoad:
if self.RunScript(scriptFile):
scriptFilesLoaded.add(scriptFile)

# Update the set of scripts which have yet to be loaded.
scriptFilesToLoad -= scriptFilesLoaded

attemptsLeft -= 1

if len(scriptFilesToLoad):
logging.error("ScriptDirectory.Load failed to resolve dependencies")

# Log information about the problematic script files.
for scriptFile in scriptFilesToLoad:
scriptFile.LogLastError()

return False

return True
And with the addition of dependency resolution support, the custom namespacing solution is now usable. However, with a sufficiently complex set of scripts, the algorithm may not be sufficient. But that's an easy problem for future users to solve, for now.

Intelligent filtering of namespace elements

A script file can be looked at as containing two different sets of objects. Those which were imported from elsewhere and those which were created within the script file. The only set which should be exported to the namespace the file contributes to, are the latter. The former should be filtered out.

In an ideal world, there would be some way of determining what was actually created locally. But in this world, it is only possible to identify certain kinds of externally sourced objects.

Modules are one of the most commonly imported types of objects. These are never created within a script file, so they can always be filtered out.

if type(v) is types.ModuleType:
continue
Classes are another commonly imported type of object. And it is simple for us to distinguish between the ones which were created locally and the ones which weren't. The __module__ attribute will be "__builtin__" if it was created locally, and it will have already been set to something else if it was imported from somewhere else.
if type(v) in (types.ClassType, types.TypeType):
if v.__module__ != "__builtin__":
continue

v.__module__ = moduleName
The kinds of objects which cannot be filtered out are those which have values that are simple types like strings, numbers and so forth.

A runnable form of the code shown above can be found here.

The followup to this post can be found here.

Edit: Added a linked to the next post in the set.

No comments:

Post a Comment