A custom namespacing system for Python, part 2
This post continues on from A custom namespacing system for Python, part 1.
Disclaimer
Feedback on these custom namespacing system posts has been almost all negative. Comments range from suggesting I use __import__
instead, to suggesting that doing this is outright wrong.
One assumption seems to be that I do not know about __import__
, which is incorrect. Another seems to be a disbelief that there should be any attempt to do something different in Python, like this for instance. Another might be that I use something like this because I don't like using the standard system which comes with Python. I can't guess what anyone reads into a blog post any more than a reader can guess why anyone would document the implementation of a system like this.
To me, one of the best aspects of the Python programming is its ability to do meta-programming. The fact that things like the package system are not forced on you, and you have the freedom and flexibility to create something like this.
Making the system more usable
In any case, the goal of the system was that all files within a given directory contributed the objects created within them to the same namespace, where the namespace was to match the directory hierarchy. However, in order to be usable, a system like this requires some additional functionality.
Namely:
- Dependency resolution.
- Intelligent filtering of namespace elements.
The current version of LoadScript in the ScriptDirectory class looks like this:
def LoadScript(self, filePath, namespacePath):The problem is that loading a set of scripts in this way prevents dependencies existing between them.This is an unrealistic constraint. For any reasonably complex set of scripts, there are going to be dependencies and one of the most common cases will be classes defined in one script subclassing classes defined in another. Dependency resolution is required for this system to be usable.
scriptFile = ScriptFile(filePath)
namespace = self.CreateNamespace(namespacePath)
self.InsertModuleAttributes(scriptFile.scriptGlobals, namespace)
return scriptFile
If the execution related aspects are removed from the loading of scripts, then all scripts can be prepared before any are executed. The next step is then to do all the execution as a batch, with dependencies resolved as part of the process.
So LoadScript needs to be broken into two parts. The new version of LoadScript should be limited to loading the code and the execution related aspects can be put into a new function called RunScript.
def LoadScript(self, filePath, namespacePath):As LoadScript delegates the actual loading and execution to the ScriptFile class, this needs to be split up in the same way.
return ScriptFile(filePath, namespacePath)
def RunScript(self, scriptFile):
scriptFile.Run()
namespace = self.CreateNamespace(scriptFile.namespacePath)
self.InsertModuleAttributes(scriptFile.scriptGlobals, namespace)
The current version of Load in the ScriptFile class looks like this:
def Load(self, filePath):This needs to be broken into two parts in the same way. A Load function to read in and compile the script file's source code and a Run function to attempt to execute the resulting compiled code.
self.filePath = filePath
script = open(self.filePath, 'r').read()
self.codeObject = compile(script, self.filePath, "exec")
self.scriptGlobals = {}
eval(self.codeObject, self.scriptGlobals, self.scriptGlobals)
However, the dependency resolution process will need to track the files which failed to run. And if there turn out to be script files which the dependencies cannot be located for preventing the startup process from being completed, knowing what those files were trying is essential to any programmer using this system being able to work out what they did wrong. So we will handle both these aspects by returning a flag to indicate success, and on failure, storing information about import failures.
def Load(self, filePath):The RunScript which was rewritten above will also need to be changed again to return the success flag up to its caller, but this is a simple change.
self.filePath = filePath
script = open(self.filePath, 'r').read()
self.codeObject = compile(script, self.filePath, "exec")
def Run(self):
self.scriptGlobals = {}
try:
eval(self.codeObject, self.scriptGlobals, self.scriptGlobals)
except ImportError:
self.lastError = traceback.format_exception(*sys.exc_info())
return False
return True
def RunScript(self, scriptFile):The next step is to rewrite the Load function in the ScriptDirectory class. Before it was enough to just load all the script files, executing them as part of the process.
if not scriptFile.Run():
return False
namespace = self.CreateNamespace(scriptFile.namespacePath)
self.InsertModuleAttributes(scriptFile.scriptGlobals, namespace)
return True
def Load(self):Now the two distinct steps need to be handled. The start of Load remains the same, as that now only handles the loading. But the second step of executing the loaded script files while resolving the encountered dependencies needs to follow it.
self.LoadDirectory(self.baseDirPath)
This can be in a simple manner with a straightforward algorithm.
- Make a list of all the known script files.
- Try and execute each script file in the list one by one.
- If a script file is executed successfully, remove it from the list.
- Note that one more attempt has been made to execute all the remaining scripts
- If more than a reasonable number of attempts have been made, give up.
- Otherwise, go back to step 2.
scriptFilesToLoad = set(self.filesByPath.itervalues())If this loop exits with scripts remaining to be loaded, then the loading process has failed, and the user should be notified so they can fix their errors, circular dependencies or whatever else they may have done wrong. Each script file will have recorded the error that occurred when it was last executed, so that information can be relayed to the user.
attemptsLeft = self.dependencyResolutionPasses
while len(scriptFilesToLoad) and attemptsLeft > 0:
scriptFilesLoaded = set()
for scriptFile in scriptFilesToLoad:
if self.RunScript(scriptFile):
scriptFilesLoaded.add(scriptFile)
# Update the set of scripts which have yet to be loaded.
scriptFilesToLoad -= scriptFilesLoaded
attemptsLeft -= 1
if len(scriptFilesToLoad):The LogLastError function is also rather straightforward.
logging.error("ScriptDirectory.Load failed to resolve dependencies")
# Log information about the problematic script files.
for scriptFile in scriptFilesToLoad:
scriptFile.LogLastError()
def LogLastError(self, flush=True):The function which created the ScriptDirectory instance and asked it to load also needs to be able to tell that the process failed. Adding the return of success flags, finishes off Load.
if self.lastError is None:
logging.error("Script file '%s' unexpectedly missing a last error", self.filePath)
return
logging.error("Script file '%s'", self.filePath)
for line in self.lastError:
logging.error("%s", line.rstrip("\r\n"))
if flush:
self.lastError = None
def Load(self):And with the addition of dependency resolution support, the custom namespacing solution is now usable. However, with a sufficiently complex set of scripts, the algorithm may not be sufficient. But that's an easy problem for future users to solve, for now.
## Pass 1: Load all the valid scripts under the given directory.
self.LoadDirectory(self.baseDirPath)
## Pass 2: Execute the scripts, ordering for dependencies and then add the namespace entries.
scriptFilesToLoad = set(self.filesByPath.itervalues())
attemptsLeft = self.dependencyResolutionPasses
while len(scriptFilesToLoad) and attemptsLeft > 0:
scriptFilesLoaded = set()
for scriptFile in scriptFilesToLoad:
if self.RunScript(scriptFile):
scriptFilesLoaded.add(scriptFile)
# Update the set of scripts which have yet to be loaded.
scriptFilesToLoad -= scriptFilesLoaded
attemptsLeft -= 1
if len(scriptFilesToLoad):
logging.error("ScriptDirectory.Load failed to resolve dependencies")
# Log information about the problematic script files.
for scriptFile in scriptFilesToLoad:
scriptFile.LogLastError()
return False
return True
Intelligent filtering of namespace elements
A script file can be looked at as containing two different sets of objects. Those which were imported from elsewhere and those which were created within the script file. The only set which should be exported to the namespace the file contributes to, are the latter. The former should be filtered out.
In an ideal world, there would be some way of determining what was actually created locally. But in this world, it is only possible to identify certain kinds of externally sourced objects.
Modules are one of the most commonly imported types of objects. These are never created within a script file, so they can always be filtered out.
Classes are another commonly imported type of object. And it is simple for us to distinguish between the ones which were created locally and the ones which weren't. The
if type(v) is types.ModuleType:
continue
__module__
attribute will be "__builtin__" if it was created locally, and it will have already been set to something else if it was imported from somewhere else.if type(v) in (types.ClassType, types.TypeType):The kinds of objects which cannot be filtered out are those which have values that are simple types like strings, numbers and so forth.
if v.__module__ != "__builtin__":
continue
v.__module__ = moduleName
A runnable form of the code shown above can be found here.
The followup to this post can be found here.
Edit: Added a linked to the next post in the set.