Sunday 7 May 2006

Porting Python to the Nintendo DS

What was involved

Initially, when I decided to port Python to the Nintendo DS, I had no idea what was involved. It was a kind of spur of the moment thing. There is a development environment, devkitarm (part of devkitpro), which allows compilation of homebrew projects as 'roms' which can be loaded onto a Nintendo DS and executed. A part of this environment is support for standard output, and basic text output. Another homebrew coder (Headspin) had written a rom which allowed keyboard input through a picture of a keyboard on the bottom screen where you tap on keys with the stylus to write that text on the upper screen. So, I just compiled the Python source code against this basic setup.

And apart from some straightforward cross-platform compilation issues, which were easily addressable by an existing patch, it was that easy. Python compiled and ran on the Nintendo DS as is.

However, the Stackless Python features, involved a little more work. For Stackless to switch between tasklets by replacing sections of the stack, it needs to be able to do two things, at the assembly level.

  1. Get the current address of the stack pointer.
  2. Change the current address of the stack pointer.
The other supported platforms, except for x64 under Visual Studio, did this using inline assembly. And the reason that x64 used masm and a standalone assembly file, was because this combination of programming tools and platform do not support inline assembler. So, first, I tried the common easy approach, writing inline assembler.
static int slp_switch(void) {
register int *stackref, stsizediff;
__asm__ volatile ("" : : : REGS_TO_SAVE);
__asm__ ("mov %0, sp" : "=g" (stackref) : );
{
SLP_SAVE_STATE(stackref, stsizediff);
__asm__ volatile (
"mov r0, %0\n"
"add sp, sp, r0\n"
: /* no outputs */
: "g" (stsizediff)
: "r0"
);
SLP_RESTORE_STATE();
return 0;
}
__asm__ volatile ("" : : : REGS_TO_SAVE);

}
However, this did not seem to work. After littering the Stackless source code with printf statements and rerunning it over and over, adding more and more of them, I still had no idea why. So the next step was to try and run it in a debugger. Insight (a graphical frontend to gdb) comes with devkitarm. However, it does not emulate the DS platform, which means roms cannot be run within it. But it is handy, when it works, for looking at disassembly of functions within a compiled binary. But this was not an option, because the version of Insight that came with devkitarm crashed when a binary was loaded into it. The next step was to try the emulators, some of which sport debugging functionality. But this was also a lot more work than I expected.
  • Only two of the debuggers, Dualis and Desmume, would even run my rom. Even then, Dualis would crash to desktop when my rom crashed inside it. Desmume was a little better, and would not crash, when my rom did inside it. This was a little better, because I could see the printf statements that had happened up to the point the rom crashed.
  • There are no breakpoints. This meant that in order to get the emulator to pause at a specific point in the code, I had to add a loop which held it there for a prolonged period of time. Desmume had single stepping, and would step for a given number of instructions. But Dualis did not. What it did have, that Desmume did not, was syntax highlighting.
What the disassembly in the emulators, especially the more readable Dualis, showed me, was the reason my Stackless support did not work. The code generated by gcc was preserving the stack register (in register r7) which made the inline assembly ineffectual. And it made the Stackless tasklet switching unstable, the stack was being restored with the contents from the tasket being switched to, but the stack pointer itself could not be adjusted to match because gcc prevented this from happening. I could alter the inline assembly to set r7 directly, and this did work, but it was a hack and could not necessarily be relied on to work consistently. This left one option, write the whole switching function in assembler.

The support for masm/x64 was my base for this. I basically wrote equivalent code in ARM thumb assembly (I chose ARM thumb over normal ARM assembler because while it is slower, it generates smaller code and memory is at a premium on the DS). Once I had this written, it worked perfectly.

Lessons learned
  • Porting a shell-based application to the Nintendo DS, given a method of input, a method of output and a display for the output to go to, is relatively straightforward. Especially to the degree which Headspin's keyboard code and libnds do.
  • Adding support to Stackless Python for another platform is relatively straightforward. The hardest part of my experience with it, was caused because of the lack of debugging support for Nintendo DS homebrew roms.
  • Just because Python is on another platform, doesn't mean it is much use. It just shows that for it to be much use, the DS hardware needs to be exposed in some way, so that games and applications can be driven by it.
Next steps

As the last of the lessons learned suggests, for Python on the DS to be any use, the DS hardware needs to be exposed. And the best way to do this, is probably to port PyGame. Why PyGame? Because SDL has been ported to the DS for the most part (sound and threading still need work, sound because it relies on the threading support) and PyGame from what I hear is a wrapper around SDL.