Thursday, 25 February 2010

Abusing mailman on Windows

Previous post: Mailman-style mailing list archives

I finished my hack of Mailman for mailing list archive generation a few weeks back. Besides monkey-patching replacement or stub versions of Linux-specific functionality which either does not exist on Windows, or works differently, the main problem was line endings. The marshalled state that Mailman keeps was persisted in a fashion that tended towards line-ending related corruption on Windows. Also the general reading of the underlying mbox archive in a non-Windows compatible way typically resulted in corrupt posts in the generated archive.

Mailman in the way I was using it didn't generate a proper "by thread" view. This was because of an eight year old bug, whose patch hadn't been applied and had rather been marked as discarded. I fumbled around in Mailman's bug tracker and created a new bug, which promptly got closed with the old one being reopened and its patch then being applied. Threading now worked, excellent.

The other problems I encountered were not the fault of Mailman, but rather an extended scope for the project. My original mbox archive which I had generated from web pages only covered 80% of the life of the mailing list. Someone else had the remainder in their gmail account, and offered to extract it for me. However they generated the mbox archive they provided me with, the emails were out of order which was not something Mailman liked. Between that, writing scripts to merge the two mbox archives, correcting the dates on the emails and stripping extraneous headers..

And finally.. a merged and generated browseable Mailman mailing list archive.