Monday, 7 June 2010

Visual Studio 2010 & profiling

Not having the ability to profile in Visual Studio 2008 Professional, rather than upgrading to the Team System variant, I chose to try obtain Visual Studio 2010 Ultimate. After recompiling all my static library dependencies, I was ready to try out the profiler.

The first result was a little demoralising.

Every function in which time was being wasted, was lumped in as part of the function body of the main function. I disabled the default exclusion of small functions from instrumentation, but that merely included a raft of unimportant functions that deserved to be excluded. A bit of googling about missing functions in Visual Studio profiling yielded the following page. While not entirely relevant, it did suggest making sure that the PDB files were present. And after a bit of project configuration changes, in addition to judicious use of ProcessMonitor to make sure they were in the right place, the problem was indeed solved.

One of my original suspicions was that the file access was a cause of the slow loading process, and indeed the above report shows that 30% of the loading time is spent there. It turns out that anytime a file is accessed by name, the routines iterated over all the files in the archive until it found the one of the given type and with the given name. By adding a dictionaries for each file type, indexing the different files to a pointer to their archive header, this should just go away.

The first solution which came to mind was Sean's Tool Box, a public domain header which includes a raft of useful C functionality, including the ability to create custom dictionaries. It provides a default string to pointer dictionary stb_sdict which is almost suitable enough to use, but the original Soulfu code only matches on the first eight characters of a filename whereas this does full string comparisons and hashes. I had to replicate and extend it, in order to get a version that suited my needs.

#define STB_DEFINE
#include "stb.h"

unsigned int sdf_file_name_hash(char *str)
unsigned int hash = 0;
unsigned int i = 0;
while (*str && i < 8)
hash = (hash << 7) + (hash >> 25) + *str++;
return hash + (hash >> 16);

stb_declare_hash(STB_noprefix, pheadermap, pheadermap_, char *, SDF_PHEADER);
stb_define_hash_base(STB_noprefix, pheadermap, void*arena;, pheadermap_, pheadermapinternal_, 0.85f,
char *, NULL, STB_SDEL, stb_sdict__copy, stb_sdict__dispose,
STB_safecompare, !strcmp, STB_equal, return sdf_file_name_hash(k);,
SDF_PHEADER, STB_nullvalue, NULL);

pheadermap *pheadermap_array[SDF_FILETYPE_COUNT];
stb.h is composed of macros so if you get an argument wrong, the C compiler more often than not gives a confusing error message. The main problem I had was not realising that the macro arguments styled along the form return function(arg); needed to be specified that way. There is a potential bug here which I ignoring for now, which is the use of !strcmp which breaks the behaviour of only checking the first eight characters.

At one stage I moved the #include to the top of the file, which caused problems. As I am developing on Windows, stb.h defines symbols that are named the same as those which come from the standard Windows headers. If these get defined before the Windows headers are first included, then the Windows headers will choke.

And the file access overhead is now removed, leaving the huge 75% of time spent within the display_loadin function. All this function does is draw the "Loadin'" text and update the progress bar below it, so this is surprisingly odd.

However this is done every iteration of any loop that partipates in updating the loading screen. By only calling this function every 1/10th of a second rather than pointlessly often, this overhead should also go away.

At this point the majority of the loading time is spent in external code, doing things like decompressing ogg or jpeg files. While it took me a night to download the Visual Studio ISO to New Zealand, and a full day to get everything compiling and profiling, it was time well spent. Possible next steps are looking at Sean's ogg vorbis or jpeg and png decoders.

Edit: Reverting the upgraded VS2010 solution, and trying the changes with VS2008 I see a decreased startup time from 3m50s to 0m25s.

No comments:

Post a Comment