Saturday 12 June 2010

Soulfu forums

The guy who hosts the Soulfu forums let the domain name expire because he didn't consider the activity within them to warrant renewing it. However, he still hosts the forums under his own domain name.

Monday 7 June 2010

Visual Studio 2010 & profiling

Not having the ability to profile in Visual Studio 2008 Professional, rather than upgrading to the Team System variant, I chose to try obtain Visual Studio 2010 Ultimate. After recompiling all my static library dependencies, I was ready to try out the profiler.


The first result was a little demoralising.


Every function in which time was being wasted, was lumped in as part of the function body of the main function. I disabled the default exclusion of small functions from instrumentation, but that merely included a raft of unimportant functions that deserved to be excluded. A bit of googling about missing functions in Visual Studio profiling yielded the following page. While not entirely relevant, it did suggest making sure that the PDB files were present. And after a bit of project configuration changes, in addition to judicious use of ProcessMonitor to make sure they were in the right place, the problem was indeed solved.



One of my original suspicions was that the file access was a cause of the slow loading process, and indeed the above report shows that 30% of the loading time is spent there. It turns out that anytime a file is accessed by name, the routines iterated over all the files in the archive until it found the one of the given type and with the given name. By adding a dictionaries for each file type, indexing the different files to a pointer to their archive header, this should just go away.

The first solution which came to mind was Sean's Tool Box, a public domain header which includes a raft of useful C functionality, including the ability to create custom dictionaries. It provides a default string to pointer dictionary stb_sdict which is almost suitable enough to use, but the original Soulfu code only matches on the first eight characters of a filename whereas this does full string comparisons and hashes. I had to replicate and extend it, in order to get a version that suited my needs.


#define STB_DEFINE
#include "stb.h"

unsigned int sdf_file_name_hash(char *str)
{
unsigned int hash = 0;
unsigned int i = 0;
while (*str && i < 8)
{
hash = (hash << 7) + (hash >> 25) + *str++;
i++;
}
return hash + (hash >> 16);
}

stb_declare_hash(STB_noprefix, pheadermap, pheadermap_, char *, SDF_PHEADER);
stb_define_hash_base(STB_noprefix, pheadermap, void*arena;, pheadermap_, pheadermapinternal_, 0.85f,
char *, NULL, STB_SDEL, stb_sdict__copy, stb_sdict__dispose,
STB_safecompare, !strcmp, STB_equal, return sdf_file_name_hash(k);,
SDF_PHEADER, STB_nullvalue, NULL);

pheadermap *pheadermap_array[SDF_FILETYPE_COUNT];
stb.h is composed of macros so if you get an argument wrong, the C compiler more often than not gives a confusing error message. The main problem I had was not realising that the macro arguments styled along the form return function(arg); needed to be specified that way. There is a potential bug here which I ignoring for now, which is the use of !strcmp which breaks the behaviour of only checking the first eight characters.

At one stage I moved the #include to the top of the file, which caused problems. As I am developing on Windows, stb.h defines symbols that are named the same as those which come from the standard Windows headers. If these get defined before the Windows headers are first included, then the Windows headers will choke.


And the file access overhead is now removed, leaving the huge 75% of time spent within the display_loadin function. All this function does is draw the "Loadin'" text and update the progress bar below it, so this is surprisingly odd.


However this is done every iteration of any loop that partipates in updating the loading screen. By only calling this function every 1/10th of a second rather than pointlessly often, this overhead should also go away.


At this point the majority of the loading time is spent in external code, doing things like decompressing ogg or jpeg files. While it took me a night to download the Visual Studio ISO to New Zealand, and a full day to get everything compiling and profiling, it was time well spent. Possible next steps are looking at Sean's ogg vorbis or jpeg and png decoders.

Edit: Reverting the upgraded VS2010 solution, and trying the changes with VS2008 I see a decreased startup time from 3m50s to 0m25s.

Sunday 6 June 2010

Visual Studio 2008 & profiling problems

The project I am currently working on is an open sourced project that was originally written for speed, at the cost of clarity and maintainability. While the changes the community have made to clean it up have served that purpose, they have also increased the time it takes to start the game from almost nothing to around four minutes.

When testing every change means sitting through this delay before the results can be seen, development becomes overly time consuming. In order to address this problem, I decided to profile starting up the game. I have Visual Studio 2008 Professional installed, and unfortunately it does not come with a profiler. If you want to profile with Visual Studio 2008, you need to go with its more extensive (and expensive) Team System variant. But there are references to installing standalone extras which in theory allow command line profiling with the less extensive variants.

The first step was downloading and installing the standalone profiler from MSDN. With this installed and a DOS window open, changing the current directory to that of the executable to be profiled. Then adding the profiling tools into the PATH environment variable, so that they could be used locally.

set path=%path%;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Team Tools\Performance Tools


Instrumenting

Profiling the executable requires that it be instrumented.

>vsinstr _SoulFu_.exe
Microsoft (R) VSInstr Post-Link Instrumentation 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.

Error VSP1011 : Unable to obtain debug information. Link with
the /Profile linker switch.

The executable that is to be profiled needs to have been linked with the /profile switch, which can be done with a simple recompile after changing the corresponding configuration option.



But now Visual Studio fails to compile the solution, with the message LNK2033: bad dll or entry point shown for several failing projects. Googling for this indicates that link.exe and msobj80.dll need to have the same version. It turns out that msobj80.dll which is located in C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE, has been clobbered by an older version during the installation of the performance tools. After more googling, I located a barely comprehensible post by an English as a second language poster, who suggested replacing it with the version in C:\Program Files\Common Files\microsoft shared\VSA\9.0\VsaEnv. Then as this Intel article indicates, killing the mspdbsrv.exe process with the task manager. After this experience, I could now recompile the solution with the required profiling switch.

Trying the instrumentation again..
>vsinstr _SoulFu_.exe
Microsoft (R) VSInstr Post-Link Instrumentation 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.

File to Process:
_SoulFu_.exe --> _SoulFu_.exe
Original file backed up to _SoulFu_.exe.orig

Successfully instrumented file _SoulFu_.exe

Profiling

The process goes along these lines, the profiler is first started, then the instrumented executable is run and exercised in the ways that would benefit from profiling, then the profiler is stopped and the resulting profiling information is processed.

Starting the profiler..
>vsperfcmd /start:trace /output:soulfu.vsp

Microsoft (R) VSPerf Command Version 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.
Running the instrumented executable..
>_SoulFu_.exe
Stopping the profiler..
>vsperfcmd /shutdown
Microsoft (R) VSPerf Command Version 9.0.30729 x86
Copyright (C) Microsoft Corp. All rights reserved.


Shutting down the Profile Monitor
------------------------------------------------------------
Processing the profiling data..
>vsperfreport soulfu.vsp /summary:all

Microsoft (R) VSPerf Report Generator, Version 9.0.0.0
Copyright (C) Microsoft Corporation. All rights reserved.

File opened
Successfully opened the file.
A report file, soulfu_Header.csv, has been generated.
A report file, soulfu_MarksSummary.csv, has been generated.
A report file, soulfu_ProcessSummary.csv, has been generated.
A report file, soulfu_ThreadSummary.csv, has been generated.
...
With the .vsp and .csv files in the same directory, PerfConsole can now be run. At its internal prompt, load soulfu.vsp gets it to load in the profiling information. However, the information it displayed was confused with the external and internal time percentages being large numbers rather than actual percentages. It also was unable to determine whether the data was intrument or sample, so defaulted to sample mode. This meant I was unable to see the call tree and other useful information, presuming that information would have been displayed correctly.

A little more searching indicated that I needed to set environment variables to indicate whether the instrumentation data was what classified as sampled or instrumented. I decided to try again, but profiling with explicit instrumentation, rather than sampling.

>VsPerfCLREnv /traceon
Enabling VSPerf Trace Profiling of managed applications
(excluding allocation profiling).

Current Profiling Environment variables are:
COR_ENABLE_PROFILING=1
COR_PROFILER={6468ec6c-94bd-40d3-bd93-4414565dafbf}
COR_LINE_PROFILING=0
COR_GC_PROFILING=0
Rerunning the profiler, the instrumented executable and then stopping the profiler gave me a new .vsp file for processing.

So I started the processing step again and waited..
>vsperfreport soulfu.vsp /summary:all

Microsoft (R) VSPerf Report Generator, Version 9.0.0.0
Copyright (C) Microsoft Corporation. All rights reserved.

File opened
Successfully opened the file.
A report file, soulfu_Header.csv, has been generated.
A report file, soulfu_MarksSummary.csv, has been generated.
A report file, soulfu_ProcessSummary.csv, has been generated.
A report file, soulfu_ThreadSummary.csv, has been generated.
I waited and waited. I left it going overnight, for over nine hours. It sat at around 50% usage of the CPU core it was running on for this entire time.


Next steps

At this point I gave up. Between the corrupt information shown by PerfConsole and the inability to generate .csv files for explicitly instrumentated profiling data I have wasted enough time on this. My next step is to just download Visual Studio 2010 Ultimate from MSDN and use that. I am going to have to waste more time rebuilding dependencies, but on the other hand I will get visual profiling tools and hopefully it will just work.