Friday 30 January 2009

gamedev.net: Why do devs not want customer feedback?

Link: Why do devs not want customer feedback?

A pretty good thread on the value of customer feedback and the reasons it is or isn't as valuable as different factions think it is.

stimarco wrote: [Link]

I've seen behind the scenes of a certain, well-known, very popular MMO based here in the UK. The tools they have to use are just awful; easily the worst scripting language I've ever used, and I've used one which only supported single-bit variables. The company makes a big noise about its utterly irrelevant employee perks and awards for same -- baskets of fruit brought to your desk; games room; etc. -- to attract new people, but the place was like The Stepford Wives. (I was there for a two-day interview. Before the end of the first day, I already knew I didn't want the job; I'd wasted half a day trying to write a trivial game in their scripting language which I could have rattled off in 680x0 assembly language in about 30 minutes! I stuck it out because I was intrigued by the actual game design issues they faced and wanted to know more about it. They have some great people working there, but they're seriously crippled by the tool-chain, IMNSHO.)

Now, this company rakes in six million dollars a *month*, but they didn't want to replace the tool-chain with something that's actually fit for purpose, and pay some interns to convert the old scripts to the new system. The developers were well aware that their tools sucked, but the powers that be weren't willing to take a little short-term pain to radically improve their employees' productivity over the long-term. Bah!
Nice story.

Oluseyi wrote: [Link]
It takes a tremendous amount of work to obtain operable data from user feedback, even when you're sitting there looking at the user and videotaping him/her. Many companies simply don't have the infrastructure in place for this, and for products that do not generate the sorts of revenues that Microsoft had seen for Halo and Halo 2, even a scaled-down version of the test facility may be cost prohibitive.
A post with a reference to a publication detailing previous industry efforts stands out amongst the rest of the opinions.

Anyway, there's a lot of interesting posts, including discussion about the silent majority, the vocal minority and who should have what say and whether their perspective is relevant with a high signal to noise ratio.

Wednesday 28 January 2009

Databases: Information System Base Language (ISBL)

Link: Thread on reddit.com

A year ago, someone named weavejester on reddit.com posted an interesting reference to a query language called ISBL.

Some excerpts from the post to illustrate why it is interesting:

Compare this SQL query:
SELECT DISTINCT username, comment_body FROM stories
INNER JOIN comments ON stories.comment_id = comments.comment_id
INNER JOIN users ON comments.user_id = users.user_id
WHERE stories.story_id = 10
And compare it to the equivalent ISBL query:
stories * comments * users % (username, comment_body) : story_id = 10
..., ISBL manages to support the whole range of relational database theory with only 6 basic table operations:
  • union +
  • difference -
  • natural join *
  • intersection .
  • projection %
  • restriction :
Links of interest:
HOPL: The History of Programming Languages entry.
Paper: Optimization of Single Expressions in a Relational Database System.
Paper: The Peterlee Relational Test Vehicle - A system overview.
Powerpoint presentation: CS319 Theory of Databases / Relational Database Models.

Sunday 25 January 2009

A Note on Distributed Computing

Link: A Note on Distributed Computing

The paper isn't the most straightforward read, but it has some good points to take from it which apply as much to the game programming I do, as they did back in 1994 when it was written.

...it is a mistake to attempt to construct a system that is “objects all the way down” if one understands the goal as a distributed system constructed of the same kind of objects all the way down.
Absolutely.

When you are programming server code and you are using an object which is not necessarily located on the server that code is running on, you know this. That you are using it, is directly because of this. When you are programming client code, the same thing applies, and when you use objects which are located on the server you are doing so because you intend to and need to. If your code invokes other code which does something remotely and you're not aware of it, you don't fully understand the code you are using. Or you know it doesn't matter in the context of what you are doing.

Now, if you are using microthreads and Stackless Python in particular, you could in theory say that you don't want to allow remote calls, therefore I'll just disable the ability of anything I call to block.

Like this:

def some_function(self):
# ...
old_block_trap = stackless.get_current().block_trap
stackless.get_current().block_trap = True
try:
some_other_function()
finally:
stackless.get_current().block_trap = oldBlockTrap
Doing this when you have no idea where some_other_function is going to lead, is a bad idea. Who knows what operation will error when it blocks the tasklet to perform some IO. But doing this when you're the operation which is being performed and unexpected blocking will interfere with you, is very useful.

One case where I do this is when I run unit tests, there are two reasons to do so.
  • Stackless is a framework which schedules running microthreads. Unit testing isn't the only thing going on and if we let these other things slice in and do their own thing, there's a chance that they'll get clobbered by changes to their environment. For instance, mock objects may be sitting place of any of the modules or resources which they are trying to access.
  • There is no reason for a unit test to actually block, the blocking resources used should be mocked out. If the blocking actually needs to happen for the testing to be done, then this is an indication that too much is being tested or that functional testing needs to be done instead.
This is a very interesting aspect of RPC to work on.
Differences in latency, memory access, partial failure, and concurrency make merging of the computational models of local and distributed computing both unwise to attempt and unable to succeed.
The paper and I both agree that pretending remote objects are local, or can be used the same as local ones is both impractical and doing it badly. It's interesting to go over the different reasons it gives that this is the case.
The most obvious difference between a local object invocation and the invocation of an operation on a remote (or possibly remote) object has to do with the latency of the two calls.
There's nothing to add here. If you are doing a remote call, you are doing it intentionally, you know the objects you are using involve it and you factor in the latency it causes.
A more fundamental (but still obvious) difference between local and remote computing concerns the access to memory in the two cases—specifically in the use of pointers.
The paper refers to this in the context of a lower level more general programming. I am talking in terms of game programming with a dynamic scripting language like Python. So the thoughts I have on this aren't directly related to the paper.

In general, for the purposes of actual game programming it is best to just disallow remote attribute access. It is just too slow for any general game use and it is easier to lock down remotely callable functions and keep in mind a model of who can access what, than it is to also try and lock down remote attribute access. But if the code running isn't tied to the gameplay and security concerns don't come into play, it can be very useful functionality.

One use case is doing functional testing of a client running against a game server. Here, arbitrary attribute access on any object, local or remote, is a very powerful tool for invoking functionality and checking state. I could obtain an object which represents an object on the server from the server itself. Or I can obtain an object which represents an object on the client which represents that object on the server. That is, just going through my RPC layer to the server, as compared to going through my RPC layer to the client then through the client's RPC layer to the server. This allows me to test the server object from two different perspectives, local usage and remote usage.
Partial failure requires that programs deal with indeterminacy. When a local component fails, it is possible to know the state of the system that caused the failure and the state of the system after the failure. No such determination can be made in the case of a distributed system.
Basically, when a remote operation fails, part of it might have completed and you might not know how much. This might lead to any attempt at recovery doing so in a way which doesn't reflect what actually happened and making things worse.

In Python, when an error happens, you get a traceback. If these errors are happening in the client, someone will eventually report it and be able to provide it to you. If the errors happen in your server, you can catch them, abstract and record them, raising a warning if they are happening too often. The point is, that you can know for sure that partial failure is happening and can react to it with due expediency.

The server should be authoritative. The client won't make any decisions which matter unless the server has given the thumbs up first. The server will keep the definitive state in the database, and will make important decisions by performing an atomic operation which has to reconcile with that state in the database. If the database doesn't perform the operation, the game logic doesn't proceed with the action.

An error on the Python level, will in general only affect the objects involved in the call stack. That's both on the called side and the remote calling side. In the worst case that all use of some functionality is broken, you can warn people away from using it. It is possible that doing so may otherwise leave them as sitting ducks for players and NPCs who encounter them while they are in this state.

However there may be errors which affect the whole server, making it unplayable to the clients connected to it. In this case, it is likely that game entities which are local to each other, are more than likely managed on the same server. This means that if the server goes down for one player, it goes down for all other players who might otherwise be able to interact with them face to face in the game. And of course for NPCs, which might decide to engage in aggression against them.
... the distinction between local and distributed objects as we are using the terms is not exhaustive. In particular, there is a third category of objects made up of those that are in different address spaces but are
guaranteed to be on the same machine. ...

... Parameter and result passing can be done via shared memory if it is known that the objects communicating are on the same machine. At the very least, marshalling of parameters and the unmarshalling of results can be avoided.
This is definitely something worth keeping in mind. It is all too easy to abstract local-remote objects away behind socket communication, to be communicated through the same layer of marshalling and networking as objects that are actually remote.

Like many other elements of a game engine, there is only so much time to work on any given aspect, and you just have to work with what you can actually get done. It is easy to lose track of things like this.

Scaling in games and virtual worlds

Link: Scaling in games and virtual worlds

This is an article written by someone who works on Sun's Project Darkstar.

Hiding threading and distribution is, in the general case, probably not a good idea (see http://research.sun.com/techrep/1994/abstract-29.html for a full argument). Game and world servers tend to follow a very restricted programming model, however, in which we believe we can hide both concurrency and distribution.
The linked paper is quite old. I haven't read it all yet, only the first page, but I've printed it out and thrown it on that stack on the floor just over there where there's things I found interesting enough to want to read and print out, but never quite find the time. The gist so far is that when you can refer to anything as an objects and you have remote applications you talk to, you tend to want to make calls to the remote applications just like calls to local objects. This is actually one of the idioms of Stackless Python which I appreciate the most.

The blocking which Stackless provides can transparently block the entire call stack (whether it contains Python, C code, C++ code or whatever) allowing you to wrap the use of asynchronous IO to allow microthreads to block waiting for their IO to complete while the others continue to run in the meantime. Now you could write your own custom API around the asynchronous IO your application uses, perhaps making it clear you're not really using a normal blocking socket. But that just results in you writing more complex code which is tied to that unique API - you're not just writing code, you're writing code with extra boilerplate.

When you can just use the same API provided for standard synchronous IO, then you can get on with the business of writing code with less complication. This then, comes back to the reference the article makes to the paper, that hiding threading and distribution is probably not a good idea. In the case of Stackless, it certainly isn't a good idea - you need to know where the blocking happens. You're writing logic to run as microthreads, if you pretend within each microthread that the others don't exist, you're denying that you're running within a framework and if you think it's going to work, you're in for a surprise.

I'd say that pretending the blocking isn't happening is definitely a bad idea. But making it possible to write remote calls just like local ones is a huge benefit as long as the person writing code understands the framework they are using. I should definitely read the whole paper and see what it actually has to say.
Measuring the performance of the system is made especially challenging by the lack of any clear notion of what the requirements of the target servers are. Game developers are notoriously secretive, and the notion of a characteristic load for a game or virtual world is not something that is well documented. We have some examples that have been written by the team or by people we know in the game world, but we cannot be sure that these are accurate reflections of what is being written by the industry.
Quite often I think about various implementational details of computer games. I visit dozens of forums, scanning what is being posted, doing searches, and can find no-one discussing anything related to them. If I post about them, there's rarely any response, and none with any insight. It's unfortunate.