Xna Multi Threading
Xna Multi Threading
Example: Balls 5. Conclusions 6. Downloads 7. References In this tutorial you will learn how to use multi-threading in your XNA games. The tutorial starts with a short introduction about multi-threading, the Xbox 360 architecture, and the advantages and disadvantages of using multi-threading for XNA games. This will be followed by a very brief look at the classes and primitives that we will use in the rest of the article. After this, the main part of this tutorial is focused on using multi-threading for the main game loop. You will learn how you can separate the code for drawing and updating you game and run the two tasks in parallel. There are other ways to use multithreading in your games, but they will not be covered by this tutorial. As a closing word, we will draw some conclusions and look at future developments. 1. Introduction As game developers, we always want our games to be better. We want better graphics, better physics, better A.I., and so on, just to make our customers happy. However, we always have so little time to do all of these. I'm not talking about implementation time, which is of little interest to our players, but about the running time. Players are really picky. They expect the games they play to run smooth and fluent, which means our games should run at 30 or even 60 FPS (Frames Per Second). This leaves us with 16.66 milliseconds in which we have to squeeze all the physics, graphics, gameplay and AI we want. For some games this is more than enough. For others, that is a painfully low threshold. So what's this about multi-threading? Normally, your game starts and runs on a single thread, so all the tasks run sequentially one after the other. Thus, the total running time of a frame will be equal to the sum of the running times of each task done during that frame.
Using multiple threads would mean taking some of these tasks, and running them in parallel, at the same time as other tasks are running. In this configuration, the total running time of a frame will be roughly equal to the running time of the slowest of these parallel sets of tasks. Thus, the overall frame time is lower than when using a single thread, so the performance of the game is higher, yielding better framerates and smoother animations.
There are a few important things you need to remember after looking at this picture.
Not all tasks need to be parallelized. In this example, we left the input handling running serially. Maybe both game logic and animations need the input, or maybe you just feel better having a certain task run on its own, without anything else running in parallel. Just remember that it can be done, and in some cases you will probably want this to happen. o The tasks need to be independent from one another. As they are running in parallel, it's not that easy for them to communicate, so we need to have some mechanism to pass data between parallel tasks. Ideally, you would have completely independent tasks, but in a game, this is rarely possible. Animation needs physics and A.I. data, A.I. also needs physics, rendering needs information from all the other processes, and so on. We will look at this later in the article. o One of the most important things to remember it that the speed gained by using multi-threading is still limited by the speed of the slowest branch. In the example, even though we use two threads, the frame time is not quite half of the initial frame time. Even if we create a separate thread for each task, the running time of the frame will be at least the running time of the physics task (the slowest in our example). Also, communication and synchronization between threads introduces some overhead, which adds to the total frame time. Now let's see, where will we run these threads. In the latest years, PCs have evolved and are now equipped with processors having 2 or 4 cores. This means that we can have up to 4 threads running completely independent of one another, and not sharing processor time. Of course, no one's limiting you to one thread per core, but it's better to do so with computationally intensive tasks. The Xbox 360 has a different architecture. It is equipped with a special IBM processor with 3 cores, each of them capable or running two independent hardware threads. So in total, we have 6 hardware threads on the 360. Unfortunately, two of them are reserved, and cannot be used by us, as they are used by the XNA Framework, and other system tasks. But having 4 hardware threads is not that bad either. Below we can see what these threads are.
o
Notes Not Available. Reserved for the XNA Framework Available. Not Available. Reserved for the XNA Framework Available. Available.
5 2 Available. So what are the advantages of using multi-threading for your games? The main advantages are greater performance, higher frame rates, possibility to add more complex simulations for physics, AI, or other things. The disadvantages are the added complexity needed to write your game for multiple threads. Many times, your game will run just fine on a single thread. Separating tasks and taking care of all issues associated with multi-threading is a complex task, and requires complex data structures and synchronization code and it may not be worth the effort. You need to have a good understanding of how threads and memory sharing works, or else your multi-threaded code might run even slower than your single-threaded code. Another great minus is that multi-threaded code is very hard to debug. The errors that occur because of the interaction between threads are very difficult to replicate and identify, and thus, difficult to remove. With that said, let's go on and see what classes and keywords you need to understand in order to follow this tutorial. 2. A Look at Multi-Threading Primitives
This section will not be a complete guide to threading in C#. For that, you can access a free ebook with that name:Threading in C#. But I also can't simply give you that link and go on with the article, so I'll cover the primitives I'll be using throughout the article. In C#, the Thread class creates and controls a thread, sets its priority and reads its status. When creating an instance of the Thread class, we need to pass a ThreadStart or a ParameterizedThreadStart delegate to its constructor. A ThreadStart delegate represents a method with no arguments that will be run on the thread. For example, if we have a method a(), we can create a new Thread on which to run that method with the following code. 1 <br /> 2 3 4 5 6 void a()<br /> {<br /> [...]<br /> }<br /> [...]</p>
7 <p> ThreadStart threadStart = new ThreadStart(a);<br /> 8 9 Thread newThread = new Thread(threadStart);<br /> newThread.Start();<br />
or simply 1 <br /> 2 3 Thread t = new Thread(new ThreadStart(a));<br /> t.Start();<br />
But simply starting functions on threads is not enough. Data that can be accessed by more than one thread at once needs to be protected. To do this, the lock statement can be used. The lock keyword marks a statement block as a critical section by obtaining the mutual-exclusion lock for a given object, executing a statement, and then releasing the lock. In other words, while a thread is in a section of code protected by a lock on an object, that no other thread may enter any area of code protected by the same lock. 1 <br /> 2 3 lock(lockObject)<br /> {<br />
4 5
//this code, and all other code surrounded by a lock on lockObject can only be accessed by one thread at a time. All other threds will be blocked, until the thread finishes.<br /> }<br />
The last class I will shortly explain is the AutoResetEvent. This class allows threads to communicate between them using signals. In order to start waiting for a signal, a thread can call the WaitOne() function of an AutoResetEvent object. If this event has been signaled previously, the thread can resume execution. Otherwise, the thread is blocked, and starts waiting until someone signals the AutoResetEvent, by calling the Set() function. The class is calledAutoResetEvent, because when a thread that was waiting for a signal is released, the AutoResetEvent is automatically reset, and enters the non-signaled state. The following code shows how to declare and initialize an AutoResetEvent. 1 <br /> 2 3 4 5 6 7 8 AutoResetEvent myEvent = new AutoResetEvent(false); // you can specify the initial state through the constructor's parameter.<br /> [...]<br /> //when a thread tries to enter a protected region of code, it calls WaitOne()<br /> myEvent.WaitOne();<br /> [...]<br /> //the thread stays blocked until some other thread signals the event using<br /> myEvent.Set()<br />
These are the most important primitives we will use. For more details, there are lots of places where you can read about threading primitives, with the most important being MSDN and the free ebook I linked above. 3. Multi-threading Update/Draw The most popular task that we hear about related to multi-threading in games is the separation of updating and rendering of the game world on two different threads. The reason for this is obvious. When using a single thread, after the state of the world is updated, we have to wait until all drawing is done in order to update again. This seems like a waste, because the update computations don't need any information from the rendering computations. But using multi-threading, while the system is busy with drawing the current scene, we can use the other available processor cores to compute the next state of the world. This way, the updating computations won't have to wait anymore, and by executing it in parallel with the rendering code, we reduce the overall time of a frame. But if this sounds so simple in theory, why are there so few examples on how to actually do this? 3.1. Double Buffering The main problem with separating Update/Draw is the fact that the rendering process needs the data computed by the update process. Moreover, we need to make sure that the data is correct and consistent for all entities that are drawn. To do this, we have to use synchronization primitives. But we also want to
avoid waiting inside locks until each piece of data is processed, or else all the waiting will reduce our performance instead of increasing it. The solution to this is the concept of double buffering. The updating thread computes the world state in an area of memory (buffer) containing information about the game world. When it is done, the rendering threads begins rendering the world using information from this buffer, and at the same time, the update thread begins computing the next state of the world in another buffer. When both processes are done, they simply switch the buffer used by each of them. The illustration below should help clarify this.
At frame 1, the rendering thread draws the world using the state stored in Buffer 0. Meanwhile, the update thread computes the next state of the world, and writes it in Buffer 1. When frame 2 starts, the rendering thread starts drawing the world using the new state stored in Buffer 1, and the update thread can start computing the next state, in Buffer 0. At the start of the 3rd frame, they switch again, and so on. So at each step, the rendering thread uses the state of the world computed by the update thread in the previous frame, while the update thread computes a new state and stores it in the buffer which is not currently used for rendering. While the main idea seems simple, this is where things start to get complicated. Let's analyze this. What is the data contained in the buffers, and how do we organize it? The most natural way of thinking, especially if you've done Object Oriented Programming before is to say: "So this is a entity in our game. It needs some data for physics, like acceleration, velocity, collision primitives, positions, rotations. It also needs some data for animation, like bones, movement constraints, etc. To draw the object, we use the same positions and rotation which also describe the physics of the object. And not to forget the game-specific data, like health points, A.I. scripts, etc.". And then you go on and create a GameEntity class which describes your object. Now where do we keep all these entities? The worst thing you can do is consider the buffers as the main structure for holding game data. When the update thread needs to compute the next state of the world, it needs to know the current state. If the state of the game is only held in these buffers, than the current state
is at the moment in the buffer that the rendering thread uses. The buffer we currently hold is outdated by two frames. So the next decision we can make is take all the game data, and store it some place in memory. Then, during the update function, a new state is computed for this data, and is then written to the buffer we have access to. So the update thread always has the most current state of the world. Some of you may have noticed one thing we can do to improve performance. Not all objects change their state each frame, so why write them all in the buffer? Why not simply update the buffers only at those positions corresponding to the objects that suffered a change? To understand why this is not a good solution, take a look at the following sequence of events. We assume the data is currently identical in both buffers, and represents the current state of the game. We are ready to start frame number k.
In frame k, the update thread sees that some of the objects (1 and 3) have changes their states. The changes are written to Buffer 0 (currently owned by the update thread), and then the buffers are swapped. In frame k+1, the update thread detects changes in the states of some objects (2 and 3), and writes the updates, while the render thread draws the contents of Buffer 0. So far so good. Now, after the buffer pointers are swapped and frame k+2 starts, we have some problems. When the rendering thread draws object 1, the state of that object is not the most recent one. Its state is actually two frames old, and does not contain the changes made during frame k. The state of objects 2 and 3 is good, because the most recent change to object 3 was made when the update thread was controlling this buffer. You can easily see how, even if this frame we render the correct states for objects 2 and 3, the next frame, when the rendering thread reads from the other buffer, it will be reading some older states. So by trying to save time doing in-place updates, we introduced some complex bugs which might give us headaches. There's got to be a better solution, right? Fortunately, there is. Next we will look at the solution proposed in a Gamefest presentation held by Ian Lewis, and then implement it using the XNA Framework. 3.2. Change Buffers
The basic idea is the following. Use different data structures for the update thread and the render thread. Usually, during rendering, you only need a subset of the data contained by a game entity: the model and textures, the World matrix, the animation bones, etc. So for each object, besides the main entity data, we could use a smaller structure which holds only the render data. This render data is a duplication of some fields in the normal entity data. The update thread only works with the entity data, while the render thread only works with the render data. All there is left to do is keep these structures synchronized. Sounds familiar? This is what the double buffer was for. But now, instead of using the buffers to simply put all entity data in it, we will use it only to notify the render thread of changes in the object's state. The buffers will be used as a sort of "message" buffers, where each "message" describes what has changed about an object.
As you can see, in frame k, the update thread works on the objects, and changes the states of some objects (1 and 3). Because these objects have changed, the update threads enters notifications into the message buffer. These notifications have to contain enough information to reflect the new state of these objects. Then, the buffers are swapped. In frame k+1, the update thread clears the buffer, and then proceeds to
update the game state. During this, it observes that the states of objects 2 and 3 have changed. It writes notifications about these changes in the buffer. Meanwhile, the rendering thread reads its buffer, where it sees the notifications about objects 1 and 3. It uses these notifications to update the render data of these objects, and then starts rendering the scene. When both are finished we swap the buffers. This was the step where the in-place update method failed. But now, as the update thread does his next update, the render thread looks at its buffer. It sees that the states of objects 2 and 3 have changed, and makes the appropriate changes in the render data of these objects. The data held by the rendering thread is consistent and correct, and the objects are drawn in their correct states. And this will continue to be the case the next frame also. So because each thread had a copy of the data, and the buffers were only used to pass off messages about changes to this data, everything stays consistent. But what about memory? Doesn't this approach use much more memory then the previous ones? Actually, it does not. The game data, used by the update thread is about the same size as one of the buffers used in the previous methods. The render data is much smaller than the game data, so it total, the game data plus the render data is significantly smaller than two buffers. There's also the buffers storing the changes, but these should be fairly small. So all in all, we use smaller memory. And because the data written and read from memory each frame is smaller, the overall process should be faster. The downside is the increased complexity for the operations used to transmit the updates. 3.3. Implementation And now the fun part begins. It's going to take a while, and more theoretical explanations will be given, but we'll get there. If you are too eager to run or see the code, check the end of this chapter, where an example is provided. For those of you interested in bearing with me through my explanations, continue reading. First, here are some things that we need to take into consideration: The Game class does some GraphicsDevice handling behind the scene for us, so if we want to avoid issues, or prefer not to rewrite those things ourselves, we will keep the rendering processing on the main Game thread (hardware thread 1, on the Xbox) o This means that we will move the Update processing on another thread. For this we could use either thread 3,4, or 5. o Some data is needed by both threads, such as the current GameTime. We will get and store this value in a shared location before the threads start executing each frame, and provide it to them as needed. The same mechanic could be used if we find other data that needs to be identical for both threads. 3.3.1. Classes Before we start coding, let's think about the classes we need and put them down as a class diagram. Obviously we will need a class for the game data of each entity. As discussed earlier, we will also need a class to hold the rendering data. The actual fields of these classes depend highly on the type of game you're doing. For some organization, we will have two classes called UpdateManager and RenderManager, which will contain arrays of the two classes mentioned earier. For this article, we will consider that an entity is identified by its position in these arrays. So the object which is on position 4 in the UpdateManager's list of game data will correspond to position 4 of the RenderManager's render data. This could be replaced with some sort of identification scheme, using globally unique identifiers for objects, and hash-based tables, or dictionaries for finding an object with a certain ID, but we use this method for simplicity and ease of indexing.
o
The change messages will be stored in structures which we will call ChangeMessage. We will discuss in detail the implementation of these structures a little bit later. Until then, we decide to store these messages in collections called ChangeBuffers. And because we want to use double buffering, we will define a class called DoubleBuffer, which will contain two ChangeBuffers, and give each of them to the Update or Render thread, as requested. Right now, a general diagram of this system looks like this:
Now we will start implementing these classes. 3.3.2. Threading and Synchronization As you will see in the next few paragraphs, there's really not much threading code to write. The most important part of multi-threaded games is planning and the data structures, and a few carefully picked lines of code for synchronization. So we only need to: synchronize the update and render thread at the start and end of each frame, make sure that the correct buffers are accessed by each frame, and lock objects that may be accessed by both frames at the same time. Fortunately, because of how we set up the data structures, and how we time the access to these structures, we will only need a few synchronization instructions. For good reasons (like debugging), I prefer to keep all threading and synchronization code in a single class, and this class will be the DoubleBuffer class. There is a sequence of steps that we will do each frame:
o o o o o o o o
A new frame starts If there are any operations that we want to do while a single thread is active, we do them now We swap the buffers, to prepare the previous update buffer as the new render buffer We signal the update and render threads that we are ready to start processing a new frame The update and render threads request their current active buffer, which they will use this frame The update and render threads do their computations We synchronize the threads by waiting for both of them to finish The current frame is done The DoubleBuffer class needs to contain several fields. We need an array of two ChangeBuffers. We will use two integers that hold the index of the current buffer used for rendering, and the current buffer used for updating. We declare these as volatile, to make sure we always get the correct values, and caching doesn't play tricks on us. You read earlier about AutoResetEvents. We will use four AutoResetEvents to signal the beginning and wait for the end of the render thread and update thread. Lastly, we will add a field to store this frame's GameTime, which we might need in both threads. This field might be read at the same time by both threads, but it will always be written before the threads start their execution, so we don't need a lock, even for this field. But we do need to make it volatile, to make sure it is not cached. 01 <br />
02 03 04 05 06
class DoubleBuffer<br /> {<br /> private ChangeBuffer[] buffers;<br /> private volatile int currentUpdateBuffer;<br /> private volatile int currentRenderBuffer;</p> private AutoResetEvent renderFrameStart;<br /> private AutoResetEvent renderFrameEnd;<br /> private AutoResetEvent updateFrameStart;<br /> private AutoResetEvent updateFrameEnd;</p> private volatile GameTime gameTime;<br />
07 <p> 08 09 10 11 <p>
Next, we initialize these value in the constructor, we write a function that resets all fields to starting values, and a function that cleans up when we're done, and releases system resources. 01 <br /> 02 03 04 05 06 07 08 <p> 09 public DoubleBuffer()<br /> {<br /> //create the buffers<br /> buffers = new ChangeBuffer[2];<br /> buffers[0] = new ChangeBuffer();<br /> buffers[1] = new ChangeBuffer();</p> //create the WaitHandlers<br /> renderFrameStart = new AutoResetEvent(false);<br />
10 11 12 13 <p> 14 15
renderFrameEnd = new AutoResetEvent(false);<br /> updateFrameStart = new AutoResetEvent(false);<br /> updateFrameEnd = new AutoResetEvent(false);</p> //reset the values<br /> Reset();<br /> }</p> public void Reset()<br />
16 <p> 17 18 19 20 21 <p> 22 23 24 25 26
{<br /> //reset the buffer indices<br /> currentUpdateBuffer = 0;<br /> currentRenderBuffer = 1;</p> //set all events to non-signaled<br /> renderFrameStart.Reset();<br /> renderFrameEnd.Reset();<br /> updateFrameStart.Reset();<br /> updateFrameEnd.Reset();<br /> }</p> public void CleanUp()<br />
27 <p> 28 29
30 31 32 33 34
renderFrameStart.Close();<br /> renderFrameEnd.Close();<br /> updateFrameStart.Close();<br /> updateFrameEnd.Close();<br /> }<br />
The function that swaps the buffers only needs to switch the values stored in the currentUpdateBuffer and currentRenderBuffer variables. Since we will only call this function at times when execution is done on a single thread, before the signal that starts the threads is sent, we don't need to lock anything. And the fact that these fields are declared as volatile ensures we won't have any problems due to processor caching. 1 <br /> 2 3 4 5 6 private void SwapBuffers()<br /> {<br /> currentRenderBuffer = currentUpdateBuffer;<br /> currentUpdateBuffer = (currentUpdateBuffer + 1) % 2;<br /> }<br />
Next, we write a function that represents the start of multi-threaded processing, and call it GlobalStartFrame(). This function receives as a parameter the current GameTime, so it can store its value and make it available for the other threads. We also need a function that waits for both render and update threads to finish, and only then returns to normal execution. 01 <br /> 02 03 04 05 public void GlobalStartFrame(GameTime gameTime)<br /> {<br /> this.gameTime = gameTime;<br /> SwapBuffers();</p>
06 <p> 07 08 09 10 11 12 13 14 15 16
processing<br /> renderFrameStart.Set();<br /> updateFrameStart.Set();<br /> }<br /> public void GlobalSynchronize()<br /> {<br /> //wait until both threads signal that they are finished<br /> renderFrameEnd.WaitOne();<br /> updateFrameEnd.WaitOne();<br /> }<br />
The last functions we need to add to this class are the functions that will be called by the render and update thread to get references to their current buffers, and functions called by these threads when they are done with processing. When one of the threads calls one of this methods, it start by waiting for the corresponding WaitHandle, until it is signaled by the GlobalStartFrame() function. After this signal is received, we know that the required values are initialized correctly, can pass them to the calling thread through the out parameters, and then can return, to allow the calling thread to resume execution. The ending functions simply set the updateFrameEnd and renderFrameEnd events, so the GlobalSynchronize function can continue. 01 <br /> 02 03 04 05 public void StartUpdateProcessing(out ChangeBuffer updateBuffer, out<br /> GameTime gameTime)<br /> {<br /> //wait for start signal<br />
06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
updateFrameStart.WaitOne();<br /> //get the update buffer<br /> updateBuffer = buffers[currentUpdateBuffer];<br /> //get the game time<br /> gameTime = this.gameTime;<br /> }<br /> public void StartRenderProcessing(out ChangeBuffer renderBuffer, out<br /> GameTime gameTime)<br /> {<br /> //wait for start signal<br /> renderFrameStart.WaitOne();<br /> //get the render buffer<br /> renderBuffer = buffers[currentRenderBuffer];<br /> //ret the game time<br /> gameTime = this.gameTime;<br /> }<br /> public void SubmitUpdate()<br /> {<br /> //update is done<br /> updateFrameEnd.Set();<br />
26 27 28 29 30 31
}<br /> public void SubmitRender()<br /> {<br /> //render is done<br /> renderFrameEnd.Set();<br /> }<br />
Now all threading synchronization and primitives that we use are encapsulated inside this class. The sequence we mentioned above now becomes something similar to the following: At the beginning on the game, or after finishing the last frame, the render thread and update thread call the functions StartUpdateProcessing() and StartRenderProcessing(), declaring that they are ready to start, and they are waiting for their data. Because the renderFrameStart and updateFrameStart events are not set, they go to sleep waiting for these events. o Somewhere in our Game class's code, we call the GlobalStartFrame() function, which swaps the buffers and stores the gameTime, preparing all the data that will be given to the render and update threads. After this, is sets the events renderFrameStart and updateFrameStart. o At this moment, the render and update threads, which wait in their start functions, wake up, and return from the functions. The update thread starts computing a new game state, while the render thread starts drawing the scene. o Back in out Game class, after we have called the GlobalStartFrame() function, we call the GlobalSynchronize() function, which begins waiting for the render and update thread to finish by watching the renderFrameEnd and updateFrameEnd events. o When the update and render threads are done, they each call SubmitUpdate() and SubmitRender() o In this moment, the GlobalSynchronize function wakes up, and returns from the function, so the Game class can leave the XNA Framework to do whatever it needs to do before starting a new frame. At first look, this seems fine, since the buffer swapping is done when a single thread is running, and the update and render threads never work on the same buffer. It almost seems too nice to be true. And indeed it is. While the threads never work on the same buffers, some problems can still appear, because of caching. When the update thread finishes its work, the data does not always go directly into the main memory. The processor caches data, and delays writing it to the main memory in order to improve performance. But this means that some data may not reach the main memory before the render thread tries to read it, so the render thread will get old data. The same thing happens on the render thread also. The processor caches the read data, and when the thread tries to read from the buffer, the contents of the buffer might not be the same as the contents of the main memory, due to caching. So, again, we could get old data. The solution is to force the cache of each core to be flushed to the main memory. This is a necessary operation to ensure that our data is always correct. To do this, we use the function Thread.MemoryBarrier(). In theory, it should be sufficient to add this on the update thread, right after it is done computing the new state, and in the render thread, just before it begins consuming the message buffer. To be extra safe, we will add it both in the update and render thread, at the beginning and end of the computations. As a side note, using a lock over the whole update and render code would automatically do this for us, because in it's implementation, lock takes care of cache coherency. However, it seems a little odd to use a lock when the buffer data will never be accessed at the same time. So
o
Thread.MemoryBarrier() will suffice. The last four functions we talked about are modified like below: 01 <br /> 02 03 04 05 06 07 08 <p> 09 10 11 12 13 14 15 16 17 18 19 <p> public void StartUpdateProcessing(out ChangeBuffer updateBuffer, out GameTime gameTime)<br /> {<br /> //wait for start signal<br /> updateFrameStart.WaitOne();<br /> //ensure cache coherency<br /> Thread.MemoryBarrier();</p> //get the update buffer<br /> updateBuffer = buffers[currentUpdateBuffer];<br /> //get the game time<br /> gameTime = this.gameTime;<br /> }<br /> public void StartRenderProcessing(out ChangeBuffer renderBuffer, out GameTime gameTime)<br /> {<br /> //wait for start signal<br /> renderFrameStart.WaitOne();<br /> //ensure cache coherency<br /> Thread.MemoryBarrier();</p> //get the render buffer<br />
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
renderBuffer = buffers[currentRenderBuffer];<br /> //ret the game time<br /> gameTime = this.gameTime;<br /> }<br /> public void SubmitUpdate()<br /> {<br /> //ensure cache coherency<br /> Thread.MemoryBarrier();<br /> //update is done<br /> updateFrameEnd.Set();<br /> }<br /> public void SubmitRender()<br /> {<br /> //ensure cache coherency<br /> Thread.MemoryBarrier();<br /> //render is done<br /> renderFrameEnd.Set();<br /> }<br />
I hope this helps you form a good idea about what actually happens with these synchronization primitives. At this point, the watchful reader might have noticed that as currently explained, this system actually uses three threads. One thread for rendering, one for updating, and one that only calls the GlobalStartFrame() and GlobalSynchronize() functions to synchronize everything. I made the explanations in this way on purpose, because it is easier to understand it like that. In the example presented later in this tutorial, the render thread and the synchronization thread will be a single thread (the main Game thread), while the
updating code will run on a separate thread. The code we will actually use in the Game class will be something like this:
o o o
Call GlobalStartFrame() execute rendering code Call GlobalSynchronize() So in truth, the AutoResetEvents used for rendering are not necessary, but I put them in there for a clearer image of how things work. Feel free to leave them out in your own code. The question remains on how you make the Update function start executing on another thread, but we will get to that later. 3.3.3. Change Buffers and Change Messages This is another place things get interesting, especially if you target the Xbox. As you probably know by now, when using XNA Game Studio, it is vital to keep the garbage generated each frame to the minimum. But our architecture requires the creation of ChangeMessages each frame. And sometimes we will create lots of change messages, depending on what happens in the scene. It should be obvious that the ChangeMessage data type cannot be a class, and has to be a structure, because structures are not created on the heap, so we don't have garbage problems with them. However, another problem arises. We will usually have more than one message type. For example, some messages will deal with updating the world matrix of an entity, others with updating some other data needed for rendering, such as highlight colors, states of an entity, or animation bones. If we were to use classes, we could have a base class for a message, and lots of subclasses for each type of message. If we didn't have to take care of garbage, this solution might be preferred, but as it is, we have to work with structures. One solution is to create a structure containing all the possible variables that we might need to pass from the update thread to the render thread. You can clearly see that this is not a good solution, because the size of this kind of structure would be large, and will usually count as wasted space. The solution we will use was inspired by Frank Savage's presentation at Gamefest 2008, regarding performance in XNA Game Studio. In his presentation, he shows us how unions are possible in C#. I know this sounds crazy (I thought the same thing), but it is actually possible. Some of you may not know what a union is. A union is a data structure that stores one of several types of data in a single memory location. For example, if we declare a union to contain an int and a float, both these fields would reside in the same place in memory. The size of an int is 4 bytes, the size of a float is 4 bytes, but the size of the union is still 4 bytes (unlike a struct, where the size would have been 8 bytes). So when assigning a value to the int of the union, or to the float of the union, both operations write to the same memory location. While this may not seem useful on first thought, it actually is for our scenario. While the structure of the message remains the same, we can interpret the data contained in it as whatever type f message we need. I hope this is not too confusing, but if it is, you'll probably understand better after we have some code, a little later. Before moving forward, let's make some decisions. For this tutorial, we will consider we could have the following message types:
UpdateCameraView, which we use to give the render thread a new View matrix to be used with the camera o UpdateWorldMatrix, which we use to update the World matrix of an object o UpdateHighlightColor, which we use to update the highlight color of an object o CreateNewRenderData, which we use to signal the render thread that a new object has been created, and pass the new RenderData to it o DeleteRenderData, which we use to signal the render thread that a certain object has been destroyed, and doesn't need to be rendered any more I'm sure you can think of other types of messages, depending on your game and scene, but this will suffice to illustrate our method. (Not all of these will be used in the example, but having them in the
o
explanation helps.) We create an enumeration to hold all these types of messages. 1 2 3 4 5 6 7 8 9 } public enum ChangeMessageType { UpdateCameraView, UpdateWorldMatrix, UpdateHighlightColor, CreateNewRenderData, DeleteRenderData,
Next we will define the structure of a message. To make a structure behave like a union, there are some steps we have to do. First, we need to add the [StructLayout(LayoutKind.Explicit)] attribute to the structure's declaration. This allows us to specify for each field, the offset in memory where that field is written to and read from. To specify this, we need to add an attribute when declaring each field of the structure. The attribute is [FieldOffset(x)], where x is the memory offset. In our structure, the first field will be a ChangeMessageType, which will indicate how an instance of this structure should be interpreted. This field will have the field offset of 0, because it is the first field in the structure. We will have to ensure that no other field will use this memory location. The size of a ChangeMessageType variable is 4 bytes, so all the next fields should use offsets larger than 4. Now we'll continue to add fields to this structure, based on the possible types of messages.
o o
o o o
The UpdateCameraView message needs to pass a Matrix from update to render. So, we add a field of type Matrix on offset 4. We notice that all other messages tend to refer to a certain object. We said earlier that we identify these objects by an integer index, which we will call ID. We add a field of type int at offset 4. The individual fields of each of the remaining messages types will need to start at offset 8. The UpdateWorldMatrix message needs to send the new World matrix to the render thread. We will add a field of type Matrix at offset 8 The UpdateHighlightColor message needs to send a new Vector4 containing the new color. We add a field of type Vector4 at offset 8 The CreateNewRenderData will send a position and a color that will be used to create a new RenderData object by the rendering thread. We add the position at offset 8, and the color at offset 20 (a Vector3 is stored on 12 bytes) Finally, the DeleteRenderData message doesn't need anything besides the ID of the RenderData to be deleted, so we don't need to add anything else.
The final structure declaration can be seen below. Also an image illustrates how the structure resides in memory, and how fields are accessed depending on the message type. 01 02 03 04 05 06 07 08 09 <p> 10 <p> 11 12 <p> 13 14 15 16 17 18 19 [StructLayout(LayoutKind.Explicit)] public struct ChangeMessage { //this appears in all messages //identifies how this message should be interpreted [FieldOffset(0)] public ChangeMessageType MessageType; //this is the field required when this message is of type UpdateCameraView [FieldOffset(4)] public Matrix CameraViewMatrix; //this field is used for all messages dealing with entities [FieldOffset(4)] public int ID; //this is the field required when this message is of type UpdateWorldMatrix [FieldOffset(8)] public Matrix WorldMatrix; //this is the field required when this message is of type UpdateHighlightColor [FieldOffset(8)]
20 21 22 23 24 25 26
public Vector4 HighlightColor; //this is the field required when this message is of type CreateNewRenderData [FieldOffset(8)] public Vector3 Position; [FieldOffset(20)] public Vector3 Color; //nothing is required when this message is of type DeleteRenderData
Below you can see how the fields are placed in memory.
As you can see, the total size in bytes of this structure is only 72 bytes, but it can be used as five different types of messages. For example, assume the update thread creates the following two messages: 01 02 03 04 05 06 07 //create a message to update the camera view matrix ChangeMessage updateCamera = new ChangeMessage(); updateCamera.MessageType = ChangeMessageType.UpdateCameraView; updateCamera.CameraViewMatrix = Matrix.CreateLookAt(...); //create a message to update the world matrix of the object with index 5 ChangeMessage updateWorld = new ChangeMessage();
08 09 10
As you can see, the structure ChangeMessage is used once as a UpdateCameraView message, and once as a UpdateWorldMatrix message. When using it as a UpdateCameraView message, we are only interesting in setting the relevant fields. Now assume that these messages are entered into a buffer, and later, the rendering thread takes each message in the buffer and analyzes it. The code would look something like: 01 02 03 04 05 06 07 08 09 10 11 12 } //processing a ChangeMessage with the name msg switch (msg.MessageType) { case ChangeMessageType.UpdateWorldMatrix: camera.View = msg.CameraViewMatrix; break; case ChangeMessageType.UpdateCameraView: renderObjects[msg.ID].World = msg.UpdatedWorldMatrix; break; [...]
So based on msg.MessageType, we can treat the message in the intender way, and use only the relevant fields. Having defined the ChangeMessage structure, a change buffer will simply contain a list of such change messages.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 } } } public void Clear() { Messages.Clear(); } public void Add(ChangeMessage msg) { Messages.Add(msg); public class ChangeBuffer { public List<ChangeMessage> Messages { get; set; } public ChangeBuffer() { Messages = new List<ChangeMessage>();
Again, based on your game, you could make this class more complex, by adding other functionality. But for educational purposes, it is fine as it is. 3.3.4. Using the Buffers We are nearing the finish line. At the moment, we have the DoubleBuffer, ChangeBuffer and ChangeMessages classes. The next step is the GameData and RenderData classes. I won't go into much detail about them here, because, as I've previously said, the fields of these classes depend on the game
you make. The code sample will contains example of this. A simple example might look like below. 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 } } class RenderData { public Vector3 HighlightColor; public Matrix WorldMatrix; public Model Model; public bool IsAlive; class GameData { public Vector3 Acceleration; public Vector3 Velocity; public Vector3 Position; public Matrix Rotation; public bool IsAlive;
In a simple form, the update manager would contain a reference to the double buffer, and a list of GameData objects. Other fields can be added as needed. 01 02 class UpdateManager
03 04 05 06 07 08 09 10 11 12 13 14
{ public List<GameData> GameDataOjects { get; set; }<br /> private DoubleBuffer doubleBuffer; private GameTime gameTime; protected ChangeBuffer messageBuffer; protected Game game; public UpdateManager(DoubleBuffer doubleBuffer, Game game) { this.doubleBuffer = doubleBuffer; this.game = game; this.GameDataOjects = new List<GameData>(); }
We need to add a function that will be called each frame, and will contain the update code. We will separate this function in two. One function will take care of synchronizing with the doubleBuffer, and one is where the update code would normally be placed. We do this separation so we provide a easy point on which to extend the class. A specialized class that inherits UpdateManager simply has to override the Update function, and everything else is taken care of. We actually do this in the sample. 01 02 03 04 05 public void DoFrame() { doubleBuffer.StartUpdateProcessing(out messageBuffer, out gameTime); this.Update(gameTime);
06 07 08 09 10 }
doubleBuffer.SubmitUpdate();
The final step is to make this function execute on a separate thread. We add a field to keep track of that thread (for example, if the main thread want to exit, it can use this field to shut this thread down), and a function that when called, starts execution on a new thread. The actual function that is started on a new thread will simply call the DoFrame fuction in a loop. If we are running this on the Xbox, we also need to manually set the processor affinity, in order for the function to be executed on a different hardware thread. To do this, we call the Thread.SetProcessorAffinity function, specifying as the parameter the hardware thread we wish to run on. 01 02 03 04 05 06 07 08 09 10 11 12 } } public Thread RunningThread { get; set; } private void run() { #if XBOX Thread.CurrentThread.SetProcessorAffinity(5); #endif while (true) { DoFrame();
13 14 15 16 17 18
public void StartOnNewThread() { ThreadStart ts = new ThreadStart(run); RunningThread = new Thread(ts); RunningThread.Start(); }
When we need to start the update thread, we simply call StartOnNewThread(). The RenderManager is similar to the UpdateManager. We don't include here the mechanism to start the RenderManager on a separate thread, because it would be identical to the one used in UpdateManager. Besides, we won't use it in the sample code and we will simply call DoFrame from the main thread, because we keep the rendering operations the main thread, as previously discussed. 01 02 03 04 05 06 07 08 09 10 11 12 class RenderManager { public List<RenderData>; RenderDataOjects { get; set; } private DoubleBuffer doubleBuffer; private GameTime gameTime; protected ChangeBuffer messageBuffer; protected Game game; public RenderManager(DoubleBuffer doubleBuffer, Game game) { this.doubleBuffer = doubleBuffer; this.game = game;
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 } } }
public virtual void LoadContent() { } public void DoFrame() { doubleBuffer.StartRenderProcessing(out messageBuffer, out gameTime); this.Draw(gameTime); doubleBuffer.SubmitRender();
And now to put everything together. To use these classes, you would first add code to the Update and Draw functions of the managers, where you would make sure to write updates to buffers in the update manager, and read them in the render manager. Next, let's see how you would add all this into the Game class. Before anything, you need some fields for the double buffer, update manager and render manager. 1 2 3 public class Game1 : Microsoft.Xna.Framework.Game {
4 5 6 7
Then, during LoadContent, you can initialize them. Here, before they start running in parallel, you can load data for objects, and add them to the UpdateManager's GameData list, and to the RenderManager's RenderData list. But you have to be carefull to add them correctly, such that the GameData present in position x in the GameDataObjects list corresponds to the RenderData on the same position in the RenderDataObjects list. As I said before, you can use a more complex identification policy, if you want, but we keep it simple for educational purposes. At the end, after you've loaded the data, you can tell the UpdateManager to start executing on another thread. protected override void LoadContent() { [...] doubleBuffer = new DoubleBuffer(); renderManager = new RenderManager(doubleBuffer, this); renderManager.LoadContent(); updateManager = new UpdateManager(doubleBuffer, this); //here, you can load data and add it to the RenderDataObjects list and GameDataObjects list renderManager.RenderDataOjects.Add(...); updateManager.GameDataOjects.Add(...); [] updateManager.StartOnNewThread(); } We said before that we keep the drawing code on the main thread. So now, in the Draw function of the Game class, we put the synchronization code. We signal the start of the new frame. Once we do this, the thread of the UpdateManager, which was waiting for this signal, starts to execute. Now we can also tell the RenderManager to draw its frame. After the render manager is done, we wait for the UpdateManager to finish its work, by calling doubleBuffer.GlobalSynchronize(). When we exit this function, we know that the update thread is waiting for us to signal a new frame. protected override void Draw(GameTime gameTime) { doubleBuffer.GlobalStartFrame(gameTime); graphics.GraphicsDevice.Clear(Color.Black); renderManager.DoFrame(); base.Draw(gameTime); doubleBuffer.GlobalSynchronize(); }
You probably noticed that we put no code in the Update() function of the Game class. Here, you can put any code that you want to be executed serially, because at the time when the Update() function is called, no other thread is executing. So here is a good place for some code that you don't want to be parallelized, such as transitions between screen states, code that deals with the Guide, and others. Now one more important thing that needs to be done is shutting down the UpdateManager's thread when we don't need it anymore. You can do this by calling the Abort() function on it's thread. One example is before exiting the game, we can make sure we don't leave that thread alive. protected override void OnExiting(object sender, EventArgs args) { doubleBuffer.CleanUp(); if (updateManager.RunningThread != null) updateManager.RunningThread.Abort(); } And normally, when your game has multiple states and screens, you will probably only want to use multithreading during your gameplay screen, so when exiting that screen, you should make sure you shut down the update thread, and clean up after the doubleBuffer. The code from this part can be found at the end of the article, and can be used as a starting point for your multithreaded games. In the next part, we will look at the code sample accompanying this tutorial, and talk a little about it. 4. Example: Balls The code sample shows a scene filled with a great number of balls, which bounce of each other, and the walls.
To be totally honest, my skills with writing physics code is not that great, so at times, you will see some balls go through the walls around the scene, and wander out into the blue, but otherwise, the code does its job. What I did was extend the UpdateManager class, and create a new class called BallsUpdater. In this class, I added code for camera control, code for input handling, and code that updated the physics of the world. The most of the code is collision handling. One interesting thing to notice is that the function
UpdatePhysics(), which updates the physics of a single ball returns a boolean value, which tells us if the position of that ball has changed this frame. So, in the Update function of the class, the following code can be seen. 01 <br /> 02 03 04 public override void Update(GameTime gameTime)<br /> {<br /> messageBuffer.Clear();<br /> HandleInput();</p>
05
GameData gd = 08 GameDataOjects[i];<br /> if (UpdatePhysics(gd, 09 (float)gameTime.ElapsedGameTime.TotalSeconds))<br /> 10 11 {<br /> Matrix newWorldMatrix = gd.rotation * Matrix.CreateTranslation(gd.position);</p> <p> /> ChangeMessage msg = new ChangeMessage();<br
12
13
15
msg.UpdatedWorldMatrix = newWorldMatrix;</p> <p> /> }<br /> }</p> UpdateCamera();<br /> base.Update(gameTime);<br /> }<br /> messageBuffer.Add(msg);<br
16
17 18
19 <p> 20
21
As you can see, because we inherit from the UpdateManager class, we only need to write code in the Update() function, and we don't deal with any multi-threading code anymore. We just need to use the ChangeBuffer to transmit data to the rendering thread. The first thing we do is to clear the messageBuffer. After this, we call the function that handles the input. Inside this function, if your press the B button, a new ball is created, and a message of type CreateNewRenderData is added to the buffer. Then, we go through each of the objects in the GameDataObjects list. Note that we didn't use foreach, because we want to have access to the index of that object. That index is used as an ID in this example. So, for each object we call UpdatePhysics on that object, which moves the ball, and updates the physics. Then, if the ball has moved this frame, we compute the new world matrix, and create a message of type UpdateWorldMatrix. We then put this message into the buffer, to be consumed by the rendering thread at a later time. If the ball didn't move this frame, no message will be sent. Here you can see that if our messages would have been creates as classes, instead of structs, we would have generated quite a lot of garbage during the frames when balls move. This way, we have no garbage generated. Lastly, we call the UpdateCamera() function, which computes the new position and orientation of the camera, based on the position of the player's ball and the orientation, and creates a message of type UpdateCameraView and puts it in the buffer. For rendering, I extended the RenderManager class, and created a new class called BallsRenderer. In the LoadContent() function, we load the assets we need, like a model for the balls, a model for the table, effects, and so on. The most important function in the contest of this tutorial is the Draw() function. Here, at the beginning of the function, we need to consume the messages which were created in the previous frame by the update manager.
switch (msg.MessageType) { case ChangeMessageType.UpdateCameraView: viewMatrix = msg.CameraViewMatrix; break; case ChangeMessageType.UpdateWorldMatrix: RenderDataOjects[msg.ID].worldMatrix = msg.UpdatedWorldMatrix; break; case ChangeMessageType.CreateNewRenderData: if (RenderDataOjects.Count == msg.ID) { RenderData newRD = new RenderData(); newRD.color = msg.Color; newRD.worldMatrix = Matrix.CreateTranslation(msg.Position); RenderDataOjects.Add(newRD); } break; default: break; } // draw scene So taking each message from the buffer, we look at its type. If it is a message for setting the camera's view matrix, we use the value stored in the field CameraViewMatrix, and set it to out local variable. If the message is of type UpdateWorldMatrix, we modify the world matrix of the RenderData object identified by the ID stored in the message. If the message is of type CreateNewRenderData, we create a new render data, and add it to the list of RenderDataObjects. Because we don't destroy balls in this sample, the new balls created by the update thread should always be added to the end of the list, with a new index. If we would have had code to destroy the balls, we would have needed more complex logic to handle the IDs of the objects. But as it is, we don't need anything else. Now we can go ahead and render the scene. Another good thing to know is that not all objects we render need to depend on the Update thread. For example, the table is never moving, so all the code that deals with it is solely in the BallsRenderer class. At first, we simply had 197 spherical balls, and no other special effects. But as the Xbox has a very powerful GPU, and CPU floating point code is not that fast, the physics thread took much longer than the rendering thread, and the gains from using multi-threading were not that spectacular (only about 30%-40% lower frame times). So I decided to give some work to that GPU. First step, I took the spherical balls, and made a new model for them, which has about 9000 polygons. But even this (197 * 9000 polygons per frame) was a walk in the park for the Xbox's GPU.
So I began writing code, and added a cartoon shader for the object, which basically draws each object twice (for normal+depth, and color), and then applies some post-processing effects.
Now things were balanced, and the difference between the physics and rendering times was smaller. So what are the numbers? The code was run on an Xbox 360 and the following numbers were obtained: Average Frames / Seconds 16-19 FPS
Multi30-40 25-26 30-40 26-33 FPS threaded So even though adding multi-threading added a couple of milliseconds to both the rendering time and physics time, the total frame time was significantly reduced, and the gain in the framerate was an important one. You can easily modify the number of balls (Game1.LoadContent()) and see what other numbers you get. Also, when running the sample, you can see the number of messages in the buffers varies from 1 (when all balls are still) to 198 (when all balls are moving). The link to the archive is at the end of the article. The control are as follows: Left Stick to turn the camera left or right Keep A down to accelerate the player's ball Press Y to give the player's ball a short burts of speed Press X to give all the balls a short burst of speed Press B to create a new ball, at the position of the player's ball, and shoot it forward
5. Conclusions So that's about it, for now. In this tutorial, I tried to explain and show how to make a multi-threaded game architecture in XNA. The focus of the tutorial was separating the updating and drawing code on parallel threads, and keeping data correctly synchronized between these two threads. We saw an example of using this architecture in practice, and we could see the performance we gained by multi-threading. However, this is not the only way to use multi-threading. There are plenty other ways not covered in this tutorial. You can use it for an animated loading screens (which can be seen in the GameStateManagement sample on creators.xna.com), complex asynchronous A.I. computations, loading data during gameplay without a loading screen, etc. All these require different architectures for the code, and come with their own pitfalls and sensible spots, but they are doable. The results vary from a little extra polish that makes your game feel right, to more intelligent enemies, or to that extra bit of performance we're all craving for. 6. Downloads The code for the framework alone can be downloaded here: MultithreadingFramework.zip. The code for the Balls example can be downloaded here: MultithreadedBalls.zip. 7. References p Gamefest 2007 - Multicore Programming, Two Years Later, by Ian Lewis Gamefest 2008 - XNA Game Studio Performance 2008: Multithreading and GPU, by Frank Savage Threading in C#, by Joseph Albahari Multi-Threading in .Net, Jon Skeet Gamasutra - Multithreaded Game Engine Architectures, by Ville Mnkknen Gamasutra - Threading 3D Game Engine Basics, by Henry Gabb and Adam Lake Multithreaded Rendering and Physics Simulation, by Rajshree Chabukswar, Adam T. Lake, and Mary R. Lee XNA - Multithreaded Rendering and Physics Simulation, by Nerdy Inverse