Implementing Direct3D for fun and profit

I can’t believe I’m writing this, it’s been what, 2 months? During that time a lot of things happened – I’ve been to the conference and gave an hour-long talk about our SPU rendering stuff (which was more or less well received), I’ve almost completed an occlusion subsystem (rasterization-based), which is giving good results; and the financial crisis has finally hit the company I work at – some projects are freezed due to the lack of funding, and some people are fired. It’s kind of sad walking through half-empty offices… Anyway, I know I promised to write often but as I am actively developing my pet engine at home and there is a lot of stuff to work on at my day job, so time is a scarce resource for me. My blog/todo.txt file is already 20 entries long, where some things are too small to deserve a post, and others demand a lengthy series. I’ll try to select something interesting from time to time and blog about it. As for todays topic,

Every object in core Direct3D (I’ll be talking about 9 today, but the same thing should apply to 10 and 11) is an interface. This means that the details of actual implementation is hidden from us, but this also means that we can implement those interfaces. Why could we want to do that?

Reverse engineering

If you work in game industry/computer graphics, or, well, any other IT-related field, I suppose, then you should be constantly gaining new knowledge; otherwise your qualification as a specialist will decrease very fast. There are lots of ways to learn, and one of the best is to learn from others experience. Unfortunately, while there is a lot of information on the technology of some titles, most are not described at all. Also sometimes the descriptions are inaccurate – after all, devil is in the details. So what you can do is take an existing title and reverse-engineer it – that is, gain information about implementation details from the outside. Disclaimer: Of course, this information is provided only for educational value. Reverse engineering can violate the laws of your country and/or the EULA of the product. Don’t use it if it does.

In PC / Direct3D world there are two primary tools than can allow such introspection – NVidia PerfHUD and Microsoft PIX. There is also a beta of Intel GPA (which is, by the way, quite promising, if lacking polish), but it is more or less like PIX. Using PIX does not require modifications of the host program, however PIX does not work for some titles (it might crash), is slow (especially for titles with complex scenes, lots of draw calls, etc.) and is not very convenient to use as a reverse engineering tool for other reasons.

PerfHUD is more useful in some areas, but you need to create Direct3D device with a special adapter and in REF mode in order for PerfHUD to work. While some games already have this kind of support in released version (notable examples include The Elder Scrolls 4: Oblivion and S.T.A.L.K.E.R. – Shadows of Chernobyl), others are more careful (I hope if you’re reading this blog you have a build configuration such as Master or Retail, which sets appropriate defines so you can compile development-only stuff, such as asset reloading, profiling or NVPerfHUD support) out of the executable). But still if you manage to intercept the call to Direct3DCreate9 (which can be done for example by creating a DLL, calling it d3d9.dll and putting it near the game executable), you can return a proxy IDirect3D9 object, that forwards all calls to the actual object, except that it modifies the adapter/device type that are passed to CreateDevice. In fact, such proxy objects are used by both PIX and GPA, though the injection technique is more complex.

There are even some programs that simplify the following for you, allowing you to run any title in PerfHUD-compatible mode.

Multithreaded rendering

In fact, this is already described in a Gamefest 2008 presentation “Practical Parallel Rendering with DirectX 9 and 10, Windows PC Command Buffer Recording” (you can get slides and example code here). Basically, since neither Direct3D9 nor Direct3D10 support proper multithreading (creating device as multithreaded means that all device calls will be synchronized with one per-device critical section), you can emulate it via a special proxy device, which records all rendering calls in a buffer, and then uses the buffer to replay the command stream via real device. This saves processing time for other rendering work you do alongside API calls by allowing it to work in multiple threads, and is a good stub for deferred context functionality that’s available on other platforms (including Direct3D11 and all console platforms). I use this technique in my pet engine mainly for the purpose of portability – I can render different parts of the scene into different contexts simultaneously, and then “kick” the deferred context via the main one. On PS3 the “kick” part is very lightweight, so the savings are huge; on Windows during the “kick” part deferred context replays the command stream, so it can be quite heavy, but it’s faster than doing everything in one thread, and the code works the same way. When I start supporting Direct3D11, the same code will work concurrently, provided a good driver/runtime support of course.

Note that I don’t use Emergent library as is – I consider it too heavyweight and obscure for my purposes. They try to support all Direct3D calls, while I use only a handful – I don’t use FFP, I don’t create resources via this device, etc. My implementation is simple and straightforward, and is only 23 Kb in size (11 of which are reused in another component – see below). If anybody wants to use it I can provide the code to you to save you an hour of work – just drop a comment.

Currently my implementation has a fixed size command buffer, so if you exceed it, you’re doomed. There are several more or less obvious ways to fix this, but I hope that by the time I get to it I’ll already have D3D11 in place.

Asset pipeline

My asset pipeline is more or less the same for all asset types – there is a source for the asset (Maya/Max scene, texture, sound file, etc.), which is converted via some set of actions to a platform-specific binary that can be loaded by the engine. In this way the complexity of dealing with different resource formats, complex structures, data non suitable for runtime, etc. is moved from engine to tools, which is great since it reduces the amount of runtime code, making it more robust and easier to maintain. The data is saved to a custom format which is optimized for loading time (target endianness, platform-specific data layout/format for graphics resources, compression). I think I’ll blog about some interesting aspects/choices in the future as time permits (for example, about my experience of using build systems, such as SCons and Jam, for data builds), but for now I’ll focus on a tool that builds textures.

This tool loads the texture file, generates mipmap levels for the texture if necessary (if it was not a DDS with mip chain, and if target texture requires mipmap levels), compresses it to DXTn if necessary (again, that depends on source format and building settings), and makes some other actions, both platform-specific and platform-independent. In order for it to work, I need an image library that can load image formats I care about, including DDS with DXTn contents (so that I don’t need to unpack/repack it every time, and so that artists can tweak DXT compression settings in Photoshop plugin – in my experience there is rarely a visible difference, but if they give me a texture and I compress it to DXT and there are some artifacts, I’m to blame – and if they use Photoshop, it’s not my scope :)). As it turns out, D3DX is a good enough image loading library, at least it works for me (although in retrospect I probably should’ve used DevIL, and perhaps I will switch to it in the future).

Anyway, to load a texture via D3DX, you need a Direct3D device. As it turns out, while you can create a working REF device in under 10 lines of code (using desktop window and hardcoded settings), you can’t create any device, including NULLREF, if your PC does not have a monitor attached. This problem appeared once I got my pipeline working via IncrediBuild, and sometimes on some machines texture building failed. Since I did not want to modify my code too much, I ended implementing another proxy device, which is suitable for loading a texture with D3DX functions. This time it was slightly harder, because I needed implementations for some functions of IDirect3DDevice9, IDirect3DTexture9 and IDirect3DSurface9, but again the resulting code is quite small and simple – 6 Kb (plus the 11 Kb dummy device I mentioned earlier), and I can load any 2D texture. Of course I’ll need to add some code to load cubemaps and even more code to load volume textures, but for now it’s fine the way it is.

So these are some examples of situations where implementing Direct3D interfaces might prove useful. The next post is going to either be about multithreading, or about some asset pipeline-related stuff, I guess I’ll decide once I get to writing it.

UPDATE 25 OCT 2010: Here is the example code:

dummydevice.h – this is just an example of a dummy device implementation; it implements all device methods with stubs that can’t be called without a debugging break. This is useful for other partial implementations.

deferreddevice.h – this is the implementation of the device that buffers various rendering calls and then allows to execute them on some other device. Note that it lives in a fixed size memory buffer, which can be easily changed, and that it implements only a subset of rendering-related functions (i.e. no FFP).

texturedevice.h – this is the implementation of the device that works with D3DXCreateTextureFromFile for 2D textures and cubemaps (3D texture support is missing but can be added in the same way).

Advertisements
This entry was posted in Asset pipeline, Direct3D, Multithreading. Bookmark the permalink.

16 Responses to Implementing Direct3D for fun and profit

  1. Tomat says:

    put hand in assert pipeline is more preferable

  2. The last two are really good ideas. There are many times I wish I could create a device with out a monitor.

  3. hax says:

    1) For, PC, there is also 3D Ripper DX http://www.deep-shadows.com/hax/3DRipperDX.htm2) do you mean build PCs do not have monitor, or videocard? Maybe it's better two implement monitor emulation driver OR just plug pair of resitors into VGA jack to trick videocard into thinking that monitor is plugged in.

  4. hax: Monitor emulation driver is another possible solution (although less automatic); I'm not a hardware guy so I did not know of resistor solution till now.

  5. hax says:

    Here is detailed description of wiring:http://www.mp3car.com/vbulletin/general-hardware-discussion/1763-how-does-video-card-detect-vga-monitor-present.htmlHowever, I have checked today our renderfarm PCs – they do not have monitors, but DirectX works fine, and Windows reports "Default monitor" attached without any HW modifications. In your case, farm PC probably does not have videocard at all, and DirectX won't work.In this case, you can try SwiftShader emulator:http://www.transgaming.com/business/swiftshader/, which is practically D3D9 software imlementation.

  6. I'm not quite sure about build PCs, I thought they did have a video card – I'll check. If they do, it's a NVidia one, and the OS is XP, if it matters.As for SwiftShader – well, it obviously works – after all, they're doing the same thing I do, only they implement much more functionality – but again it's heavyweight compared to my small stub device, and the licensing terms are not clear.

  7. Ste says:

    Hi Arseny, do you have your multi-threaded rendering slides available? :)

  8. Ste, oh, I completely forgot about it. The original slides are in Russian, I'm preparing a quick-n-dirty translation which should be ready today – I'll post the link

  9. fazeel says:

    Arseny,Thanks for posting this. Can you share your implementation of the minimal DirectX emulator please ?ThxFazeel

  10. ChenA says:

    I impl a proxy dll,but PerfHUD didn't work ,nothing is happen except it don't show "The application is not configured to use PerfHUD".Could you give me your implementation?Thanks.

  11. Last time I used this, the trick was to name the DLL differently (not d3d9.dll, for example z3d9.dll) and replace the DLL reference in executable (the DLL name length is the same so it's a safe operation, unless StarForce is used of course); PerfHUD seems to have additional protection for just replacing d3d9.dll, but it can't do anything if DLL is renamed.

  12. Electro says:

    reactive on googleI'd appreciate a copy as well. I'm especially wary of the unclear licenses as well.

  13. Anonymous says:

    добрый деньизвините, можно попросить у вас посмотреть в исходник?это исключительно для ознакомительных целей (учусь программировать), хотел бы разобраться в тонкостях DirectXбуду очень вам признателенalexpolt [at] gemabank.ru

  14. Tyler says:

    Very interesting. We came across a similar problem here. Due to time constraints we ended up never solving it. I didn’t know creating such a proxy device was even possible (I’m pretty new to the graphics industry). I would love to learn how it all works by studying your code if you’re still willing to send it.

  15. zeuxcg says:

    I’ve updated the post with links to three example source files; the last two depend on the first one.

    DL_BREAK is a variant of __debugbreak, DL_ASSERT is a custom assertion macro; the rest should be obvious.

  16. Pingback: Source code: Implementing Direct3D for fun and profit | What your mother never told you about graphics development

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s