The Birth of a new API

This past week we made available a pre-beta of Ashes of the Singularity, our upcoming massive-scale real-time strategy game.  Amongst other firsts, it utilizes DirectX 12 which became available as part of the Windows 10 launch last month.  Our game also includes a 3D benchmark for users to play with.

Unfortunately, we have to make some corrections because as always there is misinformation.  There are incorrect statements regarding issues with MSAA. Specifically, that the application has a bug in it which precludes the validity of the test. We assure everyone that is absolutely not the case. Our code has been reviewed by Nvidia, Microsoft, AMD and Intel. It has passed the very thorough D3D12 validation system provided by Microsoft specifically designed to validate against incorrect usages. All IHVs have had access to our source code for over year, and we can confirm that both Nvidia and AMD compile our very latest changes on a daily basis and have been running our application in their labs for months. Fundamentally,  the MSAA path is essentially unchanged in DX11 and DX12. Any statement which says there is a bug in the application should be disregarded as inaccurate information.

So what is going on then? Our analysis indicates that any D3D12 problems are quite mundane. New API, new drivers. Some optimizations that the drivers are doing in DX11 just aren’t working in DX12 yet. Oxide believes it has identified some of the issues with MSAA and is working to implement workarounds on our code. This in no way affects the validity of a DX12 to DX12 test, as the same exact workload gets sent to everyone’s GPUs. This type of optimization is just the nature of brand new APIs with immature drivers.

Immature drivers are nothing to be concerned about. This is the simple fact that DirectX 12 is brand-new and it will take time for developers and graphics vendors to optimize their use of it. We remember the first days of DX11. Nothing worked, it was slower then DX9, buggy and so forth. It took years for it to be solidly better then previous technology. DirectX12, by contrast, is in far better shape then DX11 was at launch.  Regardless of the hardware, DirectX 12 is a big win for PC gamers. It allows games to make full use of their graphics and CPU by eliminating the serialization of graphics commands between the processor and the graphics card.

I don’t think anyone will be surprised when I say that DirectX 12 performance, on your hardware, will get better and better as drivers mature.

Untapped potential 

Oxide has deep roots in PC gaming. You might say, it’s in our DNA. When we founded the studio two and a half years ago, one of our key motivating factors was our desire to push PC games further then they had been before. We had noticed with disappointment how much performance remained unused, and because of our experience with countless PC games we knew that there was enormous untapped potential for PC games.

There are many problems with developing a modern engine, one that can run efficiently on many CPU cores and take advantage of advanced SSE Instructions. Some of these problems were clearly caused by developers who didn’t leverage modern CPU architecture.  We took these things to heart as we started to build Nitrous and Ashes of the Singularity.

The limitations of DirectX 11 

Some performance problems, however, we couldn’t solve. The biggest of these was that the interface into the graphics layer was becoming the dominate bottleneck of the system. It didn’t matter how many bullets, units, terrain, trees, AI etc. that the engine could handle if the graphics stack (i.e. D3D11) couldn’t process them.

Enter D3D12. We’ve been working with Microsoft for some years now, giving them regular code drops of our engine and developing internal benchmarks designed to generate realistic workloads.   As part of this effort, we’re proud to be able to start to unveil some of the exciting things about Direct3D 12.

Although Ashes the game still has a great deal of work left to do, the underlying 3D engine it is built on is fairly mature.

What should you expect out of a non-synthetic benchmark? 

But what is it exactly that you are going to see in a benchmark that is measuring actual gameplay performance? If you run the Ashes of the Singularity Benchmark, what you are seeing will not be a synthetic benchmark. Synthetic benchmarks can be useful, but they do not give an accurate picture to an end user as to what expect in real world scenarios.

Our benchmark run is going to dump a huge amount of data which we caution may take time and analysis to interpret correctly. For example, though we felt obligated to put an overall FPS average, we don’t feel that it’s a very useful number. As a practical matter, PC gamers tend to be more interested the minimum performance they can expect.

People want a single number to point to, but the reality is that things just aren’t that simple. Real world test and data are like that. Our benchmark mode of Ashes isn’t actually a specific benchmark application, rather it’s simply a 3 minute game script executing with a few adjustments to increase consistency from run to run.

What makes it not a specific benchmark application? By that,we mean that every part of the game is running and executing. This means AI scripts, audio processing, physics, firing solutions, etc.  It’s what we use to measure the impact of gameplay changes so that we can better optimize our code.

Because games have different draw call needs, we’ve divided the benchmark into different subsections, trying to give equal weight to each one. Under the normal scenario, the driver overhead differences between D3D11 and D3D12 will not be huge on a fast CPU. However, under medium and heavy the differences will start to show up until we can see massive performance differences. Keep in mind that these are entire app performance numbers, not just graphics.

Understanding the numbers 

Some of the fields might need a little explaining, since they are new information that was not possible to calculate under D3D11. The first new number is the percent GPU bound. Under D3D12 it is possible with a high degree of accuracy to calculate whether we are GPU or CPU bound. For the technical savvy, what we are doing is tracking the status of the GPU fence to see if the GPU has completed the work before we are about to submit the next frame.  If it hasn’t, then the CPU must wait for the GPU. There will sometimes be a few frames for a run where the CPU isn’t waiting on the GPU but the GPU is still mostly full. Therefore, generally if you see this number above 99%, it’s within the margin of error. Also keep in mind that any windows system events could cause some spike here – things like Skype or Steam notifications can actually block us. These are not indicative of driver performance.

The second interesting number is the CPU framerate. This calculation is an estimate of what the FPS framerate would be if the GPU could keep up with the CPU. It is a very accurate estimate of what would happen if you put in an infinitely fast GPU.  Likewise, we have another mode which instead of blocking on the GPU, will do all the work but throw away the frame. This can be useful for measuring CPU performance. However, if you do this then be sure that you use the same video card and driver for a different CPU, as some of the measurement will be driver related.

What is fascinating about the CPU framerate is it demonstrates how much more potential D3D12 has over D3D11. D3D12 will not show its true CPU benefits in average frame rates while the GPU is full.  One thing to consider is that we are often pairing 28nm GPUs with 14nm CPUs. Next year, when the GPUs move to a higher process, you’re going to see a huge jump in GPU performance. This means that the gap between D3D11 and D3D12 will not only grow, but D3D12 may well become essential to achieving performance on the coming GPU architectures.

So what numbers do matter? 

So what number should someone look at? What settings do we recommend? Our current sweet spot for performance and visual quality is High Settings at 2560x1440p 4xMSAA with a high-end GPU. However, until all the vendors have their graphics drivers optimized for DirectX 12, you may want to disable the MSAA setting.

Ashes looks substantial better at higher resolutions, though 4k GPU performance is still not good enough to recommend it at high settings on a single GPU at this time.  If the GPU bound graph is close to 100%, then you can be sure that you are measuring GPU performance, not CPU or driver.  The best CPU score number is going to be the CPU frame rate on the Heavy Batches sub-benchmark. The reason is that for the lighter scenes, there may not be enough work for the job scheduler to spread it across all the cores.

We also introduce the concept of a weighted frame rate. This is a very simple calculation. What we do is square the ms timing of every frame, and then take the square root at the end. This weights slow frames more than fast frames. This is important, because we care far more about the slow frames of our game then the fast ones.  We don’t care about our fast frames going from 60fps to 120fps as much as we care about our 30 fps frames going 60 fps.

The benchmark will generate a log file which will also contain more data than is displayed, included will be the frame timings for every frame of the benchmark. We include this information, because some of the data gathering tools do not yet work on Direct3D12. Also, users have the option of uploading their data to our leaderboards. This will help us collect information to better set our defaults for our game.

Being fair to all the graphics vendors 

Often we get asked about fairness, that is, usually if in regards to treating Nvidia and AMD equally? Are we working closer with one vendor then another? The answer is that we have an open access policy. Our goal is to make our game run as fast as possible on everyone’s machine, regardless of what hardware our players have.

To this end, we have made our source code available to Microsoft, Nvidia, AMD and Intel for over a year. We have received a huge amount of feedback. For example, when Nvidia noticed that a specific shader was taking a particularly long time on their hardware, they offered an optimized shader that made things faster which we integrated into our code.

We only have two requirements for implementing vendor optimizations: We require that it not be a loss for other hardware implementations, and we require that it doesn’t move the engine architecture backward (that is, we are not jeopardizing the future for the present).

How useful is the benchmark? 

It should not be considered that because the game is not yet publically out, it’s not a legitimate test. While there are still optimizations to be had, Ashes of the Singularity in its pre-beta stage is as – or more – optimized as most released games. What’s the point of optimizing code 6 months after a title is released, after all? Certainly, things will change a bit until release. But PC games with digital updates are always changing, we certainly won’t hold back from making big changes post launch if we feel it makes the game better!

DirectX 11 vs. DirectX 12 performance

There may also be some cases where D3D11 is faster than D3D12 (it should be a relatively small amount). This may happen under lower CPU load conditions and does not surprise us. First, D3D11 has 5 years of optimizations where D3D12 is brand new. Second, D3D11 has more opportunities for driver intervention. The problem with this driver intervention is that it comes at the cost of extra CPU overhead, and can only be done by the hardware vendor’s driver teams. On a closed system, this may not be the best choice if you’re burning more power on the CPU to make the GPU faster. It can also lead to instability or visual corruption if the hardware vendor does not keep their optimizations in sync with a game’s updates.

While Oxide is showing off D3D12 support, Oxide also is very proud of its DX11 engine.  As a team, we were one of the first groups to use DX11 during Sid Meier’s Civilization V,  so we’ve been using it longer than almost anyone and know exactly how to get the get the most performance out of it.  However, it took 3 engines and 6 years to get to this point .  We believe that Nitrous is one of the fastest, if not the fastest, DX11 engines ever made.

It would have been easy to engineer a game or benchmark that showed D3D12 simply destroying D3D11 in terms of performance,  but the truth is that not all players will have access to D3D12, and this benchmark is about yielding real data so that the industry as a whole can learn.   We’ve worked tirelessly over the last years with the IHVs and quite literally seen D3D11 performance more than double in just a few years time. If you happen to have an older driver laying around, you’ll see just that.  Still, despite these huge gains in recent years, we’re just about out of runway.

Unfortunately, our data is telling us that we are near the absolute limit of what it can do. What we are finding is that if the total dispatch overhead can fit within a single thread,  D3D11 performance is solid. But eventually, one core is not enough to handle the rendering.  Once that core is saturated, we get no more performance. Unfortunately, the constructs for threading in D3D11 turned out to be not viable. Thus, if we want to get beyond 4 core utilization,  D3D12 is critical.

SLI and CrossFire 

Another question we get is concerning SLI and CrossFire situations. D3D12 allows us to have explicit support for Multi-GPU. This allows our engine to drive it. While we have some prototypes of this working, it isn’t yet ready to distribute for public review. There may still be OS level change that needs to happen, and we are still working with Microsoft to figure it out. However, we do expect it to be ready for public review before our game goes into BETA.  Our current expectation is that scaling should be close to optimal, with minimum frame variance.

So that about sums it up. We hope that gamers everywhere will find our test useful, and help us make PC gaming better than ever.