Click here to go back to www.viperlair.com
|
||||||||||||||||
Benchmarking - How and Why?: Benchmarks exist to determine how a particular piece of hardware performs in relation to itself, and to others. Question is, are readers really getting the information they really need?
|
||||||||||||||||
With the boom in the use of computers over the past few years we have also seen a extreme rise in the number of users on the internet. And with many people wanting to know more about their computers and how they can upgrade it most effectively. Such sites as Tomshardware, Anandtech and HardOCP have sprung up as a direct result of this fact. As a reader of other people's reviews, consumer and also as a hardware reviewer I try to look at reviews (my own included) and try to see what good things are in them and try to see what can be improved on. The following is more or less what I've inferred from what I've read others say, and from my own reading. This article is meant as a minor rant, with some logic (hopefully) behind it. One of the major tools used by any site to convey performance is benchmarks (this site included). But do most sites convey all the information that they can? Are they as impartial as they can be? Is there anything that sites, such as Viperlair, can do to provided a more complete picture of products we look at? This article will look at a few ways that we, as reviewers can give more information in greater detail in a few areas. Video Card BenchmarksWhen you buy a 3D video card, you buy it knowing that you want to play games with it, and not programs that just spit out a number at you. The benchmarks used in video card reviews haven't really changed in the past while, which is saddening. Many people still rely on programs such as Quake III and 3D Mark 2001SE to test video cards. But with Quake III we can see that the video card is no longer really a limiter, even at 1600*1200 (taken from Huberts review of the PNY Ti4600): Many people consider anything over 60fps to be playable, and we can see with even the current/previous generation average graphics cards, they have no trouble reaching frame rates over 100fps for a card that was only about $100 (US). We can see even the more mainstream video cards today can easily play at 1600*1200, thus making Quake III a benchmark that really has no meaning with newer video cards. This is because most people would not be hindered by buying one of these video cards instead of a faster/more expensive one. Another thing to note is that the frame rate range used is from 100fps to 160fps, which exaggerates the differences between the cards tested here. What about 3D Mark 2001? With 3D Mark 2001 we get a number that is supposed to represent how your computer will perform in games but is not always accurate. Overall it is a decent program to look at, but it has many flaws. Some people obviously try to optimize their systems for this program, while not showing any performance gain in normal games. Also the number that you are given is just that, a number, it doesn't necessarily reflect the true performance of the card. As an example of this let us look at how the Kyro II performs in 3D Mark 2001 compared to the GeForce 2 Pro, which in most games performs similarly.
As you can see from a sampling of the top five results in the Mad Onion database (for all XP's between 1481-1547MHz) the Kyro II is about 2500 'Marks' less than the GF2 Pro's, yet in most games it performs as well as or better than a GF2 Pro. We can see that this helps show the ineffectiveness of 3D Mark 2001 to show the real life performance of a video card, and how it tries to reduce the performance of games to a simple number but in some cases fails. This is more a problem of synthetic benchmarks, including , , , and to a lesser extent and due to the fact that they only simulate real life games or in some cases don't even do that. Another point is that many of the tests can be run using Max Payne based on the same engine as parts of 3D Mark 2001. We seem to have 'bashed' some of the most used benchmarks so let us change focus and look at what can be done to improve the benchmarking procedure. As I stated before, we buy video cards to play games, so therefore we should focus on this, instead of the synthetic results that don't really mean much in the way of real life performance. We would also want to be flexible to a certain extent, while holding on to some benchmarks for comparison's sake. Quake III might be a choice for comparisons, but there are many other programs that came out after that, that are just as good. A couple of examples of this are Jedi Knight II (same engine as QIII), Serious Sam SE, or even Max Payne. We as reviewers want to be flexible because the gaming world around is changing at a fast pace, with new games being released with new features, there is a need to show how video cards perform in these environments which can't be done with current games (think Doom3). This brings us to the next point. Until the release of UT2003, most Direct X benchmarks used in reviews consisted of 3D Mark 2001. There are other options for DirectX benchmarks, one just has to look around. The guys at devised a way to benchmark Max Payne which turned out to be a very good benchmark. An interesting benchmark I found out about at was for a DirectX 8 game called Ballistics, which is a very nice 'racing' game that can be used to test video cards in a non fps game. The last one I'll mention is that of AquaMark, it is harder to come by, but is an interesting real/synthetic benchmark. Another thing I have noticed sorely lacking from many reviews (even some of mine and others who write for this site) is a lack of quality tests, and more specifically 2D quality tests. Since you are looking at a 2D display for the entire time you use your computer (unless you have some new 3D display), it therefore deserves, rather requires, testing to see its quality in everyday use. The best way I've seen is to have a blind test with multiple people (preferred, to represent regular computer users), using one video card as a base for comparison to the others. Good tests are very importent as well, including text and color images, to test the saturation and quality of the images produced by the video card (not the monitor as it is a control in this case). If more people did this, it would help many people in their purchases (mine included) as I don't want to buy a video card with pathetic 2D quality (and yes many are sold even today). One last thing we will look at with video card benchmarking is detail. We need to provide you the reader with as much unbiased information as possible to counter any bias we might have. How can we do this? An interesting way of doing this is by providing a graph of the second by second frame rate, while also providing the minimum/average/maximum frame rates. This has been made possible by the program Fraps, which doesn't use up much in the way of resources (~1% of the frame rate) and allows for the frame rate to be recorded every second. Lets look at an example of this in action using the results from Serious Sam SE at 1600*1200 from my Parhelia review. First we'll look at just a bar graph, and then look at the graph found in that review. Nothing unusual here, it looks as if the Parhelia is just a little bit faster than the Kyro II, apart from that we can't infer much else. What about if we chart the results?
With the graph as well as the minimum/average/maximum frame rates we can see more about the performance of the card. We see that the Parhelia is more CPU limited than the Kyro II, with the Kyro II dropping to 0 fps twice, not only the once that we can see in the table of results. We can see the graph shows us the most information and allows us to find more about the performance of a card at a given resolution. Now let us look at benchmarking for CPU's/motherboards. CPU & Motherboard Benchmarking Just as with video cards, many benchmarks used for testing CPU and motherboards hasn't changed much, with very few pieces of software being used by many sites. Some places just use synthetic software in their reviews, such as SiSoft. Lately with some questions regarding the validity and bias of some benchmarks, such as found in Sysmark 2002. While there are many more real life benchmarks for CPU's compared to video cards some sites don't seem to acknowledge them. Software such as Sysmark 2002, Business Winstone, Content Creation Winstone, are examples of benchmarks based on programs that are useful. But many of these programs aren't open to see the times that a process takes, rather just like synthetic programs they spit out a number. A MS Office benchmark I ran across a while ago was called Office Bench (not very original), and instead of spitting out a number, it showed all the times taken by the program (load up, run, shut down) being tested. As an added bonus it allowed testing in a more 'realworld' area, allowing for background tasks to be running (Media Player, etc.) and incorperating it into its testing. Unfortunetely, this program is now being sold for about $50 which puts it out of the reach of some smaller sites who don't wish to spend money on a program that serve no other purpose than as an benchmark. While office applications are something that is used everyday, there are other free applications that can be used. Let us look at some other benchmarks that can be used in place of some of these (especially Sysmark 2002 which costs a fair amount of money) and are less biased when compared to the 'larger' benchmark programs. For example those with PhotoShop can get a benchmark , which runs 21 tests on a picture allowing you to see the performance differences to be seen when using Photoshop using many different filters. Another benchmark that is very useful in testing the differences between CPU's and finding the minute differences between motherboards, is DivX encoding as well as MPEG-2 encoding. Both tests are very CPU dependent, and the software to run them is free/month trial (MPEG-2 - ; DivX - , ). Many people today are now converting video to DivX or even converting home made movies to DVD's. As such these programs are very relevant to current trends. The video to test with can be almost anything (would be good to have a standard video). Apart from video tests there is also the very popular MP3 encoding, using the 'free' LAME codec in conjunction with some front ends for this encoder. Another very CPU dependent group of programs is that of 3D Modeling/Rendering. Programs for beginners such as Truespace up to the more advanced 3D Studio Max, Maya and Lightwave are all very CPU dependent, especially when rendering the final product. This is one area that some sites have looked into and should look into more so, as it is an area that needs CPU power to create many things. The users of these programs are always looking for ways to improve rendering speed, while keeping the quality of the final render at an extremely high quality level. What about synthetic benchmarks such as SiSoft Sandra and related software are they good pieces of benchmarking software? The answer is both yes and no. While synthetic tests tell you exact numbers, they focus solely on one area, completely removing the other variables from the equation. Thus while in real life having 128MB of RAM compared to 512MB+ can lead to severe penalties in 3D Rendering, these synthetic benchmarks would perform the same in either case. However, as with any synthetic benchmarking tool, they are adequate in finding the general performance of a certain piece, and can help find the reasons behind a strange performance boost or decrease. ConclusionsWhat can we say in conclusion. First we want to see what we can conclude about how we as reviewers can write better reviews by using better benchmarking tools. We can first of all relegate synthetic benchmarks such as 3D Mark 2001 to their proper place, that of a secondary test. We need to focus on the games people play, as well as giving as much information as possible, perhaps by using per-second graphs to show how the card performs during every second. We also do not want to forget the other tests we can do on video cards, most importantly 2D quality testing. With benchmarks for CPU's and motherboards, we need to look not only at office applications, games and synthetic applications. We must look at other programs that many people use, 3D Rendering, video editing/rendering, and more open office benchmarks. So before we as reviewers use a benchmark, or you as the reader look at the results, we should ask ourselves a couple of questions. Is this benchmark played (games) or used (applications), by people? Am I using this benchmark because I want to bias the reader into accepting an erroneous conclusion? Is there any other benchmarks that I can use to find out the full potential of this hardware? Do I want to spend time/effort on a benchmark to make the review better? (If you answer no to this last question, please do everyone else a favor and don't write anymore reviews). As a final statement please remember that these are my opinions, based on what I've read so take it as personally as you want to.
|
|