If
your brand new 3.6GHz CPU and $500 video card is still waiting
60 seconds for a Doom 3 level to load, you'll know that the hard
drive is probably the most significant bottleneck in your PC subsystem.
It's
all about finding the right balance.
IDE and SATA hard drives have not evolved much in terms of speed,
particularly within the past 18 months. Sure, fast 10 000rpm drives
have been introduced, and manufacturers are beefing up the cache,
but hard drives have not increased in speed the same way as CPUs
and video cards have. Building an extremely fast PC is more than
just dropping in the fastest processor you can find. A CPU is
one part of the equation, but you'll need a fast video card to
draw the images as quickly as the CPU can feed it. A fast subsystem
will keep data moving efficiently, as a slower bus will bottleneck
the data flow.
One problem that exists for almost all current drives is the
way they access data. Unlike ram, hard drives are mechanical devices,
that are limited by the speed of the internal components when
accessing data. Like a record player, the "needle" needs
to move from one spot to another to retrieve information. Rotational
and seek latencies are the big hurdles here, which is why we see
many SCSI or 10 000 rpm SATA drives seemingly so much faster than
"standard" IDE or SATA drives. Faster spinning hard
drives alleviate the problem somewhat by increasing the speed
of the motors, but as a quick price check can tell you, these
drives are very expensive, and in order to make the drives attractive
price-wise, they often feature lower capacities. The larger cache
mentioned earlier can also speed up data access, but it doesn't
completely solve the problem as incorrect cached data is useless
if the program doesn't need it.
How a Drive Accesses Data
All hard drives work the same way overall... the CPU makes a
data request, the drive spins to where the data is located, and
retrieves it and sends it back to the CPU. The data is stored
on tracks, and unlike recordable CDs, hard drive tracks write
from the outer edge of the platter and moves it's way to the inner
platter. In a typical hard drive, data reads and writes begin
on the bottom platter, referred to as Disc 0, and the first read/write
head, which is head 0. After one cycle of data is complete (track
0 on head 0), the drive moves to the other side of the disc (track
0 on head 1). Once that is done, the drive moves to the next head
on the second disc (track 0 on head 2). Once the last head on
the last side of the last disc finishes, the cycle repeats with
track 1 and head 0.
The two sides of a head is collectively known as a cylinder.
As outlined earlier, the data is written to sequentially on the
cylinders until the inner diameter of the disc is used. Ideally,
program data is written in order and reads follow the order data
was written. This of course is rarely the case, and in some cases,
where the program thinks data should be isn't there, forcing it
to look elsewhere (which makes a strong case for keeping your
discs defragmented).
One issue that plagues Parallel ATA drives is that although you
can speed up the mechanics, the drive still needs to be efficient
at retrieving data. Ideally, a drive will know where to pick up
data "A", and know where data "B" is located.
It should know it needs "E" before "D", and
so on. The best way to do this is through queuing, which at the
system bus level, organizes the data that needs to be retrieved.
Legacy Command Queuing (LCQ), has some limitations though, one
of which is the bus is going to be occupied until the drive completes
the reordering and retrieval of data. Given the mechanical nature
of hard drives, if requests are being made faster than the drive
can fulfill them, we still get bottlenecked, even with more cache
and faster motors.
Native
Command Queuing
Native Command Queuing (NCQ) was developed to address the problems
of LCQ. Introduced with the Serial ATA II spec, this is a feature
that can only be found in native SATA hard drives. Unlike LCQ,
NCQ works by allowing a drive to process multiple commands at
the same time. These commands can be rescheduled or reordered
on a whim, and can also issue new requests while the drive is
retrieving data from the previous request.
NCQ is tied in closely with Hyper-Threading, and combined with
HT capable hardware and software, the performance differences
should be quite substantial when compared to non-NCQ drives. Tagged
command queuing is supported, and is the command reordering based
on seek and rotational optimization. We mentioned that those two
items are a big part of a drive's performance (or lack thereof),
and by reordering the algorithms based on the linear and angular
position of the data, the process will be much more efficient.
In addition to these algorithms, NCQ is capable of communicating
the status of the commands being performed at any time. This is
referred to as Race-Free Status Return Mechanism, and in essence,
the drive is able to issue several commands at the same time,
without needing to wait for the host to check on the status.
Interrupt Aggregation is another feature where multiple commands
can be aggregated to one interrupt. Normally, for each command,
the host bus would be interrupted each time, but with Interrupt
Aggregation, this can happen only once.
Finally, NCQ has the ability to set up the direct memory access
(DMA) operation for a data transfer without host software intervention.
First Party DMA (FPDMA) allows the drive to process a number of
commands without any intervention from the CPU and/or software.
How NCQ Works
When a command is given to the hard drive, the device needs to
determine if this command is to be queued or processed right away.
In order for NCQ to work efficiently, two commands were added
to the SATA II specification, which are Read FPDMA Queued and
Write FPDMA Queued. We mentioned Hyper-Threading earlier, and
the advantage here is normally, applications request one piece
of data at a time. With Hyper-Threading, several applications
can request data. While this can happen without Hyper-Threading,
the technology allows queues to be built more efficiently.
To simplify how NCQ works, a good example would be an elevator.
Say there is a person to deliver three packages in the elevator,
where each package represents a data request for an application
(which would be the company these packages are intended for).
Say the packages need to go to floors 2, 3, and 4, but they are
stacked in a random order on the elevator floor. The deliver person
ends up dropping off the packages as they are currently stacked,
which would be on floors 4, 2, and 3. Naturally, this is inefficient,
and it would be better to deliver the boxes in sequential order.
 |
|
|
No NCQ
|
NCQ
|
To represent queuing, the delivery person sorts the packages
so that they are dropped off in sequential order. Hyper-Threading
can be represented by having a second delivery person sorting
out the boxes while the other drops them off. In any case, this
was a simplified example, but that's the general idea.
Who and What Supports NCQ?
As we've already pointed out, NCQ is only present in native SATA
drives. The majority of initial SATA drives (Seagate was the exception)
upon debut were not native drives, but rather, IDE drives with
SATA interfaces. NCQ enabled drives will only apply to those which
qualify under the Serial ATA II specification. At the time of
this writing, only Seagate and Maxtor offer these drives. The
Seagate Barracuda 7200.7, which we'll be looking at today, is
readily available and coming soon will be their Barracuda 7200.8
which increases the capacity and doubles the buffer to 16MB. Maxtor's
offering is the DiamondMax 10, which should be available now,
but we were unable to find any of these drives locally.
NCQ drives don't mean much without a controller, and like the
drives, the controller support is a little slim right now. Intel's
ICH6R, which can be found with their 915 and 925X chipsets fully
supports NCQ from the get go. Silicon Image has recently demo'd
their SiI 3124 controller at IDF 2004, and should have their part
out soon. As for Promise, Siig, NVIDIA, ATI and VIA, none of them
currently have controllers that support NCQ. No doubt, these controllers
will come, but at the moment (as in, going to the store and buying
something today), only Intel boards based on Grantsdale and Alderwood
will support NCQ.
As for software support, for desktop users, the performance gain
from "regular" applications will be minimal. NCQ thrives
on multithreaded and multitasking situations, and the truth is,
not many desktop applications and games tap into this. Workstation
and server level systems on the otherhand may see a boost, but
again, that will depend on the application.
The important thing to remember is NCQ drives will work on all
SATA controllers, but the NCQ functionality will be disabled.
Now that we've discussed NCQ, let's have a look at a couple Seagate
Barracuda 7200.7s.