OCZ DDR PC-4200 Dual Channel

Memory by KeithSuppe @ 2003-10-28

"OCZ Performance Series PC-4200 Dual Channel memory using state of the art HyperSpeed? technology is capable of achieving outrageous speeds of up to 533 MHz at CL 3-4-4-8. OCZ PC-4200 Performance Series Dual Channel memory has been designed and fully optimized for use on Intel i865/i875 chipset-based systems." Liquid3D puts this statement to the test.. and then some!

Introduction

OCZ Technology make's a splash, with Premier PC4200

Madshrimps (c)



OCZ Technology has been finding some very inventive ways to extrapolate every possible MHz from modern DDR IC's. They have done it once again with their Premier series PC4200. Dressed in an attractive black heat spreader, with gold clips, it's much lighter then their GolD series, and to be honest I prefer it over the much heavier, and may I say heavy "Gold" series. The jet-black light-weight aluminium heat spreader simply exudes "fast," and "cool." First off let me say I've been running this memory for almost a week now at 280FSB with absolutely no stability problems. I originally ran it on my Abit IS7-E, and the gists of the becnhmarks were run on the venerable Asus P4C800E Deluxe using a 1GB-kit. I have not removed the heat spreaders to verify the IC's, and believe them to be Hynix, although I haven't empirically verified this. Suffice it to say, if they were Samsung TCB3 DDR333, then I shave my head, in honour of Belt sanding, C02 lasers, or pixie dust. Given the fact there are no Fab's actually designing, and building 4.0ns IC's, the first stage is the binning, and that in itself will only get you so far. I do know where OCZ has made significant strides is in their researching different PCB's. And at these speeds given the jump-off point (silicon limitations) choosing the correct PCB has just as large of an impact on performance as the IC's themselves.

I don't remove heat-spreaders all that often. Unless there's some compelling mystery to warrant it I see no reason to. In fact I was slightly suspicious when Lost Circuits decided to "unveil" the "hidden" secret under OCZ's GolD3700 Heat Spreaders some time ago. In fact I found it to be in poor taste to suggest OCZ was claiming there was actual "gold" in their Heat Spreaders. I've come to respect Lost Circuits for their in-depth, most often educational articles. Yet I must say their removing one heat spreader, on one particular memory brand, of all the memory they've reviewed over time seemed somewhat nefarious. So long as the module does what it supposed to do, why bother? If OCZ was specifying 3.10VDIMM as the default required voltage, I'd most likely become suspicious. None the less, Anand Shimpi countered in his article on the Gold3700 and coming from Anandtech, that's a powerful endorsement. When Lost Circuit's attempted to discredit OCZ with the words, "belt sander" line drawn through it, I found that to be personal. As they say, "that's all IC's under the heat spreader..." Speaking of Anandtech, they published their review on OCZ's GolD series 4200EL just recently. I like to run memory for at least a week, to allow the silicon to under-go, what some may label as "burn-in."

This article will not only be a review, it will discuss the discrepancies found from the label on your DDR, to the chipset claims of bandwidth. As Overclocker's (Enthusiast's), we can be an overly demanding and fickle bunch. We are insatiable when it comes to hardware performance, and whether it be DDR, Graphic's, or CPU's, we plug them in and immediately begin seeking the performance ceiling. Yet when we overclock our memory, we overclock the FSB, ergo the Processor. While we can use memory dividers, any experienced Enthusiast knows, this is simply slowing the system, and even with the best latencies, the divider defeats that purpose. This is especially true so long as were discussing the Canterwood/Springdale chipsets, for which OCZ Technology PC4200 was generally designed. At present the i875 MCH is the best "peak bandwidth" chipset available on the market.

I875 Testing and Memory Bandwi

The i875 in theory can provide an amazing 6.4GB/s of Peak Bandwidth and it is exactly this term which inspired my delving into this subject matter. Next to the processor's physical cache, the North Bridges, and more specifically it's MCH (Memory Controller Hub) will play a critical role in system performance. The architecture of the MCH may be as important as any component in your system, and certainly as much as the processor itself. Where the chipset is concerned, many theoretical figures are thrown at us by marketing departments. Yet I wonder how many of us truly realize the distinction between peak and effective bandwidth? I've come to realize, to the Overclocker, the word default albeit seven letters, surely macerates a four letter connotation on his/her best carpet. I for one, just can't live with my P4 2.4C running at 2400MHz. After 3500MHz it's simply offensive. Therefore I need components which will keep pace at these overclocked speeds. And while memory can be overclocked, it has no where near the flexibility, or the performance ceiling most processor's do. I need DDR500 to see 3GHz such that I'm not using a divider. It is, however; somewhat disheartening not to reach the "labelled" bandwidth on my modules, or bandwidth claimed by the chipset maker. For these reasons, we will investigate whether the industry has aggrandized these values, or if the majority of systems are simply performing below their specifications? All too often we (I) fail to meet the ostentatious claims (bandwidth) associated with today's memory, and chipset. Are these figures theoretical, and therefore rarely, if ever attained? Or are the values associated with today's memory "real-world" numbers, and empirically verifiable? First the formula;

Bus Size X Clock Speed = Bandwidth

DDR = 64-bit Bus ~ 8-bytes

8 x 533 = PC4200 MB/s (actually 4264 MB/s you get 64MB/s extra for your money!)

I've been running OCZ PC4200 at a constant 285FSB for six days now. Since were using DDR (Double Data Rate) memory, this infers our bus speed to be 570MHz. Ergo 8 x 570 = PC4560. Since I paid for PC4200, and been running the memory completely stable at PC4560 speed, I'd say things are good. Yet when we look at the screenshot below one becomes befuddled. First, I must qualify the figurative results of this screenshot. At this speed, running with PAT enabled was not possible given the voltage available to the DIMM's. And it is around this attribute of the 875 chipset our discussion will revolve;

Madshrimps (c)


First off let's look at the bandwidth. Since we operating on Dual DDR platform, then the figure needs to be divided in half, so 5914 MB/s divided by 2 = 2957 MB/s subtract the theoretical single channel bandwidth of 4560 MB/s = 1603 MB/s. What happened to 1603 MB/s ? Or 3206MB/s between both channels? I want it back. Even Sandra tells me my "estimated" bandwidth could be 9120MB/s ((peak) 4560 x 2 = 9120MB/s). Do I call OCZ, Intel, Asus, who? Who is going to give me back my 1603MB/s per channel which I (we) have been deprived? The fact is we never really see this kind of "Peak" bandwidth. And why is that? Some may call it false-advertising, sand-bagging, perhaps even duplicitous. I simply call it, "wishful Bandwidth thinking." I believe the clearest explanation I've found is in the article by Peter Rundberg, Memory Bandwidth Explained;

...there is a difference between peak bus bandwidth and effective memory bandwidth. Where the peak bus bandwidth is just the product of the bus width and the bus frequency, the effective memory bandwidth includes addressing, and other things that is needed to perform a memory read or write. The bold figures of DDR-SDRAM and DRDRAM does not indicate how these new memory technologies perform in real life. In this article we will look into the cause of the failed promises of these technologies by focusing on the most important part of memory performance, latency. What neither of these two new memory technologies gives us is reduced memory latency, which is the time it takes to look something up in memory. This is because they are both based on DRAM. The latency is not so much an issue of the memory interface as it is the memory cell itself, and since both these two new memories use DRAM, the latency is not improved. As we will see in the following sections, latency is more important than peak bus bandwidth when it comes to providing effective memory bandwidth...(In and Outs Of Memory Bandwidth, Peter Rundberg.)


The quoted article detail's the many processes which conspire to slow the potential bandwidth label your memory carries. What may shock you, is the primary culprit responsible slowing your RAM's bandwidth ironically happens to be the same device responsible for speeding it up, your CPU's cache. A cache operates on the principle of locality, "temporal," and "spatial." Temporal locality, assumes that if a given program uses a piece of data, it will use that same soon. Spatial locality states if a program uses a specific piece of data, it will use data in close proximity soon. When data is requested, which is not in the Processor's cache, a "miss" occurs. And in reality the only time Main Memory (RAM) is accessed, is during a cache miss. When the data is retrieved from RAM, the Spatial locality effect also determines what is retrieved. The problem with accessing main memory is it cannot simply be retrieved in one block. DRAM is stored in a matrix, rows and columns so that it can be "easily" found. Below is a basic example of how data is organized, and therefore retrieved from SDRAM;

RAS - Row Access Strobe. A signal indicating that the row address is being transferred.
CAS - Column Access Strobe. A signal indicating that the column address is being transfered.
tRCD - Time between RAS and CAS.
tRP - The RAS Precharge delay. Time to switch memory row.
tCAC - Time to access a column.

The CPU addresses the memory with row and bank during the time RAS is held (tRP).
After a certain time, tRCD, the CPU address the memory with the column of interest during the time CAS is held (tCAC).
The addressed data is now available for transfer over the 64 bit memory bus.
The immediate following 64 bits are transferred the next cycle and so on for the whole cache block.

For SDRAM these times are usually presented as 3-2-2 or 2-2-2, where these numbers indicate tCAC, tRP, and t RCD. Thus for 2-2-2 memory the first 64 bit chunk is transferred after 6 cycles. (In and Outs Of Memory Bandwidth, Peter Rundberg.)

It's very simple, the numbers associated with Peak-Bandwidth fail to account for the above processes, as well as the write-back /write-allocate steps. In order to better understand the difference between peak, and effectual bandwidth, one must understand how the FSB (cache) and NB-MCH (North Bridge-Memory Controller HUB) interact, as well as the relationship between the MCH and memory itself. nVidia revolutionized memory throughput with it's introduction of Twinbank memory architecture in it's nForce chipset. Twinbank combines two distinct 64-bit DDR channels via an arbiter, to form 128-bit throughput to the FSB. This effectively doubles the bandwidth, and combines another revolutionary North Bridge architecture known as DASP. In fact they patented it. DASP or Dynamic Adaptive Speculative Pre-processor was designed to reduce latency acting as an additional prefetch along the memory bus. DASP sought to reduce latency by predicting what the next data block/s would be through temporal locality. It would then fetch and store data concurrently for the processor. In fact it's been labelled an "L3 cache." The concept is not so unique, as the Athlon FX51 has an L3 cache, which basically renders the MCH moot. Of course ECC memory is required, although I'm sure given time that will not be so. DASP is still the strongest performance element of the nForce chipset, and while Twinbank certainly improves performance dramatically, there's a reason nVidia decided to release two version's of its nForce-2 400. The nForce-2 400 being single channel, and the nForce-2 400 Ultra version incorperating Twinbank. Of course DASP is common to both.

PAT on/off

On the Intel front, it's taken a bit of time to find a chipset to rival the nForce, and while Intel's efforts were originally mired down by Rambus influence, and RDRAM considerations they finally decided to adopt the more widely accepted DDR standard. Intel's 850 chipset was their first dual channel effort, although based on RDRAM it was a plateau in their chipset ascent. Granite Bay was the predecessor of the venerable Canterwood, and was Intel's first synchronous Dual DDR chipset. Albeit significantly slower then it's offspring at 4.2GB/s it was, none-the-less the break from the RDRAM strangle-hold. RDRAM, while an excellent memory solution, was simply too costly and its Patent holders seemed to be involved in one litigation after another. Politics aside Granite Bay, exhibits many of the attributes of Canterwood. Both are 1005-BGA chips, feature the same die size, share the 8x AGP. And except for Gigabit LAN, and PAT, Granite Bay and Canterwood are closely related. But it is PAT which will be our focus, and its ability at latency reduction.

From the screenshot at the previous page running OCZ's PC4200 at CAS-3-4-4-8 / 2.85V, and 285FSB (1:1) Sandra reads a 5924MB/s Buffered Bandwidth result. Now we shall run the system at 280FSB / CAS-3-4-4-8 / 2.85V, and again run with PAT disabled;


Madshrimps (c)



While 5975MB/s comes no where near the theoretical bandwidth, it's still decent bandwidth given the latencies, and disabling of Performance Acceleration Technology. Now we'll run the system at 280FSB again (1:1) at CAS-3-4-4-8 / 2.85V, and enable PAT;


Madshrimps (c)


As you can clearly see enabling PAT improves bandwidth by 349MB/s, perhaps not by a huge amount, but significant none-the-less. Now just for further comparison, we'll run the system again at approximately 280FSB (277FSB) except change the CAS to 2.5. Ergo the parameters will be as follows; 277FSB / CAS-2.5-4-4-8 / VDIMM 2.85;


Madshrimps (c)


Although this may seem prima facie to many, it is none-the-less, empirical evidence of "real-world," or effectual bandwidth with Intel's PAT, and a BIOS reduction of CAS latency. And to show the effects temperature has on the NB, this next screenshot was taken when I'd removed the Maze-4 system to add the Z-chipset block. The Vantec Tornado 92mm/119CFM is currently the most powerful HSF available on the market. When Thermalright designed it's SLK-947-U, and SP-94 to accommodate this fan, you can be sure they knew what they were doing in so far as over-lap is concerned. A great deal of air comes in contact with the surrounding mosfetts and especially the NB. Here-in lays the empirical evidence;

Madshrimps (c)


We've now surpassed the theoretical bandwidth of the 875 chipset, and excellent achievement. And 22C is an excellent temp for air-cooling. This is all attributable to the Thermalright SP94, Vantec Tornado 92mm, and AS5.

Conclusion

Conclusion

OCZ has done an excellent job, finding the optimum platform (PCB) given the available IC's on the semiconductor market. OCZ has always found a unique approach to squeeze every MHz from DDR memory. One reason I've been loyal to this company, is due to their R&D efforts. I'm very grateful to Bhulinder Sethi, who provided me with valuable insights into the semiconductor field, and for providing the preview RAM itself.


Madshrimps (c)




Questions and Comments can be shared with the other readers in our forums.
  翻译: