Stabilizing your memory overclock on Core i7 platform - Back-to-Back Cas Delay Investigated

OC-Team.be by massman @ 2009-06-30

In today´s short article, we´ll have a look at the effect of just one memory timing, the Back-to-Back Cas Delay timing, which seems to be one of the more important timings both performance and stability-wise. If you´re looking for a solution for your high-frequency capable memory kit, performance increase with low-frequency memory or stabilizing high BCLK/high memory ... this is something for you.

Introduction

Introduction

It has been a while since my latest article for which I blame examinations at school and of course the Gigabyte Open Overclocking Competition, which eventually led me to the great city of Taipei. However, since I have a few weeks of vacation now, I'm able to present you some of the findings of the Madshrimps team regarding the Core i7 platform.

Maybe you remember the X58 motherboard round-up which was published already four months ago, close to the release of the Core i7 platform. Having 7+ X58 motherboards at home (I was sent a couple more after the round-up) gave me, and my collegues, the opportunity to digg deeper into the different aspects of overclocking an i7 processor. In fact, we focussed on three different aspects:

  • BCLK frequency
  • Memory frequency
  • The combination of BCLK and memory frequency

    As for this article, we'll be spending time on one setting that seems to give us the best solution when running into either stability or performance problems when overclocking the memory. It seems to be one of the key elements to stabilize the memory when running at high frequencies.

    Back-to-Back Cas Delay

    Madshrimps (c)


    This one setting I'm talking about is a memory timing called "Back-to-Back Cas Delay" (referred to as B2B in the article). To explain you why this particular timing is so interesting, I need to tell you the story of how we figured out all this, since it's vital to the understanding of it's impact on stability. First of all, this timing seems to be available on Asus motherboards only. Why? We don't know, but on the Asus motherboards it defaults at "0" when set to auto in the bios. But, before we digg in deeper: the background story.

    To start with, I have to explain a concept named "low-clock challenges", which is mainly used within the enthousiast community to work out a software/hardware configuration as efficient as possible. Basicly, you limit the maximum frequency of the cpu and then try to get a score as good as possible using a certain benchmark, in this case 32M. This allows the overclockers to compare their tweaking skills and, if necessary, figure out hardware-related performance problems.

    Now, when tweaked properly, a 4GHz Core i7 combined with 1GHz CL7 (2000CL7) memory will give you a time around 8 minutes 50 seconds (in the 32M benchmark). However, certain people seem to be able to run at least 10 seconds faster, which could not be explained by software tweaks solely. To keep the story short: it occured to me that almost all fast configurations were Asus-equipped.

    On the Madshrimps forums, I wrote quite a lenghty post regarding my performance issues in the 32M low-clock challenge. At initial stages, I believed the hardware prefetcher options in the bios were causing this since the Asus motherboards were the only motherboards that had them available in the bios. However, after testing both the Rampage II Gene and Foxconn Bloodrage (new bios has prefetcher option) I know that those two prefetchers are not the cause. I presume the best way to read that post is just for understanding that there's a significant difference in clock-per-clock performance between certain motherboards.
    (Link: 32M and i7, a motherboard's choice?)

    The second part of the story is actually located in another thread, but since the information is spread over numerous forums, I'll give you the details in short. Basicly, Jody (3oh6, hardwarecanucks.com) and I were testing the Rampage II Gene at the same time and both our motherboards were incapable of running an Elpida-based memory kit over 930MHz, no matter what voltage or cas latency. After a few hours (okay, 10+) I found a reasonably stable configuration when playing with the Back-to-back Cas Delay timming. There are a couple of weird issues with this timing, though.

    1) The instability caused by this timing doesn't scale as you'd expect. It's equally unstable set at "0" as it is set at "10" whereas you'd expect it to be more unstable at a lower value.
    2) The instability itself is not typical for memory instability: instead of getting a BSOD or a reboot, the system just locks up.
    3) The instability seems to be very particular to certain benchmarks. I am perfectly capable of running Superpi 2M as much as I want and even copying large files doesn't cause the system to lock up, but after 2 seconds of Superpi 4M the system hangs.
    4) In dual channel configuration, there's absolutely no issue whatsoever with this timing: this value set at "0" is perfectly stable
    5) The issue cannot be solved by increasing the voltage or loosening the timings. The setting "10" is as unstable at 1.8v Vdimm and 1.6v Vqpi/dram as it is at 1.65/1.50v


    A fellow overclocker has tested the influence of this timing on the performance in 32M and changing this value from 0 to 12 has a negative effect on the performance as follows: at loop 4, the benchmark is already 1 second slower. Taking into account that the benchmark has 24 loops in total, that would mean that changing this timing will cause you to lose about 6 seconds, maybe 7. And, coincidence or not, this is almost exactly what I'm losing in comparison to the best scores. This "finding" also occurs on other motherboards; initially I was using the DFI Lanparty DK X58-T3eH6 for testing purposes and again I had times around 8 minutes 50s. But, this board gives me perfect 200/2000 stability out of the box ... so I assume the back-to-back cas delay timing has been set to 12 by default on this motherboard to maximize stability.

    In short: the Back-to-Back Cas Delay timing has a significant effect on the stability of you memory overclock, but seems to have an effect on the performance as well. On the next page, we tried to find out how much effect the timing has on the performance.
  • Findings and data

    Data on performance:

    Since we now already know that this particular timing influences the stability of the memory overclock, we now focus on the effect in performance. As you all know, the higher a memory timing is set, the less performance you have. Why? To make it simple: the higher the value, the longer you have to wait for the command (controlled by that timing) can be issued. The lower the value, the shorter the waiting period, thus the faster the command is issued.

    We used Superpi 32M and Lavalys Everest to show the performance differences.

    Madshrimps (c)


    As you can see, a good B2B-setting can mean the difference between a good and a very bad 32M result.

    Madshrimps (c)


    Quite a spectacular decrease in the memory read bandwidth going from 6 to 12 or even 10 to 12.

    Madshrimps (c)


    The memory write bandwidth decrease seems to be more subtile than what we saw in the graph above, but going from 10 to 12 still has quite a big effect on the performance.

    Madshrimps (c)


    Very big decrease in performance, once again!

    Madshrimps (c)


    As you can see, this timing has nothing to do with the latency of the memory, but only with the bandwidth throughput.

    Findings:

    When going over the different graphs, it's more than clear that this timing has a dramatic effect on the performance. More specificly, on the memory bandwidth. Underneath you find an overview of the effect of the different aspects of tuning the memory subsystem on the different benchmarks.

  • Memory frequency: 1600CL7 versus 2000CL7
  • Cas Latency: 2000CL9 versus 2000CL7
  • Back-to-Back Cas Delay: 1600CL7_12 versus 1600CL7_6

    Madshrimps (c)


    Basicly, I calculated the effect of changing on of the three variables of the tests in above section. Each set of bars represent the effect of the three variables in a certain benchmark. The longer the bar is, the more effect a variable has in this specific test.

    Another way of looking at this would be to find a match of low-clock and high-clock settings in terms of performance:

    Madshrimps (c)


    Most interesting result is of course the 'LE - copy' result as 1600CL8 can outperform 2000CL7. Who needs high frequency memory anyway?
  • Conclusive thoughts

    Conclusive thoughts

    Madshrimps (c)


    Looking back at the findings in terms of stability and performance, this Back-to-Back Cas Delay timing reminds me of the Performance Level setting on the Core 2 Duo platforms we've been playing with for so long now. In fact, I think it's pretty much the equivalent since the performance level timing is described as tRD, or Read Delay, which is pretty similar to what the B2B timing is referred to: Burst Read Delay. The hardware enthousiasts will agree with me that the tRD timing was one of the most powerful on the C2D platform, especially in terms of performance.

    As we already explained in the second part of the first page, this timing is vital when trying to stabilize your high-frequency memory overclock. Both myself and Leeghoofd, my fellow Madshrimps reviewer, have experienced exactly the same behavior when trying to improve stability over 1GHz memory (2GHz effective): increase B2B to 10 or even 12 and you'll be able to get it running flawlessly. The downside to this story is of course the loss in performance.

    For those people who have an i7 processor which has a locked multiplier, this timing might be the key to a higher BCLK frequency, especially in combination with high-frequency memory. As already said, on the Rampage 2 Gene, I was only able to run 200/2000 when increasing the B2B timing to a value of 12. For those who want to tune their memory for highest performance, this timing might also be interesting when your memory kit isn't of the most high-binned stack. As the table on the previous page already showed: 1600CL8 isn't slower than 2000CL8 by definition, as long as you're able to keep the B2B value as low as possible.

    We already sent this feedback to different motherboard manufacturers and MSI already gave us a beta bios to play with the B2B timing. Strangely enough, apart from Asus and MSI, there's no other motherboard manufacturer who has enabled this timing in the bios. Judging from overclocking capabilities and memory performance, most motherboards have this particular timing set at 10/12. Also, the Asus motherboard reports the auto setting as "0", but we are not entirely sure the timing is indeed set to a value of "0" since first tests lead us to think the auto setting is rather "4" than a real "0". Let's hope other manufacturers will follow and give the end-user the opportunity to manually change this rather important memory timing.

    More tests will be conducted soon and you'll hear from us in the forums!

    To end with, I'd like to thank:

  • Milan from Asus for the Rampage 2 Gene
  • Manu from Tones for the Core i7 965
  • Leona, Hendry and Eric from MSI for the motherboard and taking the time to answer my mails
  • Albrecht from Madshrimps for providing me with factual data on the B2B timing.
  •   翻译: