Intel Core i7 In-Depth Performance Scaling Analysis

OC-Team.be by massman @ 2009-08-11

In this article we take a closer look at the overclocking aspects of the Core i7 platform. Core, uncore, QPI, memory, bclk... all have been analyzed with performance scaling tests to you help extract the most performance from your setup.

Introduction

Madshrimps (c)


The more attentive Madshrimps reader is probably aware that we've already had a couple of 'in-depth performance scaling'-articles in our review-collection, more specifically regarding AMD's Phenom and Phenom II CPUs. In those articles, we came to the conclusion that overclocking the integrated memory controller has a bigger effect on the performance than we'd originally assumed and helped us reach new heights when benchmarking the Phenom II under LN2. Today we turn towards the blue side of computing as we have a look at Intel's Core i7 technology and how overclocking different aspects have an effect on the performance.

As you probably remember from our Core i7 launch article, overclocking Intel's latest isn't exactly the same as overclocking the C2D. With Core 2 Duo configurations, it basically came down to increasing the voltage, decreasing temperatures and hit the '+'-symbol on your keyboard to change the FSB frequency. Although this may be a bit oversimplified (at a certain level, there's always more tweaking involved), it's just to point out the difference in overclocking simplicity, or rather overclocking ease, between the old and new technology as with the Core i7, you have to worry about a lot more than those three factors. There's the BCLK which seems to be limited quite badly by the X58 chipset, the memory of which the voltage shouldn't exceed 1.65v to prevent hardware failure, the hyperthreading which isn't that good for those who like low temperatures as it adds quite some heat and so much more to take into account.

In our last DDR3 memory round-up, we noticed that manufacturers are already pushing the triple channel kits to perform at very high frequencies and, although it's not that difficult to find low-priced triple-channel i7-ready memory kits, certain manufacturers are now releasing memory kits with the so-called next-gen Elpida chips, which are pretty much guaranteed to work at 1GHz 7-8-7 frequencies. Sounds great ... until you see the price.

In this article, we'll not only try to find out what's important when overclocking, we'll also be spending quite some time on the memory performance scaling. For those who are interested in those high-priced memory kits, it'll be a true eye-opener!

Test setup and methodology

Test setup

We prepared the following test system to be used for all performance results mentioned on the next pages:

Madshrimps' Intel Test Setup

Madshrimps (c)
CPU Intel Core i7 965 (click for picture)
Cooling Noctua NH-U12P
Mainboard DFI Lanparty DK X58-T3eH6
Memory 3 * 2GB OCZ Reaper PC-14400
Other
  • Sapphire 4870X2
  • Antec 1000W PSU
  • Western Digital 320Gb SATA HDD
  • Windows XP SP3


  • Methodology

    The following benchmarks were used:

  • Lavalys Everest: Memory latency
  • SuperPi 1M
  • Wprime 32M
  • 3DMark2001SE: Nature

  • QPI Link Frequency Performance Scaling

    QuickPath Interconnect Link

    Most people unfamiliar with the technical part of Intel's Core 2 and Core i7 platforms state that the BCLK is for i7 what the FSB is for Core 2. While that would indeed simplify things a lot, it's just not correct: the FSB, or Front Side Bus, frequency is the link between the Northbridge and the processor, which in i7 terms would be the QPI link. Basically, all data sent by the graphics card, sata, audio and so on travels through the IOH (Input/Output Hub) via the QPI link to the processor. In contrary to the C2D platform, the QPI link does not transfer any data from the memory banks since the memory controller has been re-assigned from the Northbridge to the CPU.

    So to translate all this to scaling; in theory, by increasing the QPI link, we should only see a gain in 3D performance.

    Madshrimps (c)
    (click for bigger version)


    Although the QPI link is the replacement of the FSB, which was one of the biggest factors in overclocking performance on the older generation of Intel processors, the link speed is now so high that even at the lowest setting available the performance is pretty much maxed out. We see a difference of less than 2% when increasing the frequency by 33%, which says enough.

    We expect the QPI link to have a bigger effect once its bus speed will be used more, for instance in setups with two or more i7 processors. However, for desktop purposes, it's not necessary to even bother to overclock the QPI frequency. For the overclockers under us, especially those with a locked CPU, it's in any case best to set the QPI Link frequency multiplier at the lowest setting available.

    Uncore Frequency Performance Scaling

    Uncore: an introduction

    As already mentioned on the previous page: the memory controller is now integrated inside the processor. Although the term integrated memory controller seems perfectly suitable for this part of the processor, Intel decided differently. Apparently, the thinktank inside Intel couldn't come up with anything other besides uncore ... which basically comes from un-core, or non-core.

    Now, the uncore serves as gate between the memory banks and the processor core, so increasing the frequency should have quite a nice impact on the performance, especially in memory-related applications. Also, since i7 now has a triple channel memory configuration, increases in performance should be more noticeable since the bandwidth throughput of three memory banks is higher than that of two banks.

    Madshrimps (c)
    (click for bigger version)


    Although the performance increase is not spectacular, we see that a higher Uncore frequency has a significant effect on the performance in 3D: over 5% increase with an elevated Uncore by 44%.

    The biggest advantage of overclocking the Uncore frequency is not the performance, however, but the support for higher clocked memory. Since the Uncore frequency has to be at least two times the frequency of the memory, an increased Uncore will lead you to higher memory clock frequencies.

    CPU Frequency Performance Scaling

    The Core i7 processor

    More resistors inside means a larger die, so the processor socket size has to be increased. Instead of the 775 pins we are used to, we now use 1366 pins for the main component in our system. The Core i7 is actually quite an interesting product from Intel since they've now integrated the memory controller inside the processor, whereas it used to be inside the Northbridge. Also, with the re-introduction of Hyperthreading, users now have 8 threads running instead of the 4 on a regular Core 2 Quad processor. 8 Threads require more memory bandwidth, so Intel also decided to add a third memory channel to feed the 8 threads with all the necessary bandwidth.

    As you can hear: a lot of new technology, but how does it scale?

    Madshrimps (c)
    (click for bigger version)


    I believe it doesn't come as a shock to anyone that an increase in raw CPU frequency gives you the best performance increase. In 3D applications, we see that the increase in performance can be upto 75% with an increase of 131% in cpu frequency, which is far from bad. Superpi and Wprime are almost scaling 1:1 with CPU frequency.

    BCLK Frequency Performance Scaling

    Base clock

    The BCLK frequency is a new term in the Intel technology terminology, at least for most of us. Actually, the term BCLK has been used ever since the beginning of Intel's product line, even on the Pentium 1. This clock frequency has nothing to do with sending data, but is merely a reference clock to derive all other frequencies from. In other words: raising the BCLK frequency itself has no effect on the performance whatsoever, but it does have an effect on other clock frequencies (that are derived from the BCLK frequency with multipliers).

    Madshrimps (c)
    (click for bigger version)


    As expected, no real performance increasement when overclocking the BCLK frequency.

    Memory frequency Performance Scaling

    Memory frequency Performance Scaling

    The memory configuration scheme has become a little bit different on Core i7 as Intel decided a third memory channel was needed to fully support the bandwidth needs of the processor.

    Madshrimps (c)
    (click for bigger version)


    It seems that increasing the memory frequency doesn't have that much of an effect, except in the Lavalys Everest Read Bandwidth benchmark where the gain is very noticeable. However, for gaming purposes it doesn’t really pay off.

    In-depth memory performance scaling - Latency

    Latencies

    Next to the memory frequency we have of course the latencies of the memory which also determines the performance of your memory. In short: the lower the latencies, the faster data reads and writes will be executed, thus the faster data is available.

    Madshrimps (c)

    (click for bigger version)


    Althought there's a very small difference; CL6 isn't that much faster than CL8 or even CL9.

    Madshrimps (c)

    (click for bigger version)


    Changing from 1T to 2T has barely any effect: not even 1%!

    In-depth memory performance scaling - Channels

    Triple channel

    As already mentioned before, one of the novelties introduced with the Core i7 launch is the triple channel memory configuration. Apparently Intel believes it's necessary to add a third channel to the platform to keep the processor happy in terms of memory bandwidth. Let's have a look.

    Madshrimps (c)

    (Click for bigger version)


    The performance increase going from single to dual channel is indeed quite noticeable, even in gaming environments, however, the third memory channel doesn't really add that much to the whole performance picture.

    Gathering all information and conclusive thoughts

    Gathering all information

    As we expected, overclocking the processor core frequency gives the biggest performance boost: not only in the processor-related benchmark such as Super Pi and Wprime (where the increase is 1:1), but also in 3D environments where a higher clocked CPU can make the performance go up. If we look at the performance increase for a 4GHz overclocked i7 920 (2.67GHz), we see that there's an increase of more than 10-15% in the 3DMark01 Nature subtest, which is quite a nice boost given how relatively easy it is to reach 4GHz with a 920, even on plain air cooling.

    The Uncore frequency comes in second, but with the recent findings on the Core i5, it seems that this might be more related to a marketing scheme than a pure technical reason.
    "... For Intel-based platforms, the story isn't that much different: the memory controller is now integrated in the processor and the clock frequency of it can form a bottleneck with high-frequency memory. The novelty about the i7 was the newly introduce third memory channel, which should increase the memory bandwidth significantly. Many tests, however, confirm that the extra channel doesn't have that much of an effect in most benchmarks, let alone in daily computing activities. And that is a big problem when trying to sell the product: who wants to pay more for something that doesn't work in the first place? The technique is quite simple: make it look like it works. And that's where the limitation of "uncore >= 2 x memory" kicks in: with an even lower uncore frequency; the added memory channel would have had even less effect than it has now. Less than almost insignificant, that's bad PR. Technically, it seems possible for the uncore to run at a lower ratio than 2:1, but weirdly enough none of the motherboard manufacturers seem to have added this option to their bios, although it would help people reaching 2000CL7 on air cooling since the memory overclock is very often limited by the uncore frequency ..."
    (~ Forum post: "A first look at the MSI P55-GD80")
    In short: the third memory channel increases the bandwidth SO much that even a 4GHz uncore can't really keep up with the enormous amount of data coming from the memory. Overclocking helps to reduce this problem.

    Last in row would be overclocking the memory. Quite frankly, after having reviewed the figures I couldn't believe the memory has so little effect on the performance; only in the memory-related benchmarks such as Lavalys Everest you can really see a difference. But in any other 24/7 application 1400CL9 is almost as fast as 2000CL7. To back me up, I use a quote from a X58 memory round-up in which 8 memory kits were tested.
    "... suppose you got an average performing 1333 kit (almost all do cas 7 nowadays), and compare it to the best performing midrange kit. The average performance increase you can expect from your system is 3,59%. I’ll let it up to you to decide whether this is a worthwhile performance increase, it is certainly something you will not be noticing in your day to day pc tasks (you need about 10% speed difference to really note a difference). The conclusion of today must therefore be that any of the tested kits will do fine in your i7 rig, of course, if you do want the last bit of performance, and if you’re an avid benchmark enthusiast (like myself), just go on and pick up that upper midrange kit you’ve been drooling on …"
    (~ "X58 Triple Channel DDR3 Memory Roundup! 8 Mid-range Kits Tested"
    Again, it's very simple: if you don't want the absolute maximum performance, it's just not worth looking at memory kits in the higher regions of the pricing charts; for most, if not all, daily applications you'll see no difference whatsoever!

    By the way, in case someone missed it, on some X58 motherboards it's actually quite easy to tune your low-clocking memory kit to have the performance of a highly overclocked memory kit by changing the Back-to-back Cas Delay memory timing.
    "... For those who want to tune their memory for highest performance, this timing might also be interesting when your memory kit isn't of the most high-binned stack. As the table on the previous page already showed: 1600CL8 isn't slower than 2000CL8 by definition, as long as you're able to keep the B2B value as low as possible …"
    (~ "Stabilizing your memory overclock on Core i7 platform - Back-to-Back Cas Delay Investigated"


    Conclusive thoughts

    Maybe it's because of the many quotes and referrals to other articles on the very last page of this one, but it's more than obvious that there's so much more to tell about the Core i7 and X58 technology than what you can find in this article. In fact, I tried to stick to the subject (performance scaling) as much as possible since elaborating would only make things more complicated. Even so, if you're interested in reading more about the overclocking aspects of the Core i7 technology you can also read the couple of forum posts on the Madshrimps forums, for example the findings using the MSI X58 Eclipse SLI motherboard in which we talk about the QPI frequency and how it cán be a limitation in slow-mode.

    I have to put an end somewhere and I think this is the ideal moment to wave you all a goodbye and shout out the famous words 'till the next one'. With the next new platform, Core i5, almost ready to be launched, I'm sure that we'll have a new one in a couple of months. Until then, have a good one!



    I hope you enjoyed reading this OC report, until next time, click the banner below to read up on our previous overclocking endeavours:

    Madshrimps (c)

      翻译: