Intel Core 2 on 45nm: Performance, Overclocking, Power Usage

CPU by piotke @ 2007-10-29

Intel is launching their successor the popular Conroe CPU, build on 45nm manufacturing process, it boasts reduced power consumption and has 50% more L2 cache. The first product out the door is a quad core beast dubbed QX9650. We take this new creation through its paces, comparing performance, power consumption and venturing into overclocking land, where sub zero cooling is the norm.

Introduction

Introduction

A little over a year ago we saw Intel take back the performance crown when they launched their new CPU based off the Pentium M series, dubbed “Core 2 Duo” this CPU held 2 physical cores inside one heatspreader, and it’s performance was stellar even at lower clock speeds. Build on 65nm process the Core 2 Duo could be manufacturered at reduced cost and proved to have quite a bit of headroom in the speed department. The top of the line model was the Core 2 X6800 clocked at 2.93Ghz with 4Mb L2 cache and 266Mhz FSB.

Since then more affordable CPUs have been added to the Core 2 line-up, with lower end models receiving less L2 cache to reduce cost, newer revisions released this year got a FSB bump to 333, and we have one in for test here today too, the Core 2 E6850 is clocked at 3Ghz (9x333) and has 4Mb L2 cache; this CPU surpasses the performance of the original X6800 but costs only ~$280 at time of writing. AMD has yet to reveal a CPU which can match the Core 2 in price/performance, and while Intel it still in the lead, they are not sitting by idly.

Back in October last year Intel released a press statement regarding the switch to 45nm manufacturing process, the 65nm CPU had a code name “Conroe”, the 45nm CPUs got a new one: Penryn. We are now at the end of October 2007 and Intel is going public with 45nm processors, you’ll see a large collection of reviews and articles on the web this week, covering not only the Penryn, but also its larger brother, the “Yorkfield”. The latter is a Quad Core CPU, and where the Penryn has 1x6Mb L2 cache, the Quad Core has 2x6Mb L2.

Madshrimps (c)


The CPU Intel provided us was an Engineer Sample of the QX9650 CPU, running at 3Ghz with 333 FSB (a multiplier of 9x), with an estimated retail price of $999. Still on socket 775 this new CPU should work on most recent motherboards.

Before we get started, a truly meant thank you goes out to Forcom.be for providing us with most of the hardware used in this review.

Madshrimps (c)





What we have in store for you today:

Madshrimps (c)


  • New Intel Core 2 Quad QX9650
  • Intel Core 2 Duo E6850
  • Motherboard compatibility and support
  • Performance comparison single/multi core with different applications and games
  • Power Usage and Temperatures
  • Overclocking and Performance Scaling

    Let’s take a look at our test setup ->
  • Motherboard Support, Test Setup & Benchmarks

    Motherboard Support

    When the Core 2 Duo was launched Intel tweaked their existing 975 chipset to support the new processor, it was but a quick patch and while 975 based boards did okay with the new CPU at default settings, it wasn’t long before enthusiasts wanted more than stock speeds, here they hit a motherboard limit quickly as the 975 based boards were incapable of reaching high FSB speeds, needed for overclocking. The answer to this problem came in the form of the more affordable P965 chipset, this one was build from the ground up with Core 2 support in mind, and it delivered impressive results. Then came along the first Core 2 Quad processor and enthusiasts were again met with lower than expected FSB overclocks on different motherboards, earlier this year Intel released the P35, an updated mid-range chipset with above average Quad Core support, even overclocking was possible to an extent.

    Madshrimps (c)


    Madshrimps (c)
    QX9650 on the Asus P5K before we updated the BIOS


    Back to the QX9650 Quad Core sample we have in the lab today, we first tested the CPU on an Asus P5K motherboard, based on the P35 chipset the BIOS recognized the processor correctly and after a fresh install of Windows XP we were set to start our battery of tests. It didn’t take long for us to find something odd happening; with the CPU running at default speed results were fluctuating between runs, up to 40% in some cases. Only when we disabled Multi Core and ran with a single core the QX9650 acted normally on the P5K board, this was less than ideal of course. Since we already had the latest BIOS the Asus board there was not much else to do than try another board.

    Madshrimps (c)


    Another Asus board to the rescue, the older P5B Deluxe, based on the P965 chipset this board is know for good Quad Core support, but the latest BIOS did not allow for multiplier changes, so we continued our search for a motherboard which could profit from all the power this QX9650 sample has to offer.

    Madshrimps (c)


    Our local Computer Shop Forcom.be helped us out by lending a Gigabyte X38 DQ6, this motherboard is brand new, based on the recently released X38 chipset, the Gigabyte board features a 8 phase digital power regulater (PWM) and our overclocking results with the new Quad Core Yorkfield delivered repeatable benchmarks and reached new heights.


    Test setup and Benchmarks


    Test Setup

    Madshrimps (c)
    CPU
  • Intel Core 2 Quad QX9650 "Yorkfield"
  • Intel Core 2 Duo E6850 "Conroe"
  • Mainboard Gigabyte X38 DQ6 (by Forcom.be)
    Video card Jetway HD 2900 XT (by Dollarshops.eu)
    Memory 2 * 1024 Mb DDR2 PC7200 EPP OCZ
    Other OCZ 600 watt PSU



  • Hexus PiFast: PiFast is an easy-to-use package written by Xavier Gourdon to compute pi with a very large number of digits. PiFast is avalaible on several platforms, download it from here. PiFast can also compute E and a large family of user defined constants.

  • SiSoftware Sandra 2008: Arithmetic and Multimedia CPU benchmarks.

  • SuperPi Mod v1.5 XS: testing PI calculations which stress co-processor and memory sub-systems. We run 1M and 8M calculations.

  • Wprime : Wprime is a simple, easy to use multithreaded benchmark application that can quickly test your processor performance. In contrary to most other simple benchmark applications, wPrime is written to take full benefit of processors with multiple cores, like the new Intel Core 2 Duo or AMD Athlon 64 X2.

  • 3DMark2001 SE: Discontinued Freeware version, however; this benchmark is still valuable as a tool for testing 3D and memory performance.

  • 3DMark06: Freeware version from Futuremark tests, CPU, Memory and graphics.

    Madshrimps (c)
    3DMark06 CPU Benchmark ~ Multi Core Support


  • MAXON CineBench 9.5: this benchmark stresses the CPU and graphics system primarily using OpenGL.

    Madshrimps (c)
    Cinebench Rendering Benchmark ~ Multi Core Support


  • TechArp X264 bench: Simply put, this test measures how fast your machine can encode a short, DVD quality MPEG-2 video clip into a high-quality x264 video clip.

    Madshrimps (c)
    x264 Encoding Benchmark ~ Multi Core Support


    What's x264, you ask? x264 is a free software library for encoding H.264/MPEG-4 AVC video stream. More info about H.264 can be found here. It's ideal for a benchmark because the application (x264.exe) reports fairly accurate compression results (in frames per second) for each pass of the video encoding process and it uses multi-core processors very efficiently.


  • F.E.A.R. Build-In Benchmark.
  • Cryis Single Player Demo manual run-through, average FPS logged with FRAPS.

    Our first batch of tests has both processors at stock frequency (9x333) and using SPD memory timings (400 Mhz 4-4-4-10).

    ->
  • Benchmarks - Multi Core

    Benchmarks – Multi Core

    The benchmarks were run on the QX9650 and E6850 with multi core enabled, this means though that benchmarks which benefit from the extra cores on the QX9650 you will see a large difference, but not all benchmarks support multiple cores, and thus the advantage of the QX is lost.

    First up we compare the raw CPU performance with PiFast,SuperPi and Wprime:

    Madshrimps (c)


    PiFast and SuperPi take no advantage of the extra cores on the QX9650, but the extra L2 cache does pay off, SuperPi 1M improves by ~12%, 32M sees less effect, only ~3%; PiFast even less at only ~1%. Wprime does support multi core, the quick calculation of Wprime 32M is ~53% faster on the Quad Core QX9650, the longer Wprime 1024M also improves by the same amount.

    Moving on the synthetic benchmarks of Sisoft Sandra 2008:

    Madshrimps (c)


    The multimedia tests more than double, the Multimedia Int improves by 135%! The Arithmetic tests are even more impressive, increasing up to 147%.

    Madshrimps (c)


    The Cinebench includes single core as well as multi core results, you can see that the QX9650 already has a small lead over the E6850 when both are using only 1 core, about ~10% better. The benchmark does not scale perfectly to the maximum amount of cores, otherwise the E6850 should be at 1000 and the QX9650 at 2000.

    The x264 encoding test next:

    Madshrimps (c)


    The more cores the merrier, close to 100% increase at the 1st pass, and up to 123% faster on the 2nd pass.

    Let’s take a look at the game benchmarks ->

    Game Benchmarks

    Game Benchmarks

    Futuremark releases synthetic gaming benchmarks every few years, in our chart below we’ve included all their 3DMark benchmarks since 2001SE, until 3DMark06 none of those benchmarks supported multi cores so results should be close for most:

    Madshrimps (c)


    As expected, once you no longer run CPU specific benchmark or do tasks which depend on more factors than just raw CPU power, the difference diminishes, only 3DMark06 shows a noticeable difference between the two processors, the 3D06 CPU tests shows close to 100% performance scaling when the amount of cores is increased. The overall score goes up by ~12%.

    Our test setup is equipped with an ATI HD 2900 XT which can be considered a “gamer” video card, with enough pixel pushing power to run the latest games; to find out if the extra 2Mb L2 cache and 45nm manufactering process yields a noticeable improvement in games we choose two FPS games; first up FEAR. An older game which doesn’t support multi core:

    Madshrimps (c)


    At 1280x960 High Quality the video card is not the bottleneck as you can see the 170+ average FPS results, still the CPU hardly makes a difference here, the extra L2 cache is responsible for ~1% difference...

    On to a brand new game, Crysis from Crytek is a long awaited spiritual successor of Far Cry, this new game is known to bring the latest and greatest hardware to its knees. We picked up this intriguing tidbit about the game from Shacknews:

    Shack: What is the main limiter for Crysis in terms of GPU, CPU, or RAM? If users are near the low end of the requirements, which should they upgrade first?
    Cevat Yerli: We would say first CPU, then GPU, then memory. But it must be in balance. If you are balanced, we are more CPU bound then GPU, but at the same time at higher CPU configurations we scale very well for GPUs.

    Shack: Is there dedicated support for 64-bit and dual- and quad-core processors, and if so how does the game distribute its tasks? Do you suggest a higher-clocked dual-core over a quad-core, or is quad-core performance enough to give it the edge?
    Cevat Yerli: We support both 64-bit and multi-cores. Multi-core will be beneficial in the experience, particularly in faster but also smoother framerates. 64-bit and higher memory will yield quicker loading times. We recommend quad core over higher clock.

    Crytek is claiming that you should first upgrade your CPU before you buy a new VGA card if you want better performance in Crysis. Also, according the Cevat Yerli, instead of increasing the speed of your Core 2 Duo CPU, you are better of with a lower clocked Quad Core processor. Let’s put these claims to the test shall we?
    Madshrimps (c)


    At 1280x1024 (a rather modest resolution) we are happy to get close to 20 FPS with the HD 2900 XT. Moving from the Core 2 Duo to a Core 2 Quad CPU which has more L2 is and underwhelming experience, we get a boost of ~2%, worth the extra cost?

    For those people playing the Crysis Demo right now on their Windows XP machine we found a nice little tweak which allows you to enable the "Very High" quality settings under XP/DX9, making the game look very close to what you get in Vista/DX10.

    Let’s forget about multi cores for a moment and compare single core performance ->

    Benchmarks - Single Core

    Single Core Benchmarks

    The following benchmarks will show you the advantage of the newer Penryn 45nm core with 6Mb L2 over the older Conroe. While these results will not be 100% identical to the final Penryn CPUs, the single core results will be very close.

    We disabled the extra cores in Windows and ran through the benchmarks with only 1 core enabled, since both CPUs are running at the same frequency, any difference found can be attributed to the 50% extra L2 cache and other silicon improvements:

    Madshrimps (c)


    SuperPi really reacts well to the extra L2 cache, the QX9650 is ~12% better. Wprime shows less difference but still 6%. The attentive reader might have noticed that the Super Pi 1M result with 1 core enabled is slightly higher compared to having all cores enabled; the difference is negligible, but it’s there.

    Next up is the x264 encoding benchmark:

    Madshrimps (c)


    Higher is better, and the QX9650 manages to outdo the E6850 in this benchmark as well, between 5~11%.

    Last up are the Futuremark 3DMarks:

    Madshrimps (c)


    Strangely enough in the 3DMark2001SE benchmark which has no multi core support, the score drops by quite a lot for the E6850, the QX9650 also scores lower but the drop is less spectacular. The results for 3DMark06 are lower as expected; the difference is minimal in favor of the QX9650.

    Overclocking next ->

    Overclocking

    Overclocking

    Before the official launch of the Yorkfield Intel had some samples out “in the wild” these showed promising overclocking results, some reaching up to 4.6Ghz with air cooling! (+53%) So our hopes were high when we received this engineering sample from Intel.

    We mounted the Intel standard heatsink, left the vcore setting at default in the BIOS and raised the FSB until we noticed instability. Intel reports default BIOS for the QX9650 at 1.25v, the Gigabyte X38 board was more generous, providing 1.33v at default setting. Even with the extra juice the maximum overclock was 3.3Ghz. Time to raise the vcore, at 1.65v vcore the CPU ran stable at 3.85Ghz. Do note that we are still using the stock Intel cooler!

    Madshrimps (c)
    Overclocking on air


    Time to take it a step further:

    Madshrimps (c)


    Gigabyte offers an impressive vcore range on their X38 DQ6 motherboard, but to be able to cope with the extra heat we need to go more extreme, air cooling won’t do, even water cooling will run into troubles. So we installed an Asetek VapoChill LS, this compressor based cooling system works pretty much like a refrigerator, but instead of cooling a large area we focus on a small surface, size of the CPU heatspreader. At idle the VapoChill LS can reach operating temperatures as low as -60°C.

    A bit of info the voltage control in the BIOS of the Gigabyte X38:
  • CPU voltage, adjustable from 0,5 Volt up to 2,35 Volt , and even then there are some options to rise it a bit extra.
  • DDR2 voltage, can be raised in step from 0,05 V up to 1,55 V. this means that you can go as high as 1,8V (default) + 1,55 = 3,35 Vddr2. Deadly for most memory modules. So we didn't push higher than 2,5 Volt in total, which is already high.
  • Chipset voltage can also be raised with + 0,2 up to +0,35 volt.

    We gradually raised the vcore in small steps until we reached 2v, which is insanely high for a 45nm CPU. At this voltage we could run stable at an impressive 4.9Ghz with all 4 cores enabled.

    Madshrimps (c)


    The maximum OC resulted in a SuperPi 1M calculation in less then 9,4 seconds, an impressive score without a doubt. The memory was running at only 452 MHz, so there was still room for fine-tuning.


    Madshrimps (c)
    Overclocking on Single Stage cooling


    Since overclocking is limited to the slowest factor, with 4 cores, the “worst” core will determine the maximum stability; we disabled 3 cores and continued our overclocking tests with only 1 core. The results were the following:

    Madshrimps (c)
    Click for larger image.


    Reaching 5191Mhz SuperPi 1M finishes after only 8.9 seconds, which put this score in TOP 10 bracket of the world’s fastest SuperPi 1M scores.

    Let’s take a look at how the performance scales and how much power this overclocked system using ->
  • Power usage and performance scaling

    Performance scaling

    We ran through the SuperPi 1M and Wprime 32 benchmark at different CPU speeds to get an idea what to expect if CPU speeds were to scale beyond speeds we were able to reach today.

    Do note that these performance scaling charts give but a rough idea of how things will “play out”

    Madshrimps (c)


    SuperPi 1M at 5750Mhz can expect scores little under 8 seconds, it’ll take more than 6Ghz to go below 7 seconds though, and that’s still to be seen… as the prediction is being a bit optimistic on the results.

    Madshrimps (c)


    The Wprime 32 benchmark shows similar gain in performance as the frequency increases, although the difference is slightly lower compared to SuperPi, as the multi core performance doesn’t scale perfectly.

    A look at the voltage vs frequency as we increased the vcore steadily by 0.05v we tested for maximum overclock:

    Madshrimps (c)


    At each 0.05v step we see a speed increase between 100~200Mhz, not too bad at all, since the default vcore is quite conservative, the potential for a faster edition of the QX9650 would only require a slightly higher default vcore.

    Power usage

    While we’ve been focusing on the performance benefits of the 45nm and extra L2 cache, there are those who like to keep an eye on the power consumption of their machine. While a single desktop will hardly make a difference in the large picture, a million dollar server park with hundreds of machines using a few percentage less per machine will show a difference at the end of the month.

    So here’s what you can expect for full system usage with an ATI HD 2900 XT video card installed, please note that both CPU and VGA are loaded in this test:

    Madshrimps (c)


    At idle setting the difference between the Quad Core QX9650 and Dual Core E6850 is practically none existent. But even under load the difference is minimal, as expected with all cores enabled the QX9650 shows the highest numbers, but let’s be honest, 291Watt full system load with a Quad Core powered system and power hungry ATI VGA card is hardly “a lot”. The E6850 uses only ~20 watt less when under full load.

    Comparing the single core vs multi core result shows us that the extra cores don’t take up that more power, between QX9650 SC vs MC there is only ~21W. Do note that the cores were disabled in Windows only; not through hardware or the BIOS.

    So the new 45nm Yorkfield proves to be quite power friendly at intended voltage and frequency, let’s see what happens when we start to overclock, adding extra voltage along the way. We only stressed the CPU this time around; at default setting this gave a maximum system usage of 210 Watt.

    Madshrimps (c)


    It’s hard not the notice the discrepancy in this chart, at 3850Mhz using 1.65v the system is using close to 300 Watt, this was the highest air cooled stable speed we could reach with all 4 cores enabled. The next result is with the subzero cooling from Asetek. At 4050Mhz and 1.7v vcore the CPU is running much cooler, and somehow power usage goes down, dropping 35 Watt to 265. From that point forward it increases again, topping out at 386W at 4900Mhz with 2v vcore. Does anybody know what’s going here regarding the 3.8Ghz vs 4.05Ghz result?

    Update: We received this answer from Intel, explaining the higher power usage at higher temperatures

    Matty @ Intel:

    Yes, the power consumption is reduced when the temperature of the processor is lowered.

    There are many things that happen in a CPU when the temperature is changed and to elaborate further on the processor specific causes we have to look at the origin of the power consumption. We can divide the total consumed power into two main parts, static power (Ps) and dynamic power (Pd).

    The static power consumption is what we usually call the leakage. In an ideal transistor, it should completely shut off the channel between the source-drain, gate-source and gate-drain. Transistors are far from ideal, and the current leaks between these parts and the substrate of the processor, and this is heavily dependent on the temperature.
    For example, going from room temperature to 85C (~60C difference) increases the leakage power by a factor of more than 50. Thus, reducing the temperature with the same amount will make a huge impact on Ps.

    Dynamic power consumption is emitted during the short amount of time that the transistor switches. Lower temperature reduces the resistance in the processor which results in shorter delay/faster switching of the transistors. Shorter delays and less noisy signals also reduce Pd.

    I hope this explanation give you some clarity to the relation between power consumption and temperature. This can even be seen with air cooling: The power consumption is lower just after a load is applied compared to after a while when the temperature has levelled out, even though the load is the same.

    Conclusive Thoughts

    Conclusive Thoughts

    When we first requested a sample from Intel Benelux we were told not to expect an improvement as we saw last year when Core 2 Duo was launched. After all, this 45nm CPU is a tweaked version of the existing Conroe.

    By moving to 45nm manufacturing process Intel can produce more CPUs per wafer, thereby lowering their production costs, another benefit of the die shrink is reduced power consumption and possibility to run at higher frequency.

    Intel launches their top of the line 45nm CPU dubbed QX9650 with 4 cores and a 2x6Mb L2 cache, which is a 50% more cache compared to Conroe. Our performance results show you that clock for clock the Yorkfield is slightly faster (5~12%) in CPU intensive applications and benchmarks.

    Gamers will find that sticking with a Dual Core CPU will be enough for now, the extra 2 cores on the QX9650 could not show a noticeable advantage in Crysis, one of the most anticipated and multi-core ready games of this year.

    Madshrimps (c)
    A wafer of 45nm CPUs (image courtesy of Intel)


    If you look past the $999 price tag and quirky motherboard support (for now) for this latest CPU from Intel you’ll find that there is nothing else out there currently on the market which matches the raw multi core power.

    Those into overclocking will certainly enjoy the extra headroom provided by the lower default vcore. While our overclocking attempts with air cooling were below average, once we switched to single stage phase change cooling this Quad Core scales beyond what its 65nm siblings has to offer, reaching close to 5Ghz, very impressive!

    At the end of the day though you have to wonder if this $999 processor, which is only marginally faster than a $270 counterpart in most everyday applications, is worth the extra cost. Of course not, as with the launch of every new processor, the top of the end will come at a large premium. Money conscience people will wait until this 45nm goodness trickles down to the mid range and low end products, where they will surely shine in the performance/price department. No news of those gems yet though, but keep you eyes open in the next few months.

    Intel is not sitting on their laurels; they are aggressively pushing up the performance of their products, introducing new technologies at a breakneck speed. Can AMD keep up? The future will tell.

    We thank Intel Benelux for their cooperation, until next time!
      翻译: