Late 2007 saw the release of AMD’s new native Quad-Core Opteron processors. Designed initially for servers and workstations and codenamed ‘Barcelona’, the early versions ran at clock frequencies of up to 2Ghz, with faster varieties to follow. In common with contemporary releases from rivals Intel, Barcelona was available in special edition, standard and low-power versions.
AMD’s approach to placing four cores on the same piece of silicon presents a true move forward in mainstream processor manufacturing: Intel approached the problem of getting four chips together essentially by welding two dual-core processors together. While Intel’s approach seems intuitively simpler (and indeed the release of its quad-core processors considerably pre-dated AMD’s efforts), there are potential problems with the resource sharing that is required: shared caches can cause bottlenecks while the bus connecting the two cores may present something of a nightmare when trying to optimise multi-threaded applications. Four cores on one piece of silicon provides a custom solution that is designed from the start to do the job to which it is applied.
Because of the architecture, the way in which the instructions are handled and processed, and the physical constraints inherent in sequential processing, multi-core processors handle certain applications very well, while others will see little or no improvement. Applications with multiple independent threads can take advantage of four processing centres by running four threads and applications that require intensive floating point arithmetic also fair well. Processor intensive software like video and image editing and encoding, ray tracing and, tellingly, benchmarks used to compare processors, will usually perform extremely well. Simpler applications may see little or no improvement over a single core processor but there should be no degradation of performance. But with AMD boasting benchmark results coming in at 65-70% quicker than the highest-clocked dual-core Opteron (Santa Rosa, a 3.0 GHz clocked Opteron 2222), it is evident that the quad cores can make a big difference in the right situation.
AMD’s Barcelona was designed to significantly improve the Opteron’s SSE unit and in many metrics the performance has doubled. The following table presents a useful comparison of the Barcelona to its earlier incarnations:
Metric | Pre-Barcelona | Barcelona |
---|---|---|
SSE execution width | 64 bits wide | 128 bits wide |
Instruction fetch bandwidth | 16 bytes/cycle | 32 bytes/cycle |
Data cache bandwidth | 2 x 64 bit loads/cycle | 2 x 128 bits loads/cycle |
L2 cache/memory controller bandwidth | 64 bits/cycle | 128 bits/cycle |
Floating-point scheduler depth | 36 dedicated x 64-bit ops | 36 dedicated x 128-bit ops |
Barcelona processors use 3 levels of cache. Levels 1 and 2 (64KB and 512KB respectively) are dedicated to a particular core as with previous Opterons and Athlon CPUs, while the Level 3 cache (at 2MB) is shared among all cores.
Key to improvements in similar chips released by Intel was the reduction in power required to run the unit. While later Pentium chips were essentially power-hungry convection heaters its Penryn chips were comparatively chilled. As mobile devices become more compact, and chips become more powerful, keeping energy usage and heat output within sensible levels suddenly becomes an interesting problem.
Barcelona will be fabricated on AMD’s 65nm SOI process, allowing lower voltages and TDPs than was previously possible. As of release Barcelona’s TDP was around 95W. AMD’s technology allows separate power levels to be applied to the CPU cores and to the memory controllers. This behaviour is dynamic and application independent, so if the processor detects heavy memory usage but comparatively low core utilisation (or vice versa) then it can change voltage delivery accordingly. Also in use is an enhanced version of AMD’s PowerNow technology. PowerNow allows individual cores to operate at differing clock frequencies depending on its requirements.
AMD did the sensible thing and tried its best to ensure backwards compatibility with both existing hardware and software, claiming it will run on existing AM2 slot motherboards with just a simple bios upgrade, though AM2+ slot boards are required for the full benefit of the processor.