In early 2000, Intel unveiled details of its first new IA-32 core since the Pentium Pro – introduced in 1995. Previously codenamed Willamette – after a river that runs through Oregon – it was announced a few months later that the new generation of microprocessors would be marketed under the brand name Pentium 4 and be aimed at the advanced desktop market rather than servers.
Representing the biggest change to Intel’s 32-bit architecture since the Pentium Pro in 1995, the Pentium 4’s increased performance is largely due to architectural changes that allow the device to operate at higher clock speeds and logic changes that allow more instructions to be processed per clock cycle. Foremost amongst these is the Pentium 4 processor’s internal pipeline – referred to as Hyper Pipeline – which comprises 20 pipeline stages versus the ten for the P6 microarchitecture.
A typical pipeline has a fixed amount of work that is required to decode and execute an instruction. This work is performed by individual logical operations called gates. Each logic gate consists of multiple transistors. By increasing the stages in a pipeline, fewer gates are required per stage. Because each gate requires some amount of time (delay) to provide a result, decreasing the number of gates in each stage allows the clock rate to be increased. It allows more instructions to be in flight or at various stages of decode and execution in the pipeline. Although these benefits are offset somewhat by the overhead of additional gates required to manage the added stages, the overall effect of increasing the number of pipeline stages is a reduction in the number of gates per stage, which allows a higher core frequency and enhances scalability.
In absolute terms, the maximum frequency that can be achieved by a pipeline in an equivalent silicon production process can be estimated as:
1/(pipeline time in ns/number of stages) * 1,000 (to convert to megahertz) = maximum frequency
Accordingly, the maximum frequency achievable by a five-stage, 10-ns pipeline is: 1/(10/5) * 1,000 = 500MHz
In contrast, a 15-stage, 12-ns pipeline can achieve: 1/(12/15) * 1,000 = 1,250MHz or 1.25GHz
Additional frequency gains can be achieved by changing the silicon process and/or using smaller transistors to reduce the amount of delay caused by each gate.
Other new features introduced by the Pentium 4’s new micro-architecture – dubbed NetBurst – include:
- an innovative Level 1 cache implementation comprising – in addition to an 8KB data cache – an Execution Trace Cache, that stores up to 12K of decoded x86 instructions (micro-ops), thus removing the latency associated with the instruction decoder from the main execution loops
- a Rapid Execution Engine that pushes the processor’s ALUs to twice the core frequency resulting in higher execution throughput and reduced latency of execution – the chip actually uses three separate clocks: the core frequency, the ALU frequency and the bus frequency
- a very deep, out-of-order speculative execution engine – referred to as the Advanced Dynamic that avoids stall can occur while instructions are waiting for dependencies resolve by providing a large window of from which units choose>
- a 256KB Level 2 Advanced Transfer Cache that provides a 256-bit (32-byte) interface that transfers data on each core clock, thereby delivering a much higher data throughput channel – 44.8 GBps (32 bytes x 1 data transfer per clock x 1.4 GHz) – for a 1.4GHz Pentium 4 processor
- SIMD Extensions 2 (SSE2) – the latest iteration of Intel’s Single Instruction Multiple Data technology which integrate 76 new SIMD instructions and improvements to 68 integer instructions, allowing chip grab 128-bits at a time in both floating-point and integer and thereby accelerate CPU-intensive encoding and decoding operations such as streaming video, speech, 3D rendering and other multimedia procedures
- the industry’s first 400MHz system bus, providing a 3-fold increase in throughput compared with Intel current 133MHz bus.
Based on Intel’s ageing 0.18-micron process, the new chip comprised a massive 42 million transistors. Indeed, the chip’s original design would have resulted in a significantly larger chip still – and one that was ultimately deemed too large to build economically at 0.18 micron. Features that had to be dropped from the Willamette’s original design included a larger 16KB Level 1 cache, two fully functional FPUs and 1MB of external Level 3 cache. What this reveals is that the Pentium 4 really needs to be built on 0.13-micron technology – something that was to finally happen in early 2002.
The first Pentium 4 shipments – at speeds of 1.4GHz and 1.5GHz – occurred in November 2000. Early indications were that the new chip offered the best performance improvements on 3D applications – such as games – and on graphics intensive applications such as video encoding. On everyday office applications – such as word processing, spreadsheets, Web browsing and e-mail – the performance gain appeared much less pronounced.
One of the most controversial aspects of the Pentium 4 was its exclusive support – via its associated chipsets – for Direct Rambus DRAM (DRDRAM). This made Pentium 4 systems considerably more expensive than systems from rival AMD that allowed use of conventional SDRAM, for little apparent performance gain. Indeed, the combination of an AMD Athlon CPU and DDR SDRAM outperformed Pentium 4 systems equipped with DRDRAM at a significantly lower cost.
During the first half of 2001 rival core logic providers SiS and VIA decided to exploit this situation by releasing Pentium 4 chipsets that did support DDR SDRAM. Intel responded in the summer of 2001 with the release of its i845 chipset. However, even this climbdown appeared half-hearted, since the i845 supported only PC133 SDRAM and not the faster DDR SDRAM. It was not until the beginning of 2002 that the company finally went the whole hog, re-releasing the i845 chipset to extend support to DDR SDRAM as well as PC133 SDRAM.
During the course of 2001 a number of faster versions of the Pentium 4 CPU were released. The 1.9GHz and 2.0GHz versions released in the summer of 2001 were available in both the original 423-pin Pin Grid Array (PGA) socket interface and a new Socket 478 form factor. The principal difference between the two is that the newer format socket features a much more densely packed arrangement of pins known as a micro Pin Grid Array (µPGA) interface. It allows both the size of the CPU itself and the space occupied by the interface socket on the motherboard to be significantly reduced.
The introduction of the of the Socket 478 form factor at this time was designed to pave the way for the Willamette’s 0-13-micron successor, known as Northwood.
- Pentium Architecture
- Pentium Pro
- Pentium MMX Technology
- Pentium II
- Pentium SEC
- Pentium “Deschutes
- Pentium Xeon
- Pentium III
- Pentium Tualatin
- Pentium 4
- Pentium Northwood
- Hyper-Threading Technology
- Pentium Prescott
- Pentium Processor Numbers
- Multi-Core Processors
- Pentium Smithfield
- Pentium D
- Pentium Roadmap