The Pentium CPU Revolution // Part-2
Part 2 – Inside the Pentium: The Architecture That Changed CPU Design
Opening the Pentium
When Intel introduced the Pentium P5 processor on March 22, 1993, it wasn't simply releasing a faster 486—it was introducing an entirely new microarchitecture. While the Pentium still executed the familiar x86 instruction set, almost everything inside the chip had been redesigned.
Instead of processing one instruction at a time like its predecessors, the Pentium could execute two integer instructions simultaneously under the right conditions. This was a major milestone in CPU evolution and the beginning of superscalar computing for mainstream PCs.
The original Pentium featured:
| Feature | Pentium (P5) |
|---|---|
| Year | 1993 |
| Transistors | 3.1 Million |
| Process | 0.8 µm CMOS |
| Clock Speeds | 60–66 MHz (later up to 200 MHz) |
| External Bus | 64-bit |
| L1 Cache | 8 KB Instruction + 8 KB Data |
| Pipelines | 2 (U and V) |
| FPU | Fully redesigned |
| Branch Prediction | Dynamic |
The Pentium at a Glance
Pentium Processor
+--------------------------------------+
| Branch Prediction Unit |
+----------------+---------------------+
|
Instruction Fetch
|
Instruction Decoder
|
+---------------+---------------+
| |
▼ ▼
U Pipeline V Pipeline
| |
+---------------+---------------+
|
Register File
|
Integer Execution
|
+----------------+----------------+
| |
▼ ▼
L1 Data Cache Floating Point Unit
| |
+----------------+----------------+
|
64-bit Memory Bus
|
RAM
Although this diagram is simplified, it captures the core innovation: parallel instruction execution.
Why Two Pipelines Matter
The 486 processor could generally issue one instruction per clock cycle.
Cycle 1
Instruction A
Cycle 2
Instruction B
Cycle 3
Instruction C
The Pentium could often execute two independent instructions at once.
Cycle 1
Instruction A
Instruction B
Cycle 2
Instruction C
Instruction D
Cycle 3
Instruction E
Instruction F
This effectively doubled instruction throughput in favorable situations.
U Pipeline and V Pipeline
Intel named the execution pipelines:
U Pipeline
and
V Pipeline
The U pipeline was the primary execution engine.
The V pipeline handled a second instruction whenever possible.
Instruction Stream
ADD
MOV
SUB
INC
CMP
Decoder
+-----------+
| Scheduler |
+-----------+
/ \
U Pipeline V Pipeline
The scheduler determined whether two instructions could safely execute in parallel.
Parallel Execution Example
Imagine this sequence:
MOV EAX, EBX
ADD ECX, EDX
Neither instruction depends on the other.
Therefore:
Clock Cycle
U Pipeline
MOV EAX,EBX
V Pipeline
ADD ECX,EDX
Both complete together.
Now consider:
MOV EAX, EBX
ADD EAX, 1
The second instruction requires the result of the first.
Therefore:
Cycle 1
MOV
Cycle 2
ADD
The CPU cannot issue them simultaneously because of the dependency.
Instruction Dependencies
The Pentium analyzes dependencies between instructions.
Instruction A
Produces EAX
↓
Instruction B
Needs EAX
↓
Must Wait
But if instructions use different registers:
Instruction A
Uses EAX
Instruction B
Uses ECX
↓
Can Execute Together
Modern processors still perform dependency analysis, though with much more sophisticated hardware.
The Register File
The Pentium retained the 32-bit register model introduced by the 80386.
General Registers
EAX
EBX
ECX
EDX
ESI
EDI
EBP
ESP
Each register stores 32 bits.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|31 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Register Purposes
EAX – Accumulator
Used by arithmetic instructions.
ADD EAX, EBX
EBX – Base Register
Frequently stores memory addresses.
MOV EAX,[EBX]
ECX – Counter
Often used for loops.
LOOP Start
EDX – Data Register
Commonly stores multiplication or division results.
ESP – Stack Pointer
Always points to the top of the stack.
Stack
+--------+
| Data |
+--------+
| Data |
+--------+
| Data |
+--------+
↑
ESP
EBP – Base Pointer
Used to access local variables within stack frames.
Function
EBP
↓
Arguments
↓
Local Variables
This organization is still recognizable in modern x86-64 systems.
Segment Registers
The Pentium retained the x86 segment registers for compatibility.
CS
Code Segment
DS
Data Segment
ES
Extra Segment
FS
Extra
GS
Extra
SS
Stack Segment
Although modern operating systems often use a flat memory model, these registers remain part of the architecture.
Instruction Pointer
EIP
The Instruction Pointer contains the address of the next instruction to execute.
Memory
1000
1004
1008
100C
1010
↑
EIP
Every executed instruction updates EIP automatically unless altered by jumps, calls, or interrupts.
EFLAGS Register
The Pentium's EFLAGS register records the outcome of operations and controls processor behavior.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|CF|PF|AF|ZF|SF|TF|IF|DF|OF|... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Some important flags include:
| Flag | Meaning |
|---|---|
| CF | Carry Flag |
| ZF | Zero Flag |
| SF | Sign Flag |
| OF | Overflow Flag |
| IF | Interrupt Enable |
| DF | Direction Flag |
Example:
CMP EAX, EBX
JE Equal
If EAX == EBX, the comparison sets the Zero Flag (ZF), allowing JE (Jump if Equal) to branch to the Equal label.
Pipeline Stages
Each instruction passes through several stages.
Fetch
↓
Decode
↓
Address Generation
↓
Execute
↓
Write Back
Multiple instructions occupy different stages simultaneously.
Clock →
Instruction A
Fetch
Decode
Execute
Write
Instruction B
Decode
Execute
Write
Instruction C
Fetch
Decode
Execute
This overlap is called instruction pipelining.
Branch Prediction
One of the Pentium's most important innovations was dynamic branch prediction.
Consider this code:
CMP EAX,10
JL LOOP
The CPU doesn't know immediately whether the branch will be taken.
If it waits, the pipeline becomes idle.
Instead, the Pentium predicts the outcome and continues fetching instructions.
CMP
↓
Predict Taken
↓
Fetch Loop
↓
Correct?
YES
Continue
NO
Flush Pipeline
Correct predictions save many clock cycles.
Modern CPUs have highly advanced branch predictors, but the Pentium brought this capability to mainstream PCs.
Split L1 Cache
Unlike the 486, the Pentium separated instructions and data into distinct caches.
CPU
+---------+
| Decoder |
+---------+
/ \
Instruction Data
Cache Cache
8 KB 8 KB
This design allows the processor to fetch an instruction while simultaneously reading or writing data, reducing contention and improving throughput.
64-bit External Data Bus
The 486 featured a 32-bit external memory bus.
CPU ======== Memory
32 bits
The Pentium doubled this width.
CPU ================= Memory
64 bits
Advantages:
- More data transferred per memory cycle
- Improved performance in memory-intensive applications
- Better support for graphics, multimedia, and scientific workloads
Cache Access
When executing:
MOV EAX,[1000h]
The processor checks:
L1 Cache
↓
Hit?
YES
Return Data
NO
↓
RAM
A cache hit is extremely fast, while a cache miss requires a slower memory access.
Keeping frequently used instructions and data in the L1 cache dramatically improves overall performance.
Why This Architecture Was Revolutionary
The Pentium combined several innovations into one cohesive design:
- Superscalar execution enabled multiple instructions per cycle.
- Dual pipelines increased throughput without merely raising clock speeds.
- Dynamic branch prediction reduced wasted cycles.
- Split instruction and data caches improved memory efficiency.
- A 64-bit memory interface doubled data transfer bandwidth.
- A redesigned floating-point unit (FPU) delivered major gains for graphics, CAD, engineering, and scientific software.
Together, these features allowed the Pentium to outperform the 486 by a substantial margin—even at similar clock speeds. Software no longer depended solely on faster clocks; it could benefit from smarter hardware capable of doing more work each cycle.