Opening the Pentium

When Intel introduced the Pentium P5 processor on March 22, 1993, it wasn't simply releasing a faster 486—it was introducing an entirely new microarchitecture. While the Pentium still executed the familiar x86 instruction set, almost everything inside the chip had been redesigned.

Instead of processing one instruction at a time like its predecessors, the Pentium could execute two integer instructions simultaneously under the right conditions. This was a major milestone in CPU evolution and the beginning of superscalar computing for mainstream PCs.

The original Pentium featured:

Feature	Pentium (P5)
Year	1993
Transistors	3.1 Million
Process	0.8 µm CMOS
Clock Speeds	60–66 MHz (later up to 200 MHz)
External Bus	64-bit
L1 Cache	8 KB Instruction + 8 KB Data
Pipelines	2 (U and V)
FPU	Fully redesigned
Branch Prediction	Dynamic

The Pentium at a Glance


                 Pentium Processor

        +--------------------------------------+
        |      Branch Prediction Unit          |
        +----------------+---------------------+
                         |
                  Instruction Fetch
                         |
               Instruction Decoder
                         |
         +---------------+---------------+
         |                               |
         ▼                               ▼
     U Pipeline                     V Pipeline
         |                               |
         +---------------+---------------+
                         |
                Register File
                         |
               Integer Execution
                         |
        +----------------+----------------+
        |                                 |
        ▼                                 ▼
      L1 Data Cache               Floating Point Unit
        |                                 |
        +----------------+----------------+
                         |
                   64-bit Memory Bus
                         |
                        RAM

Although this diagram is simplified, it captures the core innovation: parallel instruction execution.

Why Two Pipelines Matter

The 486 processor could generally issue one instruction per clock cycle.


Cycle 1
Instruction A

Cycle 2
Instruction B

Cycle 3
Instruction C

The Pentium could often execute two independent instructions at once.


Cycle 1
Instruction A
Instruction B

Cycle 2
Instruction C
Instruction D

Cycle 3
Instruction E
Instruction F

This effectively doubled instruction throughput in favorable situations.

U Pipeline and V Pipeline

Intel named the execution pipelines:


U Pipeline

and


V Pipeline

The U pipeline was the primary execution engine.

The V pipeline handled a second instruction whenever possible.


           Instruction Stream

ADD
MOV
SUB
INC
CMP

          Decoder

       +-----------+
       | Scheduler |
       +-----------+

        /         \

   U Pipeline   V Pipeline

The scheduler determined whether two instructions could safely execute in parallel.

Parallel Execution Example

Imagine this sequence:


MOV EAX, EBX
ADD ECX, EDX

Neither instruction depends on the other.

Therefore:


Clock Cycle

U Pipeline

MOV EAX,EBX

V Pipeline

ADD ECX,EDX

Both complete together.

Now consider:


MOV EAX, EBX
ADD EAX, 1

The second instruction requires the result of the first.

Therefore:


Cycle 1

MOV

Cycle 2

ADD

The CPU cannot issue them simultaneously because of the dependency.

Instruction Dependencies

The Pentium analyzes dependencies between instructions.


Instruction A

Produces EAX

↓

Instruction B

Needs EAX

↓

Must Wait

But if instructions use different registers:


Instruction A

Uses EAX

Instruction B

Uses ECX

↓

Can Execute Together

Modern processors still perform dependency analysis, though with much more sophisticated hardware.

The Register File

The Pentium retained the 32-bit register model introduced by the 80386.


General Registers

EAX
EBX
ECX
EDX

ESI
EDI

EBP
ESP

Each register stores 32 bits.


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|31                     0       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Register Purposes

EAX – Accumulator

Used by arithmetic instructions.


ADD EAX, EBX

EBX – Base Register

Frequently stores memory addresses.


MOV EAX,[EBX]

ECX – Counter

Often used for loops.


LOOP Start

EDX – Data Register

Commonly stores multiplication or division results.

ESP – Stack Pointer

Always points to the top of the stack.


Stack

+--------+
| Data   |
+--------+
| Data   |
+--------+
| Data   |
+--------+
     ↑
    ESP

EBP – Base Pointer

Used to access local variables within stack frames.


Function

EBP

↓

Arguments

↓

Local Variables

This organization is still recognizable in modern x86-64 systems.

Segment Registers

The Pentium retained the x86 segment registers for compatibility.


CS

Code Segment

DS

Data Segment

ES

Extra Segment

FS

Extra

GS

Extra

SS

Stack Segment

Although modern operating systems often use a flat memory model, these registers remain part of the architecture.

Instruction Pointer

EIP

The Instruction Pointer contains the address of the next instruction to execute.


Memory

1000
1004
1008
100C
1010

        ↑
       EIP

Every executed instruction updates EIP automatically unless altered by jumps, calls, or interrupts.

EFLAGS Register

The Pentium's EFLAGS register records the outcome of operations and controls processor behavior.


+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|CF|PF|AF|ZF|SF|TF|IF|DF|OF|... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Some important flags include:

Flag	Meaning
CF	Carry Flag
ZF	Zero Flag
SF	Sign Flag
OF	Overflow Flag
IF	Interrupt Enable
DF	Direction Flag

Example:


CMP EAX, EBX
JE Equal

If EAX == EBX, the comparison sets the Zero Flag (ZF), allowing JE (Jump if Equal) to branch to the Equal label.

Pipeline Stages

Each instruction passes through several stages.


Fetch

↓

Decode

↓

Address Generation

↓

Execute

↓

Write Back

Multiple instructions occupy different stages simultaneously.


Clock →

Instruction A

Fetch
 Decode
  Execute
   Write

Instruction B

 Decode
  Execute
   Write

Instruction C

  Fetch
   Decode
    Execute

This overlap is called instruction pipelining.

Branch Prediction

One of the Pentium's most important innovations was dynamic branch prediction.

Consider this code:


CMP EAX,10
JL LOOP

The CPU doesn't know immediately whether the branch will be taken.

If it waits, the pipeline becomes idle.

Instead, the Pentium predicts the outcome and continues fetching instructions.


CMP

↓

Predict Taken

↓

Fetch Loop

↓

Correct?

YES

Continue

NO

Flush Pipeline

Correct predictions save many clock cycles.

Modern CPUs have highly advanced branch predictors, but the Pentium brought this capability to mainstream PCs.

Split L1 Cache

Unlike the 486, the Pentium separated instructions and data into distinct caches.


            CPU

        +---------+
        | Decoder |
        +---------+

      /             \

Instruction      Data
   Cache         Cache

 8 KB            8 KB

This design allows the processor to fetch an instruction while simultaneously reading or writing data, reducing contention and improving throughput.

64-bit External Data Bus

The 486 featured a 32-bit external memory bus.


CPU ======== Memory

32 bits

The Pentium doubled this width.


CPU ================= Memory

64 bits

Advantages:

More data transferred per memory cycle
Improved performance in memory-intensive applications
Better support for graphics, multimedia, and scientific workloads

Cache Access

When executing:


MOV EAX,[1000h]

The processor checks:


L1 Cache

↓

Hit?

YES

Return Data

NO

↓

RAM

A cache hit is extremely fast, while a cache miss requires a slower memory access.

Keeping frequently used instructions and data in the L1 cache dramatically improves overall performance.

Why This Architecture Was Revolutionary

The Pentium combined several innovations into one cohesive design:

Superscalar execution enabled multiple instructions per cycle.
Dual pipelines increased throughput without merely raising clock speeds.
Dynamic branch prediction reduced wasted cycles.
Split instruction and data caches improved memory efficiency.
A 64-bit memory interface doubled data transfer bandwidth.
A redesigned floating-point unit (FPU) delivered major gains for graphics, CAD, engineering, and scientific software.

Together, these features allowed the Pentium to outperform the 486 by a substantial margin—even at similar clock speeds. Software no longer depended solely on faster clocks; it could benefit from smarter hardware capable of doing more work each cycle.

The Pentium CPU Revolution // Part-2