ATAD #15 – Beyond the CPU clock speed
In completing a task the computer splits each instruction into a series of independent steps. The computer that driven by a clock will process an instruction at a time and proceed to the next one in the instruction pipeline when the next clock signal arrives. So naturally, faster the clock speed, the instructions waiting in the pipeline are going to be completed faster. A generic pipeline has four independent steps performed per clock cycle.
- Fetch: Read an instruction
- Decode and Register Fetch
- Execute which might involve memory access
- Write Back
(Note: newer processors are buit to perform more than one of the steps in the pipeling during one clock cycle. Eg. the upcoming Intel Nahelm processor can decode upto four instructions at a time)
Processing instructions would need memory access. A CPU Cache is employed here that is simply a very fast memory and that can be accessed in very few cycles. Modern processors adopt cache of different sizes and at various levels or stps of the instruction processing. A efficient cache design is also pivotal in improving processing speed and clock frequency.
Instruction flow are not just sequential but can brach out based on conditional evaluation and exeution. Sometimes a case might arise where a brach taken will require new instructions to be executed, in which case the pipeline must be stalled or flushed. Modern microachitecure has intruduced techniques such as branch prediction and speculative execution to reduce such penalties.
Processor development and VLSI techniques improvements like superscalar processor facilitates parallel instruction execution per cycle. The keys to superscalar execution are an instruction fetching unit that can fetch more than one instruction at a time from cache; instruction decoding logic that can decide when instructions are independent and thus executed simultaneously; and sufficient execution units to be able to process several instructions at one time.
Increasing cache size facilitates storing of more instructions which leads to the possibility of Out-of-order execution of instructions while an older instruction waits on the cache, then re-orders the results to make it appear that everything happened in the programmed order.
Other well known concepts are Multiprocessing where two or more CPUs are used in a single computer and Multithreading where thread level and instruction level parallelism is targetted.
__tipped__
Leave a Reply