Each processor performs similar repeated operations. This can be following steps:
- Fetch the instruction
- Decode the instruction
- Read the effective address
- Execute the instruction
- Write back
Processor read and execute appropriate instruction coded in a program. In simpler and older CPUs, the instruction cycle is executed sequentially, but it is slow. A good idea is to execute instruction concurrently and parallel. Most modern CPUs used a few following techniques to increase execution speed these instructions.
- Instruction set choice – the complexity of the number of instructions is very important, on the one hand, the processor would prefer simplicity, but a user (or compilator) would prefer more complex, easier to implement instructions. We distinguish following concepts:
- ZISC – Zero Set Instruction Computer
- MISC – Minimal Set Instruction Computer
- RISC – Reduced Set Instruction Computer
- CISC – Complex Set Instruction Computer
- VLIW – Very Long Instruction Word
- EPIC – Explicitly Parallel Instruction Computing
In the most modern CPUs used is CISC approach with RISC in core.
- Instruction pipelining – executing instructions is divided into stages. Each stage should take a similar amount of time. At the same time, the processor might execute step the next instruction, because a block that executed own work is now idle. In this way pipelining can be compared to a manufacturing assembly line in which different parts of a product are being assembled at the same time.
- Branch prediction – it plays a critical role in achieving a very high performance in pipelining. Programs have conditional jump instructions for example loops and processor doesn’t know which instruction should be executed after this jump. This is a big problem because in typical program conditional jumps are very often. Responsible for right choice a next instruction is branch predictor.
- Cache memory – CPU cache is very fast SRAM memory that allows reducing the average cost to access data from main memory, in most modern CPUs cache has three levels. Associativity can be represented by following models:
- Direct mapped cache – good best-case time, but slow in worst case
- N-way set associative cache – compromise, direct mapped can be called 1-way set associative cache
- Fully associative cache – the best, but practical only for a small memory
- Superscalar – processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. Note the processor is single, but execution units are multiple.
- Out-of-order execution – sometimes needed data is not present in the cache and this cause unnecessary wasting a time because before the data are recalled other instruction can be performed. This technique allows in this way reorder instructions.
- Register renaming – sometimes instructions are repeatable and this technique is used to avoid unnecessary serialized execution of program instructions because of the reuse of the same registers by those instructions
- Multithreading – executing multiple threads or processes on single CPU or a single core in multi-core processor, it have to be supported by the operating system (Intel present interesting technology in this field – Hyper-Threading)
- Multiprocessing – using more than one CPUs
Techniques mentioned above allows increasing IPC (instructions per cycle).
Processors development is very fast thereby increasing the complexity of processors. For us, users the most important is performance and of course, it is higher.