Wednesday, February 4, 2009

CPU (introduction and socket)


A central processing unit (CPU) is an electronic circuit that can execute computer programs. This broad definition can easily be applied to many early computers that existed long before the term "CPU" ever came into widespread usage. The term itself and its initialism have been in use in the computer industry at least since the early 1960s (Weik 1961). The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation has remained much the same.
Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this costly method of designing custom CPUs for a particular application has largely given way to the development of mass-produced processors that are suited for one or many purposes. This standardization trend generally began in the era of discrete transistor mainframes and minicomputers and has rapidly accelerated with the popularization of the integrated circuit (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of nanometers. Both the miniaturization and standardization of CPUs have increased the presence of these digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from automobiles to cell phones to children's toys.
Prior to the advent of machines that resemble today's CPUs, computers such as the ENIAC had to be physically rewired in order to perform different tasks. These machines are often referred to as "fixed-program computers," since they had to be physically reconfigured in order to run a different program. Since the term "CPU" is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.
The idea of a stored-program computer was already present during ENIAC's design, but was initially omitted so the machine could be finished sooner. On June 30, 1945, before ENIAC was even completed, mathematician John von Neumann distributed the paper entitled "First Draft of a Report on the EDVAC." It outlined the design of a stored-program computer that would eventually be completed in August 1949 (von Neumann 1945). EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed computer memory rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the large amount of time and effort it took to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the computer's memory. [1]
While von Neumann is most often credited with the design of the stored-program computer because of his design of EDVAC, others before him such as Konrad Zuse had suggested similar ideas. Additionally, the so-called Harvard architecture of the Harvard Mark I, which was completed before EDVAC, also utilized a stored-program design using punched paper tape rather than electronic memory. The key difference between the von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but elements of the Harvard architecture are commonly seen as well.
Being digital devices, all CPUs deal with discrete states and therefore require some kind of switching elements to differentiate between and change these states. Prior to commercial acceptance of the transistor, electrical relays and vacuum tubes (thermionic valves) were commonly used as switching elements. Although these had distinct speed advantages over earlier, purely mechanical designs, they were unreliable for various reasons. For example, building direct current sequential logic circuits out of relays requires additional hardware to cope with the problem of contact bounce. While vacuum tubes do not suffer from contact bounce, they must heat up before becoming fully operational and eventually stop functioning altogether.[2] Usually, when a tube failed, the CPU would have to be diagnosed to locate the failing component so it could be replaced. Therefore, early electronic (vacuum tube based) computers were generally faster but less reliable than electromechanical (relay based) computers.


Tube computers like EDVAC tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) Harvard Mark I failed very rarely (Weik 1961:238). In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low clock rates compared to modern microelectronic designs (see below for a discussion of clock rate). Clock signal frequencies ranging from 100 kHz to 4 MHz were very common at this time, limited largely by the speed of the switching devices they were built with.
The introduction of the microprocessor in the 1970s significantly affected the design and implementation of CPUs. Since the introduction of the first microprocessor (the Intel 4004) in 1970 and the first widely used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architectures, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual vast success of the now ubiquitous personal computer, the term "CPU" is now applied almost exclusively to microprocessors.
Previous generations of CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic capacitance. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by Moore's law, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date.
While the complexity, size, construction, and general form of CPUs have changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration and subthreshold leakage to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer, as well as to expand the usage of parallelism and other methods that extend the usefulness of the classical von Neumann model.
The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and writeback.
The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter (PC), which stores a number that identifies the current position in the program. In other words, the program counter keeps track of the CPU's place in the current program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units.[3] Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below).
The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture(ISA).[4] Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.
After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set.
The final step, writeback, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs.[5] Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.
After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "Classic RISC pipeline," which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.
Most CPUs, and indeed most sequential logic devices, are synchronous in nature.[8] That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period for the clock signal.
This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below).

However architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.
One method of dealing with the switching of unneeded components is called clock gating, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs.[9] Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire asynchronous CPUs have been built without utilizing a global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers (Garside et al. 1999).
A CPU socket or CPU slot is a connector on a computer's motherboard that accepts a CPU and forms an electrical interface with it. As of 2007, most desktop and server computers, particularly those based on the Intel x86 architecture, include socketed processors.

Most CPU-sockets interfaces are based on the pin grid array (PGA) architecture, in which short, stiff pins on the underside of the processor package mate with holes in the socket. To minimize the risk of bent pins, zero insertion force (ZIF) sockets allow the processor to be inserted without any resistance, then grip the pins firmly to ensure a reliable contact after a lever is flipped.
As of 2007, land grid array (LGA) sockets are becoming increasingly popular, with several current and upcoming socket designs using this scheme. With LGA sockets, the socket contains pins that make contact with pads or lands on the bottom of the processor package.
In the late 1990s, many x86 processors fit into slots, rather than sockets. CPU slots are single-edged connectors similar to expansion slots, into which a PCB holding a processor is inserted. Slotted CPU packages offered two advantages: L2 cache memory could be upgraded by installing an additional chip onto the processor PCB, and processor insertion and removal was often easier. However, slotted packages require longer traces between the CPU and chipset, and therefore became unsuitable as clock speeds passed 500 MHz. Slots were abandoned with the introduction of AMD's Socket A and Intel's Socket 370.
AMD
Desktop
• Super Socket 7 - AMD K6-2, AMD K6-III; Rise mP6.
• Slot A - AMD Athlon
• Socket A (also known as "Socket 462", 462-contact PGA) - AMD socket supporting Athlon, Duron, Athlon XP, Athlon XP-M, Athlon MP, Sempron, and Geode processors.
• Socket 754 (754-contact PGA) - AMD single-processor socket featuring single-channel DDR-SDRAM. Supports AMD Athlon 64, Sempron, Turion 64 processors.
• Socket 939 (939-contact PGA) - AMD single-processor socket featuring dual-channel DDR-SDRAM. Supports Athlon 64, Athlon 64 FX to 1 GHz[4], Athlon 64 X2 to 4800+, Opteron 100-series processors . Superseded by Socket AM2 about 2 years after launch.
• Socket 940 (940-contact PGA) - AMD single and multi-processor socket featuring registered (ECC) DDR-SDRAM. Intended for Opteron servers, but also used for "SledgeHammer" series Athlon 64 FX processors .
• Socket AM2 (940-contact PGA) - AMD single-processor socket featuring DDR2-SDRAM. Replaces Socket 754 and Socket 939[4] (some confused Socket AM2 with "Socket 940" for server processors). Supports Athlon 64, Athlon 64 X2, Athlon 64 FX, Opteron, Sempron and Phenom processors.
• Socket AM2+ (940-contact PGA) - AMD socket for single processor systems. Features support for DDR2 and HyperTransport 3 with separated power lanes. (Replaces Socket AM2 , electrically compatible with Socket AM2). Supports Athlon 64, Athlon 64 X2, Athlon 64 FX, Opteron, and Phenom processors.
• Socket AM3 (938-contact PGA) - Future AMD socket for single processor systems. Features support for DDR3 and HyperTransport 3 with separated power lanes. Planned to launch Quarter 2 of 2009. Replaces Socket AM2+ with support for DDR3-SDRAM.
Mobile
• Socket 563 - AMD low-power mobile Athlon XP-M (563-contact ยต-PGA, mostly mobile parts).
• Socket 754
• Socket S1 - AMD socket for mobile platforms featuring DDR2-SDRAM. Replaces Socket 754 for mobile processors (638-contact PGA).
• Socket FS1 - future Fusion processors for notebook market with CPU and GPU functionality (codenamed Swift), supporting DDR3 SDRAM, to be released in 2009.
Server
• Socket 940 - AMD single and multi-processor socket featuring DDR-SDRAM. Supports AMD Opteron[4] (2xx and 8xx Series), Athlon 64 FX processors (940-contact PGA).
• Socket A
• Socket F (also known as "Socket 1207") - AMD multi-processor socket featuring DDR2-SDRAM. Supports AMD Opteron[4](2xxx and 8xxx Series) and Athlon 64 FX processors. Replaces Socket 940 (LGA 1207-contact), and partially compatible with Socket F+.
• Socket F+ - Future AMD multi-processor socket featuring higher speed HyperTransport interconnect of up to 2.6 GHz. Replacing Socket F but socket F processors remained supported for backward compatibility.
• Future processor which is in development under the Fusion project codename, will employ Socket FS1 and two other sockets.
• Socket G34 - successor to socket F+, originally planned as Socket G3 paired with Socket G3 Memory Extender for servers to expand memory.
Intel
Desktop
• Slot 1 - Intel Celeron, Pentium II, Pentium III
• Socket 370 - Intel Pentium III, Celeron; Cyrix III; VIA C3
• Socket 423 - Intel Pentium 4[5] and Celeron processors (Willamette core)
• Socket 478 - Intel Pentium 4, Celeron, Celeron D, Pentium 4 Extreme Edition[5], Pentium M
• Socket N (Northwood, Prescott, and Willamette cores)
• Socket B (LGA 1366 [6]) - a new socket for Intel CPUs incorporating the integrated memory controller and Intel QuickPath Interconnect. (Bloomfield)
• Socket T (also known as Socket 775 or LGA 775) - Intel Pentium 4, Pentium D, Celeron D, Pentium Extreme Edition, Core 2 Duo, Core 2 Extreme, Celeron[5], Xeon 3000 series, Core 2 Quad (Northwood, Prescott, Conroe, Kentsfield, Cedar Mill , Wolfdale and Yorkfield cores)
Mobile
• Socket 441 - Intel Atom
• Socket 479 - Intel Pentium M and Celeron M (Banias and Dothan cores)
• Socket 495 - Also known as PPGA-B495, used for Mobile P3 Coppermine and Celerons [7]
• Socket M - Intel Core Solo, Intel Core Duo and Intel Core 2 Duo (A little part of Merom Cores and all Yohan Cores)
• Micro-FCBGA - Intel Mobile Celeron, Core 2 Duo (mobile), Core Duo, Core Solo, Celeron M, Pentium III (mobile), Mobile Celeron
• Socket P - Intel-based; replaces Socket 479 and Socket M. Released May 9th, 2007. (Merom and Penryn Cores)
• Socket 956 - Intel Core 2 Duo (Penryn core)
Server
• Socket 8 - Intel Pentium Pro
• Slot 2 - Intel Pentium II Xeon, Pentium III Xeon
• Socket 603 - Intel Xeon (Northwood and Willamette cores)
• Socket 604 - Intel Xeon
• PAC418 - Intel Itanium
• PAC611 - Intel Itanium 2, HP PA-RISC 8800 and 8900
• Socket J (also known as Socket 771 or LGA 771) - Intel Xeon (Woodcrest core)
• Socket N - Intel Dual-Core Xeon LV
Others
• Socket 463 (also known as Socket NexGen) - NexGen Nx586
• Socket 499 - Alpha 21164
• Slot B - Alpha 21264
• Slotkets - adapters for using socket processors in bus-compatible slot motherboards
Early sockets
Prior to Intel's introduction of the proprietary Slot 1 in 1997, CPU sockets were de facto open standards and were often used by multiple manufacturers.[1]
• DIP socket (40 contacts) - Intel 8086, Intel 8088
• PLCC socket (68 contacts) - Intel 80186[2] [3]
• PLCC socket - Intel 80286
• PLCC socket - Intel 80386
• Socket 1 - 80486
• Socket 2 - 80486
• Socket 3 - 80486 (3.3 V and 5 V) and compatibles
• Socket 4 - Intel Pentium 60/66 MHz
• Socket 5 - Intel Pentium 75-133 MHz; AMD K5; IDT WinChip C6, WinChip 2
• Socket 6 - Designed for the 80486, but little used
• Socket 7 - Intel Pentium, Pentium MMX; AMD K6; some Cyrix CPUs

1 comment:

Thompson said...

industrial computerThese computers are best suitable for many automated manufacturing process such as application like stock control and dispatch however their needs are quite a different one. There’sa lot different environment to run the Industrial PC’s. .