# FPGA vs CPU vs GPU vs Microcontroller: How Do They Fit into the Processing Jigsaw Puzzle?

5 Oct 2018



Decades ago, <u>central processing units (CPUs)</u> were implemented in discrete transistors and, later, in integratedcircuit logic devices. That all changed when the first microprocessor, Intel's 4-bit 4004, made its debut in 1971. For many years, products from Intel and its competitors were the only choice for engineers who wanted programmable processing power in their designs.

#### **CPU vs MPU**

Now the CPU is a component in a larger system. A standalone <u>microprocessor unit (MPU)</u> bundles the CPU with peripheral interfaces such as <u>DDR3</u> & DDR4 memory management, PCIe, serial buses such as USB 2.0, USB 3.0, Ethernet and more, so these designs are flexible and versatile and are designed to run multi-tasking high-level operating systems (OSes) such as Windows, iOS, Linux, etc. For more compact designs, a <u>microcontroller unit (MCU)</u> combines the CPU core with

internal memory, many peripherals on a single integrated circuit or into a single package, where they typically run a stripped-down OS.

The microprocessor holds sway in laptop and desktop computing, and microcontrollers are ubiquitous in embedded applications such as automotive instrument panels, motor controllers or smart cards. The CPU hardware architecture of these devices is intended for general-purpose use but often includes specialized blocks, such as floating-point units (FPUs) for mathematical operations. Low-end CPUs execute operations in a sequential manner, but multiple processing cores are now standard in high-end microprocessors and microcontrollers. Intel's Xeon has up to 22 cores.

# **FPGA and ASIC**

Although these CPUs are adequate for general-purpose computation, a slew of computationally demanding applications has emerged in recent years that require more specialized architectures. Examples include high-speed search; machine learning and artificial intelligence (AI); high-performance computing (HPC) in data centers; real-time graphics processing, including virtual reality and video gaming; industrial products such as digital motor control and automotive-related applications such as advanced driver assistance systems (ADAS) and, soon, autonomous vehicles.

Designers in these fields can draw upon three additional processing choices: the graphics processing unit (GPU), the <u>field-programmable gate</u> <u>array (FPGA)</u> and a custom-designed <u>application-specific integrated</u> <u>circuit (ASIC)</u>.

# **GPU vs FPGA**

The GPU was first introduced in the 1980s to offload simple graphics operations from the CPU. As graphics expanded into 2D and, later, 3D rendering, GPUs became more powerful. Highly parallel operation is highly advantageous when processing an image composed of millions of pixels, so current-generation GPUs include thousands of cores designed for efficient execution of mathematical functions. Nvidia's latest device, the Tesla V100, contains 5,120 CUDA cores for single-cycle multiply-accumulate operations and 640 tensor cores for single-cycle matrix multiplication. Many algorithms in other fields lend themselves to parallel execution, so GPUs have spread far beyond their initial application. Many of the world's fastest supercomputers, for example, include thousands of both GPUs and CPUs.

The FPGA consists of internal hardware blocks with user-programmable interconnects to customize operations for a specific application. In contrast to the other devices mentioned, the connections between blocks can readily be reprogrammed, changing the internal operation of the hardware and allowing an FPGA to accommodate changes to a design, or even support a new application, during the lifetime of the part. This flexibility makes the FPGA a great choice for applications in which standards are evolving, such as digital television, consumer electronics, cybersecurity systems and wireless communications.

At the other end of the spectrum, an ASIC is designed specifically for its intended application. It has only the blocks required for optimum operation, including a CPU, GPU, memory and so on. Although designers can incorporate third-party IP cores such as the Arm Cortex CPU—or predesigned blocks for standard functions such as an Ethernet physical layer—an ASIC is a ground-up design. It's best-suited for high-volume applications. The Tensor Processing Unit (TPU), for example, is an accelerator developed by Google specifically for neural-network machine learning. Google also offers TPU access to outside companies through its cloud-computing unit.

## The Future?

Increasingly, we're seeing a convergence of the three categories as vendors search for the optimum feature set for emerging applications. SoC FPGAs come with hard- or soft-IP CPUs, GPUs and DSP blocks. CPUs include hardware accelerators and ASICs for cryptographic functions, and <u>NVIDIA</u>'s Tesla T4 GPU includes embedded FPGA elements for AI inference applications.

Regardless, this is a lot of information. To help you keep track of everything, we've created a couple of tables that summarize the main features of the four devices, together with some of their relative strengths and weaknesses.

## To download the CPU vs FPGA vs GPU vs ASIC Cheat Sheet, <u>click here.</u>

|             | CPU                                                                                                          | FPGA                                                                                                                  | GPU                                                                                                                       | ASIC                                                                                                                                                                            |
|-------------|--------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Overview    | Traditional<br>sequential<br>processor for<br>general-purpose<br>applications                                | Flexible<br>collection of logic<br>elements and IP<br>blocks that can be<br>configured and<br>changed in the<br>field | Originally<br>designed for<br>graphics; now<br>used in a wide<br>range of<br>computationally<br>intensive<br>applications | Custom<br>integrated circuit<br>optimized for the<br>end application                                                                                                            |
| Processing  | Single- and<br>multi-core MCUs<br>and MPUs, plus<br>specialized<br>blocks: FPU, etc.                         | Configured for<br>application; SoCs<br>include hard or<br>soft IP cores (e.g.,<br>Arm)                                | Thousands of<br>identical<br>processor cores                                                                              | Application-<br>specific: may<br>include third-<br>party IP cores                                                                                                               |
| Programming | OSes, APIs run<br>huge range of<br>high-level<br>languages;<br>assembly<br>language                          | Traditionally<br>HDL (Verilog,<br>VHDL); newer<br>systems include<br>C/C++ via<br>openCL<br>& SDAccel                 | OpenCL &<br>Nvidia's CUDA<br>API allow<br>general-purpose<br>programming<br>(e.g., C, C++,<br>Python, Java,<br>Fortran)   | Application-<br>specific:<br>TensorFlow<br>open-source<br>framework for<br>Google's TPU;<br>CPU<br>manufacturers<br>(e.g., Intel)<br>include tools with<br>new ASIC<br>releases |
| Peripherals | Wide choice of<br>analog and<br>digital<br>peripherals in<br>MCUs; MPUs<br>include digital<br>bus interfaces | SoCs include<br>many transceiver<br>blocks,<br>configurable I/O<br>banks                                              | Very limited; e.g.,<br>only cache<br>memory                                                                               | Tailored to<br>application: may<br>include industry-<br>standard<br>functions (USB,<br>Ethernet, etc.)                                                                          |

| Strengths  | Versatility,<br>multitasking,<br>ease of<br>programming                                                             | Configurable for<br>specific<br>application;<br>configuration can<br>be changed after<br>installation; high<br>performance per<br>watt;<br>accommodates<br>massively parallel<br>operation; wide<br>choice of<br>features: DSPs,<br>CPUs | Massive<br>processing power<br>for target<br>applications—<br>video processing,<br>image analysis,<br>signal processing                                                                     | Custom-designed<br>for application<br>with optimum<br>combination of<br>performance and<br>power<br>consumption |
|------------|---------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| Weaknesses | OS capability<br>adds high<br>overhead;<br>optimized for<br>sequential<br>processing with<br>limited<br>parallelism | Relatively<br>difficult to<br>program; second-<br>longest<br>development<br>time; poor<br>performance for<br>sequential<br>operations; not<br>good for floating-<br>point operations                                                     | High power<br>consumption, not<br>suited to some<br>algorithms;<br>problems must be<br>reformulated to<br>take advantage of<br>parallelism, but<br>API frameworks<br>provide<br>abstraction | Longest<br>development<br>time; high cost;<br>cannot be<br>changed without<br>redesigning the<br>silicon        |

It's also worth considering how these choices stack up in some common applications. As shown in the table, designers can often use any or all of the options either alone or, more likely, in combination.

| Applications                 | CPU | FPGA | GPU | ASIC | Comments                                                  |
|------------------------------|-----|------|-----|------|-----------------------------------------------------------|
| Vision & image<br>processing |     | ~    | ~   | ~    | FPGA may give way to ASIC in high-<br>volume applications |

| AI training                      |     |   | ~   |   | GPU parallelism well-suited for<br>processing terabyte data sets in<br>reasonable time                                                          |
|----------------------------------|-----|---|-----|---|-------------------------------------------------------------------------------------------------------------------------------------------------|
| AI inference                     | ~   | * | ~   | ✓ | Everyone wants in! FPGAs perhaps<br>leading; high-end CPUs (e.g., Intel's<br>Xeon) and GPUs (e.g., Nvidia's T4)<br>address this market          |
| High-speed<br>Search             | ~   | ~ | ~   | ~ | Microsoft's Bing uses FPGAs; Google<br>uses TPU ASIC; CPU needed for<br>coordination & control                                                  |
| Industrial<br>motor control      | (•) | ~ |     | ~ | Many motor-control MCUs and<br>ASICs available; FPGAs offer a<br>quick-turn ASIC alternative                                                    |
| Supercomputer<br>HPC             | ~   |   | ~   |   | Majority of TOP500 supercomputers<br>uses some combination of CPUs and<br>GPUs                                                                  |
| General-<br>purpose<br>computing | ~   |   | (•) |   | CPU most versatile, flexible option;<br>GPUs beginning to perform some<br>tasks                                                                 |
| Embedded<br>control              | ~   | ~ |     | ~ | CPUs ( -> MCU) dominant in low-<br>cost, space-constrained, low-power,<br>mobile applications                                                   |
| Prototyping,<br>low-volume       |     | ~ |     |   | FPGAs best choice for low-volume,<br>high-end applications; also pre-silicon<br>validation, post-silicon validation and<br>firmware development |

•

#### The Future of Single Board Computers and Artificial Intelligence

2 years ago