Ben Keller, John Wright, Colin Schmidt, Palmer Dabbelt, Keertana Settaluri, Alon Amid, Jarno Salomaa, Stevo Bailey, Pi-Feng Chiu
The Hurricane project leverages circuit and architectural techniques to implement a spatial computing fabric that pushes the boundaries of speed and energy efficiency. Fast integrated on-chip voltage regulators, adaptive clocking, integrated body bias generators, and an on-board control processor enable fast per-core DVFS. The chip architecture is based the RISC-V ISA and optimized for energy-efficient deep convolutional neural network processing. An integrated DDR PHY and high-speed serial links coupled with coherent, shared caches provide a realistic memory system for long-running program execution.
Stevo Bailey, Paul Rigge, Angie Wang, Amy Whitcombe
ASIC design takes a significant time and financial investment. Many designs share similar analog and digital computational elements, but inflexible IP prevents design reuse and forces frequent redesign. The Craft project aims to reduce ASIC design time and cost by creating a new methodology leveraging parameterized digital and analog hardware generators to support a wide array of DSP applications. Craft 1 taped out in 2016 to test the new technology and early design work. We taped out two more chips in 2017 to prove the efficacy of this methodology. The first chip performs a standard radar receive processing DSP algorithm to demonstrate Chisel digital generators with a BAG (Berkeley Analog Generator) ADC. This chip was done in collaboration with Northrup Grumman Corporation and Cadence Design Systems. The second chip implements a BAG-generated high-speed SERDES, a variation on the first chip's BAG ADC, a custom mixer-first ADC, and a new Chisel FFT following the Craft 1 design.
Pi-Feng Chiu, Brian Zimmer
Improving energy eciency is critical to increasing computing capability, from mobile devices operating with limited battery capacity to servers operating under thermal constraints. The widely accepted solution to improving energy eciency is dynamic voltage and frequency scaling (DVFS), where each block in a design operates at the minimum voltage required to meet performance constraints at a given time. However, variation-induced SRAM bitcell failures in caches at low voltage limit voltage scaling and therefore energy-eciency improvements in advanced process nodes.
SWERVE (SRAMs with ECC and reprogrammable Redundancy to aVoid Errors) contains a RISC-V processor and uses architectural-level techniques to bypass failing bitcells at low voltages. Additionally, a highly-programmable BIST engine and in-situ pipeline ECC to measure the contribution of intermittent SRAM error sources such as random telegraph noise and aging. Last, a lower-offset sense amplifier replaces standard sense amplifiers in compiled memory arrays. Since the sense amplifier circuit is widely used in all sorts of memory circuits, improving the offset voltage of the sense amplifier could have a tremendous impact.
Luis Esteban Hernandez, Rachel Hochman
SPLASH (Single-chip Planetary Low-power ASIC Spectrometer with High-resolution) is a low power, high resolution, digital spectrometer ASIC with on-chip ADC. The ASIC will provide a compact, low power, and radiation tolerant digital polyphase filterbank for analyzing microwave thermal emission following the first downconversion in a microwave radiometer. This spectrometer is intended for use in future microwave sounders observing the atmospheres of planets such as Mars or Venus, moons of Jupiter and Saturn, and may also find applications in terrestrial CubeSat radiometers. The mixed signal ASIC spectrometer will have significant improvements in terms of mass, power, volume, and sensitivity to environmental conditions compared to other technologies such as Chirp-transform, Acousto-optical and digital autocorrelator spectrometers. The largely digital nature of the implementation will also reduce calibration requirements and increase stability compared to alternative approaches. SPLASH integrates an ADC, digital spectroscopy and spectral integration functions onto a single mixed-signal ASIC. Our novel approach to this ASIC uses an on chip ultra-low power 3 GS/s ADC (for a bandwidth of 1.5 GHz) utilizing capacitive successive approximation and self calibration. The ADC is followed by a 16k channel digital 4 tap polyphase filter bank to achieve excellent channel-to-channel isolation. The design is based on a digital architecture described in Simulink that has been field-tested on FPGA platforms for radio astronomy applications. The architecture maximizes the utilization of operators to nearly 100%. A test structure has been added to the design to support the detection of soft errors, since space-borne applications expose the circuit to high-energy particles.
The ASIC was fabricated on a naturally radiation tolerant 65 nm CMOS process, and measures 3.16×3.16mm. The chip consumes less than 1 W, and delivers a throughput of 300 billion operations per second. Chip testing will begin in January 2015.
Brian Zimmer, Yunsup Lee, Jaehwa Kwak, Milovan Blagojevic, Ruzica Jevtic, Alberto Puggelli, Ben Keller, Stevo Bailey, Pi-Feng Chiu, Palmer Dabbelt, Colin Schmidt
Manycore processors will require separate voltage and clock domains for each core to maximize energy efficiency. This project looks at a specific implementation of a low-power manycore processor and optimizes the energy for a given program completion time. It considers the efficiency of the DC-to-DC converter itself, as well as the supply voltage and CPU clock frequency, when finding the minimum energy point. The goal is to create an on-chip hardware block which actively minimizes the core's energy given a known load.
The DC-to-DC converters are energy-optimized and can supply one of four possible voltage levels. Another available knob used to minimize energy is the core body bias level. With just these two knobs, numerous possible configurations exist, and choosing the one or combination of ones to minimize the core's energy is essential. Operating along Pareto-optimal energy-delay curve requires jumping between possible supply voltage levels and CPU clock periods. Once a general formula which minimizes energy for a specific load has been found, the formula must be mapped into a dedicated hardware block. This hardware block will choose operating parameters necessary to compute with minimal energy.
Using these techniques, we have demonstrated a RISC-V vector microprocessor implemented in 28nm FDSOI with fully-integrated non-interleaved switched-capacitor DCDC (SC-DCDC) converters and adaptive clocking that generates four on-chip voltages between 0.5V and 1V using only a 1V core and 1.8V IO voltage input. The design pushes the capabilities of SC-DCDC conversion by enabling fast transitions (20ns), high conversion efficiency (80-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices.