Laboratory for Sub-100nm Design*

Single-event effects Soft errors resulting from single-event effects (SEEs) in logic constitute an important — and possibly dominant — failure mode in future computing systems. The focus of our research to date has been on low-cost SEE-reliability-aware and SEE-reliability-driven design based on the principles of fault avoidance. Specifically, the focus is on
  • Modeling and simulation of SEEs at the device and circuit levels for both classical CMOS and emerging multi-gate technologies to facilitate
  • Physically-accurate, variation-tolerant design optimization for robustness to SEEs.
Reliability and logic gates: Theory and practice The proposed research involves complementary research in the theory and practice of computation using unreliable circuits. We are currently investigating
  • Information-theoretic, circuit-structure-independent limits for computation in the presence of noise.
  • Techniques for reliability estimation that are accurate, robust, and scalable with design complexity, and
  • Seamless exploration of the trade-offs between performance (delay and energy) and reliability during design.
Nanoelectronic devices: Reliability and performance variability This is a collaborative effort that proposes to
  • Develop the modeling and simulation capability to systematically understand and predict the effects of variability and defects on the performance and reliability of various nanoscale transistors including silicon nanowire, carbon nanotube, and graphene nanoribbon transistors, and
  • Develop a formal approach to automated circuit synthesis that conforms to the foundations of a traditional synthesis flow, yet optimizes for performance robustness and reliability across abstraction levels.
Low-cost error detection and correction for logic circuits Circuits with concurrent error detection and correction features have the capability to detect and correct both temporary and permanent faults and are widely used in systems where dependability and data integrity are of importance. Conventional techniques for synthesis of such fault-tolerant multilevel logic circuits focus on guaranteeing 100% coverage of broad classes of errors and generally require very large area, power, and performance overhead (usually in excess of 100%). However, in the next decade, as fault tolerance becomes necessary for cost-sensitive high-volume mainstream applications, most of these existing techniques will be overkill. This research advocates a paradigm shift, where the goal is to meet coverage requirements at minimum cost instead of trying to guarantee operation under broad classes of errors.
LSRVM: Loosely-synchronized, redundant virtual machines The goal of the LSRVM project is to provide high levels of reliability by tolerating hardware faults at all levels of the system at very low cost. Historically, such hardware fault tolerance has only been achievable using custom-designed hardware and proprietary operating systems. Today, however, technological trends and economic factors are driving a reduction in the amount of custom-designed hardware. We believe that this path should be followed to its ultimate conclusion: a highly-available, fault-tolerant computing system based entirely on commodity hardware and open-source operating systems. Our revolutionary approach utilizes virtualization to efficiently provide redundancy on modern commodity hardware. When combined with existing application-level fault tolerance mechanisms, LSRVM will provide very high levels of reliability at extremely low cost.

We gratefully acknowledge the support of our sponsors: NSF, TI, AMD, Intel, FLA, and the A. Richard Newton Graduate Scholarship

*
There is a puzzle that relates the Laboratory for Sub-100nm Design to this web-page. A $10.24 reward, successively halved, will be awarded to anyone who emails me the solution to this puzzle.