RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models

Sang Min Lee, Hanjoon Kim, Jeseung Yeon, Minho Kim, Changjae Park, Byeongwook Bae, Yojung Cha, Wooyoung Choe, Jonguk Choi, Younggeun Choi, Ki Jin Han, Seokha Hwang, Kiseok Jang, Jaewoo Jeon, Hyunmin Jeong, Yeonsu Jung, Hyewon Kim, Sewon Kim, Suhyung Kim, Won KimYongseung Kim, Youngsik Kim, Hyukdong Kwon, Jeong Ki Lee, Juyun Lee, Kyungjae Lee, Seokho Lee, Minwoo Noh, Junyoung Park, Jimin Seo, June Paik

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

There is a need for an AI accelerator optimized for large language models (LLMs) that combines high memory bandwidth and dense compute power while minimizing power consumption. Traditional architectures [1]-[4] typically map tensor contractions, which is the core computational task in machine learning models, onto matrix multiplication units. However, this approach often falls short in fully leveraging the parallelism and data locality inherent in tensor contractions. In this work, tensor contraction is used as a primitive instead of matrix multiplication, enabling massive parallelism and time-axis pipelining similar to vector processors. Large coarse-grained PEs can be split into smaller compute units called slices, as illustrated in Fig. 16.2.1. Depending on the setup of the fetch network connecting the slices, these slices can function either as one large processing element or as small and independent compute units. Input data are continuously fetched in a pipelined manner through the fetch network, allowing high throughput and efficient data reuse. Since the operation units compute deterministically as configured, accurate cost models for performance and energy can be developed for optimization. The chip specifications are also shown in Fig. 16.2.1.

Original languageEnglish
Title of host publication2025 IEEE International Solid-State Circuits Conference, ISSCC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages284-286
Number of pages3
ISBN (Electronic)9798331541019
DOIs
StatePublished - 2025
Event72nd IEEE International Solid-State Circuits Conference, ISSCC 2025 - San Francisco, United States
Duration: 16 Feb 202520 Feb 2025

Publication series

NameDigest of Technical Papers - IEEE International Solid-State Circuits Conference
ISSN (Print)0193-6530

Conference

Conference72nd IEEE International Solid-State Circuits Conference, ISSCC 2025
Country/TerritoryUnited States
CitySan Francisco
Period16/02/2520/02/25

Fingerprint

Dive into the research topics of 'RNGD: A 5nm Tensor-Contraction Processor for Power-Efficient Inference on Large Language Models'. Together they form a unique fingerprint.

Cite this