On Mon September 06, 2021


Hanjoon Kim (FuriosaAI / CTO)


Challenges of building inference chips for data centers


According to papers from hyperscale datacenters[1][2], the demand for deep learning inference in data centers is growing rapidly. While energy efficiency is important to reduce TCO (total cost of ownership), high performance is also essential to serve large models in production. Hyperscalers, on the other hand, emphasized the importance of programmability and flexibility for inference accelerators to track DNN progress[1]. In order to build a production accelerator for all these challenging requirements, instead of building a chip that is optimized for a specific model, the architecture should expose the raw ability to maximize parallelism and energy efficiency of DNN models to the software with well-defined abstraction. Software stack should also exploit every parallelism and energy efficiency for each operator and model. To accomplish such cross-layer optimizations within algorithm, architecture, and software, small and excellent teams must communicate deeply and closely, and design methodologies and the infrastructures must support these communication structures.
[1] Norman P. Jouppi et al., Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product, ISCA'21
[2] Michael Anderson et al., First-Generation Inference Accelerator Deployment at Facebook, https://arxiv.org/abs/2107.04140


Hanjoon Kim is co-founder and CTO of FuriosaAI Inc. He is leading AI chip development, setting the engineering direction and technology vision. Prior to FuriosaAI Inc, he lead the development of memory-centric accelerator architecture targeting hyperscale datacenter at Samsung Memory. He holds a PhD in Computer Science from KAIST.