Abstract
Resistive Random Access Memory (RRAM) has emerged as a promising technology for deep neural network (DNN) accelerators, but programming every weight in a DNN onto RRAM cells for inference can be both time-consuming and energy-intensive, especially when switching between different DNN models. This paper introduces a hardware-aware multi-model merging (HA3M) framework designed to minimize the need for reprogramming by maximizing weight reuse, while taking into account the hardware constraints of the accelerator. The framework includes three key approaches: Crossbar(XB)-aware Model Mapping (XAMM), Block-based Layer Matching (BLM) and Multi-Model Retraining (MMR). XAMM reduces the XB usage of the pre-programmed model on RRAM XBs while preserving the model’s structure. BLM reuses pre-programmed weights in a block-based manner, ensuring the inference process remains unchanged. MMR then equalizes the block-based matched weights across multiple models. Experimental results show that the proposed framework significantly reduces programming cycles in multi-DNN switching scenarios while maintaining or even enhancing accuracy, and eliminating the need for reprogramming.
Original language | English |
---|---|
Journal | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
DOIs | |
State | Accepted/In press - 2025 |
Keywords
- deep neural network
- model switching
- multiple DNN application
- resistive random access memory
- RRAM programming
- RRAM-based DNN accelerator