Abstract:
The design of Last Level Caches (LLCs) using Static Random Access Memory (SRAM)
is increasingly being challenged by emerging memory technologies like Non-Volatile
Memories (NVM). Among these, Spin-Transfer Torque RAM (STT-RAM) stands out for
its higher density and lower static power consumption. However, its drawbacks—higher
write latency, increased write power, and limited write endurance—pose significant
challenges. The primary issue preventing widespread adoption of STT-RAM in LLCs is the
low write endurance, largely caused by the uneven distribution of write operations across
the cache. Existing techniques to address this focus on minimizing either inter-set (InterV)
or intra-set (IntraV) write variation to prolong the lifetime of STT-RAM-based LLCs.
Additionally, STT-RAM’s high write latency can lead to congestion in the read-write
queue of the LLC.
To address these challenges, two techniques have been proposed to enhance endurance
while maintaining performance. The first, PROLONG, is a dynamic write bypassing
approach that redirects write-backs from the L2 cache to an SRAM buffer or main memory.
This decision is guided by two parameters: the write hotness of the cache set and the
liveness score of the incoming block. The second, LiveWay, dynamically bypasses writes
based on their placement in write-hot ways and their liveness scores. Both methods
significantly improve wear leveling, reducing InterV and IntraV by approximately 84% and
53%, respectively, while achieving a Relative Lifetime Improvement (RLI) of up to 22×.
Additionally, by alleviating write congestion in the read-write queue, these techniques
minimize system impact, ensuring smoother performance.
Generic wear-leveling techniques primarily focus on extending LLC lifetime by reducing
InterV and IntraV write variation, but block-level wear leveling remains rare. To
address this gap, a decoupled cache architecture has been proposed. In this design, the
Set-Associative SRAM tag array is separated from the Fully-Associative Data array, with
the two linked via forward and backward pointers that maintain a 1:1 mapping. Two
techniques are introduced within this architecture. The Primal Approach swaps writes
between write-hot and write-cold blocks based on their write counts, with each block
maintaining an individual write counter. The Hardware-Efficient Approach categorizes
blocks into buckets using simple hashing. Writes from write-hot buckets are then redirected
to write-cold buckets. These methods can achieve a RLI of up to 13.07×.
Malicious attacks in a multi-core setup require access to just one core to repeatedly
target specific memory locations, leading to accelerated lifetime degradation. To
expose this vulnerability in STT-RAM-based LLCs, we propose four distinct attacks:
Recurring Location Attack (RLA), Recurring Toggle Attack (RTA), Random Location
Attack (RnLA), and Random Toggle Attack (RnTA). These Targeted Endurance Attacks
illustrate the impact of malicious benchmarks on modern counter-based wear-leveling
techniques and reveal how wear leveling influences the effectiveness of such attacks.