dc.description.abstract |
In this article, we present the first GPU implementation for FrodoKEM-976, NewHope-1024, and Kyber-1024. These
algorithms belong to three different classes of post-quantum algorithms: Learning with errors (LWE), Ring-LWE, and Module-LWE.
We show the practical applicability of the algorithms in different scenarios using two different implementation approaches. Moreover,
we achieve highly efficient realization of computationally expensive operations such as NTT (Number Theoretic Transform), matrix
multiplication, and Keccak. Since, these are the most common operations in lattice-based cryptographic algorithms, the techniques
presented in this article will likely benefit other similar algorithms. Using a NVIDIA QUADRO GV100 graphics card, we undertook a
detailed experimental study. For NewHope and Kyber we were able to perform approximately 504K and 473K key exchanges per
second, demonstrating a speedup of almost 53.1 and 51.05 compared to the reference C implementation. Compared to the optimized
AVX2 versions we obtain speedups of 25.7 and 14.6 , respectively. Further, implementation of FrodoKEM resulted in a speedup of
50.6 , 44.2 , and 36.9 for KeyGen, Encaps and Decaps operations. Compared to its AVX2 counterpart, we achieved a speedup of
about 7.3 , 4.7 and 4.9 , respectively. We also show that using multiple streams resulted in further speedup of about 28–38 percent. |
en_US |