Area and Power Efficient FFT/IFFT Processor for FALCON Post-Quantum Cryptography
AI Breakdown
Get a structured breakdown of this paper — what it's about, the core idea, and key takeaways for the field.
Abstract
Quantum computing is an emerging technology on the verge of reshaping industries, while simultaneously challenging existing cryptographic algorithms. FALCON, a recent standard quantum-resistant digital signature, presents a challenging hardware implementation due to its extensive non-integer polynomial operations, necessitating FFT over the ring <inline-formula><tex-math notation="LaTeX">$\mathbb {Q}[x]/(x^{n}+1)$</tex-math><alternatives><mml:math><mml:mrow><mml:mi mathvariant="double-struck">Q</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mi>x</mml:mi><mml:mo>]</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mi>n</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math><inline-graphic xlink:href="alsuhli-ieq1-3407124.gif"/></alternatives></inline-formula>. This paper introduces an ultra-low-power and compact processor tailored for FFT/IFFT operations over the ring for efficient FALCON implementation. The proposed processor incorporates various optimization techniques, including twiddle factor compression and conflict-free scheduling. In an ASIC implementation using a 22 nm GF process, the proposed processor demonstrates an area occupancy of 0.15 mm<inline-formula><tex-math notation="LaTeX">$^{2}$</tex-math><alternatives><mml:math><mml:msup><mml:mrow/><mml:mn>2</mml:mn></mml:msup></mml:math><inline-graphic xlink:href="alsuhli-ieq2-3407124.gif"/></alternatives></inline-formula> and a power consumption of 12.6 mW/28.1 mW at an operating frequency of 167 MHz/500 MHz for the non-pipelined/pipelined version of the processor. Since a hardware implementation of FFT/IFFT over the ring is currently non-existent, the execution time achieved by this processor is compared to the reference software implementation of FFT/IFFT of FALCON on a Raspberry Pi 4 with Cortex-A72, where the proposed pipelined processor achieves a speedup up to 3.8×. Furthermore, in comparison to dedicated state-of-the-art hardware accelerators for classic FFT, the pipelined architecture occupies 42% less area and consumes 64% less power, on average. The quantified speedup in the context of FALCON suggests that the proposed hardware design offers a promising solution for the efficient implementation of FALCON.