[riot-devel] AES crypto optimizations, speed and size

Oleg Artamonov oleg at unwds.com
Tue Jan 2 18:55:29 CET 2018

I believe results will be even better on most Cortexes, with nice performance gain with run-time T-tables on high-speed CPUs — as a rule, MCU core gains speed easier than flash, so 168-Mhz high-end Cortex-M4F may have same flash memory as entry-level Cortex-M0, limited at 20-30 MHz effective clock speed.

For example, STM32F4, flash wait cycles vs core clock: https://yadi.sk/i/SEI7mXtK3RAFxg, effective flash clock is 20 MHz. Same for Atmel SAM3U (https://yadi.sk/i/B4iA0ik33RAGVK, 24 MHz effective flash clock), same for EFM32, etc.

TI CC1310 and CC2650 have 8KB flash cache so maybe they'll perform better with in-flash T-tables, but that's a very specific case (not to mention that flash cache may be disabled and used as a regular RAM, and it often is, as there's only 20 KB of regular RAM available). Anyway, they have hardware AES accelerator too.

On low-end MCUs (20 MHz or less, 8-bit and 16-bit architectures) in-flash T-tables should be faster, but most such MCUs have a very limited amount of flash as well, so wasting 7 KB on T-tables is not an option anyway.

Sincerely yours,
Oleg Artamonov
+7 (916) 631-34-90

02.01.2018, 19:27, "Ludwig Knüpfer" <ludwig.knuepfer at fu-berlin.de>:
> Hello,
> First of all thank you for sharing your insights!
> While I'm not entirely sure I get all the implications right away I do have the following thoughts:
> I assume the results can not be generalized for the CPU architecture because CPU clock and flash reading speed does vary independently.
> I do expect the result depends on the concrete product and configuration you're looking at. A factor 2 memory access speed difference in between all cortex m products does not seem very unlikely to me.
> Did you factor this in to your conclusion? Any thoughts?
> Cheers,
> Ludwig

More information about the devel mailing list