ARM MTE Performance in Practice (Extended Version)
We present the first comprehensive analysis of ARM MTE hardware performance on four different microarchitectures: ARM Big (A7x), Little (A5x), and Performance (Cortex-X) cores on the Google Pixel 8 and Pixel 9, and on Ampere Computing's AmpereOne CPU core. We also include preliminary analysis of MTE on Apple's M5 chip. We investigate performance in MTE's primary application -- probabilistic memory safety -- on both SPEC CPU benchmarks and in server workloads such as RocksDB, Nginx, PostgreSQL, and Memcached. While MTE often exhibits modest overheads, we also see performance slowdowns up to 6.64x on certain benchmarks. We identify the microarchitectural cause of these overheads and where they can be addressed in future processors. We then analyze MTE's performance for more specialized security applications such as memory tracing, time-of-check time-of-use prevention, sandboxing, and CFI. In some of these cases, MTE offers significant advantages today, while the benefits for other cases are negligible or will depend on future hardware. Finally, we explore where prior work characterizing MTE performance has either been incomplete or incorrect due to methodological or experimental errors.