From 159afbc5d5f1ccea22b21f8ca77be1b2054e216e Mon Sep 17 00:00:00 2001 From: Patrick Rudolph Date: Sun, 17 Aug 2025 20:01:32 +0200 Subject: [PATCH] lib/lzmadecode: Increase decoding speed by 30% When CONFIG_SSE is enabled use the "prefetchnta" instruction to load the next chunk of data into the CPU cache. This only works when the input stream is covered by an MTRR. In case the input stream is read from the SPI ROM MMIO area it allows to keep the SPI controller busy fetching new data, which is automatically placed into the CPU cache, resulting in less I/O wait on the CPU side and faster decompression. When the input stream is not cachable the prefetch instruction has no effect. The SPI interfaces on the tested device runs at 100Mbit/s and the Sandy Bridge mobile CPU has quite some work to do decompressing the LZMA stream. That gives the SPI controller enough time to preload data into the cache. The payload of 1100213 bytes is now read in 164msec, resulting in an input bandwidth of 53MBit/s. TEST=Booted on Lenovo X220 and used cbmem -t: Before: 16:finished LZMA decompress (ignore for x86) 1,218,418 (210,054) After: 16:finished LZMA decompress (ignore for x86) 1,170,949 (164,868) Boots 46msec faster than before or 30% faster than before. Change-Id: I3b2ed7fe0883f271553ecd1ab4191e4848ad0299 Signed-off-by: Patrick Rudolph Reviewed-on: https://review.coreboot.org/c/coreboot/+/88813 Tested-by: build bot (Jenkins) Reviewed-by: Angel Pons --- src/lib/lzmadecode.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/src/lib/lzmadecode.c b/src/lib/lzmadecode.c index 5c6baa4160..f68cab61c4 100644 --- a/src/lib/lzmadecode.c +++ b/src/lib/lzmadecode.c @@ -25,6 +25,17 @@ #define __lzma_attribute_Ofast__ #endif +/* When the input stream is covered by an MTRR the "prefetch" instruction + * will load the next chunk of data into the CPU cache ahead of time. + * On a 100MBit/s SPI interface this reduces the time spent in I/O wait + * by 5usec for every cache-line (64bytes) prefetched. + */ +#if CONFIG(SSE) + #define __lzma_prefetch(x) {asm volatile("prefetchnta %0" :: "m" (x));} +#else + #define __lzma_prefetch(x) +#endif + #include "lzmadecode.h" #include @@ -68,6 +79,11 @@ RC_TEST; \ Range <<= 8; \ Code = (Code << 8) | RC_READ_BYTE; \ + if (!((uintptr_t)Buffer & 63)) { \ + if ((BufferLim - Buffer) >= 128) { \ + __lzma_prefetch(Buffer[64]); \ + } \ + } \ } #define IfBit0(p) \