lib/lzmadecode: Increase decoding speed by 30%
When CONFIG_SSE is enabled use the "prefetchnta" instruction to load the next chunk of data into the CPU cache. This only works when the input stream is covered by an MTRR. In case the input stream is read from the SPI ROM MMIO area it allows to keep the SPI controller busy fetching new data, which is automatically placed into the CPU cache, resulting in less I/O wait on the CPU side and faster decompression. When the input stream is not cachable the prefetch instruction has no effect. The SPI interfaces on the tested device runs at 100Mbit/s and the Sandy Bridge mobile CPU has quite some work to do decompressing the LZMA stream. That gives the SPI controller enough time to preload data into the cache. The payload of 1100213 bytes is now read in 164msec, resulting in an input bandwidth of 53MBit/s. TEST=Booted on Lenovo X220 and used cbmem -t: Before: 16:finished LZMA decompress (ignore for x86) 1,218,418 (210,054) After: 16:finished LZMA decompress (ignore for x86) 1,170,949 (164,868) Boots 46msec faster than before or 30% faster than before. Change-Id: I3b2ed7fe0883f271553ecd1ab4191e4848ad0299 Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Reviewed-on: https://review.coreboot.org/c/coreboot/+/88813 Tested-by: build bot (Jenkins) <no-reply@coreboot.org> Reviewed-by: Angel Pons <th3fanbus@gmail.com>
This commit is contained in:
parent
0b8ad35ac1
commit
159afbc5d5
1 changed files with 16 additions and 0 deletions
|
|
@ -25,6 +25,17 @@
|
|||
#define __lzma_attribute_Ofast__
|
||||
#endif
|
||||
|
||||
/* When the input stream is covered by an MTRR the "prefetch" instruction
|
||||
* will load the next chunk of data into the CPU cache ahead of time.
|
||||
* On a 100MBit/s SPI interface this reduces the time spent in I/O wait
|
||||
* by 5usec for every cache-line (64bytes) prefetched.
|
||||
*/
|
||||
#if CONFIG(SSE)
|
||||
#define __lzma_prefetch(x) {asm volatile("prefetchnta %0" :: "m" (x));}
|
||||
#else
|
||||
#define __lzma_prefetch(x)
|
||||
#endif
|
||||
|
||||
#include "lzmadecode.h"
|
||||
#include <types.h>
|
||||
|
||||
|
|
@ -68,6 +79,11 @@
|
|||
RC_TEST; \
|
||||
Range <<= 8; \
|
||||
Code = (Code << 8) | RC_READ_BYTE; \
|
||||
if (!((uintptr_t)Buffer & 63)) { \
|
||||
if ((BufferLim - Buffer) >= 128) { \
|
||||
__lzma_prefetch(Buffer[64]); \
|
||||
} \
|
||||
} \
|
||||
}
|
||||
|
||||
#define IfBit0(p) \
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue