CPU cache and cacheline

Posted on 2023-11-10 Edited on 2023-11-11 In Tech , Performance Views:

A CPU cache is a hardware cache which helps reduce the cost from main memory access. It is a smaller and faster memory part which locates closer to a CPU core than the main memory. Most modern CPUs have mulitple level of caches like L1, L2, and L3. When trying to read and write the main memory, the processor will first check if the data already exists in its local cache.

The modern CPUs have at least three independent caches:

Instruction cache - Used to speed executable instruction fetch
Data cache - Used to speed data fetch and store
Translation lookaside buffer (TLB) - Used to speed virtual-to-physical address translation for both executable instructions and data. The TLB cache is part of the memory management unit (MMU) but not directly related to the CPU cache

Let’s take an example to see how the CPU and cache looks like in a physical server.

In the following server, there are 48 CPU cores and each core processor has three levels of caches(L1, L2 and L3). L1 cache has both instruction cache and data cache.

[root@init531-e43 ~]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                48
On-line CPU(s) list:   0-47
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 106
Model name:            Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz
Stepping:              6
CPU MHz:               1506.042
CPU max MHz:           3600.0000
CPU min MHz:           800.0000
BogoMIPS:              6000.00
Virtualization:        VT-x
L1d cache:             48K
L1i cache:             32K
L2 cache:              1280K
L3 cache:              18432K
NUMA node0 CPU(s):     0-11,24-35
NUMA node1 CPU(s):     12-23,36-47
Flags:                 [...]

The getconf command outputs the detail cache size for each level.

[root@init531-e43 ~]# getconf -a | grep CACHE
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_ICACHE_ASSOC                8
LEVEL1_ICACHE_LINESIZE             64
LEVEL1_DCACHE_SIZE                 49152
LEVEL1_DCACHE_ASSOC                12
LEVEL1_DCACHE_LINESIZE             64
LEVEL2_CACHE_SIZE                  1310720
LEVEL2_CACHE_ASSOC                 20
LEVEL2_CACHE_LINESIZE              64
LEVEL3_CACHE_SIZE                  18874368
LEVEL3_CACHE_ASSOC                 12
LEVEL3_CACHE_LINESIZE              64
LEVEL4_CACHE_SIZE                  0
LEVEL4_CACHE_ASSOC                 0
LEVEL4_CACHE_LINESIZE              0

From the above output, we can see cacheline size for each cache level. It’s all 64 bytes in this example.

When the processor accesses a memory portion which is not already in the cache, it will load a chunk of the memory around the accessed address into the cache such that the cached data could be reused. The chunks of memory loaded by the cache are called cache lines. The size of chunk is called the cache line size. The common cache line sizes are 32, 64 and 128 bytes.

The limited number of cache lines can be held by the cache, which is determined by the cache size. In this example, the size of L1 data cache is 48k and it can hold 768 cache lines(49152/64).