

## Side-Channel Security

Chapter 1. Introduction

**Daniel Gruss** 

March 6, 2025

Graz University of Technology





## Team



Daniel Gruss

## **Rishub** Nagpal

Lukas Giner



Roland Czerny

Sudheendra Neela

- 2 persons per team
- $\rightarrow$  register via <code>https://tinyurl.com/mw6exaz7</code> until Friday March 14, 08:00am
- $\rightarrow\,$  include name, email and matriculation number of both team members

- 2 persons per team
- $\rightarrow$  register via <code>https://tinyurl.com/mw6exaz7</code> until Friday March 14, 08:00am
- $\rightarrow\,$  include name, email and matriculation number of both team members
  - 2 exercises, each 15 points
  - submission via git tag
  - points based on exercise interview
  - minimum of 4 points per exercise sheet to pass

- aim for the best or drop out now!
- 26 of 30 points  $\rightarrow$  1

- aim for the best or drop out now!
- 26 of 30 points  $\rightarrow$  1
- 22 of 30 points  $\rightarrow$  2
- 18 of 30 points  $\rightarrow$  3
- 15 of 30 points  $\rightarrow$  4
- minimum of 4 points per exercise sheet to pass

- Same Task
- Deadline: 1 week after negative exercise interview
- No penalty to reach 50%, then 25% reduction!

- Same Task
- Deadline: 1 week after negative exercise interview
- No penalty to reach 50%, then 25% reduction!
- e.g., 5 points on ex1  $\rightarrow$  missing 10 points on ex2: No reduction until 10 points are achieved on ex2, then -25%

Ex1: Software Security

- Presentation: Thursday, 06.03.
- Deadline 1: Thursday, 20.03., 08:00am
- Deadline 2: Thursday, 08.05., 08:00am

Ex1: Software Security

- Presentation: Thursday, 06.03.
- Deadline 1: Thursday, 20.03., 08:00am
- Deadline 2: Thursday, 08.05., 08:00am

Ex2: Hardware Security

- Presentation: Thursday, 15.05.
- (Preliminary) Deadline: Thursday, 26.06., 08:00am

up to 15 points from:

- Task 1: Introduction [0.5 P]
- Task 2: Flush+Reload Attack on PIN Entry [3 P]
- Task 3: Covert Channel [5.5 P]
- Task 4: Spectre [3 P]
- Task 5: KASLR is bad, please break it [3 P]

Both team members:

- Clone your repo, pull from upstream (accessible a few days after team registration, check Discord) https://git.teaching.isec.tugraz.at/scs/scs25/upstream.git
- Make a histogram (use F+R calibration tool in demo folder)
- Choose a good threshold (the tool will not tell you what is "good")

- Flush+Reload attack on a PIN entry library
- Library checks each PIN digit and calls 1 of 2 functions
- Recover the key by checking which functions were called

- Cross-core cache covert channel
- Real/random *binary* data
- Raw capacity, bit error rate  $\rightarrow$  Lukas redrabbyte@Discord
- Speed records: https://www.isec.tugraz.at/teaching/materials/ scs/exercises/ex1/ & Discord

- Run a Spectre attack on a provided library
- Leak a secret string by exploiting speculative execution
- Be as fast as possible

- Break KASLR using one of the demonstrated approaches
- Simplest approach: use timing of prefetch instructions
- Bonus Points for using Data Bounce attack (older Intel only)
- Visualize the output of your program
- Use Intel if you can, or ask us about AMD!

- Lecture materials and *exercise hints* at https://www.isec.tugraz.at/scs/
- Discord: ISEC, SCS channel

## Getting started

Measuring timing leakage

Exploiting timing leakage

CPU caches

Cache attacks



National Geographic

• safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed

- safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution

- safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution
- information *leaks* because of the *hardware* it runs on
- $\bullet\,$  no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations

- safe software infrastructure  $\rightarrow$  no bugs, e.g., Heartbleed
- does not mean safe execution
- information *leaks* because of the *hardware* it runs on
- $\bullet\,$  no "bug" in the sense of a mistake  $\rightarrow$  lots of performance optimizations
- $\rightarrow\,$  crypto and sensitive info., e.g., keystrokes and mouse movements



- shared across cores
- fast

- shared across cores
- fast
- $\rightarrow\,$  fast cross-core attacks!

- caches improve performance
- SRAM is expensive  $\rightarrow$  small caches
- different timings for memory accesses
  - data is cached  $\rightarrow$  cache hit  $\rightarrow$  fast
  - data is not cached  $\rightarrow$  cache miss  $\rightarrow$  slow

Getting started

Measuring timing leakage

Exploiting timing leakage

CPU caches

Cache attacks

How every timing attack works:

• learn timing of different corner cases

How every timing attack works:

- learn timing of different corner cases
- later, we recognize these corner cases by timing only

git clone

https://git.teaching.isec.tugraz.at/scs/scs25/upstream.git

cd library2/demos/calibration/fr make

./calibration

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)

- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a *histogram*!
- 1. build two cases: cache hits and cache misses
- 2. time each case many times (get rid of noise)
- 3. we have a *histogram*!
- 4. find a *threshold* to distinguish the two cases

Loop:

- 1. measure time
- 2. access variable (always cache hit)
- 3. measure time
- 4. update histogram with delta

Loop:

- 1. measure time
- 2. access variable (always cache miss)
- 3. measure time
- 4. update histogram with delta
- 5. flush variable (clflush instruction)

- very short timings
- rdtsc instruction: cycle-accurate timestamps

```
[...]
rdtsc
function()
rdtsc
[...]
```

- do you measure what you think you measure?
- *out-of-order* execution  $\rightarrow$  what is really executed

| rdtsc      | rdtsc      | rdtsc      |
|------------|------------|------------|
| function() | []         | rdtsc      |
| []         | rdtsc      | function() |
| rdtsc      | function() | []         |

• use pseudo-serializing instruction rdtscp (recent CPUs)

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

- use pseudo-serializing instruction rdtscp (recent CPUs)
- and/or use serializing instructions like cpuid
- and/or use fences like mfence

Intel, How to Benchmark Code Execution Times on Intel IA-32 and IA-64 Instruction Set Architectures White Paper, December 2010.





- as high as possible
- most cache hits are below
- no cache miss below

Getting started

Measuring timing leakage

Exploiting timing leakage

CPU caches

Cache attacks

 $\bullet\,$  cache attacks  $\rightarrow\,$  exploit timing differences of memory accesses

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs

- cache attacks  $\rightarrow$  exploit timing differences of memory accesses
- attacker monitors which lines are accessed, not the content
- covert channel: two processes communicating with each other
  - not allowed to do so, e.g., across VMs
- side-channel attack: one malicious process spies on benign processes
  - e.g., steals crypto keys, spies on keystrokes

- locate key-dependent memory accesses
- with cache template attacks

#### Attacker address space





Victim address space



Cache is empty



Attacker triggers an event



Attacker checks one address for cache hits ("Reload")



Update number of cache hits per event



Attacker flushes shared memory



### Repeat for higher accuracy



### Continue with next address



### Continue with next address

```
# ps -A | grep gedit
# cat /proc/pid/maps
00400000-00489000 r-xp 00000000 08:11 396356
/usr/bin/gedit
7f5a96991000-7f5a96a51000 r-xp 00000000 08:11 399365
/usr/lib/x86_64-linux-gnu/libgdk-3.so.0.1400.14
...
```

memory range, access rights, offset, -, -, file name

cd cta\_examples/profiling/generic\_low\_frequency\_example
# the first parameter is the cache miss threshold
./spy
# start the targeted program
sleep 2; ./spy 200 400000-489000 -- 20000
-- -- /usr/bin/gedit

... and hold down key in the targeted program save addresses with peaks!

cd cta\_examples/exploitation/generic
./spy threshold file offset

Getting started

Measuring timing leakage

Exploiting timing leakage

CPU caches

Cache attacks

- Every memory reference goes through the cache
- Transparent to OS and programs

Memory Address

Memory Address

Cache

Memory Address

Cache

| Tag | Data |
|-----|------|
|     |      |
|     |      |
|     |      |
|     |      |

Memory Address

Cache

| Tag | Data |
|-----|------|
|     |      |
|     |      |
|     |      |
|     |      |


















#### Problem: working on congruent addresses











 $\rightarrow$  replacement policy



- L1 and L2 are private
- last-level cache:
  - divided in *slices*
  - shared across cores
  - inclusive

• L1 cache: 4 cycles

- L1 cache: 4 cycles
- L2 cache: 12 cycles

- L1 cache: 4 cycles
- L2 cache: 12 cycles
- L3 cache: 26-31 cycles

- L1 cache: 4 cycles
- L2 cache: 12 cycles
- L3 cache: 26-31 cycles
- DRAM memory: >120 cycles

User programs can optimize cache usage:

- prefetch: suggest CPU to load data into cache
- clflush: throw out data from all caches

... based on virtual addresses

Getting started

Measuring timing leakage

Exploiting timing leakage

CPU caches

Cache attacks

- cache-based keylogging
- crypto key recovery
  - various implementations (AES, RSA, ECC, ...)
  - up to 97% key bits recovered after 1 encryption
- cross-VM, cross-core, even cross-CPU
- any CPU vendor

• using the *inclusive* property

- using the *inclusive* property
- last-level cache is a superset of L1 and L2

- using the *inclusive* property
- $\bullet\,$  last-level cache is a superset of L1 and L2
- data evicted from last-level cache  $\rightarrow$  evicted from L1 and L2

- using the *inclusive* property
- last-level cache is a superset of L1 and L2
- data evicted from last-level cache  $\rightarrow$  evicted from L1 and L2
- $\bullet\,$  a core can evict lines in the private L1 of another core

Attacker monitors its own activity to find sets accessed by victim.



Same techniques for covert and side channels

• Shared Library / load binary twice / page deduplication

- Shared Library / load binary twice / page deduplication
- clflush throws data out of cache
- $\rightarrow\,$  We can throw other shared code out of the cache

- Shared Library / load binary twice / page deduplication
- clflush throws data out of cache
- $\rightarrow\,$  We can throw other shared code out of the cache
  - rdtsc / rdtscp give accurate timing information
- $\rightarrow\,$  We can measure whether shared code is in the cache

- Measure timing of cached memory
- Measure timing of non-cached memory (flush before measuring)
- Draw a histogram



step 0: attacker maps shared library  $\rightarrow$  shared memory, shared in cache



step 0: attacker maps shared library  $\rightarrow$  shared memory, shared in cache



 $step \ 0:$  attacker maps shared library  $\rightarrow$  shared memory, shared in cache

step 1: attacker flushes the shared line



step 0: attacker maps shared library ightarrow shared memory, shared in cache

step 1: attacker flushes the shared line

step 2: victim loads data while performing encryption



step 0: attacker maps shared library ightarrow shared memory, shared in cache

step 1: attacker flushes the shared line

step 2: victim loads data while performing encryption

 $step \ 3:$  attacker reloads data  $\rightarrow$  fast access if the victim loaded the line

Pros: fine granularity (1 line)

Cons: restrictive

- 1. needs clflush instruction (not available e.g., in JS)
- 2. needs shared memory

- Flush+Flush [1]
- Evict+Reload [2] on ARM [4]

#### **Prime+Probe**



**step 0**: attacker fills the cache (prime)






step 0: attacker fills the cache (prime)



step 0: attacker fills the cache (prime)



step 0: attacker fills the cache (prime)



step 0: attacker fills the cache (prime)



step 0: attacker fills the cache (prime)



- step 1: victim evicts cache lines while performing encryption
- step 2: attacker probes data to determine if the set was accessed



- step 1: victim evicts cache lines while performing encryption
- step 2: attacker probes data to determine if the set was accessed



- step 1: victim evicts cache lines while performing encryption
- step 2: attacker probes data to determine if the set was accessed

Pros: less restrictive

- 1. no need for clflush instruction (not available e.g., in JS)
- 2. no need for shared memory

Cons: coarser granularity (1 set)

We need to evict caches lines without clflush or shared memory:

- 1. which addresses do we access to have congruent cache lines?
- 2. without any privilege?
- 3. and in which order do we access them?



# Side-Channel Security

Chapter 1. Introduction

**Daniel Gruss** 

March 6, 2025

Graz University of Technology

- [1] Gruss, D., Maurice, C., Wagner, K., and Mangard, S. (2016). Flush+Flush: A Fast and Stealthy Cache Attack. In DIMVA'16.
- [2] Gruss, D., Spreitzer, R., and Mangard, S. (2015). Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches. In USENIX Security Symposium (USENIX Security'15).
- [3] Gullasch, D., Bangerter, E., and Krenn, S. (2011). Cache Games Bringing Access-Based Cache Attacks on AES to Practice. In IEEE Symposium on Security and Privacy (S&P'11).
- [4] Lipp, M., Gruss, D., Spreitzer, R., and Mangard, S. (2015). ARMageddon: Last-Level Cache Attacks on Mobile Devices. ArXiv e-prints.
- [5] Liu, F., Yarom, Y., Ge, Q., Heiser, G., and Lee, R. B. (2015). Last-Level Cache Side-Channel Attacks are Practical. In IEEE Symposium on Security and Privacy (S&P'15).
- [6] Maurice, C., Neumann, C., Heen, O., and Francillon, A. (2015). C5: Cross-Cores Cache Covert Channel. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA'15).
- [7] Percival, C. (2005). Cache Missing for Fun and Profit. URL: http://daemonology.net/hyperthreading-considered-harmful/.
- [8] Yarom, Y. and Falkner, K. (2014). FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Symposium (USENIX Security'14).