Securing TEEs against re-entry attacks

Fabrizio Genovese ,  Lev Stambler · January 8, 2025

For a deeper dive, read our paper about this!

Happy new year everyone! In this post, we talk about TEEs and how we can use quantum resources to make them more secure. The acronym TEE stands for Trusted Execution Environment. In a nutshell it denotes a piece of hardware – such as a processor – where operations can be performed securely. This happens, more or less, as follows:

  • The TEE has some root of trust hardcoded at the hardware level. You can think about this as a private key of sorts. The root of trust should not be extractable from the TEE without physically destroying it;
  • From the root of trust, the TEE derives a public key which is communicated to the outside world;
  • Everyone can use the public key to encrypt some input (be it a program, data or whatnot) and give it to the TEE;
  • The input is decrypted only within the TEE. All the relationship between the TEE and various peripherals such as the memory are also encrypted;
  • The consequence of this is that the TEE is the only one that has access to the operations being performed. It will spit out an output in clear only when the computation ends.

This simple idea has many applications, the most obvious one being hard wallets: A hard wallet is nothing more than a secure enclave where your seed phrase works as the root of trust. Every operation, such as signing a transaction, happens within the TEE and the signing private key never leaves the Trusted Environment, making sure that everything stays secure.

As of late, TEEs have seen rising interest from the cryptocurrency community as they can be used to solve many longstanding problems related to MEV extraction. Unfortunately this solution is unsatisfactory for many reasons:

  • The most obvious one is that one should trust that the hardware is effectively reliable, that it is made according to spec and that the specs themselves do not have weaknesses. As with every complicated feat of engineering we know this is not always the case: The root of trust may be extracted using clever techniques such as side-channel attacks. Furthermore, if an attacker has direct access to the TEE and enough resources, the number of available techniques skyrockets;
  • Then, one needs to trust that the hardware manufacturer does not copy and store the root of trust during the production process. This is a big requirement given the proven track record of intelligence agencies such as the NSA interfering with hardware producers to install backdoors;
  • Finally, one needs to trust the TEE owner. The owner has physical access to the TEE and can tamper with it to extract the root of trust. Furthermore, the owner can strategically decide to replay programs. These attacks are actually easier to pull off than you may think. See this paper and this paper for a deeper dive.

Protecting against the root of trust being leaked out is very important, but outside of the scope of this post. For now, we will focus on point 3 on our list above, and precisely on protecting ourselves against replay attacks.

Replay attacks

The concept of replay attack is very simple. Imagine I have some program $P$, whichy I encrypt with the TEE public key and send to you. So you do not have acces to $P$, but your TEE does. From your perspective, the TEE acts like a black box: You give it an input $x$, and receive its evaluation $P(x)$, without necessarily knowing anything about $P$. Still, you may not like the output $P(x)$ too much: For instance, imagine that $P$ is some algorithm evaluating how to allocate some resource, and that $x$ is the aggregate of all bids from people interested in receiving a slice. Maybe, it turns out that $P(x)$ dictates that you receive much less cake than you expected. If you are in control of the TEE, you can fiddle with the input $x$ until you get an answer that you like. This works because in principle the TEE is not able to tell if the same program has been run multiple times.

There are some techniques to avoid this: For instance, if the input $x$ is assembled from multiple sources, the TEE may require $x$ to be pre-encrypted with some ephemeral key. In this setting, you may not be able to fiddle with the part of the input provided by the other parties. Still, you are able to fiddle with your part of the input, and replaying the program by changing that little part and replaying may be enough to influence the result in a way you want.

In this post, we look at a way to ensure that a program – any program – is executed only once. There are other techniques available, such as the TEE computing ephemeral keys that get discarded at the end of the session. Still, this sort of computation requires a trusted source of randomness, and must be implemented with the right care anyway. Our technique instead relies on the use of quantum resources, and does not require any source of randomness within the TEE.

A table is all you need

Given any bitstring $s$, we will denote with $s_i$ the $i^\text{th}$ bit of $s$. $s_i$ will often be rendered as s_i in code listings. Similarly, note that writings such as $s^x$ will be often rendered as s^x in code listings.

We will make use of the Hamming distance between binary strings. We will denote $x \sim y$ two strings that are close enough, that is, so that their Hamming distance is below a fixed treshold.

Fix a program $P$ and some input $x$. We want to evaluate $P(x)$ securely while also making sure that, once $P(x)$ is evaluated, we won’t be able to evaluate $P(x’)$ on any other $x’ \neq x$. To do this, we will define a new program $P’$ which accepts two inputs: $x$ and a keystring $r$ which depends on $x$. We want that:

  • $P’(x,r) = P(x)$ with overwhelming probability;
  • $P’(y,r) = \bot$ with overwhelming probability for any $y \neq x$.

Consider some cryptographic hash function $H$ which always outputs strings of $n$ bits, so $H(x)$ has length $n$ for any $x$. Consider a $2 \times n$ table $T$ and populate it with random binary entries, like so:

A 2xn table, randomly populated

Now, define $P’$ as follows:

read input $x$;
read input $r$;
h = Hash x;
define s binary string of length n;
for i = 1, i=n, i++ {
  if h_i = 0 {
    s_i = 0-th row, i-th column of T;
  } else {
    s_i = 1-st row, i-th column of T;
  }
}

if r ~ s {
  return P(x);
} else {
  return "Error";
}

As we can see, $P’$ is very simple. It has the table $T$ hardcoded into the program. The bits of the hash $H(x)$ are used to look up the table entries. The entries are used to build a string $s$.

For each column i, we use the i-th hash bit to decide from which row we want to read. This results in a bitstring of length n.

After having done this, $P’$ compares $s$ with the second input $r$ and evaluates to $P(x)$ only if the difference between $s$ and $r$ is small enough (more on this later!). Notice that, by definition, there may be many $r \sim s$ such that $P’(x,r)$ evaluates to $P(x)$.

In any case, this is the program that we send to the TEE. As this program is sent to the TEE encrypted, the user has no direct access to $T$. Still, given an input $x$, the user still needs to provide a matching $r$ to compute $P(x)$. Where does she get this from? That’s where quantum comes in!

The quantum sparkle

The quantum world is very different from the classical world. Whereas a classical bit can only be in one of two states – $0$ or $1$ – a quantum bit $\ket{\psi}$ can be represented as any point laying on a surface called the block sphere, which looks like so:

The Block sphere, with the three canonical complementary bases and a state psi highlighted.

Quantum operations on the bit consist in reflecting or rotating this vector $\ket{\psi}$ by a given phase. Most importantly for us, we can measure any quantum bit along a particular, chosen axis. The most used axes are the three orthogonal axes showing in the picture above, which are usually referred to as the $\ket{0}$–$\ket{1}$ basis, the $\ket{+}$–$\ket{-}$ basis, and the $\ket{+i}$–$\ket{-i}$ basis.

Measurements in the quantum world yield probabilistic results, and change the system being measured in a non-reversible way. A measurement is specified by choosing a given axis, for instance the $\ket{0}$–$\ket{1}$ basis. Measuring makes our state $\ket{\psi}$ collapse to either the state $\ket{0}$ or the state $\ket{1}$ with a certain probability, which is proportional to how close our state $\ket{\psi}$ was to either of these points. This is known as the Born rule. To visualize this, consider the example below:

The previous Block sphere, with the projections of psi along each basis highlighted.

Here, $\ket{\psi}$ is tilted towards $\ket{0}$, $\ket{-}$ and $\ket{+i}$. If we were to measure, say, in the $\ket{0}$–$\ket{1}$ basis, it is more likely that our measurement outcome would yield $\ket{0}$ than to $\ket{1}$. The bias towards $\ket{0}$ is proportional to the red highlighted line, which in turn depends on the red angle. Similarly, if we were to measure in the $\ket{+}$–$\ket{-}$ basis, we would be way more likely to obtain $\ket{-}$, with a bias proportional to the blue line. For the mathematically versed, quantum measurements can be realized as projections is the linear-algebraic sense.

Notice how since the axes we are choosing are ortogonal, the state $\ket{0}$ is completely unbiased with respect to the $\ket{+}$–$\ket{-}$ basis: Measuring along this axis would produce outcomes $\ket{+}$ or $\ket{-}$ with equal probability (50%). So, in particular, measuring $\ket{\phi}$ in the $\ket{0}$–$\ket{1}$ basis and then measuring it along the $\ket{+}$–$\ket{-}$ basis would yield $\ket{+}$ or $\ket{-}$ with the same probability (50%), since both possible outcomes of the first measurement are unbiased with respect to the second measurement basis.

Orthogonal states are unbiased with respect to each other.

A last, very important ingredient to our toolbox is the no cloning theorem, which says that there is no procedure to clone arbitrary quantum states. So we cannot make a copy of $\ket{\psi}$, and unless we know how it was prepared, the only way we have to probe it is by measuring it. But in doing so we would modify $\ket{\psi}$ forever. This suggests that quantum states are very different from classical states in that they are more similar to physical resources (like, say, matter): We cannot freely clone or destroy them, and, exactly like running a mass spectrometry to probe the chemical nature of something destroys the sample, so do quantum measurements with our quantum states.

We can use these facts to pack two different classical bits into a quantum system in such a way that only one of the bits can be recovered. There are two different techniques to do so, which we are going to briefly cover.

QRACs

Suppose you have a sequence of two classical bits. There are four possible sequences, namely $00,01,10,11$. Let us consider the circle on the Block sphere that lies on the same plane of the axes $\ket{0}$–$\ket{1}$ and $\ket{+}$–$\ket{-}$. We encode our four possible sequences into the following four states, all laying in the plane spanned by the $\ket{0}$–$\ket{1}$ and $\ket{+}$–$\ket{-}$ bases (shown in detail in the second image below):

Depiction of a QRAC on the Block sphere.
Depiction of the same QRAC as a bove, restricted to the relevant Block sphere circumference spanned by the complementary bases used.

As we can see, each of these states is placed in a precise way so that it is biased both along the $\ket{0}$–$\ket{1}$ basis and the $\ket{+}$–$\ket{-}$ basis. The recipe for extracting the classical bit is simple:

  • If you want the first bit, measure in the $\ket{0}$–$\ket{1}$ basis. Here, obtaining $\ket{0}$ is taken to mean “the first bit is $0$”, whereas obtaining $\ket{1}$ means “the first bit is $1$”;
  • If you want the second bit instead, measure in the $\ket{+}$–$\ket{-}$ basis. Again, $\ket{+}$ means “$0$”, whereas $\ket{-}$ means “$1$”.

We see that in both cases, the user will recover the i-th bit with overwhelming probability. Most importantly, as soon as the first measurement is performed the resulting state becomes completely unbiased towards the other basis, making the recovery of the other bit impossible. All the encoded information about the second bit has been destroyed by the first measurement!

The technique detailed here is a simplified version of a technique called Quantum Random Access Coding – QRAC in short. Optimized QRACs allow for the successful recovery of the chosen bit with ~85% of probability.

Conjugate coding

Conjugate coding was invented in the ’60s by Wiesner, which is also the original quantum money guy. Compared to QRACs, conjugate coding eliminates the probability of error – the end user can always recognize one of the two classical bits with certainty. This happens at the expense of using more quantum resources – double the number of qubits compared to QRACs. The idea of conjugate coding is easy:

  • For the first bit, we encode $0$ in the state $\ket{0}$, $1$ in the state $\ket{1}$;
  • For the second bit, we encode $0$ in the state $\ket{+}$, $1$ in the state $\ket{-}$;

So now we have two quantum bits, $q_0, q_1$, each encoding one of the two classical bits. We know that we can recover the first bit with certainty by measuring $q_0$ in the $\ket{0}–\ket{1}$ basis, whereas $q_1$ must be measured in the $\ket{+}–\ket{-}$ basis.

We now concatenate $q_0$ and $q_1$ randomly in a sequence of two qubits, that can either be $q_0 q_1$ or $q_1 q_0$. Without knowing in which order the qubits were concatenated, it is impossible to recover both bits at the same time:

  • If we measure along the $\ket{0}$–$\ket{1}$ basis, we obtain a sequence that certainly contains the value of the first bit, plus another bit of pure noise;
  • If we measure along the $\ket{+}$–$\ket{-}$ basis, we obtain a sequence that certainly contains the value of the second bit, plus another bit of pure noise;
  • If we measure the first qubit along a basis and the second along another, we may either obtain both bits with certainty, or a purely random string.

…The point is that it is impossible to establish in which of these scenarios we end up being without knowing the concatenation order. So no information can be extracted without knowing it.

1-shot QRAC-based programs

Probably by now you’re starting to see how the final protocol works. To make our programs 1-shot using QRACs, our user also receives $n$ qubits along with the encrypted $P’$, call them $q_1, q_n$. To obtain an accepting keystring $r$, the user must do the following:

h = Hash x;
define r binary string of length n;
for i = 1, i=n, i++ {
  if h_i = 0 {
    r_i = outcome of 0\1 measurement on q_i;
  } else {
    r_i = outcome of +\- measurement on q_i;
  }
}
return r;

Now you probably also see why, in the program $P’$, we don’t require $r = s$ but instead $r \sim s$: The bitstrings $r$ and $s$ should match only up to some point to account for the fact that QRACs probabilistic in nature, and so they will fail to produce the correct output from time to time.

Notice furthermore that if our user was particularly unlucky with her measurements, thus obtaining a keystring such that $P(x,r) = \bot$, it is very likely that an accepting string $r’$ lies withing a small hamming distance from $r$. So she could try to bruteforce the situation evaluating $P(x,r’)$ with $r’ \sim r$. She would have to be EXTREMELY unlucky and basically get a huge amount of the measurements wrong to make this technique unfruitful.

Security

To intuitively see why this protocol works, suppose that an attacker has $P’$ running encrypted on a TEE, qubits $q_0,\dots,q_n$, and an input $x$. He measures the qubits to obtain a string $r$, lets the TEE compute $P’(x,r)$ and obtains $P(x)$. Using $H(x)$ and $r$, attacker can guess with certainty the corresponding cells of the table $T$, that is, the deterministically inferred string $s$. Indeed, the fact that $P’(x,r)$ is accepted gives only reasonable certainty about the corresponding entries in $T$, as in the example below:

Depiction of how we can use the measurement string r to guess the original hardcoded string s.

But exactly as we said at the end of the last section, if attacker knows the Hamming distance threshold (which should be considered public information), he can replay $P’$ on inputs $(x,r’)$ with $r’ \sim r$, and infer $s$ with certainty observing which $P(x,r’)$ are accepted and which ones are not. Notice that replaying $P’$ in this way is not a problem, since every run will return $P(x)$. So, as of now, our attacker can just evaluate $P$ on input $x$, and the program is still one-shot. Importantly, this doesn’t tell the attacker anything with respect to the values of all the other cells.

Indeed, imagine that attacker wants to to guess the value at cell $(1,1)$ in the figure above. This is outside of the entries he knows with certainty. To probe the cell $(1,1)$ while leveraging the information accumulated so far, the attacker should first produce an input $x’$ such that:

  • The first bit of $H(x’)$ is $1$;
  • Every other bit of $H(x’)$ is as $H(x)$.

With this information, the attacker could determine the value at $(1,1)$ by evaluating $P(x’,r’)$ with enough $r’$ such that $r_i \sim s_i$ for bits $2,\dots,n$. Yet, this problem entails finding a preimage for $H$ given $H(x’)$. Generalizing on this, the attacker should devise a way to find a $x’$ such that $H(x’) \sim H(x)$, The farther away $H(x’)$ is from $H(x)$ the more bruteforce he has to do, which means we should be secure as long as $H$ is preimage-resistant1.

We can also harden the protocol further by requiring $T$ to be of size $2 \times m \cdot n$, and concatenate $H(x)$ $m$ times to extract the string $s$. This boosts security as the entries of $T$ are completely uncorrelated, at the expense of having to share more qubits.

1-shot Conjugate coding-based programs

To implement the conjugate-coding based version of one-time programs, our user must receive $2n$ qubits. She has to run the following program to obtain the string $r$:

h = Hash x;
define r binary string of length 2n;
for i = 1, i=n, i++ {
  if h_i = 0 {
    r_(2i-1) = outcome of 0\1 measurement on q_(2i-1);
    r_(2i) = outcome of 0\1 measurement on q_(2i);
  } else {
    r_(2i-1) = outcome of +\- measurement on q_(2i-1);
    r_(2i) = outcome of +\- measurement on q_(2i);
  }
}
return r;

Notice that in this case the string $r$ is twice as long as the one in the QRAC case, and half of it will be noise.

The program $P’$ must also be slightly edited: It needs to contain the hardcoded information about the conjugate coding order for each couple of bits. These are encoded as $n$ numbers, denoted $c_1,\dots,c_n$. Every $c_i$ tells us which qbit of the conjugate pair encoded the first bit. $P’$ must run a clean up routine to clear $r$ from the noise, like so:

read input $x$;
read input $r$;
h = Hash x;

// cleanup r from noise
define t binary string of length n;
for i = 1, i=n, i++ {

    \\ we want the first bit of the couple
    if h_i = 0 {
      \\ first bit is encoded in the first qbit
      if c_i = 0 {
        t_i = r_(2i-1);
      }
      \\ first bit is encoded in the second bit
      else {
        t_i = r_(2i);
      }
    \\ we want the second bit of the couple
    } else {
      \\ second bit is encoded in the second qbit
      if c_i = 0 {
        t_i = r_(2i);
      }
      \\ second bit is encoded in the first qbit
      else {
        t_i = r_(2i-1);
      }
    }
}
// t is the string r without the noise

// from here everything is as before
define s binary string of length n;
for i = 1, i=n, i++ {
  if h_i = 0 {
      s_i = 0-th row, i-th column of T;
  } else {
      s_i = 1-st row, i-th column of T;
  }
}

// Notably, we can use an hamming distance of 0 here
// Since conjugate coding always gives certain measurements
if t = s {
  return P(x);
} else {
  return "Error";
}

Security

Proof of security here is similar to the one for QRACs. Suppose that an attacker has $P’$ running encrypted on a TEE, qubits $q_0,\dots,q_{2n}$, and an input $x$. He measures the qubits to obtain a string $r$, lets the TEE compute $P’(x,r)$ and obtains $P(x)$. Again, using $H(x)$ and $r$, attacker can guess with certainty:

  • The secret string $s$, and so the corresponding cells of the table $T$;
  • For each couple of qubits $q_i,q_{i+1}$, which one encoded the first/second bit.

As before, the attacker does this by replaying $P’$ with slightly different versions of $r$ along with the same input $x$, and observing the resuts.

Notice how point 2 above is not a problem, since the attacker gets this information only after having measured all $q_0,\dots,q_{2n}$. Any data about ordering is completely useless at this stage since measurement already destroyed information about the complementary bits.

So, the attacker is pretty much in the same situation it was with QRACs: To probing other entries of $T$ is as hard as inverting the hash function $H$2.

Applications and bottlenecks

There are many obvious applications for such a protocol, such as one-shot signatures, which become trivially feasible in this setup.

As for bottlenecks, the biggest one is around storing and sharing the qubits to be measured. Sharing can be done using currently available QKD infrastructure. Unfortunately, albeit growing, this kind of infra is far from being ubiquitous at the moment. The storing part is harder, as quantum memories still have very short lifespans. It is worth noticing though that in this setup we do not need to store any complex information such as entangled states: Each qubit $q_i$ is on its own and completely independent from any other qubit $q_j$ (with $i \neq j$ of course!) this makes the requirements on quantum memory much leaner.

Notes

  1. To be completely precise, we require something more than simple preimage-resistance here: Given our target $H(x')$, a preimage $x''$ that evaluates to $H(x'')$ close enough to $H(x')$ may be enough, as the protocol exhibits some fault tolerance with respect to $r$. This slightly complicates proving security if one doesn't want to assume additional properties on $H$.

  2. Notice how in this case assuming pre-image resistance is really enough, since this protocol does not use Hamming distance anymore!

Twitter, Facebook

Comments