Equivariance Encryption for Private LLM Inference

Jul 14, 2025

This article shows how we can leverage equivariant transformations to run LLM inference on encrypted data without losing model performance.

The problem is clear: you want to use powerful LLMs for sensitive data, but sending your private information to cloud providers is a privacy nightmare. Medical records, financial documents, legal contracts - all this sensitive stuff needs LLM processing, but traditional inference means exposing everything in plaintext.

What if we could encrypt the input, run inference on the encrypted data, and get back meaningful results? That's exactly what equivariant encryption enables.

Equivariance is a mathematical property where transforming the input leads to a predictable transformation of the output. The formal definition:

$f(g(x)) = h(f(x))$

Where:

  • g is our input transformation (encryption)

  • h is the corresponding output transformation

  • f is our model

For LLM inference, this means we can encrypt user inputs, process them through a modified model, and decrypt the outputs to get the same results as if we processed the original data directly.

But implementing this is tricky. Most encryption schemes destroy the semantic relationships that LLMs depend on or add extra latency. We need something that encrypts the data while preserving enough structure for the model to work and be fast.

Why permutation?

We're starting with the simplest approach: vocabulary permutation. Think of it as a secret code where every word maps to a different word, but the relationships stay intact.

Let's consider the example: "The weather is sunny"

With permutation encryption:

  • "The" → token 45 → encrypted token 892

  • "weather" → token 156 → encrypted token 23

  • "is" → token 89 → encrypted token 445

  • "sunny" → token 234 → encrypted token 67

The encrypted sequence becomes: [892, 23, 445, 67]

The beautiful part? We can twist the model to understand this encrypted vocabulary while maintaining all the semantic relationships it learned during training.

Our approach modifies three key components:

Encrypted Tokenizer: $encrypted_id = \pi(original_id)$

Where pi is our secret permutation function generated from a client key.

Permuted Embedding Layer: Instead of the original embedding matrix E, we create: $E_{new}[encrypted_id] = E_{original}[original_id]$

Permuted Output Head: The language modeling head outputs probabilities over encrypted tokens, maintaining the same distributions but in permuted space.

This creates perfect equivariance: $Model_{encrypted}(\pi(tokens)) = \pi(Model_{original}(tokens))$

We implemented this on Llama 3.2 1B to test real-world performance. Our experiment setup:

  • Model: Llama 3.2 1B Instruct

  • Hardware: T4 GPU (CUDA required)

  • Encryption: Cryptographically secure permutation from SHA-256 derived seeds

  • Test prompts: Various categories from factual to creative

  • Metrics: Generation quality, inference speed, encryption overhead

The implementation replaces only the embedding and output layers - the core transformer blocks remain unchanged. This is crucial for maintaining the model's learned representations.

We only rearrange the embedding rows - no additional mathematical transformations. This preserves the semantic distances that make LLMs work:

$W_{new}[i] = W_{original}[\pi^{-1}(i)]$

Practical Performance:

  • Encryption/decryption: <1ms overhead per request

  • Inference speed: Same as original model (permutation happens in parallel)

  • Memory usage: Identical to base model

  • Model quality: Preserved (semantic relationships intact)


What's next?

This permutation approach is just the beginning. The equivariance framework opens up possibilities for more sophisticated encryption schemes:

  • Orthogonal transformations in embedding space

  • Multi-layer encryption with different transformations per layer

  • Dynamic permutations that change based on context

Can we build encryption schemes that are both cryptographically secure and preserve model performance? How do we balance privacy, security and utility in production systems?

Stay tuned for more updates as we explore the frontier of private LLM inference!