A compressed cryptic diner with approximations

That's why I like friendly people from Guix community: a casual diner discussing various topics about life mixed with topics about current hot Guix issues, and in the middle of the discussion bang a new unknown world! An excerpt:

– Yeah, tomorrow I’ll have a meeting about post-quantum cryptography.

– Wait, post-quantum cryptography?

– It considers that a quantum computer is available for the attacker but not for Alice & Blake who run the current good ol’ computers. Therefore the most widely used public-key algorithms relying on integer factorization problem, discrete logarithm problem, or elliptic-curve discrete logarithm problem, etc. are defeated.

– …

– Do you know error correction code?

And then we dived into some detailed discussion. Well, I’ve learnt many things and I poorly expressed a naive still dumb intuition. Hence this post: a memory note for my future self about that new world and some explanations to back my naive and incorrect “intuition“.

Well, last time I wrote down about error correcting core was in 2006. Ouch, time flies! And I’m not familiar with the finite field \(\mathbb{F}_{2}\). Therefore, please take all this with a lot of grain of salt. And on the top of this salty plate, please dredge some Dunning–Kruger effect. Plonk!

Error correcting code

Let consider a \(t\) error \((n, k)\) linear code where:

\(\mathcal{G}\) is an efficient decoding algorithm,
\(G\) is any generator \((k, n)\) matrix of this linear code.

Such error correcting code holds the property:

\begin{equation} \label{orgc974c35} \Big(xG + y\Big)\mathcal{G} = x \quad\quad \forall x,\quad \forall y,\quad ||y||_{0}\leq t \end{equation}

It means: once the message \(x\) of length \(k\) is encoded, i.e., \(xG\) of length \(n\), it can still be decoded although it contains at most \(t\) errors. Here the link between \((n, k)\) and \(t\) is let as an exercise for the reader.

Please note that \(||\cdot ||_{0}\) defined as a procedure that counts the non-zero elements isn’t a mathematical norm; it’s an abuse terminology and notation. Despite not being a norm, the associated metric, known as Hamming distance, is a valid distance.

The notation might appear weird, especially the left application; why not? Concretely, the message \(x\) is of length \(k\) and is a row vector, then the encoded message is of length \(n>k\). All in all, in summary:

Encoding: apply the matrix-vector product \(xG\),
Decoding: apply the algorithm \(\Big(\cdot\Big)\mathcal{G}\).

McEliece cryptosystem

The McEliece cryptosystem is an encryption algorithm developed by Robert McEliece in 1978 and based on error correcting code. Although probably not designed initially for post-quantum cryptography, it is a now a candidate for post-quantum cryptography, because the claim¹ – without knowing \(\mathcal{G}\), decoding is NP-hard – resists to quantum attacks². Hence, the McEliece cryptosystem exploits two strategies:

The encoding matrix \(G\) is obfuscated,
As many errors as possible are added, up to \(t\), when encrypting.

Thus, the asymmetric cryptosystem reads:

\(\Big\{\mathcal{G}, S, P\Big\}\) is the private-key, where:
- \(S\) is a random non-singular \((k, k)\) matrix,
- \(P\) is a random permutation \((n, n)\) matrix.
\(\Big\{H, t\Big\}\) is the public-key, where \(H = SGP\).

The encryption of the message \(m\) is straightforward: \(c = mH + e\) where \(||e||_{0}=t\). Then the encrypted message \(c\) can be exchanged and only the person knowing the private-key is able to decode.

Deciphering applies \(P^{-1}\mathcal{G}S^{-1}\) to the encrypted message \(c\). Namely, it reads:

\begin{eqnarray*} cP^{-1}\mathcal{G}S^{-1} &=& \Big(mH + e \Big)P^{-1}\mathcal{G}S^{-1} \\ &=& \Big(mSGP + e \Big)P^{-1}\mathcal{G}S^{-1} \\ &=& \Big(mSGPP^{-1} + eP^{-1} \Big)\mathcal{G}S^{-1} \\ &=& \Big(mSG + eP^{-1} \Big)\mathcal{G}S^{-1} \end{eqnarray*}

Considering \eqref{orgc974c35}, a part of the last line from above reads:

\begin{equation*} \Big(mSG + eP^{-1} \Big)\mathcal{G} = \Big(xG + y \Big)\mathcal{G} = x = mS \end{equation*}

where \(||y||_{0}=||eP^{-1}||_0=||e||_{0}=t\) because \(P^{-1}\) is a permutation matrix. Finally,

\begin{eqnarray*} cP^{-1}\mathcal{G}S^{-1} &=& \Big(mS \Big)S^{-1} \\ &=& m \end{eqnarray*}

So far, so good!

What’s one annoying problem with McEliece cryptosystem? The size of the keys and especially the public-key, since the public-key might be broadly shared with many peers.

Now, what follows draws my naive “intuition” which is surely incorrect and wrong!

Matrix approximation

Let consider \(\mathcal{C}\) an algorithm that is able to “approximate” a matrix, e.g., \(A=\Big(G\Big)\mathcal{C}\). Such approximation means the encoding produces errors, i.e., \(xA = xG + \varepsilon_{x}\). Therefore, the question becomes the control of \(||\varepsilon||_0\) compared to the errors \(t\) that the error correcting code is able to correct. It’s probably very difficult to get an explicit formal bound, thus, the engineer in me would be tempted to evaluate it: generate \(i\) random message and compute \(\mathbb{E}(||x_iA - x_iG||_0)\). Arf.

Again, \(A\) is an approximation of the encoding application of \(G\), so it means that \(A\) and \(G\) have the same matrix size. The idea is to then reduce the public-key size by a lossless compression, still keeping the two key ingredients (obfuscation and as much as possible errors).

Assuming such approximation algorithm, the public-key becomes \(F=SAP\) and the encrypted message \(c^\prime = mF + e^\prime\) where \(||e^\prime||_0\) depends on \(t\) and also the error of approximation \(||\varepsilon||_0\).

The deciphering reads similarly as before:

\begin{eqnarray*} cP^{-1}\mathcal{G}S^{-1} &=& \Big(mF + e^{\prime} \Big)P^{-1}\mathcal{G}S^{-1} \\ &=& \Big(mSA P + e^{\prime} \Big)P^{-1}\mathcal{G}S^{-1} \\ &=& \Big(mSA + e^\prime P^{-1} \Big)\mathcal{G}S^{-1} \\ &=& \Big(mSG + \varepsilon + e^{\prime}P^{-1} \Big)\mathcal{G}S^{-1} \end{eqnarray*}

The algorithm \(\mathcal{G}\) is able to decode when the number of errors is less than \(t\). In other words, here \eqref{orgc974c35} holds if \(||\varepsilon + e^{\prime}P^{-1}||_0\leq t\).

It’s where all becomes very tedious:

It’s probably difficult to get a bound for \(||\varepsilon||_0\).
Triangle inequality doesn’t help much about \eqref{orgc974c35}.

All that said, it doesn’t answer either to the these fundamental questions:

Does the public-key \(F\) lossless compress compared to \(H\)?
A good rate of a lossless compression means the public-key \(F\) owns an internal structure, therefore, does it open surfaces of attack?

Approximating by compressed sensing

Warning: The usual notation by compressed sensing community is column vector and not row vector. Well, it’s left or right matrix-vector product and some transpose around. For consistency, the previous notation is kept; it does not matter much. Be aware when reading literature!

Let \(x\) a signal of length \(p=kn\). Let \(\Psi\) a basis \((p, p)\) matrix where \(x=\chi\Psi\) and \(||\chi||_0\leq ||x||_0\). Let \(\Phi\) a measure \((p, q)\) matrix where \(q < p\); it means \(q\) samples of \(x\).

The Compress Sensing problem solves:

\begin{equation} \label{org4149e43} s = argmin~~||z||_{i} \quad such~that \quad ||x\Phi - z\Psi\Phi||_{j} \end{equation}

where the choice of the norms is let as an exercise³ for the reader. The key here is about the choice of the pair \((\Psi, \Phi)\): it must satisfy the Compressed Sensing properties, e.g., Restricted Isometry Property (RIP), Exact Reconstruction Principle, Uniform Uncertainty Principle (UUP), etc.

The idea of Compress Sensing is to capture a sparse representation of a signal with few measures. And under some constrainted conditions, there is no loss between the original signal and its sparse representation.

Let consider the approximation algorithm for a \((k, n)\) matrix named \(X\):

\(x=\Big(X\Big)\mathcal{V}\) where \(\mathcal{V}\) is an algorithm which transforms the \((k, n)\) matrix into a \(kn\) vector; assuming such algorithm is invertible.
Solve the Compress Sensing problem.
\(y=s\Psi\).
Return \(Y=\Big(y\Big)\mathcal{V}^{-1}\) where \(\mathcal{V}^{-1}\) transforms back the \(kn\) vector into a \((k, n)\) matrix.

The question becomes the control of the error of the approximation. For instance, is it possible to bound \(||x - s\Psi||\) in term of errors \(t\)?

If we consider \(A\) the resulting approximated encoding matrix by such Compressed Sensing algorithm, then the faulty encoding process is \(xA = xG + \varepsilon_{x}\) for any message \(x\). The question: is \(||\varepsilon_{x}||_0\) controllable with decoding errors rate \(t\)?

If yes – huge assumption! –, it means one ingredient is satisfied, what about the other? Is \(A\) obfuscated enough? If yes – another huge assumption! – then \(A\) can be used as a public-key. The approximated matrix is built to be sparse so well lossless compressed, but such compression, does it provide elements for an attack?

If the approximated encoding matrix \(A\) isn’t obfuscated enough, then the public-key generation needs the application of matrices \(S\) and \(P\). Does the result keep sparsity property for being lossless compressed? If no, doomed!

If the error isn’t controllable with the decoding errors rate \(t\), woof, almost doomed too.

That’s all for my naive still dumb intuition. Ready for yet another compressed cryptic diner with approximations?

Edit. After breakfast…

…And many coffees. Now I remember my personal motto that I must never forget: When I have an idea, either:

it's just a bad idea;
someone already had the same idea and often even better articulated;
both!

In addition, in this post, I must plead guilty of beeing ultracrepidarian (not beyond the shoe) by offering opinions beyond my expertise. Hm, it seems I’m going to continue…

Checkout this Red Hat post about code-based cryptography published on June 2024 – part 2 of 4.

Well, I’ve read parts of a book chapter A Survey on Code-Based Cryptography – where the almost two hundred of pages don’t read as a novel but are still quite readable for one used to mathematical notations and some algebra. This book chapter sets on the correct rail the slap-dash I wrote above. Hum, my understanding is still quite limited, hence it surely remains uncorrected errors in what follows below, so please apply an error correcting code when reading my prose. Bah otherwise, instead of such prose, it’s better to directly dive into code-based cryptography literature!

Why is the exchange of compressed public-key more complex than it might appears at first? Because McEliece cryptosystem relies on two assumptions:

Decoding a random linear code is hard,
and The public code is not distinguishable from a random code.

Although one could envisage having security without indistinguishibility, then cryptographers like to err on the side of caution; I guess… or I bet! Being distinguishable paves the way for hypothetical structural strategies.

Roughly, the attacker has two possibilities: (a) recover the clear message from both the encrypted message \(c=mH+e\) and the public-key \((H, t)\) – these both only –, or (b) recover the private-key. The keystone of the both attacks is to potentially exploit some structure exposed when the cryptosystem does not behave “randomly”.

Message-recovery attack

The message-recovery attack exploits the internal algebraic structure of the public-key. Let provide one abstract way without introducing too much details; the practical details for a concrete implementation are let as an exercise.

The \((k, n\)) matrix public-key \(H\) might be transformed into the systematic form: \(UHV=[Id_k~;\ C]\) where \(U\) is invertible and \(V\) a permutation matrix. The systematic form is an algebraic property of \((n, k)\) linear code, thus it means the code resistance is about the complexity of algorithms for computing such transformation. Please note the public-key \(H\) is generated by the obfuscation of the private-key and such obfuscation \(H=SGP\) seems willing to confuse at most: being indistinguishable – the counter-side of the systematic form but the same ingredients: invertible and permutation matrices.

Based on the systematic form, the parity-check matrix associated to the linear code generated by the public-key \(H\) might then be defined by the \((n-k, n)\) matrix: \(R=[-C^T~;\ Id_{n-k}]\). Let define the syndrome of any message \(x\) of length \(n\) by the application of the parity-check matrix, namely the syndrome is evaluated by \(xR^{T}\). Well, the parity-check matrix is somehow about the null-space of the \((n, k)\) linear code generated by \(H\) (and without knowing any decoding algorithm), i.e.,

\begin{equation*} HR^{T} = \mathbf{0}_{(k, n-k)} \end{equation*}

In plain words, it means any vector belonging to the linear code returns to the null-vector under the parity-check matrix; the syndrome of an encoded message is null. Applying such parity-check matrix to the encrypted message \(c=mH+e\), it leads to:

\begin{eqnarray} cR^{T} &=& \Big(mH + e\Big)R^{T} \nonumber \\ &=& mHR^{T} +eR^{T} \nonumber \\ &=& eR^{T} \label{eqn:3} \end{eqnarray}

It means the both syndromes of the encrypted message \(c\) and of the (unknown) added error \(e\) are evaluated to the same. This provides a mean⁴ to find one error vector candidate \({e}\), considering the attacker also knows \(||e||_0\leq t\) that satisfies the given⁵ syndrome. Therefore, subtracting such error vector to the encrypted message provides the encoded message, i.e., \(c - {e} = mH\).

Remember the systematic form \(UHV=[Id_k~;\ C]\), hence:

\begin{eqnarray*} mH &=& mH \\ c - {e} &=& mU^{-1}UH \quad\quad\quad where \ U^{-1}U=Id_{k}\\ (c - {e})V &=& xUHV \quad\quad\quad x=mU^{-1} \ , \quad apply \ V \\ (c - {e})V &=& x~[Id_k~;\ C] \quad\quad\quad replace \ UHV\\ (c - {e})V &=& [x~;\ xC] \\ \end{eqnarray*}

Therefore, it’s enough to read the \(k\) first entries of \((c - {e})V\) in order to get \(x\) and then by applying \(U\), the clear message \(m\) is available. Oops.

All in all, when the public-key is exchanged under a lossless compressed representation, such lossless compression reveals an internal structure of the public-key. And this knowledge eases the process for decomposing the public-key by structural strategies, hence a surface of attack.

Moreover, the Compressed Sensing problem \eqref{org4149e43} seems similar to the syndrome decoding problem (\ref{eqn:3}) where the measure matrix \(\Phi\) would read the parity-check matrix \(R^T\) and the sparsity would be the error-rate \(t\). Well, one might guess that trying to find a sparse representation of the private-key \(G\) and then release it as public-key \(H\) would make the syndrome decoding problem resolution easier. Maybe?

Two notes for my future self

Considering a faulty encoder, e.g., \(xA = xG + \varepsilon_x\) – based on the \((n, k)\) linear code generated by the \((k, n)\) matrix \(G\) and owning a decoding algorithm \(\mathcal{G}\) – it seems difficult to release a public-key from this faulty encoder because of \(\varepsilon_x\). Let assume it’s doable, let also assume the fault \(\varepsilon\) is controlled, and let note \(F\) such obfuscated faulty encoder. Hence, the encrypted message reads: \(c = mF + e\). Somehow, this encrypted message might also be:
\begin{equation*} c = mG + \tilde{e} + e \end{equation*}
where \(\tilde{e}\) is related to the error from faulty encoder. Somehow, this story about “matrix approximation” (faulty encoder) looks – from far far away – like some poorly-baked variant of Augot-Finiasz cryptosystem⁶, from my little understanding.
The idea of Low density parity-check code⁷ (LDPC) first introduced by Robert Gallager in 1962 is to have a parity-check matrix that is sparse. Then the Niederreiter cryptosystem developed in 1986 is a variant of the McEliece cryptosystem: it applies the same idea but relies on the parity-check matrix instead of the generated matrix. My initial poor intuition above was to take advantage of Compressed Sensing for having such sparsity; wrong! However, maybe LDPC and Compressed Sensing could be connected⁸.

Here, we are! If my future self reads this, I hope my future self still strongly remembers the Dunning-Kruger effect bites everyone and Sutor, ne supra crepidam and the motto above. That’s said, I’ve enjoyed discussing code-based cryptography and then reading about it. Heh!

Footnotes:

A claim that is proved, I guess.

Based on my poor understanding.

Considering floating-point numbers, \(i\) reads \(\ell_{1}\) and \(j\) might be \(\ell_2\) or else. For other fields, I’ve no idea!

⁴

The syndrome decoding method is an highly efficient method of decoding a linear code over noise.

⁵

For instance, Prange’s algorithm returns the vector \(e\) of weight \(t\) such that \(R^T e = s\), for the known syndrome \(s\).

⁶

Section 3.5, p.58 of A Survey on Code-Based Cryptography.

⁷

Section 2.2.6 p.17 of A Survey on Code-Based Cryptography.

⁸

At least on see Compressed sensing and linear codes over real numbers published in IEEE 2008 Information Theory and Applications Workshop.