0% found this document useful (0 votes)
25 views

A New Neural Distinguisher Considering Features Derived From Multiple Ciphertext Pairs

paper 3

Uploaded by

lchc1214
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

A New Neural Distinguisher Considering Features Derived From Multiple Ciphertext Pairs

paper 3

Uploaded by

lchc1214
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

© The Author(s) 2022. Published by Oxford University Press on behalf of The British Computer Society.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution,
and reproduction in any medium, provided the original work is properly cited.
Advance Access publication on 11 March 2022 https://doi.org/10.1093/comjnl/bxac019

A New Neural Distinguisher


Considering Features Derived From
Multiple Ciphertext Pairs
Yi Chen, Yantian Shen, Hongbo Yu∗ and Sitong Yuan

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


Department of Computer Science and Technology, Tsinghua University, Haidian District, Beijing
100084, P. R. China
∗ Corresponding author: [email protected]

Neural-aided cryptanalysis is a challenging topic, in which the neural distinguisher (N D) is a core


module. In this paper, we propose a new N D considering multiple ciphertext pairs simultaneously.
Besides, multiple ciphertext pairs are constructed from different keys. The motivation is that the
distinguishing accuracy can be improved by exploiting features derived from multiple ciphertext
pairs. To verify this motivation, we have applied this new N D to five different ciphers. Experiments
show that taking multiple ciphertext pairs as input indeed brings accuracy improvement. Then,
we prove that our new N D applies to two different neural-aided key recovery attacks. Moreover,
the accuracy improvement is helpful for reducing the data complexity of the neural-aided statistic
attack. The code is available at https://github.com/AI-Lab-Y/ND_mc.

Keywords: neural distinguisher; cryptanalysis; multiple ciphertext pairs; data reuse;


Received 20 July 2021; Editorial Decision 30 January 2022; Accepted 30 January 2022
Handling editor: Chris Mitchell

1. INTRODUCTION
To improve the performance of N D, researchers have
In CRYPTO’19, Gohr improved attacks on round reduced explored N D from different directions. The most popular
Speck32/64 using deep learning [1], which created a precedent direction is adopting different neural networks. In [3], Jain
for neural-aided cryptanalysis. The neural distinguisher (N D) et al. proposed a multi-layer perceptron network (MLP)
proposed by Gohr plays a core role in [1]. Its target is to distin- to build N Ds against PRESENT reduced to 3, 4 rounds.
guish real ciphertext pairs (C0 , C1 ) corresponding to plaintext In [4], Yadav et al. also built an MLP-based 3-round N D
pairs with a specific difference from random ciphertext pairs. against Speck32/64. In [5], Bellini et al. compared MLP-
N D takes a ciphertext pair (C0 , C1 ) as input, and gives the based and convolutional neural network-based distinguishers
classification result. with classic distinguishers. In [6], Pareek et al. proposed
The performance of N D is important for neural-aided crypt- fully-connected network-based distinguisher against the key
analysis. For Gohr’s key recovery attack [1], the most important scheduling algorithm of PRESENT. Another popular direction
step is identifying the right plaintext structure that passes the is changing the input of N D. In [7], Baksi et al. used the
differential placed before N D. To attack 11-round Speck32/64, ciphertext difference C0 ⊕ C1 as the input. In [2], Chen et al.
Gohr adopted a 6-round N D and 7-round N D for identifying suggested that the ND can be built by flexibly taking some bits
the right plaintext structure. The identification result is given by of a ciphertext pair as input. In [8], Hou et al. investigate the
the 6-round N D instead of 7-round N D. Compared with the influence of input difference pattern on the accuracy of N Ds
7-round N D, the 6-round N D achieves higher distinguishing against round reduced Simon32/64.
accuracy. This implies that a stronger ND is more helpful for These above N Ds can be viewed as the same type since
Gohr’s attacks. Recently, Chen et al. [2] proposed a generic only features hidden in a single ciphertext pair are exploited.
neural-aided statistical attack (NASA) for cryptanalysis. The Thus, another natural way is taking more ciphertexts as the
data complexity of NASA is strongly related to the distinguish- input. In [9], Benamira et al. initially tested this idea as follows.
ing accuracy of N D. First, a group of B ciphertexts is constructed from the same

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1420 Y. Chen et al.

key. Second, take a group of B ciphertexts as the input of N D. the data complexity of the attack [2] can be reduced by
Finally, based on a large B, the accuracy of N Ds against 5- using the new N D.
round and 6-round Speck32/64 is increased to 100%, which is
a huge improvement. 1.2. Outlines
Previous findings especially the work in [9] inspired us to
This paper is organized as follows:
think about the deeper motivation of taking more ciphertexts
as input. We believe that the deeper motivation stands for a
• Section 2 presents preliminaries, including some impor-
generic method for improving N D. The N D processing a
tant notations and five related ciphers.
group of B ciphertexts has two important characteristics: (1)
• In Section 3, the N D proposed by Gohr and two key
the input contains more ciphertexts, (2) all the ciphertexts in
recovery attacks are briefly reviewed.
a group share the same key. Since Ankele and Kölbl [10],
• Section 4 presents the new N D including the motiva-

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


as well as Gohr [1], reported significant key-dependency in
tion, model, the neural network for implementing the
the output distribution in round reduced Speck, we wonder
new N D, and the training pipeline.
whether the same key is a core factor that brings significant
• Section 5 presents the verification framework.
improvement.
• In Section 6, we build N Ds for five ciphers and perform
an analysis.
1.1. Our contributions • In Section 7, we show how to perform key recovery
attacks using the new N D. A data reuse strategy is also
In this paper, our work contains five contributions:
proposed in this section.
• By introducing a clear deep motivation, we propose a
new N D considering multiple ciphertext pairs simulta- 2. PRELIMINARIES
neously. The motivation is as follows. When ciphertext
pairs corresponding to plaintext pairs with a specific 2.1. Notations
difference obey a non-uniform distribution, there are
some derived features from multiple ciphertext pairs.
Once neural networks capture these features, N D would P, C Plaintext, Ciphertext
obtain performance improvement. α Plaintext difference
• We prove that the same key is not the core factor that N, M The number of plaintext or ciphertext pairs
brings significant improvement in [9]. We made the N Dk=? N D with k ciphertext pairs as input
conclusion by testing the accuracy of N Ds against round Z The output of an N D
reduced Speck32/64 under two different scenarios: one r The number of reduced rounds
is that ciphertext pairs in a group share the same key,
one is that ciphertext pairs in a group adopt different 2.2. Five ciphers
keys. In the first scenario, the key for generating a cipher-
text group each time is randomly selected. Experiments We choose five different ciphers for supporting our work.
show that the same key has small or no influence on
N Ds. • Speck32/64 [11] is a lightweight block cipher whose
• We design a verification framework for further directly block size is 32 bits. Its non-linear component is the
checking that derived features from multiple ciphertext modular addition.
pairs are learned. This framework is composed of two • Chaskey [12] is a Message Authentication Code (MAC)
tests: false-negative test (FNT), false-positive test (FPT). algorithm whose intermediate state size is 128 bits. Its
• We build two types of N Ds for five round reduced non-linear component is the modular addition.
ciphers: Speck32/64, Chaskey, PRESENT, DES, SHA3- • Present64/80 [13] is a block cipher whose block size is
256. The first one is the N D proposed by Gohr, and 64 bits. Its non-linear component is a 4 × 4 Sbox.
the other one is our new N D. These experiments further • DES [14] is a block cipher whose block size is 64 bits.
prove the advantage of taking multiple ciphertext pairs Its non-linear component is given by eight different 6×4
as input and support the presented deep motivation. Sboxes.
• We prove that the N D taking multiple ciphertext pairs • SHA3-256 [15] is a hash function whose intermediate
as input applies to key recovery, which is not discussed state size is 1600 bits. Its non-linear component can be
in previous research. At the time of writing, there are seen as the application of a 5-bit Sbox applied in parallel
only two key recovery attacks [1, 2] based on the N D 320 times.
proposed by Gohr. We show how to apply new N D to
these two attacks. Due to the performance improvement, We refer readers to [11–15] for more details of these ciphers.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1421

2.3. Computing resources (c) Compute the rank score Vkg of kg as:
m  
In this paper, the available computing resources are: an Intel(R) Zi
Core(TM) i5-7500 CPU @ 3.40GHz, a graphics card (NVIDIA Vkg = log2 (3)
1 − Zi
GeForce GTX 1060 6GB). i=1

(d) If Vkg exceeds a threshold c1 , save kg as a subkey


3. RELATED WORK candidate.
3.1. Gohr’s neural distinguisher 3. Return kg with the highest key rank score as the final
subkey guess.
In [1], Gohr built N Ds against round reduced Speck32/64. The
N D proposed by Gohr is a generic distinguisher since it only The value of c1 and m is set experimentally.
A differential ΔP → α can be placed before the N D to

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


requires a plaintext difference constraint.
Consider a cipher E and a plaintext difference α. Gohr’s N D extend the rounds covered by the attack (see Fig. 1). With the
aims at distinguishing two classes of ciphertext pairs help of neutral bits [17], ciphertext structures consisting of m
 positive samples or negative samples can be generated. Then
1, if P0 ⊕ P1 = α a high rank score occurs only when the structure consisting of
Y(C0 , C1 ) = (1)
0, if P0 ⊕ P1 = α positive samples is decrypted by the true subkey. More details
can refer to [1].
where (C0 , C1 ) is the ciphertext pair corresponding to the
plaintext pair (P0 , P1 ), and Y is the label of (C0 , C1 ). 3.3. Neural-aided statistical attack
We denote ciphertext pairs corresponding to plaintext pairs The NASA proposed by Chen et al. [2] is performed as follows:
with the target difference α as positive samples, and denote
ciphertext pairs corresponding to plaintext pairs with a random 1. Randomly generate N plaintext pairs with a difference
difference as negative samples. ΔP.
If a neural network achieves a distinguishing accuracy higher 2. Collect the ciphertext pairs.
than 0.5 over randomly selected ciphertext pairs, the neural 3. For each possible subkey guess kg:
network is a valid N D. (a) Decrypt N ciphertext pairs with kg.
In [1], Gohr chose a residual network [16] with one output (b) Feed partially decrypted ciphertext pairs into the ND
neuron. Thus, the output Z of Gohr’s N D is also used as the and collect the outputs Zi , i ∈ [1, N].
following posterior probability (c) Count the following statistic T:
Pr (Y = 1 |(C0 , C1 ) ) = F1 (f (C0 , C1 )) 
N 
(2) 1, if Zi > c2
0  Pr (Y = 1 |(C0 , C1 ) )  1 T= φ (Zi ), φ (Zi ) = (4)
0, if Zi  c2
i=1
where f (C0 , C1 ) stands for features learned by the ND from
(C0 , C1 ), F1 (·) is the posterior probability estimation function (d) If T exceeds a decision threshold t, save kg as a
learned by the N D. If Pr(Y = 1|(C0 , C1 )) > 0.5, the label of subkey candidate.
(C0 , C1 ) predicted by the N D is 1.
4. Return all the surviving subkey candidates.
3.2. Gohr’s key recovery attack Chen et al. proposed a theoretical framework to estimate N
and t. The value of c2 is set in advance, which does not influence
Given an N D, we denote the output of N D as Z. Positive
the estimation of N, t.
samples are expected to obtain a higher posterior probability
According to Figure 1, Chen et al. summarized three types
than negative samples, which is the core idea of Gohr’s key
of probabilities:
recovery attack [1].
Consider an (r + 1)-round cipher E and an r-round N D built Pr(Z > c2 |S0 ⊕ S1 = α, kg = sk) = p1 (5)
over a plaintext difference α. Gohr’s attack recovers the subkey
of the (r + 1) − th round as follows:
Pr(Z > c2 |S0 ⊕ S1 = α, kg = sk) = p2 (6)
1. Generate m positive samples with α randomly.
2. For each possible subkey guess kg:
Pr(Z > c2 |S0 ⊕ S1 = α) = p3 (7)
(a) Decrypt m positive samples with kg.
(b) Feed partially decrypted samples into the N D and where sk is the true subkey. These three probabilities p1 , p2 , p3
collect the outputs Zi , i ∈ [1, m]. are related to the N D.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1422 Y. Chen et al.

FIGURE 2. P1 (x) : a Gaussian distribution. P2 (x) : a uniform


distribution.

4. NEW NEURAL DISTINGUISHER

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


FIGURE 1. The key recovery process. The prepended differential
ΔP → α is satisfied with a probability p0 . The intermediate state pair 4.1. Motivations
is (S0 , S1 ). The motivations of our new N D contain two aspects.
First, in the machine learning community, providing more
NASA returns all the possible subkey candidates. Besides, features is a common method to improve the accuracy of
NASA allows us to set two ratios β0 , β1 in advance. The ratio neural networks. For example, depth map estimation [18] and
β0 is the expected probability that the true subkey sk survives action recognition [19] are both tackled by feeding various
the attack. The ratio β1 is the expected probability that wrong features (e.g. stereo knowledge [20], depth maps [21]) into
subkey guesses survive the attack. neural networks simultaneously.
Based on p0 , p1 , p2 , p3 , β0 , β1 , the required N is: Second, there are some useful features among multiple sam-
ples drawn from the same non-uniform distribution. Figure 2
√ z1−β0 × v0 + z1−β1 × v1 shows a simple example. If we randomly draw two samples
N= (8) (x11 , x21 )/(x12 , x22 ) from a Gaussian distribution or a uniform dis-
(p1 − p2 ) × p0
tribution, the average distance of two samples is d1 /d2 . Then,
where it is expected that d1 < d2 , which is useful for distinguishing
the two distributions.
 Based on the two common phenomena, we obtain the idea of
v0 = p0 × p1 (1 − p1 ) + (1 − p0 )p3 (1 − p3 )
building a new N D by considering multiple ciphertext pairs.

4.2. New distinguisher model



v1 = p0 × p2 (1 − p2 ) + (1 − p0 )p3 (1 − p3 ), Our new N D needs to distinguish two types of ciphertext
groups (C1,1 , C1,2 , · · · , Ck,1 , Ck,2 ):
and z1−β0 , z1−β1 are the quantiles of the standard normal
distribution. 
1, if Pj,1 ⊕ Pj,2 = α, j ∈ [1, k]
The decision threshold t is: Y= (10)
0, if Pj,1 ⊕ Pj,2 = α, j ∈ [1, k]

t = μ0 − z1−β0 × σ0 (9) where Y is the label of ciphertext groups, and (Cj,1 , Cj,2 )
is the ciphertext pair corresponding to the plaintext pair
where (Pj,1 , Pj,2 ), j ∈ [1, k].
According to the introduced motivation, the requirement is
μ0 = N × (p0 p1 + (1 − p0 ) p3 ) that ciphertext pairs in a group are randomly sampled from
the same distribution. To minimize influencing factors, we ask
that a ciphertext group is constructed from k random keys if
 the cipher needs a key. This ensures that k ciphertext pairs
σ0 = N × p0 × p1 (1 − p1 ) + N(1 − p0 )p3 (1 − p3 ) do not have any same properties except for the same plaintext
difference constraint.
Our new N D can be described as
If c2 = 0.5, the distinguishing accuracy of the ND is (p1 +
1 − p3 ) × 0.5. Thus, the data complexity of NASA is strongly Pr (Y = 1 |X1 , · · · , Xk )
related to the N D. We refer readers to [2] for more details of = F2 (f (X1 ) , · · ·, f (Xk ) , ϕ (f (X1 ) , · · · , f (Xk ))) (11)
NASA. Xi = Ci,1 , Ci,2 , i ∈ [1, k]

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1423

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


FIGURE 3. The network architecture of our new ND. Conv stands for a convolution layer with Nf filters. The size of each filter is Ks ×Ks . Module
2 also adopts the skip connection [16]. FC is a fully connected layer that has d1 or d2 neurons. BN is batch normalization. Relu and Sigmoid are
two different activation functions. The output of Sigmoid ranges from 0 to 1.

where f (Xi ) represents the basic features extracted from the built over a 2D convolutional layer. The 2D filters with a size
ciphertext pair Xi , ϕ (·) is the derived features, and F2 (·) is the of Ks × Ks can learn derived features from k ciphertext pairs.
new posterior probability estimation function. In this article, we use one residual block for building our new
The motivation also puts forward some design guidelines for N Ds.
the neural network to be used. Since we hope more features
ϕ(f (X1 ), · · · , f (Xk )) are extracted from the distribution of basic Training pipeline
4.3.2.
features f (Xi ), i ∈ [1, k], N D should learn basic features from New N Ds are obtained by following three processes:
each ciphertext pair firstly. From the perspective of neural
networks, this requirement can be satisfied by placing 1D 1. Data generation: Consider a plaintext difference ΔP
convolutional layers before 2D convolutional layers. and a cipher E. Randomly generate k plaintext pairs with
ΔP. If E needs a key, randomly generate k keys. Collect
the k ciphertext pairs with E and k keys. Regard these
4.3. Residual network k ciphertext pairs as a ciphertext group with a size of k,
and the label is Y = 1. We denote a ciphertext group with
4.3.1. Network architecture
Y = 1 as a positive sample. If the plaintext differences
The network architecture adopted by Gohr [1] is also applied
of k plaintext pairs are random, the label of the resulting
in this article. According to the requirement of the motivation,
ciphertext group is Y = 0. And we denote it as a negative
except for the first 1D convolutional layer, the remaining 1D N
sample. A training set is composed of 2k positive samples
convolutional layers are replaced by 2D convolutional layers. N M
Figure 3 shows the neural network architecture. The input and 2k negative samples. A testing set is composed of 2k
M
consisting of k ciphertext pairs is arranged in a k × w × 2L w
positive samples and 2k negative samples. We need to
array. L represents the block size of the target cipher and w is generate a training set and a testing set.
the size of a basic unit. For example, L is 32 and w is 16 for 2. Training: Train the neural network (Fig. 3) on the train-
Speck32/64. ing dataset.
The network architecture contains two core modules. The 3. Testing: Test the distinguishing accuracy of the trained
first one (Module 1) is a bit slice layer that contains convolution neural network on the testing dataset. If the test accuracy
kernels with a size of 1 × 1. This layer can learn basic features exceeds 0.5, return the neural network as a valid N D. Or
from each input ciphertext pair that is arranged in a 1 × w × 2L w
choose a different α and start from the data generation
array. The second one (Module 2) is a residual block that is process again.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1424 Y. Chen et al.

In the training phase, the neural network is trained for TABLE 1. Parameters for constructing our new N D.
Es epochs with a batch size of Bs . The cyclic learning rate
Nf d1 d2 Ks Bs
scheme in [1] is adopted. Optimization is performed against
the following loss function: 32 64 64 3 1000
N λ Lr Es N M

k
2 10−5 0.002 → 0.0001 10 107 106
loss = Zi,p − Yi + λ × W (12)
i=1

where Zi,p is the output of the N D, Yi is the true label, W is the such ciphertext pairs are false negative samples. These k sam-
parameters of the neural network and λ is the penalty factor. ples are combined into a ciphertext group and fed into N Dk .
The Adam algorithm [22] with default parameters in Keras [23] Generate a large number of such ciphertext groups and feed

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


is applied to the optimization. them to N Dk . What we care about is the following pass ratio

F2 (f (X1 ) , · · · , f (Xk ) , ϕ (f (X1 ) , · · · , f (Xk )))  0.5 (14)


5. THE VERIFICATION FRAMEWORK
Although the distinguishing accuracy of new N Dk is the best
evidence for supporting the motivation of taking k ciphertext Now, the classification is determined by ϕ (f (X1 ) , · · · ,
pairs as input, we propose an auxiliary verification framework f (Xk )). The final pass ratio under such a setting can
to further show that new N Dk captures features derived from show whether derived features have been learned and their
multiple ciphertext pairs. This framework is composed of two effects. If N Dk can obtain a non-negligible pass ratio, then
tests: FNT FPT. ϕ (f (X1 ) , · · · , f (Xk )) can offset the negative influence of
The idea of FPT and FNT is as follows. When features f (Xi ) , i ∈ [1, k]. If the pass ratio is high, derived features
f (Xi ), i ∈ [1, k] hidden in a single ciphertext pair do from k ciphertext pairs play a vital role in classification for this
not lead to the right classification, only derived features kind of ciphertext pair.
ϕ (f (X1 ) , · · · , f (Xk )) can provide useful clues for classifi-
cation.
5.2. False positive test
It is hard to directly select k ciphertext pairs that satisfy the
above requirement based on the N Dk itself. Thus, An N D Similarly, if k ciphertext pairs with label 0 are wrongly classi-
that takes a single ciphertext pair as input is used to select k fied
wrongly classified ciphertext pairs. This is an approximate but
reasonable method that is based on the following reasons p (Y = 1 |X1 ) = F1 (f (X1 ))  0.5
.. (15)
.
• When we build N Dk , all the ciphertext pairs are con- p (Y = 1 |Xk ) = F1 (f (Xk ))  0.5,
structed from different keys. This ensures that only two
types of features are available: one is features hidden in such ciphertext pairs are false positive samples. These k sam-
a single ciphertext pair, the other one is features derived ples are combined into a ciphertext group and fed into N Dk .
from multiple ciphertext pairs. Now what we care about is the following pass ratio
• When the N D that takes one ciphertext pair as input
has high accuracy, it means that features hidden in the
F2 (f (X1 ) , · · · , f (Xk ) , ϕ (f (X1 ) , · · · , f (Xk ))) < 0.5 (16)
ciphertext pair provide strong clues leading to wrong
classifications. If new N Dk still correctly classifies such
k ciphertext pairs with a high probability, we can believe
that this is due to features derived from multiple cipher- 6. APPLICATIONS TO FIVE CIPHERS
text pairs.
We apply our new N D as well as Gohr’s N D to five ciphers
introduced in Section 2.2. The training pipeline of Gohr’s N D
5.1. False negative test is presented in [1]. Table 1 summarizes the parameters that are
If k ciphertext pairs with label 1 are all wrongly classified by related to the residual network and training pipeline that are
the N D that takes a single ciphertext pair as input introduced in Section 4.3.
Since Gohr provided N Ds against round reduced Speck32/64
p (Y = 1 |X1 ) = F1 (f (X1 )) < 0.5 in CRYPTO’19, we perform in-depth analysis by taking the
.. application to Speck32/64 as an example. Applications to the
. (13)
remaining four ciphers are listed as supporting materials. For
p (Y = 1 |Xk ) = F1 (f (Xk )) < 0.5, convenience, we denote Gohr’s N D as N Dk=1 .

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1425

TABLE 2. Distinguishing accuracy of N D s against Speck32/64. TABLE 3. Distinguishing accuracy of N Ds against Speck32/64
under the fair setting.
r N Dk=1 N D k=2 N D k=4 N Dk=8 N Dk=16
r / na N Dk=1 N Dk=2 , m = n
k N Dk=4 , m = n
k
5 0.926 0.9739 0.9914 0.9991 0.9999
6 0.784 0.8667 0.9358 0.9528 0.9786 6/8 0.9573 0.9767 0.9823
7 0.607 0.6396 0.6847 0.7009 0.6493 7/8 0.7333 0.7421 0.7506
7 / 16 0.7859 0.8020 0.8090
7 / 32 0.8352 0.8682 0.8751
7 / 64 0.8757 0.9282 0.9387
6.1. Experiments on Speck32/64
TABLE 4. Distinguishing accuracy of N Ds over two kinds of testing

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


6.1.1. Neural distinguishers
The plaintext difference is α = (0x0040, 0) introduced in [24]. sets. These N Ds are built under the same key setting.
We built N Dk , k ∈ {2, 4, 8, 16} against Speck32/64 reduced to r Testing set 1
5, 6, and 7 rounds respectively.
Table 2 lists the accuracy of N Ds. Compared with N Dk=1 , N Dk=2 N Dk=4 N Dk=8 N Dk=16
all the N Dk , k > 1 achieve accuracy improvement. Besides, 5 0.9744 0.9906 0.9989 0.9999
we find that the overfitting phenomenon [25] always appears in 6 0.8663 0.9317 0.9561 0.9762
the training process of N Dk=16 against 7-round Speck32/64. If
this problem could be solved, it is possible to further improve r Testing set 2
the accuracy.
N Dk=2 N Dk=4 N Dk=8 N Dk=16
In the above setting, our distinguishers take k ciphertext pairs
as input while Gohr’s distinguishers take one ciphertext pair 5 0.9745 0.9903 0.9990 0.9999
as input. To prove the positive influence of features derived 6 0.8662 0.9309 0.9557 0.9770
from multiple ciphertext pairs, we compare the distinguishing
accuracy under a fair setting.
The concrete process is as follows:
with our N Ds is that ciphertext pairs belonging to a group are
1. Generate n ciphertext pairs with the same sample label.
constructed from the same key in [9].
It is worth noticing that n random keys are used.
To prove that the same key setting is not the core factor that
2. For Gohr’s distinguishers N Dk=1 , feed n ciphertext
brings huge accuracy improvement in [9], we build N Ds by
pairs into N Dk=1 , and use the median value of n outputs
adopting the same key setting as follows
to give the prediction label of the n ciphertext pairs.
3. For our new distinguishers N Dk>1 , collect m ciphertext
• we randomly generate a key for each ciphertext group.
groups by uniformly sampling from the n ciphertext
• k ciphertext pairs belonging to a group are constructed
pairs, feed m ciphertext groups into N Dk>1 , and use the
from the same key.
median value of n outputs to give the prediction label.
4. Repeat the above steps 106 times and count the distin-
Then, we test the distinguishing accuracy of these N Ds over
guishing accuracy.
two kinds of testing sets
Such a setting ensures that our distinguishers do not use
more prior knowledge. Taking N Dk=2 , N Dk=4 as examples, • Testing set 1: k ciphertext pairs of a group are con-
we have performed several experiments under the setting. structed from k different keys.
Table 3 summarizes our experiment results. Under the fair • Testing set 2: k ciphertext pairs of a group are con-
setting, our distinguishers achieve higher accuracy. This proves structed from the same key.
that some features derived from multiple ciphertext pairs have
been captured by our distinguishers, and these features bring Table 4 summarizes the accuracy of N Ds over two kinds of
accuracy improvement. testing sets. Based on the comparision with results as shown
Besides, we find that the distinguishing accuracy can be in Table 2, we find that the same key setting has small or no
further improved, if we increase m by adopting the data reuse influence on the accuracy.
strategy that will be introduced in Section 7.1.
6.1.3.The comparison of neural network parameters
6.1.2.The impact of the same key setting Since the neural network adopted in this paper is different from
As introduced in Section 1, Benamira et al. [9] also tested the the neural network adopted by Gohr in [1], we also focus on the
idea of taking multiple ciphertext pairs as input. The difference comparison of neural network parameters.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1426 Y. Chen et al.

TABLE 5. The comparison of neural network parameters as well as TABLE 7. Distinguishing accuracy of N Ds against Chaskey.
the accuracy of N Dk , k ∈ {1, 2}.
r N Dk=1 N Dk=2 N Dk=4 N Dk=8 N Dk=16
r N Dk=1 , 1 residual blocks
3 0.8608 0.8958 0.9583 0.9887 0.9986
Parameter Accuracy 4 0.6161 0.6589 0.6981 0.7603 0.7712

5 44 321 0.926
6 44 321 0.784 TABLE 8. Pass ratios of FPT and FNT of N Ds against Chaskey.
r N Dk=1 , 10 residual blocks
r False negative test
Parameter Accuracy
5 102 497 0.929 N Dk=2 N Dk=4 N Dk=8 N Dk=16

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


6 102 497 0.788
r N Dk=2 3 0.1156 0.0635 0.0373 0.0087
Parameter Accuracy 4 0.1412 0.1749 0.1481 0.1675
5 89 377 0.9738 r False positive test
6 89 377 0.8613
N Dk=2 N Dk=4 N Dk=8 N Dk=16

3 0.4027 0.4032 0.3976 0.4705


TABLE 6. Pass ratios of FPT and FNT of N Dk against Speck32/64. 4 0.8369 0.7439 0.7298 0.5591
r False negative test
N D k=2 N D k=4 N Dk=8 N Dk=16 TABLE 9. Distinguishing accuracy of N Ds against Present64/80.

5 0.0112 0.0013 0.0001 0 r N Dk=1 N Dk=2 N Dk=4 N Dk=8 N Dk=16


6 0.0331 0.0143 0.0081 0.0048
7 0.0511 0.0212 0.0283 0.0917 6 0.6584 0.7198 0.7953 0.8308 0.8259
7 0.5486 0.5503 0.5853 0.5786 0.5818
r False positive test

N D k=2 N D k=4 N Dk=8 N Dk=16 6.2. Experiments on Chaskey


5 0.3068 0.6268 0.6748 0.7228 Based on the plaintext difference α = (0x8400, 0x0400, 0, 0)
6 0.1519 0.1432 0.3723 0.4375 [12], we build N Ds against Chaskey reduced to 3, 4 rounds.
7 0.0659 0.0233 0.0157 0.0691 The accuracies are presented in Table 7. Table 8 summarizes
the results of the FPT and FNT.

6.3. Experiments on Present64/80


Table 5 summarizes the comparison of neural network Based on the plaintext difference α = (0, 0, 0, 0x9) provided
parameters as well as the accuracy of some N Ds. Gohr in [26], we build N Ds against Present64/80 reduced up to 6,
reported the best accuracy of 5-round and 6-round N Dk=1 7 rounds respectively. The penalty factor is 10−4 and other
by using 10 residual blocks. Besides, Gohr also provided related parameters are the same as Table 1. The distinguishing
N Dk=1 by using 1 residual block. These two kinds of accuracies are presented in Table 9. Table 10 summarizes the
distinguishers almost achieve the same accuracy. Compared results of FPT and FNT.
with N Dk=1 with 10 residual blocks, our new distinguishers
N Dk=2 achieve significant accuracy improvement but contains
6.4. Experiments on DES
fewer parameters. This comparison proves that taking more
ciphertext pairs as input is the reason that brings accuracy Based on the analysis of DES in [27], the plaintext difference
improvement. α = (0x40080000, 0x04000000) is adopted. We build N Ds
against DES reduced to 5, 6 rounds.
6.1.4. The results of FPT and FNT The batch size is adjusted to 5000. The penalty factor is
We further perform the FPT and FNT. Corresponding pass increased to 8 × 10−4 . Other related parameters are the same
ratios are presented in Table 6. For each N Dk , there is at least as Table 1. The distinguishing accuracies are presented in
one type of pass ratio higher than 0. This further proves that Table 11. The pass ratios of the FPT and FNT of N Ds are
N Dk captures derived features from k ciphertext pairs. presented in Table 12.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1427

TABLE 10. Pass ratios of FNT and FPT of N Ds against TABLE 14. Pass ratios of FNT and FPT of N Ds against SHA3-256.
Present64/80.
r False negative test
r False negative test
N Dk=2 N Dk=4 N Dk=8 N Dk=16
N Dk=2 N D k=4 N Dk=8 N Dk=16
3 0.2249 0.2347 0.3336 0.2711
6 0.0277 0.0097 0.0258 0.0751
7 0.1796 0.0587 0.1214 0.1488 r False positive test
r False positive test N Dk=2 N Dk=4 N Dk=8 N Dk=16
N Dk=2 N D k=4 N Dk=8 N Dk=16 3 0.1045 0.0961 0.0171 0.0088
6 0.0147 0.0046 0.0068 0.0183

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


7 0.0533 0.0126 0.0324 0.0302
7. KEY RECOVERY ATTACKS
In this section, we propose a data reuse strategy for reducing
TABLE 11. Distinguishing accuracy of N Ds against DES.
data complexity. Then we prove that our N D can be applied
r N Dk=1 N D k=2 N D k=4 N Dk=8 N Dk=16 to the two key recovery attacks introduced in Section 3. Since
the data complexity of NASA is directly related to the perfor-
5 0.6261 0.7209 0.8382 0.9318 0.9585 mance of N Ds, NASA is first performed to highlight the extra
6 0.5493 0.5653 0.5568 0.5507 0.5532 superiority of our N Ds.

TABLE 12. Pass ratios of FNT and FPT of N Ds against DES. 7.1. Data reuse strategy for reducing data complexity
There is a potential problem when we directly apply our new
R False negative test
N D to key recovery attacks.
N Dk=2 N Dk=4 N D k=8 N Dk=16 Assuming Gohr’s distinguisher and our new N Dk have the
same performance, and a certain attack requires M random
5 0.0046 0.0034 0.0132 0.0131
inputs. If we directly reshape M × k ciphertext pairs into M
6 0.0802 0.2348 0.2526 0.3207
ciphertext groups, the data complexity of our N Dk is k times
R False positive test as much as the data complexity of Gohr’s distinguisher.
Given M ciphertext
  pairs Xi = (Ci,0 , Ci,1 ), i ∈ [1, M], there
N Dk=2 N Dk=4 N D k=8 N Dk=16 are a total of Mk options for composing a ciphertext group,
which is much larger than Mk . Thus, we can randomly select M
5 0.0594 0.0627 0.0566 0.0518  
6 0.0462 0.0598 0.0921 0.0809 ciphertext groups from Mk options. Such a strategy can help
reduce data complexity. In fact, it is equivalent to attach more
importance to derived features from k ciphertexts.
TABLE 13. Distinguishing accuracy of N Ds against SHA3-256. However, the subsequent key recovery attacks using this
naive strategy do not obtain good results. The main reason
r N Dk=1 N D k=2 N D k=4 N Dk=8 N Dk=16 is that the sampling randomness of M ciphertext groups is
destroyed. Two new concepts are proposed for overcoming this
3 0.7228 0.8149 0.9241 0.971 0.9904
problem.
Maximum reuse frequency: During the generation of M
ciphertext groups, a ciphertext pair is likely to be reused several
6.5. Experiments on SHA3-256 times. We denote the reuse frequency of the ith ciphertext pair
as RFi , i ∈ [1, M]. Maximum reuse frequency (MRF) is defined
SHA3-256 is a hash function. When one message block is fed as the maximum value of RFi :
into reduced SHA3-256, we collect the first 32 bytes of the
output process after r-rounds permutation is applied to this MRF = max RFi , i ∈ [1, M] (17)
message block. Given a message difference α = 1, we build
N Ds against SHA3-256 reduced up to 3 rounds.
The number of ciphertext pairs is N = 2 × 106 . The batch Sample similarity degree: For any two ciphertext groups
size is 500, and the penalty factor is 10−5 . The accuracies are Gi , Gj , the similarity of these two ciphertext groups is defined
presented in Table 13. The pass ratios of the FPT and FNT of as the number of the same ciphertext pairs. As for M cipher-
N Ds are presented in Table 14. text groups, sample similarity degree (SSD) is defined as the

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1428 Y. Chen et al.

maximum of any two ciphertext groups’ similarity: Thus, we can generate 2m  k ciphertext pairs using m
neutral bits. The probability that these k ciphertext pairs satisfy
SSD = max Gi Gj , i, j ∈ [1, M] the difference transition ΔP → α simultaneously is still p0 .
Gi = {Xi1 , · · · , Xik } Then, N ciphertext groups with a size of k can be generated as
(18)
Gj = Xj1 , · · · , Xjk
1. Randomly generate N plaintext pairs with ΔP.
i1, · · · , ik, j1, · · · , jk ∈ [1, M]
2. Generate N plaintext structures using m neutral bits.
3. Randomly pick k plaintext pairs from a structure and
MRF can ensure that the contribution of each ciphertext pair collect the ciphertext pairs.
is similar. SSD can increase the distribution uniformity of M
The total data complexity is N × k.
ciphertext groups as much as possible. Based on the above two
It is worth noticing that the data reuse strategy is still appli-
concepts, we propose the following Data Reuse Strategy (see

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


cable here. More precisely, the data collection is performed as
Algorithm 1) that can reduce data complexity and maintain
sampling randomness. N
1. Randomly generate M plaintext pairs with ΔP.
N
2. Generate M plaintext structures using m neutral bits.
3. Randomly pick M plaintext pairs from a structure, and
generate M ciphertext groups using the data reuse strat-
egy (Algorithm 1).
The total data complexity is N now.

7.2.2. Experiments on Speck32/64


To prove that our N D applies to NASA, we perform experi-
ments on Speck32/64.
Our new N D achieves higher accuracy than the N D pro-
posed by Gohr. Since the data complexity of NASA is related to
the accuracy of N D, it is possible to reduce the data complexity
7.2. Application to NASA of NASA by adopting our new N D.
When we replace Gohr’s distinguisher with our new N D, the Experiment settings. We adopt a 2-round differential ΔP =
process of NASA does not change. The only difference is the 0x211/0xa04 → p0 = 2−6 α = 0x40/0x0 as the prepended
data collection. differential. Let β0 = 0.005, β1 = 2−16 , c2 = 0.5. The
meaning of these parameters is defined in Section 3.3. Since c2
7.2.1.Data collection is set, the values of p1 , p3 are experimentally estimated based
Consider the attack process as shown in Figure 1. Assuming on N Ds.
that our new N D is built with α. Now, we need to generate The estimation of p2 is complex. Let p2|d denote the esti-
ciphertext groups. mated value of p2 where d is the Hamming distance between the
Generate k plaintext pairs (Pi0 , Pi1 ), i ∈ [1, k] with the differ- correct key tk and wrong keys kg. According to the introduction
ence ΔP. Collect corresponding ciphertexts (C0i , C1i ), i ∈ [1, k]. in [2], when d increases, p2|d will decrease. Moreover, when p2
The intermediate states are (S0i , S1i ), i ∈ [1, k]. increases, the data complexity of NASA also increases. Thus,
According to the introduction in Section 4.2, these k cipher- if we hope the Hamming distance between tk and surviving kg
text pairs should satisfy does not exceed d, the value of p2 is

S0i ⊕ S1i = α, or S0i ⊕ S1i = α, i ∈ [1, k]


p2 = max{p2|i |i ∈ [d + 1, 16]}. (19)

simultaneously. We use neutral bits [17] to generate such k


ciphertext pairs. In this paper, we choose two different settings: d = 2, d = 1.
Here we briefly review the definition of neutral bits. Let Comparison of data complexity. Tables 15 and 16 show the
E denote the encryption function. We focus on the following comparison of data complexity under two experiment settings
conforming pairs respectively.
The second row corresponds to the data complexity when
P0 ⊕ P1 = ΔP, E(P0 ) ⊕ E(P1 ) = α. Gohr’s N D is adopted. These results are also used as the
baseline. When an r-round N D with a group size of k is
If the condition E(P0 ⊕ ej ) ⊕ E(P1 ⊕ ej ) = α always holds adopted, the corresponding data complexity is displayed in bold
where ej = 1 << j, the jth bit is a neutral bit. if it is smaller than the baseline.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1429

TABLE 15. Data complexity comparisons when p0 = 2−6 , d = TABLE 17. The value of p1 , p2 , p3 related to the 5-round N Ds when
2, β0 = 0.005, β1 = 2−16 , c2 = 0.5. The prepended differential is a 3- c2 = 0.5, d = 1, r = 5. p0 = 2−6 .
round differential that is extended from 0x211/0xa04 → p0 0x40/0x0
Distinguisher p1 p2 p3 log2 N
without loss of transition probability.

Distinguisher Data Complexity (log2 N) N Dk=1 0.8977 0.3335 0.0462 14.72


N Dk=2 0.9665 0.4802 0.0185 13.979
N Dk=4 0.9894 0.6927 0.0069 14.187
r=5 r=6 r=7
N Dk=8 0.9988 0.8604 0.0007 14.085
N Dk=1 14.212 16.911 20.509 N Dk=16 0.9999 0.9672 1.92 × 10−5 15.399
N Dk=2 13.165 15.821 19.761 The bold values show that the data complexity is reduced when our new ND is used.
N Dk=4 12.901 14.764 18.886

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


N Dk=8 12.19 14.681 18.744 TABLE 18. The value of p1 , p2 , p3 related to the 5-round N Ds when
N Dk=16 13.107 14.720 20.215 c2 = 0.5, d = 1, r = 5. p0 = 2−12 .
The bold values show that the data complexity is reduced when our new ND is used. Distinguisher p1 p2 p3 log2 N
TABLE 16. Data complexity comparisons when p0 =
N Dk=1 0.8977 0.3335 0.0462 26.657
2−6 , d = 1, β0 = 0.005, β1 = 2−16 , c2 = 0.5. The prepended
N Dk=2 0.9665 0.4802 0.0185 25.811
differential is a 3-round differential that is extended from
0x211/0xa04 → p0 0x40/0x0 without loss of transition probability. N Dk=4 0.9894 0.6927 0.0069 25.834
N Dk=8 0.9988 0.8604 0.0007 24.888
Distinguisher Data complexity (log2 N) N Dk=16 0.9999 0.9672 1.92 × 10−5 24.021
The bold values show that the data complexity is reduced when our new ND is used.
r=5 r=6 r=7

N Dk=1 14.72 17.456 21.081


If we set c2 = 0.5, the following conclusions hold
N Dk=2 13.979 16.484 20.349 TPR = p1 , TNR = 1 − p3
N Dk=4
(20)
14.187 15.606 19.49 acc = 0.5 × (p1 + 1 − p3 ).
N Dk=8 14.085 15.750 19.38
N Dk=16 15.399 16.361 20.949 Thus, when the accuracy acc of N Ds increases, there are three
The bold values show that the data complexity is reduced when our new ND is used. phenomena: p1 increases, or p3 decreases, or the former two
phenomena both occur.
We test 12 new N Ds in total. Tables 15 and 16 show that No matter which phenomenon occurs, it is helpful for reduc-
the data complexity is reduced in most cases. There is only one ing the data complexity. This is why the data complexity is
case in which the data complexity is not reduced. reduced in most cases shown in Tables 15 and 16.
Analysis of the data complexity. There are two questions to To answer the second question, we need to consider the
be explained: (1) why does the accuracy improvement of N Ds impact of p2 . For convenience, we summarized the values of
bring the reduction of the data complexity? (2) why does the p1 , p2 , p3 related to the 5-round N Ds in Table 17.
data complexity is not reduced in the only failed case shown in The value of p2 also increases as shown in Table 17. Chen
Table 16? et al. [2] presented that the impact of p1 , p2 on N is O((p1 −
To answer the first question, we need to analyze how the p2 )−2 ). Therefore, the increase of p2 has a negative impact
data complexity is influenced by p1 , p3 . Based on Equation 8 on the data complexity. If p2 is very close to p1 , the positive
in Section 3.3, we get two following conclusions: impact of the accuracy improvement may be offset. This is why
the data complexity is not reduced when the 5-round N Dk=16
• when p1 |p1  0.5 increases, the data complexity N is adopted. Actually, when p0 becomes smaller, the reduction
decreases. of data complexity is more significant. Table 18 shows an
• when p3 |p3  0.5 decreases, the data complexity N example.
decreases. Practical experiments. Based on the attack settings shown
in Table 15, we perform NASA against 10-round Speck32/64
During the training of N Ds, the accuracy can be formulated based on the N Dk=1 and N Dk=2 (r = 6) respectively. The
as target is to recover sk10 . Since d = 2, the number of surviving
subkey guesses should not exceed 137 × (1 − β0 ) + (216 −
acc = 0.5 × (TPR + TNR) 137) × β1 = 137.31.
Since the data complexity presented in Table 15 is not low,
where TPR is the true positive rate and TNR is the true negative the attack may take too much time. We adopt an optimization
rate. method proposed in [2] to accelerate this attack. This method

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1430 Y. Chen et al.

7.3. Application to Gohr’s Attack


Gohr’s attack is not directly related to the distinguishing accu-
racy of N Ds. Thus, we mainly verify whether our new N D
applies to Gohr’s attack.
In [1], Gohr performed a key recovery attack on 11-round
Speck32/64. In this section, we first perform the same attack
using our new N Dk=2 . Then we present a deeper discussion.

7.3.1. Key recovery attack on 11-round Speck32/64


The target of this attack is to recover the last two subkeys
(sk11 , sk10 ). This attack returns a pair of subkey guesses

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


(kg11 , kg10 ). If kg11 = sk11 and kg10 is different from sk10
at most 2 bits, this attack is viewed as a success [1].
Experiment settings. A 6-round and 7-round N Dk=2
FIGURE 4. The runtime of 100 experiments. When Gohr’s 6-round are built over α = (0x40, 0x0). A prepended 3-round
distinguisher N Dk=1 is used, the average runtime of NASA is 612 differential is extended from a 2-round differential ΔP =
seconds. When our 6-round distinguisher N Dk=2 is used, the average (0x211, 0xa04) → p0 = 2−6 α = (0x40, 0x0). Six neutral bits
runtime of NASA is 492 s. {14, 15, 20, 21, 22, 23} are used to generate plaintext structures
consisting of 64 plaintext pairs. The data reuse strategy is also
adopted by letting MRF = 2 and SSD = 1.
The whole attack is performed as
is building a student distinguisher to reduce the key space to be 1. Randomly generate 100 plaintext pairs with a difference
searched. The student distinguisher is built over 14 ciphertext ΔP.
bits {30 ∼ 23, 14 ∼ 7}. Then in the first stage, we guess 8 2. Generate 100 plaintext structures using 6 neutral bits
subkey bits sk10 [8 ∼ 0]. In the second stage, we guess the above, and collect corresponding ciphertext structures.
complete sk10 based on surviving guesses of sk10 [8 ∼ 0]. 3. For each ciphertext structure:
To filter sk10 [8 ∼ 0], the student distinguisher with k = 1
requires 218.888 plaintext pairs. In the second stage, we select (a) collect possible kg11 using the method introduced in
N = 216.911 plaintext pairs from 218.888 plaintext pairs. When Section 3.2.
we perform NASA with Gohr’s 6-round distinguishers 100 (b) For each possible kg11 :
times, the results are i. Decrypt the current ciphertext structure with kg11 .
1. the true subkey sk10 survives in 97 trails. ii. Collect possible subkey guess pairs (kg11 , kg10 )
2. the average numbers of surviving subkey guesses in two using the method introduced in Section 3.2.
stages are 14.98, 15.16, respectively.
3. in all the 100 trails, the number of surviving subkey 4. Return surviving (kg11 , kg10 ) with the highest rank score
guesses is lower than 137.31. as the final subkey guess.

To filter sk10 [8 ∼ 0], the student distinguisher with k = 2 In Section 3.2, we have reviewed how Gohr’s attack recovers
requires 217.785 plaintext pairs. In the second stage, we select the subkey skr+1 with an r-round N D. This method needs a
N = 215.821 plaintext pairs from 217.785 plaintext pairs. When rank score threshold. In steps 3a and 3(b)ii, we need a threshold
we perform NASA with our 6-round N Dk=2 100 times, the c3 , c4 respectively. In this paper, let c3 = 18 and c4 = 150.
results are Experiment results. Run 1000 experiments each time, and
repeat 5 times. These experiments based on Gohr’s distinguish-
1. the true subkey sk10 survives in 90 trails. ers N Dk=1 were also performed using the same ciphertexts.
2. the average numbers of surviving subkey guesses in two Table 19 summarizes the success rates.
stages are 11.82, 25.07, respectively.
3. In all the 100 trails, the number of surviving subkey
7.3.2. Posterior probability analysis
guesses is lower than 137.31.
We have proved that our N D applies to Gohr’s attack. More-
Figure 4 shows the runtime comparison of the 200 experi- over, the attack based on our N Ds shows a minor advantage in
ments. The practical experiments further prove that our new terms of the success rate. This minor advantage is interesting
N Ds can be applied to NASA. Besides, with smaller data com- since the success rate of Gohr’s attack is not directly deter-
plexity, the NASA based on our N D achieves a competitive mined by the distinguishing accuracy. To better understand
result. the influence of accuracy improvement on Gohr’s attack, we

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1431

TABLE 19. Success rates of performing 1000 experiments (Gohr’s


attack). Repeat 5 times. The first row represents the success rate when
Gohr’s distinguishers are used. The second row represents the success
rate when our new distinguishers with k = 2 are used.

1 2 3 4 5

N Dk=1 0.533 0.52 0.501 0.557 0.523


N Dk=2 0.536 0.534 0.512 0.552 0.529

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


FIGURE 6. The expectations of the conditional posterior probability
(Equation 21) of 7-round N Ds against Speck32/64.

TABLE 20. Distinguishing accuracy of N Ds against Speck32/64


under the fair setting. If v > 0 (see Formula 22), the prediction label
is 1.
r/n N Dk=1 N Dk=2 , m = n
2 N Dk=4 , m = n
2

6/8 0.987 0.9853 0.9873


6 / 10 0.9947 0.9934 0.9942
FIGURE 5. The expectations of the conditional posterior probability 7/8 0.778 0.7579 0.7636
(Equation 21) of 6-round N Ds against Speck32/64.

perform a deeper analysis from the perspective of the key rank The first phenomenon makes that a large key rank score
score. threshold (eg. c3 = 18, c4 = 150) is applicable. The second
Consider an (r + 1)-round cipher E. We first build a r- phenomenon makes the gap between the rank score of the true
round N D based on a difference α. Then we collect numerous key and that of wrong keys increase. By setting a high key rank
ciphertext pairs corresponding to plaintext pairs with a differ- score threshold, wrong keys are less likely to obtain a key rank
ence α. We decrypt these ciphertext pairs with a subkey guess score higher than the threshold. Thus, a higher success rate is
kg and feed the partially decrypted ciphertext pairs into the more likely to be obtained by replacing Gohr’s NDs with our
N D. NDs.
Let tk denote the true subkey of the (r + 1)-round. Besides,
the Hamming distance between tk and kg is d. We focus on the 8. OPEN PROBLEMS
expectation of the following conditional posterior probability
Our work in this paper raises some open problems:
Z = Pr (Y = 1 |X, d ) = F (X) (21)
• What features derived from multiple ciphertext pairs are
learned by our distinguishers?
where X is the input of the N D, and F is N D. If the N D • The influence of features derived from multiple cipher-
is Gohr’s distinguisher, X is a decrypted ciphertext pair. If text pairs is rather complex. More exactly, except for its
the N D is our distinguisher N Dk , X is a ciphertext group positive influence, we find that these features also have a
consisting of k decrypted ciphertext pairs. negative influence. For example, when we compare the
Taking N Dk=2 against Speck32/64 reduced to 6, 7 rounds distinguishing accuracy of N Ds under a fair setting (see
as examples, we estimate the expectations of the above condi- Section 6.1.1), if we give the prediction label based on
tional posterior probability. As a comparison, we also estimate the following metric:
the expectations based on N Dk=1 . The final estimation results    
are shown in Figures 5 and 6. Z1 Zm
v = log + · · · + log , (22)
There are two important phenomena. First, compared with 1 − Z1 1 − Zm
Gohr’s distinguishers N Dk=1 , our distinguishers N Dk=2 bring
higher expectations Pr(Y = 1|X, d = 0). Second, the value where Zi , i ∈ [1, m] is the output of N Ds, our dis-
of Pr(Y = 1|X, d = 0) − Pr(Y = 1|X, d = i), i ∈ [1, 16] tinguishers have tiny or no advantage in terms of the
increases. distinguishing accuracy. Table 20 shows our experiment

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
1432 Y. Chen et al.

results based on the above metric. Thus, an important Advances in Cryptology - CRYPTO 2019 - 39th Annual Interna-
problem is how to make full use of these features and tional Cryptology Conference, Santa Barbara, CA, USA, August
bring more significant positive influence? 18-22, 2019, Proceedings, Part IILecture Notes in Computer
Science (Vol. 11693), pp. 150–179. Springer.
These problems are out of scope of this paper. We will [2] Chen, Y. and Yu, H. (2020) Neural aided statistical attack for
cryptanalysis. IACR Cryptol. ePrint Arch., 2020, 1620.
explore in future research.
[3] Jain, A., Kohli, V. and Mishra, G. (2020) Deep learning based dif-
ferential distinguisher for lightweight cipher PRESENT. IACR
Cryptol. ePrint Arch., 2020, 846.
9. CONCLUSIONS
[4] Yadav, T. and Kumar, M. (2020) Differential-ml distinguisher:
In this paper, we focus on the N D, which is the core module Machine learning based generic extension for differential crypt-
in neural aided cryptanalysis. By considering multiple cipher- analysis. IACR Cryptol. ePrint Arch., 2020, 913.

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


text pairs simultaneously, we propose a new N D and have [5] Bellini, E. and Rossi, M. (2020) Performance comparison
performed a deep exploration of it. Compared with the N D between deep learning-based and conventional cryptographic
distinguishers. IACR Cryptol. ePrint Arch., 2020, 953.
considering a single ciphertext pair, this new N D achieves
[6] Pareek, M., Mishra, G. and Kohli, V. (2020) Deep learning based
higher distinguishing accuracy, which is verified by applica-
analysis of key scheduling algorithm of PRESENT cipher. IACR
tions to five different ciphers. Moreover, we prove that the accu- Cryptol. ePrint Arch., 2020, 981.
racy improvement results from features derived from multiple [7] Baksi, A., Breier, J., Chen, Y. and Dong, X. (2021) Machine
ciphertext pairs. learning assisted differential distinguishers for lightweight
Our new N D also applies to key recovery attacks. We show ciphers. In Design, Automation & Test in Europe Conference &
how to perform two different key recovery attacks based on Exhibition, DATE 2021, Grenoble, France, February 1-5, 2021,
our new N D. The first one is the NASA. Due to the accuracy pp. 176–181. IEEE.
improvement, the data complexity of NASA is reduced by [8] Hou, Z., Ren, J. and Chen, S. (2021) Cryptanalysis of round-
adopting our new neural distinguisher. A data reuse strategy reduced SIMON32 based on deep learning. IACR Cryptol. ePrint
is proposed to strengthen this advantage. The second one is the Arch., 2021, 362.
[9] Benamira, A., Gérault, D., Peyrin, T. and Tan, Q.Q. (2021)
key recovery attack proposed by Gohr at CRYPTO’19. Our new
A deeper look at machine learning-based cryptanalysis. In
neural distinguisher applies to this attack but does not bring a
Canteaut, A., Standaert, F. (eds) Advances in Cryptology -
significant positive influence, since this attack is not related to EUROCRYPT 2021 - 40th Annual International Conference
the distinguishing accuracy. on the Theory and Applications of Cryptographic Techniques,
Our new N D is full of potential. In the future, as long as Zagreb, Croatia, October 17-21, 2021, Proceedings, Part
neural aided key recovery attacks are related to the performance ILecture Notes in Computer Science (Vol. 12696), pp. 805–835.
of N D, our new N D could be a priority choice. Besides, our Springer.
neural distinguisher also introduces a novel cryptanalysis direc- [10] Ankele, R. and Kölbl, S. (2018) Mind the gap - A closer look at
tion by considering multiple ciphertext pairs simultaneously. the security of block ciphers against differential cryptanalysis.
In Cid, C., Jr., M.J.J. (eds) Selected Areas in Cryptography
- SAC 2018 - 25th International Conference, Calgary, AB,
DATA AVAILABILITY Canada, August 15-17, 2018, Revised Selected PapersLecture
Notes in Computer Science (Vol. 11349), pp. 163–190.
The data underlying this article are available at https://github. Springer.
com/AI-Lab-Y/ND_mc. [11] Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks,
B. and Wingers, L. (2015) The SIMON and SPECK lightweight
block ciphers. In Proceedings of the 52nd Annual Design
FUNDING Automation Conference, San Francisco, CA, USA, June 7-11,
National Key Research and Development Program of China 2015 (Vol. 175), pp. 1–1756. ACM.
[12] Mouha, N., Mennink, B., Herrewege, A.V., Watanabe, D., Pre-
(2018YFB0803405, 2017YFA0303903).
neel, B. and Verbauwhede, I. (2014) Chaskey: An efficient
MAC algorithm for 32-bit microcontrollers. In Joux, A., Youssef,
A.M. (eds) Selected Areas in Cryptography - SAC 2014 - 21st
SUPPLEMENTARY MATERIAL
International Conference, Montreal, QC, Canada, August 14-
Supplementary material is available at www.comjnl.oxford- 15, 2014, Revised Selected PapersLecture Notes in Computer
journals.org. Science (Vol. 8781), pp. 306–323. Springer.
[13] Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C.,
Poschmann, A., Robshaw, M.J.B., Seurin, Y. and Vikkelsoe,
REFERENCES C. (2007) PRESENT: an ultra-lightweight block cipher.
In Paillier, P., Verbauwhede, I. (eds) Cryptographic
[1] Gohr, A. (2019) Improving attacks on round-reduced speck32/64 Hardware and Embedded Systems - CHES 2007, 9th
using deep learning. In Boldyreva, A., Micciancio, D. (eds) International Workshop, Vienna, Austria, September 10-13,

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023
A New Neural Distinguisher Derived 1433

2007, ProceedingsLecture Notes in Computer Science (Vol. Pattern Recognition, CVPR 2019, Long Beach, CA, USA,
4727), pp. 450–466. Springer. June 16-20, 2019 Computer Vision Foundation / IEEE,
[14] Coppersmith, D., Holloway, C.L., Matyas, S.M. and Zunic, N. pp. 9799–9809.
(1997) The data encryption standard. Inf. Secur. Tech. Rep., 2, [21] Chen, Y., Yu, L., Ota, K. and Dong, M. (2019) Hierarchical
22–24. posture representation for robust action recognition. IEEE Trans.
[15] Huang, S., Wang, X., Xu, G., Wang, M. and Zhao, J. (2017) Comput. Soc. Syst., 6, 1115–1125.
Conditional cube attack on reduced-round keccak sponge func- [22] Kingma, D.P. and Ba, J. (2015) Adam: A method for stochas-
tion. In Coron, J., Nielsen, J.B. (eds) Advances in Cryptology - tic optimization. In Bengio, Y., LeCun, Y. (eds) 3rd Interna-
EUROCRYPT 2017 - 36th Annual International Conference on tional Conference on Learning Representations, ICLR 2015, San
the Theory and Applications of Cryptographic Techniques, Paris, Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
France, April 30 - May 4, 2017, Proceedings, Part IILecture [23] Chollet, F. et al. (2015) Keras. https://github.com/fchollet/keras.
Notes in Computer Science (Vol. 10211), pp. 259–288. [24] Abed, F., List, E., Lucks, S. and Wenzel, J. (2014)

Downloaded from https://academic.oup.com/comjnl/article/66/6/1419/6545266 by guest on 19 December 2023


[16] He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep residual Differential cryptanalysis of round-reduced simon and
learning for image recognition. In 2016 IEEE Conference on speck. In Cid, C., Rechberger, C. (eds) Fast Software
Computer Vision and Pattern Recognition, CVPR 2016, Las Encryption - 21st International Workshop, FSE 2014, London,
Vegas, NV, USA, June 27-30, 2016, pp. 770–778. IEEE Computer UK, March 3-5, 2014. Revised Selected PapersLecture
Society. Notes in Computer Science (Vol. 8540), pp. 525–545.
[17] Biham, E. and Chen, R. (2004) Near-collisions of SHA-0. In Springer.
Franklin, M.K. (ed) Advances in Cryptology - CRYPTO 2004, [25] Roelofs, R., Shankar, V., Recht, B., Fridovich-Keil, S., Hardt,
24th Annual International CryptologyConference, Santa Bar- M., Miller, J. and Schmidt, L. (2019) A meta-analysis of over-
bara, California, USA, August 15-19, 2004, ProceedingsLecture fitting in machine learning. In Wallach, H.M., Larochelle, H.,
Notes in Computer Science (Vol. 3152), pp. 290–305. Springer. Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds)
[18] Lee, J., Heo, M., Kim, K. and Kim, C. (2018) Single-image Advances in Neural Information Processing Systems 32: Annual
depth estimation based on fourier domain analysis. In 2018 IEEE Conference on Neural Information Processing Systems 2019,
Conference on Computer Vision and Pattern Recognition, CVPR NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada,
2018, Salt Lake City, UT, USA, June 18-22, 2018Computer pp. 9175–9185.
Vision Foundation / IEEE Computer Society (), pp. 330–339. [26] Wang, M. (2008) Differential cryptanalysis of reduced-round
[19] Schüldt, C., Laptev, I. and Caputo, B. (2004) Recognizing human PRESENT. In Vaudenay, S. (ed) Progress in Cryptology -
actions: A local SVM approach. In 17th International Con- AFRICACRYPT 2008, First International Conference on Cryp-
ference on Pattern Recognition, ICPR 2004, Cambridge, UK, tology in Africa, Casablanca, Morocco, June 11-14, 2008. Pro-
August 23-26, 2004, pp. 32–36. IEEE Computer Society. ceedingsLecture Notes in Computer Science (Vol. 5023), pp. 40–
[20] Tosi, F., Aleotti, F., Poggi, M. and Mattoccia, S. (2019) 49. Springer.
Learning monocular depth estimation infusing traditional stereo [27] Biham, E. and Shamir, A. (1991) Differential cryptanalysis of
knowledge. In IEEE Conference on Computer Vision and des-like cryptosystems. J. Cryptol., 4, 3–72.

Section D: Security in Computer Systems and Networks


The Computer Journal, Vol. 66 No. 6, 2023

You might also like