Acoustic Triangulation Attack
Acoustic Triangulation Attack
A FINAL YEAR PROJECT REPORT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF INFORMATION ENGINEERING DEPARTMENT OF INFORMATION ENGINEERING THE CHINESE UNIVERSITY OF HONG KONG May, 2006
Abstract
We examine the problem of keyboard acoustic emanations, which is regarded as an important area of side channel attacks. After reviewing the two previous approaches suggested by IBM and Professor J.D. Tygar, we suggest a new and more powerful approachthe time-difference approach, for conducting the attack. Instead of using one microphone, we use two microphones and simple triangulation techniques. We believe we are the first one who invented this new approach and we are planning to publish our new findings on the journal of IEEE Transaction on Information Forensics and Security. Our approach needs not complicated techniques and specialist knowledge of sound processing, and it could be easier and more powerful than the two previous approaches. In our project, we study the feasibility and properties of our new approach. We choose three to five keys out of the 104 keys on the keyboard to study the performance of our new approach, by different kinds of simple experiments; and we have successfully achieved a recognition rate up to 80 % in a noisy environment by the simplest equipments and algorithms.
Contents
Abstract.........................................................................................................................................................2 Contents .......................................................................................................................................................3 Chapter 1 Introduction..................................................................................................................................................5 1.1 Overview........................................................................................................................................5 1.2 Side Channel Attacks....................................................................................................................5 1.3 Keyboard Acoustic Triangulation & Emanation Attacks.........................................................6 1.4 The Threat of the Attack..............................................................................................................7 1.5 Our Project....................................................................................................................................8 1.6 Comparison of the three approaches of the attack....................................................................9 Chapter 2 Background ...................................................................................................................................10 2.1 Preliminaries....................................................................................................................................10 2.2 The Approach of Dmitri Asonov and Pakesh Agrawal.............................................................10 2.2.1 The hypothesis of their approach......................................................................................10 2.2.2 Procedures of their approach...............................................................................................11 2.2.3 Results.............................................................................................................................11 2.2.4 Limitations of their approach.......................................................................................11 2.3 The Approacht of Li Zhuang, Feng Zhou, J. D. Tygar............................................................12 2.3.1 The hypothesis of their approach.......................................................................................12 2.3.2 Procedures of their approach..............................................................................................12 2.3.3 Results..............................................................................................................................13 2.3.4 Discussion.........................................................................................................................13 2.4. Our proposed attack, the time-difference approach.................................................................14 2.4.1 Hypothesis.........................................................................................................................14 2.4.2 Procedures of the attack.......................................................................................................15 2.4.3 Expected Advantages of our approach over previous approaches................................15 2.4.4 Feasibility study of our new approach.............................................................................16 Chapter 3 Project Description ...................................................................................................................................17 3.1 Progress Report and Task Distributions....................................................................................17 3.2 The time-difference approach.....................................................................................................19
3.2.1 Hypothesis.....................................................................................................................19 3.2.2 The expected value of received-time-difference.........................................................19 3.2.3 Experiments of studying our new approach.............................................................20 3.2.3.1 The experimental set up.......................................................................................20 3.2.3.2 The general procedure of our experiments.......................................................21 3.2.3.3 Expected Experimental Error............................................................................23 3.2.3.4 Brief of different types of experiments............................................................28 3.2.3.3.1 Experiment type 1..................................................................................29 3.2.3.3.2 Experiment type 2..................................................................................33 3.2.3.5 Brief summary of our findings.........................................................................37 3.2.3.6 The more advanced classification and recognition means...................................38 3.2.4 Suggestion of defending methods......................................................................39
Chapter 1 Introduction
1.1 Overview
The people in IBM Almaden Research Center have shown that PC keyboards, notebook keyboards, telephone and ATM pads are vulnerable to be attacked. In addition, the keyboard acoustic emanations attack has been again brought into concerns by researchers at the University of California, who have found another way to turn the clicks and clacks of typing on a computer keyboard into a startlingly accurate transcript of what exactly is being typed. They have succeeded in analyzing the sound of someones typing on a keyboard for just ten minutes and then pieced together as much as 96 percent of what had been typed. Aiming at raising the concerns of people and warn them about how serious and harmful this kind of attack is, our project showed how easy that the attack could be conducted. In our project, we studied the procedures of the approaches suggested by previous researchers and analyzing the various kinds of processing techniques used in them. We also gained ideas from the techniques used in speech recognition. Starting from an initial idea given by our supervisor, we suggested a new kind of approach of attacking the keyboard acoustic emanations, which is called the time-difference approach, and proved the feasibility of the new approach by experiments. We have made enormous progress on studying the new approaches of the attack; and we believe we are the first one who invented this new approach and our new findings will definitely give great contributions to people on security issues. With the help of the tutor, we are planning to publish our new findings on the journal of IEEE Transaction on Information Forensics and Security. In this report, Chapter 2 briefly reviews the two previous approaches and a short brief introduction of our approach. Chapter 3 presents the details of our approach, including the hypothesis, computational methodologies, experimental procedures and our major findings. We discuss the future development in Chapter 4 and conclude with a summary in Chapter 5. Before going into the details, let us first see a short description about our researched topic.
The Keyboard acoustic attacks belong to the category of Side Channel Attacks. Side Channel attacks are kinds of attacks which are based on Side Channel Information [3], while Side Channel information is the information which can be retrieved from the encryption device that is neither the plaintext to be encrypted nor the ciphertext resulting from the encryption process. The underlying principle of this kinds of attack is that physical effects caused by the operation of a cryptosystem (on the side) can provide useful extra information about secrets in the system. Side channel attacks are an uprising security issue in cryptography and more and more people are paying attention to it due to its enormous threat. It is extremely dangerous because the attacker does not need to know the technology of the internal operation of a system in order to perform a side channel attack. Besides keyboard acoustic attacks, some other common types of side channel attacks are timing attacks, architectural side-effect attacks, power monitoring attacks and acoustic cryptanalysis.
Microphone X
Microphone Y
Figure 1.2 An example of using the keyboard acoustic emanations attack to commit crime It is only a matter of time before criminals begin using similar techniques! said Bruce Schneider, chief technology officer at Counterpane Internet Security and the author of Applied Cryptography.
Chapter 2 Background
2.1 Preliminaries
This chapter provides basic introductions of the two previous approaches and our proposed approach. It is divided into 3 parts: 1. 2. 3. A brief introduction of the approach suggested by Dmitri Asonov and Pakesh Agrawal from the IBM Almaden Research Center A brief introduction of the approach suggested Li Zhuang, Feng Zhou, J.D. Tygar from the University of California A brief introduction of our proposed approachthe time-difference approach
Fig 2.1 The graph of showing why the sound of clicks differ from each other
10
2.2.3 Results
They achieved relatively high recognition rate (approximately 80%). However, this high recognition rate could only be achieved when they trained neural networks with text-labeled sound samples of the same keyboard typed by the same person [1] [2].
11
Figure 2.2 Overview of the approach of Li Zhuang, Feng Zhou and J. D. Tygar The first phase of their approach (Figure 2.2(a)) 1. Collect data: They took a recording of a user typing English text. 2. Feature extraction: They determined subsequent keystrokes from the sound recordings. To extract features, they used Cepstrum [9] [20] features which gave better results than FFT. 3. Unsupervised key recognition: They clustered each keystroke extracted in the step 2 (unlabeled training data) into one of K classes, using standard data clustering methods. K is chosen to be slightly
12
larger than the number of keys on the keyboard. Once the conditional distributions of the classes were determined, they tried to find the most likely sequence of keys given a sequence of classes for each keystroke by using a Hidden Markov Models (HMM) [21]. HMMs predicted a stochastic process with state. They captured the correlation between keys typed in sequence. For example, if the current key could be either h or j (e.g. because they are physically close on the keyboard) and it could be known the previous key was t, then the current key was more likely to be h because th was more common than tj. Using these correlations, both the keys and the key-to-class mapping distributions were efficiently estimated using standard HMM algorithms. This step yielded accuracy rates of slightly over 60% for characters, which in turn yielded accuracy rates of over 20% for words. [2] 3. Spelling and grammar checking: They used dictionary-based spelling correction and a simple statistical model of English grammar. These two approaches were combined in a single Hidden Markov Model. This increased the character accuracy rate to over 70%, yielding a word accuracy rate of about 50% or more. At this point, the text was quite readable. [2] 4. Feedback-based training: They used the previously obtained corrected results as labeled training samples. Though their corrected results are not 100% correct, they use heuristics [2] to select words that are more likely to be correct. [2]
2.3.3 Results
They recovered the novel with an accuracy rate up to 96% of typed characters.
2.3.4 Discussion
The approach used by Li Zhuang, Feng Zhou and J. D. Tygar were much more general and serious in nature compared to the approach of Dmitri Asonov and Pakesh Agrawal as their approach did not
13
need a labeled training recording. Their approach could be put into used in the practical world by attackers, who just needed to have a cheap microphone and a computer. Moreover, the recognizer bootstrapped this way could even recognize random text such as passwords. [2] The easiness and low cost of their approach showed people the great danger of the keyboard acoustic emanations attack. However, the techniques and algorithms of their approach were difficult for people who did not have much computing and mathematical knowledge. In addition, their approach needed large unlabeled samples that the high recognition rate could only be achieved when the typed text was long. In order to further show people that the seriousness of the keyboard acoustic emanations attack is far beyond the expectation of people now, we proposed a new kind of approach, the time-difference approach, which is much easier and simpler than the approaches above.
2.4.1 Hypothesis
The hypothesis of the time-difference approach is based on the different location of keys on the keyboard/keypad. By using two microphones, when the user hits a key, the sound emitted travels through the space to the two microphones. As each key has different distances to the two microphones respectively, by computing the difference of the received time of the keystroke received by the two microphones (received-time-difference), we can know which key has been pressed.
14
Mathematical explanation: There are two microphones, the microphone X and the microphone Y placed as Figure 2.3
Microphone-X
Microphone-Y
Figure 2.3 Description of our approach--the time-difference approach Let us define the distance between the key i and the microphone X be Dix, and that between the key i and the microphone Y be Diy. The velocity of sound waves in space is V. When key i is pressed, the sound waves produced will be received by the two microphones. Let us define x(t) be the sound wave received by microphone X and y(t) be the sound wave received by microphone Y. i.e. x( t ) = y( t ti ) where ti = (Dix - Diy ) / V (Dix - Diy ) is the distance difference between key i and the two microphones respectively; and different keys have different values of (Dix - Diy ). While the velocity of sound waves in space is always a constant, we can know which key has been pressed from the value of received-time-difference--ti, of the two microphones.
15
Moreover, the previous approaches involve very difficult mathematical and sound processing knowledge like FFT and Cepstrum. However, our approach is very easy to be understood and implemented by people even though they just have very simple knowledge of Sound Physics!
16
17
The following time chart shows the progress of our project: Literature Reviews & Speech Recognition Techniques study Keystrokes Analysis Thesis I Experiments Thesis II Sept Oct Nov 2005 The following table shows our individual contributions to the project: My contributions Literature review 1. Previous Researches Review 2. Speech Recognition techniques Study Feasibility study of time-difference approach 1. Recording experiments 2. Program Feature Extraction 3. Program Signal Classification 4. Results Analysis Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes My Partners contributions Dec Jan Feb 2006 Mar Apr
18
Microphone-X
Microphone-Y
Figure 3.1 Our proposed appraoch--the time-difference approach.
3.2.1 Hypothesis
Recalled from Chapter 2: Let Dix be the distance between key i and microphone X; Diy be the distance between key i and microphone Y; x(t) be the sound wave received by microphone X; y(t) be the sound wave received by microphone Y; V be the velocity of sound wave in air. x( t ) = y( t ti ) where ti = (Dix - Diy ) / V By finding out the value of ti, we can know which key has been pressed.
19
Thus the minimum value of ti = The least value of (Dix - Diy ) / speed of the sound in air = 0.04 / 344 = 116.3 picoseconds = 11.2 samples for sampling rate = 96,000 s^-1
Microphone X
Microphone Y
Figure 3.2 Our experimental set up Equipments Microphones. condenser microphone; multimedia condenser type microphone; -42 dB sensitivity
(0db + 1Pa, 1kHz); 20-20kHz Frequency Response; 2000 ohm impedance; 120cm cord length; 3.5 mm stereo jack; Price: HK$20 each.
Keyboard: Dell PS2 keyboard P/N 7N242. Software used GoldWave: GoldWave is a top rated, professional digital audio editor. We used it to record the keystrokes and make some pre-processing operations of the recorded keystrokes to make them better for computation. Matlab: Matlab is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed
20
in familiar mathematical notation. We have developed a simple program by Matlab to compute the receive-time-difference and calculate the recognition rates of our approach. 3.2.3.2 The general procedures of our experiments 1. Using two microphones to record the keystrokes. 2. Pre-processing the recorded sounds by GoldWave 3. Extract the keystrokes from the two recorded sound 4. Compute the received-time-differences between two microphones by matlab. 5. Output the results 1. Using two microphones to record the keystrokes Synchronization of microphones X and Y For simple set up, we use two separate microphones to conduct our experiments without synchronization. It is because it would not affect the results if we do not synchronize the two microphones. Proof: When conducting our experiments, we did not synchronize the microphones X & Y. e.g. The Microphone-X may start recording at time = 0 s while the Microphone-Y may start recording at time = 0.5 s. Let us define ts be the difference of the starting time of recording of the two microphones. Thus for a particular keystroke, the computed received-time-difference will be equal to the sum of ti and ts The computed received-time-difference = ts + ti As the value of ts does not the change for all the keystrokes in the recorded sound wave, the computed received-time-difference is still different for different keys due to the presence of ti. We could still recognize the key for the computed received-time-difference.
21
2. Pre-processing the recorded sounds This process involves some operations on the two recorded sounds to make them better for the later computation. The two main operations of this stage are chopping the two recorded sounds and filtering the noise away by the default noise filter in the GoldWave software. e.g. When the keystrokes were recorded, the background noise was also recorded. Thus it is needed to reduce it by using the function of the noise reduction filtering in GoldWave. The result after processing is apparently showing a very clear signal. (Figure 3.3)
Figure 3.3 A keystroke signal before noise reduction (left) and after noise reduction (right) 3. Extracting the keystrokes The recorded sound contains many keystrokes. Thus we need to extract them out one by one for comparison. A typical keystroke contains pushing peaks and releasing peaks corresponding to pushing the key and releasing the key. Extracting the keystrokes means finding the starting point and end point of each keystroke which contains both the pushing peaks and releasing peaks. First of all, we read the two sound waves into Matlab. Then we check the values sequentially from the initial point. If the absolute amplitude of the sound wave is greater than a per-set threshold, then it is regarded as the beginning of a keystroke. According to our observation, a key pressed by normal strength can usually be
22
distinguished with a threshold of 0.1. The first point which fulfils the above requirement is regarded as Point A. Then, the starting point is set by Point A minus 2000. (Figure 3.4) By doing so, we can ensure that the complete beginning of a keystroke is included. After the starting point is found, we check for the end point. Another threshold is set to check the end of a keystroke. It is found that, after filtering the noise away, waves with absolute amplitudes less than 0.02 are generally not considered in a keystroke signal. Hence, the second threshold is set to be 0.02. When there are 2000 successive points with amplitude lower than 0.02, we set the last point as Point B. The end point is recorded as adding Point B by 2000 to obtain the complete keystroke. (Figure 3.4)
Pushing Peak
Releasing Peak
Figure 3.4 The starting point and end point of a typical keystroke
To compute for the received-time-difference, we have to ensure that the starting point and end point should include both the keystrokes of the two recorded sound wave, like what is shown in Fig 3.5. With the two pairs of start and end points corresponding of the recorded sound waves, we compare them to get a common pair. To get a common start point, the start points are compared and the earlier one is chosen. Similarly, the end points are compared and the later one is chosen as the common end point.
23
Figure 3.5 The selection of the common start and end point
4, Calculating the received-time-difference There are noises in the experiment environment. Though keystrokes recorded by the two microphones are from the same source, due to the interferences of noises, the two received waveforms differ from each other greatly, like what is shown in Figure 3.6. The presence of noises makes it very difficult to compute the value of received-time-difference.
Figure 3.6 The received signal of the two microphones from the same keystrokes (same source)
24
Starting from the first Semester, we have tried different approaches to compute the received-time-difference of keystroke. Generally we have concluded two suitable ways to get it, which are by maximum peak position and by correlation respectively. Using the maximum peak as the reference point From some simple experiments we conducted before, we found out there are some very sharp peaks in the keystrokes. Assuming that the noise is not large that it does not affect the positions of the maximum peaks of the keystrokes much, we can find out the received-time-difference by comparing the positions of the maximum peaks of the two received keystrokes respectively by the two microphones.
Figure 3.7 explanation of using the maximum peaks as a reference point. e.g. In Figure 3.7, the received-time-difference= T2-T1; This approach is based on the assumption of that the noise does not affect the positions of the sharp peaks much. However, the noise does affect the position of maximum peak, at a random basis. Thus variation of the calculated received-time-differences will be expected. By correlation of the received keystrokes In order to reduce the effect of the noise, we further invented another way to calculate the received-time-differences. It is by calculating the correlation of the two received keystrokes.
25
Explanation: Let us define x(i) as the received waveform by microphone X and y(i) as the received waveform by microphone Y from the same keystroke. i.e. x(i) = y(i- ti) where ti is the received-time-difference. In order to find out the vaule of ti, we compute the correlation of x(i) and y(i) by the correlation function:
r(d) will have a maximum value only when the overlapping area of x(i) and y(i-d) is maximum, like what is shown in Figure 3.8. However, the overlapping area of x(i) and y(i-d) will be maximum only when d= ti. Thus by plotting the graph of r(d) against d in Matlab, we can find out the value of ti.
26
Discussion: The correlation approach could reduce the noise effect, if the noise on the two microphones generally has the same behavior. Thus we expect the correlation approach will give more accurate results of calculating the received-time-difference; and we will find out if it is true, by the experiments. 5. Output the results The main operations in this stage are to output the computed received-time-differences, analyzing them and calculate the recognition rate. How to recognize: There are lots of advance algorithms and complex statistical models of doing classification and recognition. However, we use the simplest waythe minimum distance approach. We believe the time-difference approach is powerful that we could achieve a high recognition rate even if we use the simplest recognition means. Limitation of time is also a reason of adopting this simple means. The minimum distance approach: Firstly, we calculate the mean received-time-difference of each key from a set of training data. Afterwards we form a set of feature pairs--{key, mean received-time-difference}. By computing the distance between the received-time-difference of the test keystroke and the mean received time difference of each key, we set the testing keystroke to the key which has the least difference between the two values.
27
Distance 1
Distance 2
Minus
Figure 3.9 the minimum distance approach 3.2.3.3 Expected Experimental Error Large Scale Error Noise: Noise is always a main source of errors for experiments. It was impossible to find a place with no noise. In our case, we conducted the experiments in the IE laboratory. The main sources of noises will be from the air-conditioning, computers, and man-made noise of IE students. Small Scale Error 1. Hitting variance: As stated above, the hypothesis of our new approach is based on the different position of keys respective to the two microphones. However, it is very difficult to hit the keys at the same position every time. Variation occurs when we hit the keys at different position every time. 2. Shift of the microphones and the keyboard: The keyboard and the two microphones are put on the same table. When we hit the keys, we hit the table at the same time. The position of the keyboard and the two microphones may shift duo to the vibration of the table when being hit. Error occurs.
Distance N
28
3.2.3.4 Brief of different types of experiments We divided our experimental research into several stages. There are different objectives and different findings at different stages, backed by different types of experiments. Two main types of experiments will be discussed below: 3.2.3.4.1 Experiment type 1: Objective: To see if there is a distinguishable difference between the received-time-differences of 2 keys. At first, we recorded the keystrokes of key 1 and key 9 on the number-pad (shown as in the Figure 3.10), each thirty times, to see if their receive-time-differences are clustered into separate regions.
Key 1 Key 9
Figure 3.10 The positions of Keys--1 & 9 A typical result is presented by the following line chart:
29
The results computed by correlation means: Key 1 Average Median Range Variation (15549.74) (15558.00) (15451.00)- (15611.00) 1852.93 (15559.22) (15563.00) (15519.00) (15599.00) 548.72 Key 9
The results computed by the maximum peak positions: Key 1 Average Median Range Variation Discussion: As we could see from the two line charts, the computed received-time-differences of the two keys did not clustered into separate regions. Instead, they were overlapping with each other and that made it impossible to distinguish them from their received-time-differences. Another finding was that the received-time-differences of the keystrokes of the same key differed from each other much. As we explained above, the variations were due to the interferences of the random background noises, also the hitting variance, and the shift of the microphones and the keyboard. In addition, from the data, the computed received-time-differences by maximum peak position were more varied than by correlation, which matched our expectation before the experiments. The results of the experiments were out of our expectation that it showed that it was impossible to recognize the keys from the received-time-differences. However, we believed there should be unique and separate clusters of the received-time-differences between the keys. There were two possible explanations for the undistinguishable clusters: 1. It was due to the amplitude of noise effect covers the differences of the received-time-differences of the two keys. Mathematical explanation: ti = (Dix - Diy ) / V, ti tj= (Dix Diy ) / V - (Djx Djy ) / V = [(Dix Diy ) - (Djx Djy )] / V = 1/2Dij / V where Dij = 1/2[(Dix Diy ) - (Djx Djy )]
30
Dij is the separation of the key i and key j. If the separation is too small, it could not be detected and recognized due to the large noise interference. 2. The microphones and the sound card had sensitivity limits to the sound wave. The difference may be too small that it could not be detected by the sound and the microphones. As stated in the reason 1, the effect of noise plus the small separation of key 1 & key 9 caused the false results. Thus we believed there should be distinguishable clusters of the receive-time-differences if we chose two keys with larger separation. In order to prove it, we conducted the same experiment again; however, instead of using key 1 & key 9, we used key z and key 3, which has a much larger separation than that of key 1 & key 9 as shown by Figure 3.12.
Key z Key 3
31
Figure 3.13 The line charts of a typical result of the second experiments of Experiment type 1 The results computed by correlation means: Key z Average Median Range Variation 1758.93 1685.50 1400.00 2328.00 41003.31 1222.80 1240.00 877.00 - 1377.00 11074.44 Key 3
The results computed by the maximum peak positions: Key z Average Median Range Variation Recognition rate computation: As stated before, we used the minimum-distance approach as the recognition means. We chose the first 10 keystrokes as the training data while the latter 20 keystrokes of each key as the test keystrokes. The recognition rate was very perfect. We achieve recognition rates by 100%, for both correlation and max peak position means! Discussion: The results proved our hypothesis! The received-time-differences of key z and key 3 clustered into two different and separate regions. They can be distinguished from each other perfectly. Though there were still
32
large variations of the receive time difference, we could recognize two keys with an accuracy of 100%!! The data computed by the means of the maximum peak positions varied less than by correlation, which was out of our expectation. It may be due to the changing noise natures in the recording process, which gave larger effects on the means of by correlation than on the means of maximum peak position. However, both means could still achieve recognition rates by 100%! 3.2.3.4.2 Experiment type 2: From the experiment type 1, we proved that we could perfectly recognize 2 keys of large separation by the received-time-differences of two microphones. In the next step, we further investigated the performance of our approach. Instead of recognizing two keys, we tried to recognize three keys and five keys in the experiment type 2. We chose key z, key ? and key 3for the 3-key experiments while we chose key z, key b, key ?, key up-arrow and key 3 for the 5-key experiments. The distributions of the keys were as shown in the Figure 3.14.
Key z Key ? Key 3
Key b
Key up-arrow
33
A typical result of the three-key experiment: z, ? & 3 (For better presentation, we used the average received-time-difference of key ? as a reference point and we only presented the reference values.):
Figure 3.15 The line charts of a typical result of 3-key experiments The result computed by correlation means: Key z Average Median Range 358.17 284.74 (0.76) - 927.24 (*233.24-526.24 without counting the abnormal peaks) Variation 41003.31 3152.05 11074.44 Key ? 0.00 (6.76) (118.76) - 168.76 Key 3 (177.96) (160.76) (23.76)- (523.76)
The result computed by the maximum peak positions: Key z Average Median Range Variation Discussion: For the three-key experiments, the minimum separation between keys has already been reduced by half compared to that of experiment type 1. However, the received-time-differences of the three keys were still clustered into three separate regions. Though there are few extreme abnormal points due to the random noise, we could still achieve very recognition rates which were on average over 90%!
34
To further investigate the performance of the attack, we conducted the five-key experiment. A typical result was presented as followings (For better presentation, we used the average received-time-difference of key ? as a reference point and we only presented the reference values.):
35
Figure 3.16(b) The line charts of a typical result of five-key experiments The result computed by correlation means: Key z Average Median Range 358.17 284.74 (0.76) - 927.24 (*233.24-526.24 without counting the abnormal points) Variation 41003.31 5170.66 3152.05 5176.66 11074.44 Key b 229.47 226.24 173.24 302.24 Key ? 0.00 (6.76) (118.76) 168.76 Key up-arrow (144.66) (142.76) (265.76) (12.76) (177.96) (160.76) (23.76) (523.76) Key 3
36
The result computed by max peak position: Key z Average Median Range 230.00 228.03 (3.97) - 526.03 Key b 124.17 102.03 (11.97) - 960.03 Key ? 0 31.53 (191.97) 229.03 Key up_arrow (181.80) (147.47) (1109.97) (10.97) [*(288.97) (72.97) without counting the abnormal point] Variation Discussion: In the five-key experiments, the minimum key separation is further reduced to four keys. However, due to the decrease of keys separations, there are more overlapping between the clusters of the received-time-differences of each key. However, they are still separable from each other. Due to the decease of minimum key separations, the recognition rates reduced. The average recognition rate was around 60% - 70%. 7253.21 29392.95 10043.76 36030 3874.73 (334.57) (326.47) (257.97) (504.97) Key 3
Please bear in mind what we have done is very initial. In the experiments we used the very simple techniques and approaches as we only aimed at studying the feasibility and properties of the received-time-difference approach, not at achieving high recognition rates, though we could still achieve very high recognition rate by these simple techniques. Moreover, based on the findings from our simple experiments, we designed a more practical and complicated classification and recognizing means for our approach which we believed it could achieve very high recognition rates of recovering the text typed from the recorded keystrokes. In addition, for the new algorithm, we does not need training data, which was needed in the experiments we conducted before. 3.2.3.6 The more advanced classification and recognition means By gaining the idea from the previous research done by Li Zhuang, Feng Zhou, J. D. Tygar, we designed a more advanced classification and recognition means for our approach. Detailed procedures are described below: 1. Collect data: Taking a recording of a user typing English text by two microphones. 2. Feature extraction: Extracting the keystrokes and calculating their respective received-time-differences. 3. Clustering: At this stage, the computed received-time-differences were clustered into K classes, where K classes refer the clustered class of each key. Unlike the two approaches used in the two previous researches, we could easily cluster the received-time-differences into their corresponding K classes directly from their values. For example, the lowest and the highest received-time-differences will be belonged to the two keys which are closest to the two microphones respectively. After the initial clustering of the computed received-time-differences to the classes directly from their values, the accuracy of clustering could be improved by the likely sequence of English text by the means of Hidden Markov Models (HMM)[2][17][21][22][24]. For example, if the current key could be either h or j (e.g. because they are physically close on the keyboard) and it could be known the previous key was t, then the current key was more likely to be h because th was more common than tj. 4. Spelling and grammar checking: This is the final stage. The dictionary-based spelling correction and a simple statistical model of English grammar could be used to further correct the output text.
38
39
40
Chapter 5 Conclusion
We have shown how easily the attack could be conducted by inventing a completely new but simple approach. As the advance of the computation speed of computers continues, conducting the attack will become easier and simpler. In addition, just like the words of Bruce Schneider, chief technology officer at Counterpane Internet Security and the author of Applied Cryptography, It was only a matter of time before criminals began using it; and it is definitely not surprising that there have been people making use of it to commit crimes. It is crucial for people to pay attention to it and prepare to defend the attack. We hope our research would raise the awareness of people about this kind of attacks. We would also like people to bear in mind that their typing information is in great danger now. It is definitely needed for them to understand the attacking techniques and protect their typing information from being attacked by such techniques!
41
Reference
[1] D. Asonov, and R. Agrawal, Keyboard Acoustic Emanations, In Proceedings of the IEEE Symposium on Security and Privacy, 2004. [2] L. Zhuang, F. Zhou and J. D. Tygar, Keyboard Acoustic Emanations Revisited, In Proceedings of the 12th ACM Conference on Computer and Communications Security, 2005. [3] Hagai Bar-EI, Introduction to side channel attack, Discretix Technologies Ltd. [4] M. A. Casey. Introduction to MPEG-7: Multimedia Content Description Language, chapter Sound Classification and Similarity Tools. J. Wiley, 2001. [5] [6] J. Harrington and S. Cassidy, Techniques in Speech Acoustics, The Netherlands: Kluwer Academic Publishers, 1999. H. Hermansky, Analysis in Automatic Recognition of Speech, in Speech Processing, Recognition and Artificial neural Networks, G. Chollet, M. G. Di Benedetto, A. Esposito and M. Marinaro Eds., London: Springer-Verlag, 1999, pp. 115-137. [7] R. de Mori, Statistical Methods For Automatic Speech Recognition , in Speech Processing, Recognition and Artificial neural Networks, G. Chollet, M. G. Di Benedetto, A. Esposito and M. Marinaro Eds., London: Springer-Verlag, 1999, pp. 165-189. [8] [9] J. P. Haton, Neural networks for automatic speech recognition: a review, in Speech Processing, Recognition and Artificial neural Networks, G. Cho Tukey, J. W., B. P. Bogert and M. J. R. Healy: "The frequency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe-cracking". Proceedings of the Symposium on Time Series Analysis (M. Rosenblatt, Ed) Chapter 15, 209-243. New York: Wiley. [10] Int. J. Appl. Math. Computer Science, 2005 Vol. 15, No 2, On Nave Bayes in Speech Recognition [11] Dat Tran, Minh Do, Michael Wagner and T. Van Le: A Proposed Decision Rule for Speaker Identification Based on a Posteriori Probability [12] Chou Juang, Pattern Recognition in Speech and Language Processing, CRC Press [13] F.J. Owens, Signal Processing of Speech, Macmillan New Electronics Introduction to Advanced Topics [14] Ravi P. Ramachandran, Richard J. Mammone, CAIP Center, Rutgers University, Modern Methods of Speech processing, Kluwer Academic Publishers [15] Proakis . Rader, Ling . Nikias, Moonen . Proudler, Algorithms for statistical Signal Processing, Prentice Hall [16] G.Chollet, M.Di Benedetto, A.Esposito and M.Marinaro, Speech Processing, Recognition and Artificial Neural Networks, Proceedings of the 3rd International School on Neural Nets Eduardo R.Caianiello, Springer [17] Pedro Larranage, Jose A.Lozano, Estimation of Distribution algorithms, A New Tool for Evolutionary Computation, Kluwer Academic Publishers [18] A.M.Kondoz, Digital Speech, second edition, Wiley
42
[19] Richard Price, Jonathan Willmore & William Roberts, Genetically Optimised Feedforward Neural Networks for Speaker Identification, Information Technology Division, Electronics and Surveillance Research Laboratory [20] Glenn White, Cepstrum Analysis, DLI Engineering, 1998 [21] Lawrence R.Rabiner, A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, Fellow, IEEE [22] S. Balakrishnama, A. Ganapathiraju, LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL, Institute for Signal and Information Processing, Department of Electrical and Computer Engineering, Mississippi State University [23] Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis, SPEECH RECOGNITION BASED ON FEATURE EXTRACTION WITH VARIABLE RATE FREQUENCY SAMPLING, Wire Communications Lab., Electrical & Computer Engineering Dept., University of Patras [24] Michel Vidal-Naquet Shimon Ullman, Object Recognition with Informative Features and Linear Classification, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science
43