0% found this document useful (0 votes)
240 views43 pages

Acoustic Triangulation Attack

This document summarizes a student's final year project on a new approach to keyboard acoustic attacks called the time-difference approach. The student proposes using two microphones rather than one to triangulate keyboard positions based on the time difference of arrival of keystroke sounds. Experiments were able to achieve a recognition rate of up to 80% in noisy environments using simple equipment and algorithms. The report reviews previous approaches, describes the new time-difference method and experimental setup in detail, and plans for future work and publication of the new findings.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
240 views43 pages

Acoustic Triangulation Attack

This document summarizes a student's final year project on a new approach to keyboard acoustic attacks called the time-difference approach. The student proposes using two microphones rather than one to triangulate keyboard positions based on the time difference of arrival of keystroke sounds. Experiments were able to achieve a recognition rate of up to 80% in noisy environments using simple equipment and algorithms. The report reviews previous approaches, describes the new time-difference method and experimental setup in detail, and plans for future work and publication of the new findings.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

ERG4920CM Thesis II

Keyboard Acoustic Triangulation Attack

Student Name (ID): Hui Shun Pak (03630022)

A FINAL YEAR PROJECT REPORT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF INFORMATION ENGINEERING DEPARTMENT OF INFORMATION ENGINEERING THE CHINESE UNIVERSITY OF HONG KONG May, 2006

Abstract
We examine the problem of keyboard acoustic emanations, which is regarded as an important area of side channel attacks. After reviewing the two previous approaches suggested by IBM and Professor J.D. Tygar, we suggest a new and more powerful approachthe time-difference approach, for conducting the attack. Instead of using one microphone, we use two microphones and simple triangulation techniques. We believe we are the first one who invented this new approach and we are planning to publish our new findings on the journal of IEEE Transaction on Information Forensics and Security. Our approach needs not complicated techniques and specialist knowledge of sound processing, and it could be easier and more powerful than the two previous approaches. In our project, we study the feasibility and properties of our new approach. We choose three to five keys out of the 104 keys on the keyboard to study the performance of our new approach, by different kinds of simple experiments; and we have successfully achieved a recognition rate up to 80 % in a noisy environment by the simplest equipments and algorithms.

Contents
Abstract.........................................................................................................................................................2 Contents .......................................................................................................................................................3 Chapter 1 Introduction..................................................................................................................................................5 1.1 Overview........................................................................................................................................5 1.2 Side Channel Attacks....................................................................................................................5 1.3 Keyboard Acoustic Triangulation & Emanation Attacks.........................................................6 1.4 The Threat of the Attack..............................................................................................................7 1.5 Our Project....................................................................................................................................8 1.6 Comparison of the three approaches of the attack....................................................................9 Chapter 2 Background ...................................................................................................................................10 2.1 Preliminaries....................................................................................................................................10 2.2 The Approach of Dmitri Asonov and Pakesh Agrawal.............................................................10 2.2.1 The hypothesis of their approach......................................................................................10 2.2.2 Procedures of their approach...............................................................................................11 2.2.3 Results.............................................................................................................................11 2.2.4 Limitations of their approach.......................................................................................11 2.3 The Approacht of Li Zhuang, Feng Zhou, J. D. Tygar............................................................12 2.3.1 The hypothesis of their approach.......................................................................................12 2.3.2 Procedures of their approach..............................................................................................12 2.3.3 Results..............................................................................................................................13 2.3.4 Discussion.........................................................................................................................13 2.4. Our proposed attack, the time-difference approach.................................................................14 2.4.1 Hypothesis.........................................................................................................................14 2.4.2 Procedures of the attack.......................................................................................................15 2.4.3 Expected Advantages of our approach over previous approaches................................15 2.4.4 Feasibility study of our new approach.............................................................................16 Chapter 3 Project Description ...................................................................................................................................17 3.1 Progress Report and Task Distributions....................................................................................17 3.2 The time-difference approach.....................................................................................................19

3.2.1 Hypothesis.....................................................................................................................19 3.2.2 The expected value of received-time-difference.........................................................19 3.2.3 Experiments of studying our new approach.............................................................20 3.2.3.1 The experimental set up.......................................................................................20 3.2.3.2 The general procedure of our experiments.......................................................21 3.2.3.3 Expected Experimental Error............................................................................23 3.2.3.4 Brief of different types of experiments............................................................28 3.2.3.3.1 Experiment type 1..................................................................................29 3.2.3.3.2 Experiment type 2..................................................................................33 3.2.3.5 Brief summary of our findings.........................................................................37 3.2.3.6 The more advanced classification and recognition means...................................38 3.2.4 Suggestion of defending methods......................................................................39

Chapter 4 Future Plan..............................................................................................................................................40 Chapter 5 Conclusion.............................................................................................................................................41 References...............................................................................................................................................42

Chapter 1 Introduction

1.1 Overview
The people in IBM Almaden Research Center have shown that PC keyboards, notebook keyboards, telephone and ATM pads are vulnerable to be attacked. In addition, the keyboard acoustic emanations attack has been again brought into concerns by researchers at the University of California, who have found another way to turn the clicks and clacks of typing on a computer keyboard into a startlingly accurate transcript of what exactly is being typed. They have succeeded in analyzing the sound of someones typing on a keyboard for just ten minutes and then pieced together as much as 96 percent of what had been typed. Aiming at raising the concerns of people and warn them about how serious and harmful this kind of attack is, our project showed how easy that the attack could be conducted. In our project, we studied the procedures of the approaches suggested by previous researchers and analyzing the various kinds of processing techniques used in them. We also gained ideas from the techniques used in speech recognition. Starting from an initial idea given by our supervisor, we suggested a new kind of approach of attacking the keyboard acoustic emanations, which is called the time-difference approach, and proved the feasibility of the new approach by experiments. We have made enormous progress on studying the new approaches of the attack; and we believe we are the first one who invented this new approach and our new findings will definitely give great contributions to people on security issues. With the help of the tutor, we are planning to publish our new findings on the journal of IEEE Transaction on Information Forensics and Security. In this report, Chapter 2 briefly reviews the two previous approaches and a short brief introduction of our approach. Chapter 3 presents the details of our approach, including the hypothesis, computational methodologies, experimental procedures and our major findings. We discuss the future development in Chapter 4 and conclude with a summary in Chapter 5. Before going into the details, let us first see a short description about our researched topic.

1.2 Side Channel Attacks

The Keyboard acoustic attacks belong to the category of Side Channel Attacks. Side Channel attacks are kinds of attacks which are based on Side Channel Information [3], while Side Channel information is the information which can be retrieved from the encryption device that is neither the plaintext to be encrypted nor the ciphertext resulting from the encryption process. The underlying principle of this kinds of attack is that physical effects caused by the operation of a cryptosystem (on the side) can provide useful extra information about secrets in the system. Side channel attacks are an uprising security issue in cryptography and more and more people are paying attention to it due to its enormous threat. It is extremely dangerous because the attacker does not need to know the technology of the internal operation of a system in order to perform a side channel attack. Besides keyboard acoustic attacks, some other common types of side channel attacks are timing attacks, architectural side-effect attacks, power monitoring attacks and acoustic cryptanalysis.

1.3 Keyboard Acoustic Attacks


The Keyboard acoustic attacks make use of the typing sounds of users to obtain the typed text. All the systems of using keyboards or keypads will be under a great threat of being attacked by them. A simple version of keyboard acoustic emanations attack has already been used during World War II (1940's). The experts of the allied force recorded the sounds made by axis teletypes being used to type in messages to be encoded. Then they examined the sounds by "looking" at how they appeared on an oscilloscope and were soon able to tell exactly what was being typed by the operator. After 50 years, Dmitri Asonov and Rakesh Agrawal of IBM Almaden Research Center successfully employed a neural network to recognize the key being pressed [1]. Followed by their research, Li Zhuang, Feng Zhou, and J.D Tygar from the University of California presented a novel attack taking as input a 10-minute sound recording of a user typing English text using a keyboard and then recovering up to 96% of typed characters.[2] We renamed the title of our time-difference approach as Keyboard Acoustic Triangulation Emanation Attack because the time-difference approach is a completely new approach than the two previous approaches. The meaning of Triangulation Attack is that our approach uses the similar triangulation technique which is a common technique for locating an object for surveying, navigation, metrology and astrometry. By detecting and measuring the differences in arrival times of keystrokes at two microphones, the keys being pressed can

be found. (Figure 1.1).

Microphone X

Microphone Y

Figure 1.1 An acoustic triangulation attack

1.4 The Threats of the Attack


The cost of conducting keyboard acoustic attacks is very cheap; thus it is more dangerous than other kinds of attacks because everybody is able to conduct it at very low cost, provided they know the techniques of conducting the attack. Both approaches suggested by previous researchers and in our project can be conducted with microphones which just cost tens of dollars! Many important systems are using keyboards or keypads which can be easily attacked. For example, keypads are widely used in banking industries and housing estates that they can be attacked easily by criminals to get passwords of banking accounts and entrance passwords of housing estates. An example of showing the seriousness of the attack is presented in the Figure 1.2. When you are using an ATM machine, everybody in the queue behind you can get your account password by recording your typing sounds with microphones!

Figure 1.2 An example of using the keyboard acoustic emanations attack to commit crime It is only a matter of time before criminals begin using similar techniques! said Bruce Schneider, chief technology officer at Counterpane Internet Security and the author of Applied Cryptography.

1.5 Our Project


What have we done?
There are people who have already done researches on the topic. By studying the techniques they used and learning the techniques used in the speech recognition, we proposed a new approach--the time-difference approach. We conducted experiments to prove the feasibility of the new approach and study the accuracy rates.

Why did we suggest and study a new approach of the attack?


As mentioned above, the cost of conducting the attacks is very cheap and that makes the attack very dangerous. However, the techniques used in the two previous approaches are very difficult so that they have not yet been commonly used by attackers in the society. Due to the high computation power of computers nowadays, we believe the attack can be conducted in another way by some simple algorithms. The time-difference approach suggested in this paper does not require difficult techniques as previous approaches. It is also very cheap and easy to be implemented. We believe attackers will soon find out similar techniques and put them in practices; and it will be not a surprise if they have been using them now. Thus it is needed that people should pay attention to this kind of attacks and find out ways to defend it now!

How do our findings contribute to the issue?


We have made enormous progress on the study of our new approach. With precise experimental data, we proved the feasibility of our new approach and showed how powerful it is. Moreover, backed by the experimental data, we suggested the complete implementation algorithm of conducting the attack in the real practice! In the first semester, we focused on studying the two previous approaches and some important sound processing techniques. In the second semester, we conducted experiments to prove the feasibility of the time-difference approach, and calculated the recognition rates. We successfully proved the feasibility of our suggested approach and we believed we were the first one who invented this new approach. Our new findings will definitely give great contributions to people on security issues!

1.6 Comparison of the three approaches of the attack


Before going into details of describing the two previous approaches and our approach, let us have a simple comparison between them.
Cost Difficulty Accuracy Can it be put in real practice? The approach of Dmitri Asonov and Pakesh Agrawal The approach of Li Zhuang, Feng Zhou, J. D. Tygar Our time-difference approach Cheap Easy Cheap Very difficult Very High (for long type-text) Under investigation Yes (expected) Yes Cheap difficult High No

Chapter 2 Background
2.1 Preliminaries
This chapter provides basic introductions of the two previous approaches and our proposed approach. It is divided into 3 parts: 1. 2. 3. A brief introduction of the approach suggested by Dmitri Asonov and Pakesh Agrawal from the IBM Almaden Research Center A brief introduction of the approach suggested Li Zhuang, Feng Zhou, J.D. Tygar from the University of California A brief introduction of our proposed approachthe time-difference approach

2.2 The Approach of Dmitri Asonov and Pakesh Agrawal [1]


2.2.1 The hypothesis of their approach
Their approach is based on the fundamental hypothesis that sound of keys might differ slightly from key to key. Although the clicks of different keys sound similar to the human ear [1], due to the great computation power of computers nowadays, it is possible to distinguish the key from the recorded sound of the clicks by using suitable kinds of sound processing techniques. The reason why the sound of clicks might differ slightly from key to key because different parts of the keyboard plate might produce different sounds when the nearby key is pushed [1].It could be explained by analogy with a drum that striking a key at different locations on the plastic plate provides different timbres.
Striking a key at different locations on the plastic plate provides different timbres

Fig 2.1 The graph of showing why the sound of clicks differ from each other

10

2.2.2 Procedures of their approach


1. Feature extraction: They recorded 100 clicks of each key and extracting the features [23]. These features were used to train the neural network [1] [5] [6] [8] [12] [13] [16]. 2. Training: They trained the neural network [8] with the pairs {key, feature}. 3. Testing: They recorded a set of test clicks (100 clicks per key) and extracted the features. The features were inputted to the neural network and the output of the network was compared with the identity of the key that was actually pressed [7][12] [19].

2.2.3 Results
They achieved relatively high recognition rate (approximately 80%). However, this high recognition rate could only be achieved when they trained neural networks with text-labeled sound samples of the same keyboard typed by the same person [1] [2].

2.2.4 Limitations of their approach


Their research showed that it was possible to recover the typed information from the typing sounds. However, the approach they used had lots of limitations, as stated below, and that made it not possible to be conducted in real practice. Limitations: 1. Their approach needed training data. That meant that their approach was quite limited as the attacker needed to obtain the training samples of the same keyboard by the same person, which was not quite possible in the real practice. 2. Given that the attack worked well only with the same settings (i.e. the same keyboard, person, recording environment, etc.) as the training recording, the training data were hard to be obtained in typical cases. Training on one keyboard and recognizing on another keyboard of the same model yielded lower accuracy rates, around 25% [1]. Lower recognition rates were also observed when the model was trained by one person and used on another. 3. The combination of classification techniques leaved rooms for improvement. Researchers from the University of California found superior techniques to FFT as features and neural networks as classifiers [2]. Their approach will be discussed in the next section.

11

2.3 The approach of Li Zhuang, Feng Zhou, J. D. Tygar [2]


2.3.1 The hypothesis of their approach
Based on the research done by Asonov and Agrawal, Li Zhuang, Feng Zhou and J. D. Tygar further assumed that the typed text is often not random. When one types English text, the limited number of English words limited the possible temporal combination of keys, and English grammar limited the word combinations. One could first cluster (using unsupervised methods) keystrokes into a number of classes based on their sound. Given sufficient (unlabeled) training samples, a most-likely mapping between these classes and actual typed characters could be established using the language constraints.[2]

2.3.2 Procedures of their approach [2]


Their approach was based on a combination of machine learning and speech recognition techniques. It contained two phases (as shown below).

Figure 2.2 Overview of the approach of Li Zhuang, Feng Zhou and J. D. Tygar The first phase of their approach (Figure 2.2(a)) 1. Collect data: They took a recording of a user typing English text. 2. Feature extraction: They determined subsequent keystrokes from the sound recordings. To extract features, they used Cepstrum [9] [20] features which gave better results than FFT. 3. Unsupervised key recognition: They clustered each keystroke extracted in the step 2 (unlabeled training data) into one of K classes, using standard data clustering methods. K is chosen to be slightly

12

larger than the number of keys on the keyboard. Once the conditional distributions of the classes were determined, they tried to find the most likely sequence of keys given a sequence of classes for each keystroke by using a Hidden Markov Models (HMM) [21]. HMMs predicted a stochastic process with state. They captured the correlation between keys typed in sequence. For example, if the current key could be either h or j (e.g. because they are physically close on the keyboard) and it could be known the previous key was t, then the current key was more likely to be h because th was more common than tj. Using these correlations, both the keys and the key-to-class mapping distributions were efficiently estimated using standard HMM algorithms. This step yielded accuracy rates of slightly over 60% for characters, which in turn yielded accuracy rates of over 20% for words. [2] 3. Spelling and grammar checking: They used dictionary-based spelling correction and a simple statistical model of English grammar. These two approaches were combined in a single Hidden Markov Model. This increased the character accuracy rate to over 70%, yielding a word accuracy rate of about 50% or more. At this point, the text was quite readable. [2] 4. Feedback-based training: They used the previously obtained corrected results as labeled training samples. Though their corrected results are not 100% correct, they use heuristics [2] to select words that are more likely to be correct. [2]

The second phase of their approach (Figure 2.2(b))


The second phase of their approach, which was the recognition phase, used the trained keystroke classifier to recognize new sound recordings. If the text consisted of random strings, such as passwords, the result would be output directly. For English text, the above spelling and grammar language model was used to further correct the result. To distinguish between two types of input, random or English, they applied the correction and saw if reasonable text is produced. Meaningful text recovered from the recognition phase during an attack could also be fedback to the first phase. These new samples along with existing samples could be used together to get an even more accurate keystroke classifier. This method improved the recognition rate over time. [2]

2.3.3 Results
They recovered the novel with an accuracy rate up to 96% of typed characters.

2.3.4 Discussion
The approach used by Li Zhuang, Feng Zhou and J. D. Tygar were much more general and serious in nature compared to the approach of Dmitri Asonov and Pakesh Agrawal as their approach did not

13

need a labeled training recording. Their approach could be put into used in the practical world by attackers, who just needed to have a cheap microphone and a computer. Moreover, the recognizer bootstrapped this way could even recognize random text such as passwords. [2] The easiness and low cost of their approach showed people the great danger of the keyboard acoustic emanations attack. However, the techniques and algorithms of their approach were difficult for people who did not have much computing and mathematical knowledge. In addition, their approach needed large unlabeled samples that the high recognition rate could only be achieved when the typed text was long. In order to further show people that the seriousness of the keyboard acoustic emanations attack is far beyond the expectation of people now, we proposed a new kind of approach, the time-difference approach, which is much easier and simpler than the approaches above.

2.4 Our proposed approachthe time-difference approach


In this section, we simply describe our new approach. Detailed descriptions of the approach will be in the session 2 of the next chapter.

2.4.1 Hypothesis
The hypothesis of the time-difference approach is based on the different location of keys on the keyboard/keypad. By using two microphones, when the user hits a key, the sound emitted travels through the space to the two microphones. As each key has different distances to the two microphones respectively, by computing the difference of the received time of the keystroke received by the two microphones (received-time-difference), we can know which key has been pressed.

14

Mathematical explanation: There are two microphones, the microphone X and the microphone Y placed as Figure 2.3

Microphone-X

Microphone-Y

Figure 2.3 Description of our approach--the time-difference approach Let us define the distance between the key i and the microphone X be Dix, and that between the key i and the microphone Y be Diy. The velocity of sound waves in space is V. When key i is pressed, the sound waves produced will be received by the two microphones. Let us define x(t) be the sound wave received by microphone X and y(t) be the sound wave received by microphone Y. i.e. x( t ) = y( t ti ) where ti = (Dix - Diy ) / V (Dix - Diy ) is the distance difference between key i and the two microphones respectively; and different keys have different values of (Dix - Diy ). While the velocity of sound waves in space is always a constant, we can know which key has been pressed from the value of received-time-difference--ti, of the two microphones.

2.4.2 Expected Advantages of our approach over previous approaches


For the two previous approaches, which are basically based on the hypothesis of each key having a different keystroke, the best defense method is to make a keyboard/keypad of keys having similar typing sounds that they are not easily to be distinguished. However, our approach, the time-difference approach, cannot be defended by this method as the hypothesis of our approach is based on the difference of the distance between the keys and the two microphones.

15

Moreover, the previous approaches involve very difficult mathematical and sound processing knowledge like FFT and Cepstrum. However, our approach is very easy to be understood and implemented by people even though they just have very simple knowledge of Sound Physics!

2.4.3 Feasibility study of our new approaches


We have conducted simple experiments to study the feasibility and properties of our approach. The detailed set up of our experiments, the challenges we encountered, and the findings will be explained in the Session 2 of the next Chapter.

16

Chapter 3 Project Description


There are two sessions in this chapter. In section 1, we introduce our plan of doing the project and our individual contributions to the project. In section 2, we explain our proposed approach--the time-difference approach in details. We provide detailed descriptions about how we conducted experiments to study of the feasibility and properties of our new approach, and presented the results.

3.1 Progress Report and Task Distributions


We divided our project into 2 phases: literature reviews and new approach studies. Our work in the first semester mainly focused on doing literature reviews and conducting simple experiments. By reading the paper of previous researchers and books about speech recognition techniques, we got basic understandings of keyboard acoustic emanations attack. In the second semester we went into the second phase that we conducted experiments to further study the feasibility and properties of our approach and try to implement it. The following table shows the schedule and effort of each stage: Stage Previous Researches Review Speech Recognition techniques Study Feasibility study of time-difference approach Experiments 2nd 15 1st 2
nd

Semester 1st 1st 5 8 8

Effort (time unit)

17

The following time chart shows the progress of our project: Literature Reviews & Speech Recognition Techniques study Keystrokes Analysis Thesis I Experiments Thesis II Sept Oct Nov 2005 The following table shows our individual contributions to the project: My contributions Literature review 1. Previous Researches Review 2. Speech Recognition techniques Study Feasibility study of time-difference approach 1. Recording experiments 2. Program Feature Extraction 3. Program Signal Classification 4. Results Analysis Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes My Partners contributions Dec Jan Feb 2006 Mar Apr

18

3.2 The time-difference approach


In this section, we will go into details of our proposed approach--the time-difference approach.

Microphone-X

Microphone-Y
Figure 3.1 Our proposed appraoch--the time-difference approach.

3.2.1 Hypothesis
Recalled from Chapter 2: Let Dix be the distance between key i and microphone X; Diy be the distance between key i and microphone Y; x(t) be the sound wave received by microphone X; y(t) be the sound wave received by microphone Y; V be the velocity of sound wave in air. x( t ) = y( t ti ) where ti = (Dix - Diy ) / V By finding out the value of ti, we can know which key has been pressed.

3.2.2 The expected value of received time-difference -- ti


The speed of the sound in air varies with the temperature of air. It is approximately 344 meters/second at room temperature of 20oC (70oF). The least value of (Dix - Diy ) = 2* the separation of two adjacent keys = 2 * 2 cm = 0.04m

19

Thus the minimum value of ti = The least value of (Dix - Diy ) / speed of the sound in air = 0.04 / 344 = 116.3 picoseconds = 11.2 samples for sampling rate = 96,000 s^-1

3.2.3 Experiments of studying our new approach


In this section, we describe the experiments we have conducted to study our new approach. 3.2.3.1 The experimental set up

Microphone X

Microphone Y

Figure 3.2 Our experimental set up Equipments Microphones. condenser microphone; multimedia condenser type microphone; -42 dB sensitivity

(0db + 1Pa, 1kHz); 20-20kHz Frequency Response; 2000 ohm impedance; 120cm cord length; 3.5 mm stereo jack; Price: HK$20 each.
Keyboard: Dell PS2 keyboard P/N 7N242. Software used GoldWave: GoldWave is a top rated, professional digital audio editor. We used it to record the keystrokes and make some pre-processing operations of the recorded keystrokes to make them better for computation. Matlab: Matlab is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed

20

in familiar mathematical notation. We have developed a simple program by Matlab to compute the receive-time-difference and calculate the recognition rates of our approach. 3.2.3.2 The general procedures of our experiments 1. Using two microphones to record the keystrokes. 2. Pre-processing the recorded sounds by GoldWave 3. Extract the keystrokes from the two recorded sound 4. Compute the received-time-differences between two microphones by matlab. 5. Output the results 1. Using two microphones to record the keystrokes Synchronization of microphones X and Y For simple set up, we use two separate microphones to conduct our experiments without synchronization. It is because it would not affect the results if we do not synchronize the two microphones. Proof: When conducting our experiments, we did not synchronize the microphones X & Y. e.g. The Microphone-X may start recording at time = 0 s while the Microphone-Y may start recording at time = 0.5 s. Let us define ts be the difference of the starting time of recording of the two microphones. Thus for a particular keystroke, the computed received-time-difference will be equal to the sum of ti and ts The computed received-time-difference = ts + ti As the value of ts does not the change for all the keystrokes in the recorded sound wave, the computed received-time-difference is still different for different keys due to the presence of ti. We could still recognize the key for the computed received-time-difference.

21

2. Pre-processing the recorded sounds This process involves some operations on the two recorded sounds to make them better for the later computation. The two main operations of this stage are chopping the two recorded sounds and filtering the noise away by the default noise filter in the GoldWave software. e.g. When the keystrokes were recorded, the background noise was also recorded. Thus it is needed to reduce it by using the function of the noise reduction filtering in GoldWave. The result after processing is apparently showing a very clear signal. (Figure 3.3)

Figure 3.3 A keystroke signal before noise reduction (left) and after noise reduction (right) 3. Extracting the keystrokes The recorded sound contains many keystrokes. Thus we need to extract them out one by one for comparison. A typical keystroke contains pushing peaks and releasing peaks corresponding to pushing the key and releasing the key. Extracting the keystrokes means finding the starting point and end point of each keystroke which contains both the pushing peaks and releasing peaks. First of all, we read the two sound waves into Matlab. Then we check the values sequentially from the initial point. If the absolute amplitude of the sound wave is greater than a per-set threshold, then it is regarded as the beginning of a keystroke. According to our observation, a key pressed by normal strength can usually be

22

distinguished with a threshold of 0.1. The first point which fulfils the above requirement is regarded as Point A. Then, the starting point is set by Point A minus 2000. (Figure 3.4) By doing so, we can ensure that the complete beginning of a keystroke is included. After the starting point is found, we check for the end point. Another threshold is set to check the end of a keystroke. It is found that, after filtering the noise away, waves with absolute amplitudes less than 0.02 are generally not considered in a keystroke signal. Hence, the second threshold is set to be 0.02. When there are 2000 successive points with amplitude lower than 0.02, we set the last point as Point B. The end point is recorded as adding Point B by 2000 to obtain the complete keystroke. (Figure 3.4)

Pushing Peak

Releasing Peak

Figure 3.4 The starting point and end point of a typical keystroke
To compute for the received-time-difference, we have to ensure that the starting point and end point should include both the keystrokes of the two recorded sound wave, like what is shown in Fig 3.5. With the two pairs of start and end points corresponding of the recorded sound waves, we compare them to get a common pair. To get a common start point, the start points are compared and the earlier one is chosen. Similarly, the end points are compared and the later one is chosen as the common end point.

23

Figure 3.5 The selection of the common start and end point
4, Calculating the received-time-difference There are noises in the experiment environment. Though keystrokes recorded by the two microphones are from the same source, due to the interferences of noises, the two received waveforms differ from each other greatly, like what is shown in Figure 3.6. The presence of noises makes it very difficult to compute the value of received-time-difference.

Figure 3.6 The received signal of the two microphones from the same keystrokes (same source)

24

Starting from the first Semester, we have tried different approaches to compute the received-time-difference of keystroke. Generally we have concluded two suitable ways to get it, which are by maximum peak position and by correlation respectively. Using the maximum peak as the reference point From some simple experiments we conducted before, we found out there are some very sharp peaks in the keystrokes. Assuming that the noise is not large that it does not affect the positions of the maximum peaks of the keystrokes much, we can find out the received-time-difference by comparing the positions of the maximum peaks of the two received keystrokes respectively by the two microphones.

Figure 3.7 explanation of using the maximum peaks as a reference point. e.g. In Figure 3.7, the received-time-difference= T2-T1; This approach is based on the assumption of that the noise does not affect the positions of the sharp peaks much. However, the noise does affect the position of maximum peak, at a random basis. Thus variation of the calculated received-time-differences will be expected. By correlation of the received keystrokes In order to reduce the effect of the noise, we further invented another way to calculate the received-time-differences. It is by calculating the correlation of the two received keystrokes.

25

Explanation: Let us define x(i) as the received waveform by microphone X and y(i) as the received waveform by microphone Y from the same keystroke. i.e. x(i) = y(i- ti) where ti is the received-time-difference. In order to find out the vaule of ti, we compute the correlation of x(i) and y(i) by the correlation function:

r(d) will have a maximum value only when the overlapping area of x(i) and y(i-d) is maximum, like what is shown in Figure 3.8. However, the overlapping area of x(i) and y(i-d) will be maximum only when d= ti. Thus by plotting the graph of r(d) against d in Matlab, we can find out the value of ti.

Figure 3.8 An example of r(d) against d

26

Discussion: The correlation approach could reduce the noise effect, if the noise on the two microphones generally has the same behavior. Thus we expect the correlation approach will give more accurate results of calculating the received-time-difference; and we will find out if it is true, by the experiments. 5. Output the results The main operations in this stage are to output the computed received-time-differences, analyzing them and calculate the recognition rate. How to recognize: There are lots of advance algorithms and complex statistical models of doing classification and recognition. However, we use the simplest waythe minimum distance approach. We believe the time-difference approach is powerful that we could achieve a high recognition rate even if we use the simplest recognition means. Limitation of time is also a reason of adopting this simple means. The minimum distance approach: Firstly, we calculate the mean received-time-difference of each key from a set of training data. Afterwards we form a set of feature pairs--{key, mean received-time-difference}. By computing the distance between the received-time-difference of the test keystroke and the mean received time difference of each key, we set the testing keystroke to the key which has the least difference between the two values.

27

Mean receivedtime-difference of key 1 Mean receivedTime-difference of key 2

Distance 1

Distance 2

Received-timedifference of the test keystroke

Minus

If Distance N is the smallest, the test keystroke = key N.

Mean receivedTime-difference of key N

Figure 3.9 the minimum distance approach 3.2.3.3 Expected Experimental Error Large Scale Error Noise: Noise is always a main source of errors for experiments. It was impossible to find a place with no noise. In our case, we conducted the experiments in the IE laboratory. The main sources of noises will be from the air-conditioning, computers, and man-made noise of IE students. Small Scale Error 1. Hitting variance: As stated above, the hypothesis of our new approach is based on the different position of keys respective to the two microphones. However, it is very difficult to hit the keys at the same position every time. Variation occurs when we hit the keys at different position every time. 2. Shift of the microphones and the keyboard: The keyboard and the two microphones are put on the same table. When we hit the keys, we hit the table at the same time. The position of the keyboard and the two microphones may shift duo to the vibration of the table when being hit. Error occurs.

Distance N

28

3.2.3.4 Brief of different types of experiments We divided our experimental research into several stages. There are different objectives and different findings at different stages, backed by different types of experiments. Two main types of experiments will be discussed below: 3.2.3.4.1 Experiment type 1: Objective: To see if there is a distinguishable difference between the received-time-differences of 2 keys. At first, we recorded the keystrokes of key 1 and key 9 on the number-pad (shown as in the Figure 3.10), each thirty times, to see if their receive-time-differences are clustered into separate regions.
Key 1 Key 9

Figure 3.10 The positions of Keys--1 & 9 A typical result is presented by the following line chart:

Figure 3.11 The line charts of a typical result of Experiment type 1

29

The results computed by correlation means: Key 1 Average Median Range Variation (15549.74) (15558.00) (15451.00)- (15611.00) 1852.93 (15559.22) (15563.00) (15519.00) (15599.00) 548.72 Key 9

The results computed by the maximum peak positions: Key 1 Average Median Range Variation Discussion: As we could see from the two line charts, the computed received-time-differences of the two keys did not clustered into separate regions. Instead, they were overlapping with each other and that made it impossible to distinguish them from their received-time-differences. Another finding was that the received-time-differences of the keystrokes of the same key differed from each other much. As we explained above, the variations were due to the interferences of the random background noises, also the hitting variance, and the shift of the microphones and the keyboard. In addition, from the data, the computed received-time-differences by maximum peak position were more varied than by correlation, which matched our expectation before the experiments. The results of the experiments were out of our expectation that it showed that it was impossible to recognize the keys from the received-time-differences. However, we believed there should be unique and separate clusters of the received-time-differences between the keys. There were two possible explanations for the undistinguishable clusters: 1. It was due to the amplitude of noise effect covers the differences of the received-time-differences of the two keys. Mathematical explanation: ti = (Dix - Diy ) / V, ti tj= (Dix Diy ) / V - (Djx Djy ) / V = [(Dix Diy ) - (Djx Djy )] / V = 1/2Dij / V where Dij = 1/2[(Dix Diy ) - (Djx Djy )]
30

Key 9 (15577.00) (15587.00) (15401.00) - (15620.00) 3874.73

(15550.87) (15573.00) (15295.00) - (15613.00) 5680.57

Dij is the separation of the key i and key j. If the separation is too small, it could not be detected and recognized due to the large noise interference. 2. The microphones and the sound card had sensitivity limits to the sound wave. The difference may be too small that it could not be detected by the sound and the microphones. As stated in the reason 1, the effect of noise plus the small separation of key 1 & key 9 caused the false results. Thus we believed there should be distinguishable clusters of the receive-time-differences if we chose two keys with larger separation. In order to prove it, we conducted the same experiment again; however, instead of using key 1 & key 9, we used key z and key 3, which has a much larger separation than that of key 1 & key 9 as shown by Figure 3.12.
Key z Key 3

Figure 3.12 The positions of Keys--z & 3

31

The results are as following:

Figure 3.13 The line charts of a typical result of the second experiments of Experiment type 1 The results computed by correlation means: Key z Average Median Range Variation 1758.93 1685.50 1400.00 2328.00 41003.31 1222.80 1240.00 877.00 - 1377.00 11074.44 Key 3

The results computed by the maximum peak positions: Key z Average Median Range Variation Recognition rate computation: As stated before, we used the minimum-distance approach as the recognition means. We chose the first 10 keystrokes as the training data while the latter 20 keystrokes of each key as the test keystrokes. The recognition rate was very perfect. We achieve recognition rates by 100%, for both correlation and max peak position means! Discussion: The results proved our hypothesis! The received-time-differences of key z and key 3 clustered into two different and separate regions. They can be distinguished from each other perfectly. Though there were still
32

Key 3 1160.40 1178.50 1000.00 - 1247.00 3874.73

1734.97 1733.00 1501.00 - 2031.00 7253.21

large variations of the receive time difference, we could recognize two keys with an accuracy of 100%!! The data computed by the means of the maximum peak positions varied less than by correlation, which was out of our expectation. It may be due to the changing noise natures in the recording process, which gave larger effects on the means of by correlation than on the means of maximum peak position. However, both means could still achieve recognition rates by 100%! 3.2.3.4.2 Experiment type 2: From the experiment type 1, we proved that we could perfectly recognize 2 keys of large separation by the received-time-differences of two microphones. In the next step, we further investigated the performance of our approach. Instead of recognizing two keys, we tried to recognize three keys and five keys in the experiment type 2. We chose key z, key ? and key 3for the 3-key experiments while we chose key z, key b, key ?, key up-arrow and key 3 for the 5-key experiments. The distributions of the keys were as shown in the Figure 3.14.
Key z Key ? Key 3

Key b

Key up-arrow

Figure 3.14 The positions of Keys--z, b,? ,up-arrow & 3

33

A typical result of the three-key experiment: z, ? & 3 (For better presentation, we used the average received-time-difference of key ? as a reference point and we only presented the reference values.):

Figure 3.15 The line charts of a typical result of 3-key experiments The result computed by correlation means: Key z Average Median Range 358.17 284.74 (0.76) - 927.24 (*233.24-526.24 without counting the abnormal peaks) Variation 41003.31 3152.05 11074.44 Key ? 0.00 (6.76) (118.76) - 168.76 Key 3 (177.96) (160.76) (23.76)- (523.76)

The result computed by the maximum peak positions: Key z Average Median Range Variation Discussion: For the three-key experiments, the minimum separation between keys has already been reduced by half compared to that of experiment type 1. However, the received-time-differences of the three keys were still clustered into three separate regions. Though there are few extreme abnormal points due to the random noise, we could still achieve very recognition rates which were on average over 90%!
34

Key ? 0 31.53 (191.97) 229.03 10043.76

Key 3 (334.57) (326.47) (257.97) (504.97) 3874.73

230.00 228.03 (3.97)-526.03 7253.21

To further investigate the performance of the attack, we conducted the five-key experiment. A typical result was presented as followings (For better presentation, we used the average received-time-difference of key ? as a reference point and we only presented the reference values.):

Figure 3.16(a) The line charts of a typical result of five-key experiments

35

Figure 3.16(b) The line charts of a typical result of five-key experiments The result computed by correlation means: Key z Average Median Range 358.17 284.74 (0.76) - 927.24 (*233.24-526.24 without counting the abnormal points) Variation 41003.31 5170.66 3152.05 5176.66 11074.44 Key b 229.47 226.24 173.24 302.24 Key ? 0.00 (6.76) (118.76) 168.76 Key up-arrow (144.66) (142.76) (265.76) (12.76) (177.96) (160.76) (23.76) (523.76) Key 3

36

The result computed by max peak position: Key z Average Median Range 230.00 228.03 (3.97) - 526.03 Key b 124.17 102.03 (11.97) - 960.03 Key ? 0 31.53 (191.97) 229.03 Key up_arrow (181.80) (147.47) (1109.97) (10.97) [*(288.97) (72.97) without counting the abnormal point] Variation Discussion: In the five-key experiments, the minimum key separation is further reduced to four keys. However, due to the decrease of keys separations, there are more overlapping between the clusters of the received-time-differences of each key. However, they are still separable from each other. Due to the decease of minimum key separations, the recognition rates reduced. The average recognition rate was around 60% - 70%. 7253.21 29392.95 10043.76 36030 3874.73 (334.57) (326.47) (257.97) (504.97) Key 3

3.2.3.5 Brief Summary of our findings:


We successfully proved the feasibility of our approaches. More amazingly was that we could achieve very high recognition rates by the simplest algorithm, the cheapest microphones and in a very noisy environment. Besides, there were still some other findings stated as below: 1. The main source of error was due to the interference of noise. The effect of noise depends on the particular time of the particular environments. Due to the noise, it may be quite difficult to recognize adjacent keys, but the effect of noise was greatly reduced for keys which have large separations. 2. The mean received-time-differences between 2 adjacent keys were between 26.0 samples (130/5) to 34.0 samples (170/5). 3. For computing the received-time-differences, the maximum-peak-positions approach generally gave less varied results and a little bit higher recognition rates.
37

Please bear in mind what we have done is very initial. In the experiments we used the very simple techniques and approaches as we only aimed at studying the feasibility and properties of the received-time-difference approach, not at achieving high recognition rates, though we could still achieve very high recognition rate by these simple techniques. Moreover, based on the findings from our simple experiments, we designed a more practical and complicated classification and recognizing means for our approach which we believed it could achieve very high recognition rates of recovering the text typed from the recorded keystrokes. In addition, for the new algorithm, we does not need training data, which was needed in the experiments we conducted before. 3.2.3.6 The more advanced classification and recognition means By gaining the idea from the previous research done by Li Zhuang, Feng Zhou, J. D. Tygar, we designed a more advanced classification and recognition means for our approach. Detailed procedures are described below: 1. Collect data: Taking a recording of a user typing English text by two microphones. 2. Feature extraction: Extracting the keystrokes and calculating their respective received-time-differences. 3. Clustering: At this stage, the computed received-time-differences were clustered into K classes, where K classes refer the clustered class of each key. Unlike the two approaches used in the two previous researches, we could easily cluster the received-time-differences into their corresponding K classes directly from their values. For example, the lowest and the highest received-time-differences will be belonged to the two keys which are closest to the two microphones respectively. After the initial clustering of the computed received-time-differences to the classes directly from their values, the accuracy of clustering could be improved by the likely sequence of English text by the means of Hidden Markov Models (HMM)[2][17][21][22][24]. For example, if the current key could be either h or j (e.g. because they are physically close on the keyboard) and it could be known the previous key was t, then the current key was more likely to be h because th was more common than tj. 4. Spelling and grammar checking: This is the final stage. The dictionary-based spelling correction and a simple statistical model of English grammar could be used to further correct the output text.

38

3.2.4 Suggestion of defending methods


One of the major advantages of our approach over the two previous approaches is that it is very difficult to defend it. Previous researchers suggested that making keyboards/keypads which keys sound similarly to defend the attack does not work on our approach, as our approach is based on the different position of keys respective to the microphones, not on the distinct characteristic features of the keystrokes. One suitable defense method is to increase the noise level in the typing environment, as our experiments have shown that the interference of the noise would greatly reduce the recognition rate. Another defending method is using the newly invented Virtual Keyboard (Figure 3.17) as the typing device, as it does not emit sound when typing.

Figure 3.17 A virtual keyboard

39

Chapter 4 Future Plan


We have made progress on the research of the time-difference approach. However, there is still a very big room to be investigated and for improvements for our approach. For example, instead of using two microphones, we can use three microphones which would definitely give a better clustering and recognition results. We have tried our best to do as much as we can in the limited time that we have successfully initiated the research on the topic, proved its feasibility by precise experimental data and suggested practical ways to implement it. We have started the initial step, and we fully hope there will be some groups of students in the next year who could continue our research that they could go into the implementation milestone, according to the implementation algorithm we suggested in the previous chapter, or by a new algorithm suggested by them. It is definitely worthwhile and it will give great contribution to the society on the security issue.

Other Ways of improvements:


1. Use better noise reduction technique As stated in the previous chapter, noise is the main source of error for our new approaches. The recognition rate would be greatly improved if advanced noise reduction techniques are used. 2. Use better equipments The sensitivity of the microphones and sound cards will also affect the recognition rate. Using better equipments, for example, using the parabolic microphones instead of our cheap microphones would definitely give better results. 3. Combined with the approach of Dmitri and Rakesh The time-difference approach could combine with the approach of Dmitri and Rakesh as they represent two completely different characteristic natures of keys on keyboard. It would definitely increase the recognition rate due to the combined information.

40

Chapter 5 Conclusion

We have shown how easily the attack could be conducted by inventing a completely new but simple approach. As the advance of the computation speed of computers continues, conducting the attack will become easier and simpler. In addition, just like the words of Bruce Schneider, chief technology officer at Counterpane Internet Security and the author of Applied Cryptography, It was only a matter of time before criminals began using it; and it is definitely not surprising that there have been people making use of it to commit crimes. It is crucial for people to pay attention to it and prepare to defend the attack. We hope our research would raise the awareness of people about this kind of attacks. We would also like people to bear in mind that their typing information is in great danger now. It is definitely needed for them to understand the attacking techniques and protect their typing information from being attacked by such techniques!

41

Reference
[1] D. Asonov, and R. Agrawal, Keyboard Acoustic Emanations, In Proceedings of the IEEE Symposium on Security and Privacy, 2004. [2] L. Zhuang, F. Zhou and J. D. Tygar, Keyboard Acoustic Emanations Revisited, In Proceedings of the 12th ACM Conference on Computer and Communications Security, 2005. [3] Hagai Bar-EI, Introduction to side channel attack, Discretix Technologies Ltd. [4] M. A. Casey. Introduction to MPEG-7: Multimedia Content Description Language, chapter Sound Classification and Similarity Tools. J. Wiley, 2001. [5] [6] J. Harrington and S. Cassidy, Techniques in Speech Acoustics, The Netherlands: Kluwer Academic Publishers, 1999. H. Hermansky, Analysis in Automatic Recognition of Speech, in Speech Processing, Recognition and Artificial neural Networks, G. Chollet, M. G. Di Benedetto, A. Esposito and M. Marinaro Eds., London: Springer-Verlag, 1999, pp. 115-137. [7] R. de Mori, Statistical Methods For Automatic Speech Recognition , in Speech Processing, Recognition and Artificial neural Networks, G. Chollet, M. G. Di Benedetto, A. Esposito and M. Marinaro Eds., London: Springer-Verlag, 1999, pp. 165-189. [8] [9] J. P. Haton, Neural networks for automatic speech recognition: a review, in Speech Processing, Recognition and Artificial neural Networks, G. Cho Tukey, J. W., B. P. Bogert and M. J. R. Healy: "The frequency analysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe-cracking". Proceedings of the Symposium on Time Series Analysis (M. Rosenblatt, Ed) Chapter 15, 209-243. New York: Wiley. [10] Int. J. Appl. Math. Computer Science, 2005 Vol. 15, No 2, On Nave Bayes in Speech Recognition [11] Dat Tran, Minh Do, Michael Wagner and T. Van Le: A Proposed Decision Rule for Speaker Identification Based on a Posteriori Probability [12] Chou Juang, Pattern Recognition in Speech and Language Processing, CRC Press [13] F.J. Owens, Signal Processing of Speech, Macmillan New Electronics Introduction to Advanced Topics [14] Ravi P. Ramachandran, Richard J. Mammone, CAIP Center, Rutgers University, Modern Methods of Speech processing, Kluwer Academic Publishers [15] Proakis . Rader, Ling . Nikias, Moonen . Proudler, Algorithms for statistical Signal Processing, Prentice Hall [16] G.Chollet, M.Di Benedetto, A.Esposito and M.Marinaro, Speech Processing, Recognition and Artificial Neural Networks, Proceedings of the 3rd International School on Neural Nets Eduardo R.Caianiello, Springer [17] Pedro Larranage, Jose A.Lozano, Estimation of Distribution algorithms, A New Tool for Evolutionary Computation, Kluwer Academic Publishers [18] A.M.Kondoz, Digital Speech, second edition, Wiley

42

[19] Richard Price, Jonathan Willmore & William Roberts, Genetically Optimised Feedforward Neural Networks for Speaker Identification, Information Technology Division, Electronics and Surveillance Research Laboratory [20] Glenn White, Cepstrum Analysis, DLI Engineering, 1998 [21] Lawrence R.Rabiner, A Tutorial on Hidden Markov Models and Selected Application in Speech Recognition, Fellow, IEEE [22] S. Balakrishnama, A. Ganapathiraju, LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL, Institute for Signal and Information Processing, Department of Electrical and Computer Engineering, Mississippi State University [23] Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis, SPEECH RECOGNITION BASED ON FEATURE EXTRACTION WITH VARIABLE RATE FREQUENCY SAMPLING, Wire Communications Lab., Electrical & Computer Engineering Dept., University of Patras [24] Michel Vidal-Naquet Shimon Ullman, Object Recognition with Informative Features and Linear Classification, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science

43

You might also like