Steganogrphy: Seminar Report
Steganogrphy: Seminar Report
STEGANOGRPHY
Seminar Report
Submitted by ALPHONSA THOMAS
Under the guidance of
Page 1
STEGANOGRAPHY
ABSTRACT
We propose a new method for strengthening the security of information through a combination of signal processing, cryptography and steganography. Cryptography provides the security by concealing the contents and steganography provides security by concealing existence of information being communicated. Signal processing adds additional security by compressing and transforming the information. The proposed method, viz. Steganography Based Information Protection Method (SBIPM), consists of scanning, coding, encryption, reshaping, cover processing and embedding steps. We then turn to data-hiding in images. Steganography in images has truly come of age with the invention of fast, powerful computers. Software is readily available off the Internet for any user to hide data inside images. These softwares are designed to fight illegal distribution of image documents by stamping some recognizable feature into the image. The most popular technique is Least Significant Bit insertion, which we will look at. Also, we look at more complex methods such as masking and filtering, and algorithms and transformations, which offer the most robustness to attack, such as the Patchwork method which exploits the human eye's weakness to luminance variation. We will take a brief look at Steganalysis, the science of detecting hidden messages and destroying them. We conclude by finding that steganography offers great potential for securing of data copyright, and detection of infringers. Soon, through Steganography, personal messages, files, all artistic creations, pictures, and songs can be protected from piracy.
Page 2
STEGANOGRAPHY
Chapter 1 Introduction
1.1 What is Steganography?
Steganography comes from the Greek and literally means, "Covered or secret writing. Although related to cryptography, they are not the same. Steganographys intent is to hide the existence of the message, while cryptography scrambles a message so that it cannot be understood. Steganography is one of various data hiding techniques, which aims at transmitting a message on a channel where some other kind of information is already being transmitted. This distinguishes Steganography from covert channel techniques, which instead of trying to transmit data between two entities that were unconnected before. The goal of Steganography is to hide messages inside other harmless messages in a way that does not allow any enemy to even detect that there is a second secret message present. The only missing information for the enemy is the short easily exchangeable random number sequence, the secret key, without the secret key, the enemy should not have the slightest chance of even becoming suspicious that on an observed communication channel, hidden communication might take place.
Page 3
STEGANOGRAPHY The information to be hidden in the cover data is known as the embedded'' data. The``stego'' data is the data containing both the cover signal and the ``embedded'' information. Logically, the processing of putting the hidden or embedded data, into the cover data, is sometimes known as embedding. Occasionally, especially when referring to image Steganography, the cover image is known as the Container
The following formula provides a very generic description of the pieces of the steganographic process: cover medium + hidden data + stego_key = stego_medium
STEGANOGRAPHY when each one of these substances are heated they darken and become visible to the human eye. In Ancient Greece they used to select messengers and shave their head, they would then write a message on their head. Once the message had been written the hair was allowed to grow back. After the hair grew back the messenger was sent to deliver the message, the recipient would shave have off the messengers hair to see the secret message. Another method used in Greece was where someone would peel wax off a tablet that was covered in wax, write a message underneath the wax then re-apply the wax. The recipient of the message would simply remove the wax from the tablet to view the message.
STEGANOGRAPHY Embedded data needs to be invisible; it is possible for the data to be hidden while it Remains in plain sight.) The embedded data should be directly encoded into the media, rather than into a header or wrapper, to maintain data consistency across formats. The embedded data should be as immune as possible to modifications from intelligent attacks or anticipated manipulations such as filtering and resampling. Some distortion or degradation of the embedded data can be expected when the cover Data is modified. To minimize this, error correcting codes should be used. The embedded data should be self-clocking or arbitrarily re-entrant. This ensures that the embedded data can still be extracted when only portions of the cover data are available. For example, if only a part of image is available, the embedded data should still be recoverable.
Page 6
STEGANOGRAPHY
The illegal distribution of documents through modern electronic means, such as electronic mail, means such as this allow infringers to make identical copies of documents without paying royalties or revenues to the original author. To counteract this possible wide scale piracy, a method of marking printable documents with a unique codeword that is Indiscernible to readers, but can be used to identify the intended recipient of a document just by Examination of a recovered document The techniques they propose are intended to be used in conjunction with standard security measures. For example, documents should still be encrypted prior to transmission across a network. Primarily, their techniques are intended for use after a document has been decrypted, once it is readable to all. An added advantage of their system is that it is not prone to distortion by methods such as photocopying, and can thus be used to trace paper copies back to their source. An additional application of text steganography suggested by Bender, et al. is annotation that is, checking that a document has not been tampered with. Hidden data in text could even by used by mail servers to check whether documents should be posted or not. The marking techniques described are to be applied to either an image representation of a document or to a document format file, such as PostScript or Textiles. The idea is that a codeword (such as a binary number, for example) is embedded in the document by altering particular textual features. By applying each bit of the codeword to a particular document Feature, we can encode the codeword. It is the type of feature that identifies a particular encoding method. Three features are described in the following subsections: DEPT OF CSE, MLMCE Page 7
STEGANOGRAPHY
STEGANOGRAPHY However, word-shifting can also be detected and defeated, in either of two ways. If one knows the algorithm used by the formatter for text justification, actual spaces between words could then be measured and compared to the formatter's expected spacing. The differences in spacing would reveal encoded data. A second method is to take two or more distinctly encoded, uncorrupted documents and perform page by page pixel-wise difference operations on the page images. One could then quickly pick up word shifts and the size of the word displacement. By respacing the shifted words back to the original spacing produced under the formatter, or merely applying random horizontal shifts to all words in the document not found at column edges, an attacker could eliminate the encoding. However, it is felt that these methods would be time-consuming and painstaking.
STEGANOGRAPHY conjunction with feature coding, for example. Efforts such as this can place enough impediments in the attacker's way to make his job difficult and time Consuming.
The syntactic and semantic methods are particularly interesting. In syntactic methods, Multiple methods of punctuation are harnessed to encode data. For example, the two phrases below are both considered correct, although the first line has an extra comma: bread, butter, and milk bread, butter and milk Alternation between these two forms of listing can be used to represent binary data. Other methods of syntactic encoding include the controlled use of contractions and abbreviations. Although such syntactic encoding is very possible in the English language, the amount of data that could be encoded would be very low, somewhere in the order of a several bits per kilobyte of text. The final category of data hiding suggested by Bender, et al. is semantic methods. By assigning values to synonyms, data could be encoded into the actual words of the text. For example, the word big might be given a value of one, the word large a value of zero. Then, when the word big is encountered in the coded text, a value of one can be decoded. Further synonyms can mean greater bit encoding. However, these methods can sometimes interfere with the nuances of meaning.
Page 10
STEGANOGRAPHY
STEGANOGRAPHY many steganography experts recommend using images featuring 256 shades of grey as the palette, for reasons that will become apparent. Grey-scale images are preferred because the shades change very gradually between palette entries. This increases the image's ability to hide information. When dealing with 8-bit images, the steganographer will need to consider the image as well as the palette. Obviously, an image with large areas of solid color is a poor choice, as variances created by embedded data might be noticeable. Once a suitable cover image has been selected, an image encoding technique needs to be chosen.
STEGANOGRAPHY More complex encoding can be done to embed the message only in ``noisy'' areas of the image, that will attract less attention. The message may also be scattered randomly throughout the cover image. The most common approaches to information hiding in images are: Least significant bit (LSB) insertion Masking and filtering techniques Algorithms and transformations
Each of these can be applied to various images, with varying degrees of success. Each of them suffers to varying degrees from operations performed on images, such as cropping, or resolution decrementing, or decreases in the color depth.
Page 13
STEGANOGRAPHY The binary value for the letter A is (101101101). Inserting the binary value of A into the three pixels, starting from the top left byte, would result in: (00100111 11101000 11001001) (00100111 11001000 11101001) (11001001 00100110 11101001) The emphasized bits are the only bits that actually changed. The main advantage of LSB Insertion is that data can be hidden in the least and second to least bits and still the human eye would be unable to notice it. When using LSB techniques on 8-bit images, more care needs to be taken, as 8-bit formats are not as forgiving to data changes as 24-bit formats are. Care needs to be taken in the selection of the cover image, so that changes to the data will not be visible in the stego-image. Commonly known images, (such as famous paintings, like the Mona Lisa) should be avoided. In fact, a simple picture of your dog would be quite sufficient. When modifying the LSB bits in 8-bit images, the pointers to entries in the palette are changed. It is important to remember that a change of even one bit could mean the difference between a shade of red and a shade of blue. Such a change would be immediately noticeable on the displayed image, and is thus unacceptable. For this reason, data-hiding experts recommend using grey-scale palettes, where the differences between shades are not as pronounced. Alternatively, images consisting mostly of one color, such as the so-called Renoir palette, named because it comes from a 256 color version of Renoir's ``Le Moulin de la Galette''.
Page 14
STEGANOGRAPHY Technically, watermarking is not a steganographic form. Strictly, steganography conceals data in the image; watermarking extends the image information and becomes an attribute of the cover image, providing license, ownership or copyright details. Masking techniques are more suitable for use in lossy JPEG images than LSB insertion because of their relative immunity to image operations such as compression and cropping.
Page 15
STEGANOGRAPHY Advantage of the human weaknesses to luminance variation. Using redundant pattern encoding to repeatedly scatter hidden information throughout the cover image, like a patchwork, Patchwork can hide a reasonably small message many times in a image. In the Patchwork method, n pairs of image points (a, b) are randomly chosen. The brightness of a is decreased by one and the brightness of b is increased by one. For a labeled image, the expected value of the sum differences of the n pairs of points is then 2n. Bender shows that after JPEG compression, with the quality factor set to 75, the message can still be decoded with an 85. This algorithm is more robust to image processing such as cropping and rotating, but at the cost of message size. Techniques such as Patchwork are ideal for watermarking of images. Even if the image is cropped, there is a good probability that the watermark will still be readable. Other methods also attempt to mark labels into the images by altering the brightness of pixel blocks of the image by a selected value k. This value k is dependent on a lower quality JPEG compressed version of the labeled block. This method is fairly resistant to JPEG compression, depending on the size of the pixel blocks used, and offers low visibility of the label. Unfortunately, it is not very suitable to real-time applications. Other techniques encrypt and scatter the hidden throughout the image in some predetermined manner. It is assumed that even if the message bits are extracted, they will be useless without the algorithm and stego-key to decode them. Although such techniques do help protect value of the against hidden message extraction, they are not immune to destruction of the hidden message through image manipulation.
Page 16
STEGANOGRAPHY
STEGANOGRAPHY kHz).Sampling rate puts an upper bound on the usable portion of the frequency range. Generally, usable data space increases at least linearly with increased sampling rate. Another digital representation that should be considered is the ISO MPEG-Audio format, a perceptual encoding standard. This format drastically changes the statistics of the signal by encoding only the parts the listener perceives, thus maintaining the sound, but changing the signal.
Page 18
STEGANOGRAPHY
For the decoding process, the synchronization of the sequence is done before the decoding. The length of the segment, the DFT points, and the data interval must be DEPT OF CSE, MLMCE Page 19
STEGANOGRAPHY known at the receiver. The value of the underlying phase of the first segment is detected as 0 or 1, which represents the coded binary string.
Page 20
STEGANOGRAPHY As a binary one is represented by a certain delay y, and a binary zero is represented by a certain delay x, detection of the embedded signal then just involves the detection of spacing between the echoes. Echo hiding was found to work exceptionally well on sound files where there is no additional degradation, such as from line noise or lossy encoding, and where there is no gaps of silence. Work to eliminate these drawbacks is being done.
Page 21
STEGANOGRAPHY
Chapter 5 Steganalysis
Whereas the goal of steganography is the avoidance of suspicion to hidden messages in messages. Hiding information within electronic media requires alterations of the media properties that may introduce some form of degradation or unusual characteristics. These characteristics may act as signatures that broadcast the existence of the embedded message, thus defeating the purpose of steganography. Attacks and analysis on hidden information may take several forms: detecting, extracting, and disabling or destroying hidden information. An attacker may also embed counter information over the existing hidden information. Here two methods are looked into: detecting messages or their transmission and disabling embedded information. These approaches (attacks) vary depending upon the methods used to embed the information in to the cover media. Some amount of distortion and degradation may occur to carriers of hidden messages even though such distortions cannot be detected easily by the human perceptible system. This distortion may be anomalous to the "normal" carrier that when discovered may point to the existence of hidden information. Steganography tools vary in their approaches for hiding information. Without knowing which tool is used and which, if any, stegokey is used; detecting the hidden information may become quite complex. However, some of the steganographic approaches have characteristics that act as signatures for the method or tool used. other data, steganalysis aims to discover and render useless such covert
STEGANOGRAPHY casual observer. However, appended spaces and "invisible" characters can be easily revealed by opening the file with a common word processor. The text may look "normal" if typed out on the screen, but if the file is opened in a word processor, the spaces, tabs, and other characters distort the text's presentation. Images too may display distortions from hidden information. Selecting the proper combination of steganography tools and carriers is the key to successful information hiding. Some images may become grossly degraded with even small amounts of embedded information. This visible noise will give away the existence of hidden information. The same is true with audio. Echoes and shadow signals reduce the chance of audible noise, but they can be detected with little processing. Only after evaluating many original images and stego images as to color composition, luminance, and pixel relationships do anomalies point to characteristics that are not "normal" in other images. Patterns become visible when evaluating many images used for applying steganography. Such patterns are unusual sorting of color palettes, relationships between colors in color indexes, exaggerated "noise" An approach used to identify such patterns is to compare the original cover images with the stego-images and note visible differences (known-cover attack).Minute changes are readily noticeable when comparing the cover and stego-images. In making these comparisons with numerous images, patterns begin to emerge as possible signatures of steganography software. Some of these signatures may be exploited automatically to identify the existence of hidden messages and even the tools used in embedding the messages. With this knowledge-base, if the cover images are not available for comparison, the derived known signatures are enough to imply the existence of a message and identify the tool used to embed the message. However, in some cases recurring, predictable patterns are not readily apparent even if distortion between the cover and stego-images is noticeable. A number of disk analysis utilities are available that can report and filter on hidden information in unused clusters or partitions of storage devices. A steganographic file system may also be vulnerable to detection through analysis of the systems partition information.
Page 23
STEGANOGRAPHY Filters can also be applied to capture TCP/IP packets that contain hidden or invalid information in the packet headers. Internet firewalls are becoming more sophisticated and allow for much customization. Just as filters can be set to determine if packets originate from within the firewall's domain and the validity of the SYN and ACK bits, so to can the filters be configured to catch packets that have information in supposed unused or reserved space.
Page 24
STEGANOGRAPHY Hidden information may also be overwritten. If information is added to some media such that the added information cannot be detected, then there exists some amount of additional information that may be added or removed within the same threshold which will overwrite or remove the embedded covert information. Audio and video are vulnerable to the same methods of disabling as with images. Manipulation of the signals will alter embedded signals in the noise level (LSB) which maybe enough to overwrite or destroy the embedded message. Filters can be used in an attempt to cancel out echoes or subtle signals but becomes this may not be as successful as expected. Caution must be used in hiding information in unused space in files or file systems. File headers and reserved spaces are common places to look for out of place information. In file systems, unless the steganographic areas are in some way protected (as in a partition), the operating system may freely overwrite the hidden data since the clusters are thought to be free. This is a particular annoyance of operating systems that do a lot of caching and creating of temporary files. Utilities are also available which "clean" or wipe unused storage areas. In wiping, clusters are overwritten several times to ensure any data has been removed. Even in this extreme case, utilities exist that may recover portions of the overwritten information. As with unused or reserved space in file headers, TCP/IP packet headers can also be reviewed easily. Just as firewall filters are set to test the validity of the source and destination IP addresses, the SYN and ACK bits, so to can the filters be configured to catch packets that have information in supposed unused or reserved space. If IP addresses are altered or spoofed to pass covert information, a reverse lookup in a domain name service (DNS) can verify the address. If the IP address is false, the packet can be terminated. Using this technique to hide information is risky as TCP/IP headers may get overwritten in the routing process. Reserved bits can be overwritten and passed along without impacting the routing of the packet.
Page 25
STEGANOGRAPHY
6.1.4 GIFShuffle
The program gifshuffle is used to conceal messages in GIF images by shuffling the Color map, which leaves the image visibly unchanged. Gifshuffle works with all GIF images, including those with transparency and animation, and in addition provides compression and encryption of the concealed message. http://www.darkside.com.au/gifshuffle/ DEPT OF CSE, MLMCE Page 26
STEGANOGRAPHY
6.1.5 WbStego
WbStego is a tool that hides any type of file in bitmap images, text files, HTML files or Adobe PDF files. The file in which you hide the data is not optically changed. http://www.wbailer.com/wbstego
6.1.6 StegoVideo
MSU StegoVideo allows to hide any file in a video sequence. When the program was created, different popular codecs were analyzed and an algorithm was chosen which provides small data loss after video compression. You can use MSU StegoVideo as Virtual Dub filter or as standalone .exe program, independent from Virtual Dub. http://compression.ru/video/stego_video/index_en.html
Page 27
STEGANOGRAPHY performing statistical analysis, does not make it any easier to find the concealed information. http://sourceforge.net/project/showfiles.php?group_id=13903
Page 28
STEGANOGRAPHY
STEGANOGRAPHY and fingerprinting, for use in detection of unauthorized, illegally copied material, is continually being realized and developed. Also, in places where standard cryptography and encryption is outlawed, steganography can be used for covert data transmission. Steganography, formerly just an interest of the military, is now gaining popularity among the masses. Soon, any computer user will be able to put his own watermark on his artistic creations.
Page 30
STEGANOGRAPHY
Bibliography:
1. Wikipedia contributors.Steganography [Internet].Wikipedia, the free Encyclopedia; 2010 Feb 16. 2. URL: http://zone-h.org 3. N.provos & P.Honeyman, Hide & Seek: An Introduction to Steganography, IEEE Security and privacy. 4. D.Artz, Digital Steganography : Hiding Data within Data, IEEE Security and Privacy 5. The Science of secrecy; Steganography URL: www.channel4.com/plus/secrecy/page1b.html Dept. of E& C VKIT 30
Page 31