High Density Data Storage in Dna Using An Efficient Message Encoding Scheme

This document summarizes a research paper that proposes a new encoding scheme for storing small text files in DNA to achieve very high data density storage. The scheme first applies data transformation algorithms like Burrow-Wheeler transform to improve data compression. It then uses Huffman encoding to compress the text files before mapping the binary output to nucleotide sequences based on a mapping table. Testing showed this approach reduced the number of nucleotides needed compared to existing methods, achieving the goal of high-density DNA data storage.

Uploaded by

ijitcs

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

High Density Data Storage in Dna Using An Efficient Message Encoding Scheme

Uploaded by

ijitcs

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Information Technology Convergence and Services (IJITCS) Vol.2, No.

2, April 2012

HIGH DENSITY DATA STORAGE IN DNA USING AN

EFFICIENT MESSAGE ENCODING SCHEME
Rahul Vishwakarma1 and Newsha Amiri2
1

Tata Consultancy Services, India

[email protected]
2

Bangalore University, India

ABSTRACT
This paper suggests a message encoding scheme for small text files in nucleotide strands for ultra high data density storage in DNA. The proposed scheme leads to high volume data density and depends on adoption of sequence transformation algorithms. Compression of small text files must fulfill special requirement since they have small context. The use of transformation algorithm generates better context information for compression with Huffman encoding. We tested the suggested scheme on collection of small text size files. The testing result showed the proposed scheme reduced the number of nucleotides for representing text message over existing method and realization of high data density storage in DNA.

KEYWORDS
Encoding, Nucleotides, Compression, Text

1. INTRODUCTION
This DNA consists of double stranded polymers of four different nucleotides: adenine (A), cytosine (C), guanine (G) and thymine (T). The primary role of DNA is long-term storage of genetic information. This feature of DNA is analogous to a digital data sequence where two binary bits 0 and 1 are used to store the digital data. This analogous nature of DNA nucleotide with Binary Bits can be exploited to use artificial nucleotide data memory [1] [2]. For example, small text message can be encoded into synthetic nucleotide sequence and can be inserted into genome of living organisms for long term data storage. Further, to enhance the data density for encoded message, original text message can be compressed prior to encoding. Currently, there exist many losses-less compression algorithms for large text les. All of them need sufficient context information for compression, but context information in small le (50 kB to 100 kB) is difficult to obtain. In small les, context information is sufficient context information only when we process them by characters. Character based compression is most suitable for small les up to 100 kB. Thus we need a good compression algorithm [3], which requires only small context or we need an algorithm that transforms data into another form. An alternative approach is to use Burrow Wheeler transform followed by Move to Front transform. The Huffman encoding is used to convert the original file into compressed one. The paper suggests a compression scheme for small text message with an introduction of mapping table to encode the data into nucleotide sequence to increase the data density. The organization of paper will be as follows: Section 2 presents a method for data preparation using transforms and compression scheme. Section 3 describes the mapping function for encoding the message into nucleotide sequence. Section 4 describes the method for message encoding and retrieval. Section 5 shows the performance result, and Section 6 contains the conclusion of this paper.

DOI : 10.5121/ijitcs.2012.2204