Surveyondatamanagementsystemfor Final
Surveyondatamanagementsystemfor Final
net/publication/312218717
CITATIONS READS
0 2,710
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Thulasi Accottillam on 12 January 2017.
C. Mapreduce
III. COMMON CONCEPTS Mapreduce is the programming paradigm suitable for
handling large amount of data in parallel.The framework takes
A. Sharding the input in key value pair, the map function performs the
While dealing with large volume of data single machine filtering and sorting and the reduce function performs the
cannot store or process the data with the limited RAM size and summary operation and produce the output. The main
input/output capacity of disk drives. So the concept of scaling contribution of the map reduce is the scalability and fault
was introduced. Sharding is the horizontal scaling, which tolerance achieved for a variety of domains by optimizing the
stores the data in multiple servers or shards. Each shard is an execution engine once [8].
independent database, with high availability and consistency
and collectively forms the single logical database [6].
Database
1TB
Figure 1:Sharding
B. Consistent hashing
Consistent hashing allows incremental scalability of clusters
without rehashing the older values. This method is commonly
used by many NoSql databases to apply sharding pretty Figure 3: Mapreduce
elegantly. While altering the hash tables using consistent
hashing only need to remap k/n keys where k is the number of IV. DATA MODELS
keys and n is the number of shards or slots. So it can prevent NoSQL data models differ from the RDBMS system. The
the overwhelm of servers without remapping the whole keys. traditional system faced a lot of challenges, which are actually
It uses the hash rings which points each object to the edge of added as the requirements for this new set of data stores in
the circle and then walk around to fall in the first bucket NoSQL like, storage of non-relational data in distributed
encountered [7]. environment,open source, effective horizontal scalability,
schema-less,replication support, eventual consistency and user
friendly APIs. NoSQL data models are mainly categorized
into four:
A. Key Value Store
Key value store is a simple key and value pairs, which
follows hash or dictionary like storage mechanism. The key is
a unique identifier for managing values. It supports schema
free, distributed environment but it lacks relational structure,
indexing and data level querying[9].
3
● OrientDB: It is a multi model data store that supports key- NoSql is not an alternative to RDBMS and the migration
value, document and graph structures but mainly the between the relational databases and NoSql is still a big
relationships are managed through graph model. MVRB- deal[18]. The relational database is a full fledged area and
Tree is used as an indexing solution with user prefered combining those traditional features to NoSql is also
security and uses a simple layer of querying for challenging.
traversal[16]. Another important research challenge is related with the
privacy and security. The distributed architecture of NoSql has
D. D. Graph Stores lots of limitations like inadequacy of encryption support, poor
As the name implies graph stores are interconnected authentication between client and server, vulnerability to SQL
collection of nodes, node itself is an entity and the edges injection and Denial of Service attacks.
represents the relationship between. The nodes can interpreted
in different ways based on relationships. Edges or VI. CONCLUSION
relationships have directional significance which results
In this paper we presented the limitations of conventional
interesting patterns over the dataset[16].
relational database and the evolution of NoSql. The common
concepts for NoSql data management and different data
models are discussed here identifying some major research
challenges. Even though the NoSql is the first preference for
big data management it is not an ultimate solution. More
research works should be collaborated to attain higher degrees
of big data management effectively.
REFERENCES
[1] E. F. Codd “A relational model of data for large shared data banks” in
Magazine, Communications of the ACM. vol.13 (6), pp. 377-387, June 1970