0% found this document useful (0 votes)

56 views18 pages

Unit 1

Uploaded by

Avon Numa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views18 pages

Unit 1

Uploaded by

Avon Numa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Generation
Technologies
MODULE-1: Big Data , NoSQL , Introducing MongoDB

Compiled by: Prof. Seema Bhatkar

[email protected]
Vidyalankar School of
Information Technology
Wadala (E), Mumbai
www.vsit.edu.in
Certificate

This is to certify that the e-book titled “Big Data , NoSQL , Introducing MongoDB” comprises all
elementary learning tools for a better understating of the relevant concepts. This e-book is comprehensively
compiled as per the predefined eight parameters and guidelines

Signature
Ms. Seema Bhatkar
Assistant Professor Date: 13-06-2019
Department of IT

DISCLAIMER: The information contained in this e-book is compiled and distributed for educational purposes
only. This e-book has been designed to help learners understand relevant concepts with a more dynamic
interface. The compiler of this e-book and Vidyalankar Institute of Technology give full and due credit to the
authors of the contents, developers and all websites from wherever information has been sourced. We
acknowledge our gratitude towards the websites YouTube, Wikipedia, and Google search engine. No
commercial benefits are being drawn from this project.
Unit I Big Data , NoSQL , Introducing MongoDB

Contents :
Big Data: Getting Started, Big Data, Facts About Big Data, Big Data Sources, Three Vs of Big Data,
Volume, Variety, Velocity, Usage of Big Data, Visibility, Discover and Analyze Information,
Segmentation and Customizations, Aiding Decision Making, Innovation, Big Data Challenges, Policies
and Procedures, Access to Data, Technology and Techniques, Legacy Systems and Big Data, Structure
of Big Data, Data Storage, Data Processing, Big Data Technologies
NoSQL: SQL, NoSQL, Definition, A Brief History of NoSQL, ACID vs. BASE, CAP Theorem (Brewer‟s
Theorem), The BASE, NoSQL Advantages and Disadvantages, Advantages of NoSQL, Disadvantages
of NoSQL, SQL vs. NoSQL Databases, Categories of NoSQL Databases
Introducing MongoDB: History, MongoDB Design Philosophy, Speed, Scalability, and Agility, Non-
Relational Approach, JSON-Based Document Store, Performance vs. Features, Running the Database
Anywhere, SQL Comparison

Recommended Books :
1. Practical MongoDB by Shakuntala Gupta and Edward Navin Sabharwal published by Apress
2. Beginning jQuery by Jack Franklin and Russ Ferguson second edition published by Apress
3. Next Generation Databases by Guy Harrison published by Apress
4. Beginning JSON by Ben Smith published by Apress

Prerequisites/linkages
Unit II Pre- Sem. I Sem. II Sem. III Sem. IV Sem. VI
requisites
Big Data , - - WP DBMS CJ Project
NoSQL, Python
Introducing
MongoDB
Chapter -1
Big Data
Big data is data that has high volume, is generated at high velocity, and has multiple
varieties. Let’s look at few facts and figures of big data.

 Big Data Sources

The major sources of this data are

 Enterprises, which are collecting data with more granularities now, attaching more details
with every transaction in order to understand consumer behavior.
 Increase in multimedia usage across industries such as health care, product companies, etc.
 Increased popularity of social media sites such as Facebook, Twitter, etc.
 Rapid adoption of smartphones, which enable users to actively use social media sites and
other Internet applications.
 Increased usage of sensors and devices in the day-to-day world, which are connected by
networks to computing resources.

Source:- https://www.youtube.com/watch?v=eVSfJhssXUA
 Three Vs of Big Data

1. Volume
Volume in big data means the size of the data. As businesses are becoming more transaction-
oriented so increasing numbers of transactions adding generating huge amount of data. This
huge volume of data is the biggest challenge for big data technologies. The storage and
processing power needed to store, process, and make accessible the data in a timely and cost
effective manner is massive.

2. Variety
The data generated from various devices and sources follows no fixed format or structure.
Compared to text, CSV or RDBMS data varies from text files, log files, streaming videos,
photos, meter readings, stock ticker data, PDFs, audio, and various other unstructured
formats.
New sources and structures of data are being created at a rapid pace. So the onus is on
technology to find a solution to analyze and visualize the huge variety of data that is out
there. As an example, to provide alternate routes for commuters, a traffic analysis application
needs data feeds from millions of smartphones and sensors to provide accurate analytics on
traffic conditions and alternate routes.

3. Velocity
Velocity in big data is the speed at which data is created and the speed at which it is required
to be processed. If data cannot be processed at the required speed, it loses its significance.
Due to data streaming in from social media sites, sensors, tickers, metering, and monitoring,
it is important for the organizations to speedily process data both when it is on move and
when it is static.

 Usage of Big Data

1. Visibility
Accessibility to data in a timely fashion to relevant stakeholders generates a tremendous
amount of value.
Example:- Consider a manufacturing company that has R&D, engineering, and
manufacturing departments dispersed geographically. If the data is accessible across all these
departments and can be readily integrated, it can not only reduce the search and processing
time but will also help in improving the product quality according to the present needs.

2. Discover and Analyze Information

Most of the value of big data comes from when the data collected from outside sources can
be merged with the organization’s internal data. Organizations are capturing detailed data on
inventories, employees, and customers. Using all of this data, they can discover and analyze
new information and patterns; as a result, this information and knowledge can be used to
improve processes and performance.

3. Segmentation and Customizations

Big data enables organizations to create tailor-made products and services to meet specific
segment needs. This can also be used in the social sector to accurately segment populations
and target benefit schemes for specific needs. Segmentation of customers based on various
parameters can aid in targeted marketing campaigns and tailoring of products to suit the
needs of customers.

4. Aiding Decision Making

Big data can substantially minimize risks, improve decision making , and uncover valuable
insights. Automated fraud alert systems in credit card processing and automatic fine-tuning
of inventory are examples of systems that aid or automate decision-making based on big data
analytics.

5. Innovation
Big data enables innovation of new ideas in the form of products and services. It enables
innovation in the existing ones in order to reach out to large segments of people. Using data
gathered for actual products, the manufacturers can not only innovate to create the next
generation product but they can also innovate sales offerings.
As an example, real-time data from machines and vehicles can be analyzed to provide insight
into maintenance schedules; wear and tear on machines can be monitored to make more
resilient machines; fuel consumption monitoring can lead to higher efficiency engines. Real-
time traffic information is already making life easier for commuters by providing them
options to take alternate routes.

 Big Data Challenges

1. Policies and Procedures
As more and more data is gathered, digitized, and moved around the globe, the policy and
compliance issues become increasingly important. Data privacy, security, intellectual
property, and protection are of immense importance to organizations.
Compliance with various statutory and legal requirements poses a challenge in data handling.
Issues around ownership and liabilities around data are important legal aspects that need to
be dealt with in cases of big data.
Moreover, many big data projects leverage the scalability features of public cloud computing
providers. This poses a challenge for compliance. Policy questions on who owns the data,
what is defined as fair use of data, and who is responsible for accuracy and confidentiality of
data also need to be answered.
2. Access to Data
Accessing data for consumption is a challenge for big data projects. Some of the data may be
available to third parties, and gaining access can be a legal, contractual challenge. Data about
a product or service is available on Facebook, Twitter feeds, reviews, and blogs, so how does
the product owner access this data from various sources owned by various providers?
Likewise, contractual clauses and economic incentives for accessing big data need to be tied
in to enable the availability of data by the consumer.

3. Technology and Techniques

New tools and technologies built specifically to address the needs of big data must be
leveraged, rather than trying to address the aforementioned issues through legacy systems.
The inadequacy of legacy systems to deal with big data on one hand and the lack of
experienced resources in newer technologies is a challenge that any big data project has to
manage.

 Legacy Systems and Big Data

The challenges that organizations are facing when managing big data using legacy
systems .

1. Structure of Big Data

Legacy systems are designed to work with structured data where tables with columns are
defined. The format of the data held in the columns is also known. However, big data is data
with many structures. It’s basically unstructured data such as images, videos, logs, etc.
Since big data can be unstructured, legacy systems created to perform fast queries and
analysis through techniques like indexing based on particular data types held in various
columns cannot be used to hold or process big data.

2. Data Storage
Legacy systems use big servers and NAS and SAN systems to store the data. As the data
increases, the server size and the backend storage size has to be increased. Traditional legacy
systems typically work in a scaleup model where more and more compute, memory, and
storage needs to be added to a server to meet the increased data needs. Hence the processing
time increases exponentially, which defeats the other important requirement of big data,
which is velocity.

3. Data Processing
The algorithms in legacy system are designed to work with structured data such as strings
and integers. They are also limited by the size of data. Thus, legacy systems are not capable
of handling the processing of unstructured data, huge volumes of such data, and the speed at
which the processing needs to be performed.
As a result, to capture value from big data, we need to deploy newer technologies in the field
of storing, computing, and retrieving, and we need new techniques for analyzing the data.
 Big Data Technologies
The recent technology advancements that enable organizations to make the most of its big
data are the following:
1. New storage and processing technologies designed specifically for large
unstructured data
2. Parallel processing
3. Clustering
4. Large grid environments
5. High connectivity and high throughput
6. Cloud computing and scale-out architectures

 Applications of Big Data

Big data has found many applications in various fields today. The major fields where big
data is being used are as follows.
1. Government
Big data analytics has proven to be very useful in the government sector. Big data
analysis played a large role in Barack Obama’s successful 2012 re-election campaign. Also
most recently, Big data analysis was majorly responsible for the BJP and its allies to win a
highly successful Indian General Election 2014.
2. Social Media Analytics
Social media can provide valuable real-time insights into how the market is
responding to products and campaigns. With the help of these insights, the companies can
adjust their pricing, promotion, and campaign placements accordingly. Before utilizing the
big data there needs to be some preprocessing to be done on the big data in order to derive
some intelligent and valuable results.
3. Technology
The technological applications of big data comprise of the following companies
which deal with huge amounts of data every day and put them to use for business decisions
as well. For example, eBay.com uses two data warehouses at 7.5 petabytes and 40PB as well
as a 40PB Hadoop cluster for search, consumer recommendations, and merchandising.
4. Agriculture
A biotechnology firm uses sensor data to optimize crop efficiency. It plants test crops
and runs simulations to measure how plants react to various changes in condition. Its data
environment constantly adjusts to changes in the attributes of various data it collects,
including temperature, water levels, soil composition, growth, output, and gene sequencing
of each plant in the test bed. These simulations allow it to discover the optimal environmental
conditions for specific gene types.
5. Marketing
Marketers have begun to use facial recognition software to learn how well their
advertising succeeds or fails at stimulating interest in their products.
6. Smart phones
Perhaps more impressive, people now carry facial recognition technology in their
pockets. Users of I Phone and Android smartphones have applications at their fingertips that
use facial recognition technology for various tasks.
Chaper-2

NoSQL
 SQL

The language used to query RDBMS(Relational Database Management System) systems is

SQL (Sequel Query Language ). RDBMS systems are well suited for structured data held in
columns and rows, which can be queried using SQL.
The RDBMS systems are based on the concept of ACID transactions. ACID stands for
Atomic, Consistent, Isolated, and Durable, where

• Atomic implies either all changes of a transaction are applied completely or not applied at
all.

• Consistent means the data is in a consistent state after the transaction is applied. This
means after a transaction is committed, the queries fetching a particular data will see the
same result.

• Isolated means the transactions that are applied to the same set of data are independent of
each other. Thus, one transaction will not interfere with another transaction.

• Durable means the changes are permanent in the system and will not be lost in case of any
failures.

 NoSQL
NoSQL is a term used to refer to non-relational databases . Thus, it encompasses majority of
the data stores that are not based on the conventional RDBMS principles and are used for
handling large data sets on an Internet scale.
NoSQL is an umbrella term for data stores that don’t follow the RDBMS principles.

A Brief History of NoSQL

In 1998, Carlo Strozzi coined the term NoSQL . He used this term to identify his
database because the database didn’t have a SQL interface. The term resurfaced in early 2009
when Eric Evans (a Rackspace employee) used this term in an event on open source
distributed databases to refer to distributed databases that were non-relational and did not
follow the ACID features of relational databases.

 ACID vs. BASE

In the introduction, we mentioned that the traditional RDBMS applications have
focused on ACID transactions . Howsoever essential these qualities may seem, they are quite
ncompatible with availability and performance requirements for applications of a Web scale.
In contrary to the ACID approach of traditional RDBMS systems, NoSQL solves the
problem using an approach popularly called as BASE.

 CAP Theorem ( Brewer’s Theorem )

Eric Brewer outlined the CAP theorem in 2000. The theorem states that when designing
an application in a distributed environment there are three basic requirements that exist, namely
consistency, availability, and partition tolerance.
• Consistency means that the data remains consistent after any operation is performed that
changes the data, and that all users or clients accessing the application see the same updated
data.
• Availability means that the system is always available.
• Partition Tolerance means that the system will continue to function even if it is partitioned
into groups of servers that are not able to communicate with one another.

The CAP theorem states that at any point in time a distributed system can fulfil only two of the
above three guarantees.

 The BASE
Eric Brewer coined the BASE acronym . BASE can be explained as

 Basically Available means the system will be available in terms of the CAP theorem.

 Soft state indicates that even if no input is provided to the system, the state will change over
time. This is in accordance to eventual consistency.

 Eventual consistency means the system will attain consistency in the long run, provided no
input is sent to the system during that time.

Hence BASE is in contrast with the RDBMS ACID transactions.

You have seen that NoSQL databases are eventually consistent but the eventual consistency
implementation may vary across different NoSQL databases.
NRW is the notation used to describe how the eventual consistency model is implemented across
NoSQL databases where

• N is the number of data copies that the database has maintained.

• R is the number of copies that an application needs to refer to before returning a read request’s
output.
• W is the number of data copies that need to be written to before a write operation is marked as
completed successfully.

Using these notation configurations , the databases implement the model of eventual consistency.

Write Operations
 N=W implies that the write operation will update all data copies before returning the control
to the client and marking the write operation as successful. This is similar to how the
traditional RDBMS databases work when implementing synchronous replication. This setting
will slow down the write performance.
 If write performance is a concern, which means you want the writes to be happening fast, you
can set W=1, R=N. This implies that the write will just update any one copy and mark the
write as successful, but whenever the user issues a read request, it will read all the copies to
return the result. If either of the copies is not updated, it will ensure the same is updated, and
then only the read will be successful. This implementation will slow down the read
performance.
 Hence most NoSQL implementations use N>W>1. This implies that greater than one node
needs to be updated successfully; however, not all nodes need to be updated at the same time.

Read Operations
 If R is set to 1, the read operation will read any data copy, which can be outdated.
 If R>1, more than one copy is read, and it will read most recent value. However, this can
slow down the read operation.
 Using N<W+R always ensures that a read operation retrieves the latest value. This is because
the number of written copies and read copies are always greater than the actual number of
copies, ensuring that at least one read copy has the latest version.
Advantages of NoSQL

1. High scalability : This scaling up approach fails when the transaction rates and fast response
requirements increase. In contrast to this, the new generation of NoSQL databases is
designed to scale out (i.e. to expand horizontally using low-end commodity servers).
2. Manageability and administration : NoSQL databases are designed to mostly work with
automated repairs, distributed data, and simpler data models, leading to low manageability
and administration.
3. Low cost : NoSQL databases are typically designed to work with a cluster of cheap
commodity servers, enabling the users to store and process more data at a low cost.
4. Flexible data models : NoSQL databases have a very flexible data model, enabling them to
work with any type of data; they don’t comply with the rigid RDBMS data models. As a
result, any application changes that involve updating the database schema can be easily
implemented.

Disadvantages of NoSQL

1. Maturity: Most NoSQL databases are pre-production versions with key features that are still
to be implemented. Thus, when deciding on a NoSQL database, you should analyze the
product properly to ensure the features are fully implemented and not still on the To-do list .

2. Support: Support is one limitation that you need to consider. Most NoSQL databases are
from start-ups which were open sourced. As a result, support is very minimal as compared to
the enterprise software companies and may not have global reach or support resources.

3. Limited Query Capabilities: Since NoSQL databases are generally developed to meet the
scaling requirement of the web-scale applications, they provide limited querying
capabilities. A simple querying requirement may involve significant programming expertise.

4. Administration : Although NoSQL is designed to provide a no-admin solution, it still

requires skill and effort for installing and maintaining the solution.

5. Expertise : Since NoSQL is an evolving area, expertise on the technology is limited in the
developer and administrator community.
SQL vs. NoSQL Databases

SQL NoSQL
Types All types support SQL standard. Multiple types exists, such as
document stores, key value stores,
column databases, etc.
Development Developed in 1970. Developed in 2000s.
History
Examples SQL Server, Oracle, MySQL. MongoDB, HBase, Cassandra.
Data Storage Model Data is stored in rows and columns The data model depends on the
in a table, where each column is of database type. Say data is stored as a
a specific type. key-value pair for key-value stores.
The tables generally are created on In documentbased databases, the
principles of normalization. data is stored as documents.
Joins are used to retrieve data from The data model is flexible, in
multiple tables. contrast to the rigid table model of
the RDBMS.
Schemas Fixed structure and schema, so any Dynamic schema, new data types, or
change to schema involves altering structures can be accommodated by
the database. expanding or altering the current
schema.
New fields can be added
dynamically.
Scalability Scale up approach is used; this Scale out approach is used; this
means as the load increases, bigger, means distributing the data load
expensive servers are bought to across inexpensive commodity
accommodate the data. servers.
Supports Supports ACID and transactions. Supports partitioning and
Transactions availability, and compromises on
transactions.
Transactions exist at certain level,
such as the database level or
document level.
Consistency Strong consistency. Dependent on the product. Few
chose to provide strong consistency
whereas few provide eventual
consistency.
Support High level of enterprise support is Open source model. Support through
provided. third parties or companies building
the open source products.
Maturity Have been around for a long time. Some of them are mature; others are
evolving.
Querying Available through easy-to-use GUI Querying may require programming
Capabilities interfaces. expertise and knowledge. Rather
than an UI, focus is on functionality
and programming interfaces.
Expertise Large community of developers Small community of developers
who have been leveraging the SQL working on these open source tools.
language and RDBMS concepts to
architect and develop applications.

Categories of NoSQL Databases

The NoSQL databases are categorized on the basis of how the data is stored. NoSQL mostly
follows a horizontal structure because of the need to provide curated information from large
volumes, generally in near real-time. They are optimized for insert and retrieve operations on a large
scale with built-in capabilities for replication and clustering.
Table 2-4 briefly provides a feature comparison between the various categories of NoSQL
databases

Chapter -3

Introducing MongoDB
History
In the later part of 2007, Dwight Merriman, Eliot Horowitz, and their team decided to
develop an online service. The intent of the service was to provide a platform for developing,
hosting, and auto-scaling web applications, much in line with products such as the Google App
Engine or Microsoft Azure. Soon they realized that no open source database platform suited the
requirements of the service.
A year later, the database for the service was ready to use. The service itself was never
released but the team decided in 2009 to open source the database as MongoDB. In March of
2010,the release of MongoDB 1.4.0 was considered production-ready. The latest production release
is 3.0and it was released in March 2015.MongoDB was built under the sponsorship of 10gen, a New
York–based startup.

 MongoDB Design Philosophy

1. Speed, Scalability, and Agility
The design team’s goal when designing MongoDB was to create a database that was fast,
massively scalable, and easy to use. To achieve speed and horizontal scalability in a partitioned
database, as explained in the CAP theorem, the consistency and transactional support have to be
compromised. Thus, per this theorem,MongoDB provides high availability, scalability, and
partitioning at the cost of consistency and transactional support. In practical terms, this means
that instead of tables and rows, MongoDB uses documents to make it flexible, scalable, and fast.
2. Non-Relational Approach
Traditional RDBMS platforms provide scalability using a scale-up approach, which
requires a faster server to increase performance. The following issues in RDBMS systems led to
why MongoDB and other NoSQL databases were designed the way they are designed:
 In order to scale out, the RDBMS database needs to link the data available in two or more
systems in order to report back the result. This is difficult to achieve in RDBMS systems
since they are designed to work when all the data is available for computation together. Thus
the data has to be available for processing at a single location.
 In case of multiple Active-Active servers, when both are getting updated from multiple
sources there is a challenge in determining which update is correct.
 When an application tries to read data from the second server, and the information has been
updated on the first server but has yet to be synchronized with the second server, the
information returned may be stale.
The MongoDB team decided to take a non-relational approach to solving these
problems. As mentioned, MongoDB stores its data in BSON documents where all the related
data is placed together,which means everything is in one place. The queries in MongoDB are
based on keys in the document, so the documents can be spread across multiple servers.
Querying each server means it will check its own set of documents and return the result. This
enables linear scalability and improved performance.
MongoDB has a primary-secondary replication where the primary accepts the write
requests. If the write performance needs to be improved, then sharding can be used; this
splits the data across multiple machines and enables these multiple machines to update
different parts of the datasets. Sharding is automatic in MongoDB; as more machines are
added, data is distributed automatically.

3. JSON-Based Document Store

MongoDB uses a JSON-based (JavaScript Object Notation) document store for the data.
JSON/BSON offers a schema-less model, which provides flexibility in terms of database design.
Unlike in RDBMSs, changes can be done to the schema seamlessly.
This design also makes for high performance by providing for grouping of relevant data
together internally and making it easily searchable.
A JSON document contains the actual data and is comparable to a row in SQL. However, in
contrast to RDBMS rows, documents can have dynamic schema. This means documents within a
collection can have different fields or structure, or common fields can have different type of data.
A document contains data in form of key-value pairs. Let’s understand this with an

example:
{
"Name": "ABC",
"Phone": ["1111111",
........"222222"
........],
"Fax":..}
As mentioned, keys and values come in pairs. The value of a key in a document can be left
blank. In the above example, the document has three keys, namely “Name,” ”Phone,” and “Fax.”
The “Fax” key has no value.
4. Performance vs. Features
In order to make MongoDB high performance and fast, certain features commonly
available in RDBMS systems are not available in MongoDB. MongoDB is a document-oriented
DBMS where data is stored as documents. It does not support JOINs, and it does not have fully
generalized transactions. However, it does provide support for secondary indexes, it enables
users to query using query documents, and it provides support for atomic updates at a per
document level. It provides a replica set, which is a form of master-slave replication with
automated failover, and it has built-in horizontal scaling.

5. Running the Database Anywhere

The language used for implementing MongoDB is C++, which enables MongoDB to
run on servers, VMs, or even on the cloud. The 10gen site provides binaries for different OS
platforms, enabling MongoDB to run on almost any type of machine.

 SQL Comparison
The following are the ways in which MongoDB is different from SQL.
1. MongoDB uses documents for storing its data, which offer a flexible schema (documents in
same collection can have different fields). This enables the users to store nested or multi-
value fields such as arrays, hashes, etc. In contrast, RDBMS systems offer a fixed schema
where a column’s value should have a similar data type. Also, it’s not possible to store arrays
or nested values in a cell.

2. MongoDB doesn’t provide support for JOIN operations, like in SQL. However, it enables the
user to store all relevant data together in a single document, avoiding at the periphery the
usage of JOINs. It has a workaround to overcome this issue.

3. MongoDB doesn’t provide support for transactions in the same way as SQL. However, it
guarantees atomicity at the document level. Also, it uses an isolation operator to isolate write
operations that affect multiple documents, but it does not provide “all-or-nothing” atomicity
for multi-document write operations.

Source:- https://www.youtube.com/watch?v=EE8ZTQxa0AM
Questions

1. What is Big Data? What are the different sources of Big Data?(Nov-2018)
2. Explain the three Vs of Big Data. (Nov-2018)
3. Compare ACID Vs BASE(Nov-2018)
4. With the help of a neat diagram explain, the CAP theorem. (Nov-2018)
5. What are the advantages and disadvantages of NoSQL databases? (Nov-2018)
6. What are the different categories of NoSQL database? Explain each with an example.
(Nov-2018)
7. What are the different challenges Big Data posses? (May-2019)
8. Differentiate between SQL and NoSQL Databases. (May-2019)
9. What is MongoDB Design Philosophy? Explain. (May-2019)
10. Write a short note on Non-Relational Approach. (May-2019)
11. Discuss the various applications of Big Data. (May-2019)

Multiple Choice Questions

1. Which of the following language is MongoDB written in ?
a) Javascript
b) C
c) C++
d) All of the mentioned

2. Point out the correct statement :

a) MongoDB is classified as a NoSQL database
b) MongoDB favours XML format more than JSON
c) MongoDB is column oriented database store
d) All of the mentioned

3. Which of the following format is supported by MongoDB ?

a) SQL
b) XML
c) BSON
d) All of the mentioned

4. Point out the wrong statement :

a) Secondary indices is not available in MongoDB
b) MongoDB supports search by field, range queries, regular expression searches
c) MongoDB can store the business subject in the minimal number of documents
d) All of the mentioned

5. MongoDB is a _________ database that provides high performance, high availability, and
easy scalability.
a) Graph
b) key value
c) Document
d) all of the mentioned

6. Dynamic schema in MongoDB makes ____________ easier for applications.

a) inheritance
b) polymorphism
c) encapsulation
d) none of the mentioned
7. With ________ MongoDB supports a complete backup solution and full deployment
monitoring.
a) MMS
b) AMS
c) CMS
d) DMS

8. Which of the following is a NoSQL Database Type ?

a) SQL
b) Document databases
c) JSON
d) All of the mentioned

9. NoSQL databases is used mainly for handling large volumes of ______________ data.
a) Unstructured
b) Structured
c) semi-structured
d) all of the mentioned

10. MongoDB uses a ____________ lock that allows concurrent read access to a database but
exclusive write access to a single write operation.
a) Readers
b) readers-writer
c) writer
d) none of the mentioned

Module I-1
100% (1)
Module I-1
21 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
BD unit 1
No ratings yet
BD unit 1
5 pages
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
No ratings yet
Introductions: What Are The 5 Vs of Big Data/ Characteristics of Big Data or Nature of Data
75 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Sns College of Engineering: Big Data Analytics
No ratings yet
Sns College of Engineering: Big Data Analytics
17 pages
G12 It Unit 2
No ratings yet
G12 It Unit 2
30 pages
IT UNIT 2 Part 1
No ratings yet
IT UNIT 2 Part 1
33 pages
Unit-1.1-Introduction To Big Data
No ratings yet
Unit-1.1-Introduction To Big Data
50 pages
UNIT_1 BDA
No ratings yet
UNIT_1 BDA
14 pages
BD 1
No ratings yet
BD 1
15 pages
(15) Big Data
No ratings yet
(15) Big Data
10 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
ETB 1 (Big data)
No ratings yet
ETB 1 (Big data)
28 pages
What Is Big Data
No ratings yet
What Is Big Data
8 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
BDAV Question Bank Solution
No ratings yet
BDAV Question Bank Solution
63 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
BIG_DATA
No ratings yet
BIG_DATA
16 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
BDA notes part 1
No ratings yet
BDA notes part 1
11 pages
BDCC Unit 1
No ratings yet
BDCC Unit 1
165 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
BDA-UNIT-I-LM
No ratings yet
BDA-UNIT-I-LM
14 pages
Big Data Analysis Seminar
100% (1)
Big Data Analysis Seminar
15 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Unit I
No ratings yet
Unit I
64 pages
BDA2023Outline
No ratings yet
BDA2023Outline
7 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
Bigdata
100% (1)
Bigdata
7 pages
Big Data
No ratings yet
Big Data
30 pages
IMP Questions pdf in Big Data
No ratings yet
IMP Questions pdf in Big Data
15 pages
ETEM S01 - (Big Data)
No ratings yet
ETEM S01 - (Big Data)
24 pages
Acc 411 Topic 2
No ratings yet
Acc 411 Topic 2
30 pages
BDA - Unit-I
No ratings yet
BDA - Unit-I
35 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Now To Be Data
No ratings yet
Now To Be Data
16 pages
Big Data: Abstract
No ratings yet
Big Data: Abstract
15 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Big Data(1) [Autosaved]
No ratings yet
Big Data(1) [Autosaved]
13 pages
Big Data
No ratings yet
Big Data
24 pages
Imp Answers
No ratings yet
Imp Answers
29 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
unit-1-big-data-notes
No ratings yet
unit-1-big-data-notes
40 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
6-Spatial Data Analysis (E-next.in)
No ratings yet
6-Spatial Data Analysis (E-next.in)
98 pages
4-Spatial Referencing and Positioning (E-next.in)
No ratings yet
4-Spatial Referencing and Positioning (E-next.in)
81 pages
IOT 9 Telegram
No ratings yet
IOT 9 Telegram
4 pages
EJB Life Cycle Example - Java Code Geeks
No ratings yet
EJB Life Cycle Example - Java Code Geeks
8 pages
JPA - Architecture
No ratings yet
JPA - Architecture
3 pages
An Object-Oriented and Executable Sysml Framework For Rapid Model Development
No ratings yet
An Object-Oriented and Executable Sysml Framework For Rapid Model Development
10 pages
Advanced Programming Techniques Vol4
No ratings yet
Advanced Programming Techniques Vol4
206 pages
GCP CDL
100% (1)
GCP CDL
156 pages
Ijalis 22 500
No ratings yet
Ijalis 22 500
10 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
Internship Report
No ratings yet
Internship Report
12 pages
Lahore Garrison University, Pakistan: Assignment No 1
No ratings yet
Lahore Garrison University, Pakistan: Assignment No 1
4 pages
Fixpack3 Updated
No ratings yet
Fixpack3 Updated
8 pages
Shaunak Internship Report PDF
No ratings yet
Shaunak Internship Report PDF
17 pages
Grade 10- Base Question-Answer
No ratings yet
Grade 10- Base Question-Answer
2 pages
Etl Interview Questions
No ratings yet
Etl Interview Questions
36 pages
Interview DS: Most Asked Interview Questions
No ratings yet
Interview DS: Most Asked Interview Questions
21 pages
Dilnoza Babajonova: Summary
No ratings yet
Dilnoza Babajonova: Summary
6 pages
synopsis_banking system
No ratings yet
synopsis_banking system
21 pages
Acronis Cloud Tech Associate Advanced Backup 2024 Handout
No ratings yet
Acronis Cloud Tech Associate Advanced Backup 2024 Handout
130 pages
Payment2Go: End To End E-Commerce Solution
No ratings yet
Payment2Go: End To End E-Commerce Solution
16 pages
Antonopoulos, Andreas M - Mastering Bitcoin - Unlocking Digital cryptocurrencies-O'Reilly (2015) - 69-78
No ratings yet
Antonopoulos, Andreas M - Mastering Bitcoin - Unlocking Digital cryptocurrencies-O'Reilly (2015) - 69-78
10 pages
Connect JSP With Mysql
No ratings yet
Connect JSP With Mysql
19 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Computer Science Mini Project: Topic - Fee Management System
No ratings yet
Computer Science Mini Project: Topic - Fee Management System
9 pages
Nov. 27 Computer Exam Will Only Have M.C.Q. From Ch. 2
No ratings yet
Nov. 27 Computer Exam Will Only Have M.C.Q. From Ch. 2
2 pages
Cs3381-Oops Lab Manual
No ratings yet
Cs3381-Oops Lab Manual
40 pages
Knowledge Graphs v Vector Databases and when not to use them!
No ratings yet
Knowledge Graphs v Vector Databases and when not to use them!
3 pages
Hibernate in Java
No ratings yet
Hibernate in Java
6 pages
DBMS LAB MANUAL(STUDENT)
No ratings yet
DBMS LAB MANUAL(STUDENT)
39 pages
Aashish Resume 2021 - AA
No ratings yet
Aashish Resume 2021 - AA
2 pages
Aspera EUG - Posting Rules
No ratings yet
Aspera EUG - Posting Rules
45 pages
All Itt Questions 60
No ratings yet
All Itt Questions 60
572 pages
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
No ratings yet
Lab # 12 K-Nearest Neighbor (KNN) Algorithm: Objective
5 pages
GP Report Guidlines
No ratings yet
GP Report Guidlines
5 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

Next

Compiled by: Prof. Seema Bhatkar

 Big Data Sources

The major sources of this data are

 Usage of Big Data

2. Discover and Analyze Information

3. Segmentation and Customizations

4. Aiding Decision Making

 Big Data Challenges

3. Technology and Techniques

 Legacy Systems and Big Data

1. Structure of Big Data

 Applications of Big Data

The language used to query RDBMS(Relational Database Management System) systems is

A Brief History of NoSQL

 ACID vs. BASE

 CAP Theorem ( Brewer’s Theorem )

Hence BASE is in contrast with the RDBMS ACID transactions.

• N is the number of data copies that the database has maintained.

4. Administration : Although NoSQL is designed to provide a no-admin solution, it still

Categories of NoSQL Databases

 MongoDB Design Philosophy

3. JSON-Based Document Store

5. Running the Database Anywhere

Multiple Choice Questions

2. Point out the correct statement :

3. Which of the following format is supported by MongoDB ?

4. Point out the wrong statement :

6. Dynamic schema in MongoDB makes ____________ easier for applications.

8. Which of the following is a NoSQL Database Type ?

You might also like