Research about MongoDB
Spring 2024
CSE 301
Supervisor: - [Link] M. Ali
ENG : Hassan
Name iD
Kareem Mohammed Elsayed 200030812
MONGO-DB :-
Abstract—MongoDB is most popular among the
NoSQL databases. For building data warehouses, it is
a great tool especially because of its ability to fully
utilize so called “sharding-nothing cluster
architecture.” It is an open-source database, which
makes it ideal for building high performance data
warehouses. In this paper, a review of various aspects
of MongoDB is discussed and some key issues are
framed. In future research can be done on any of
these issues. So, this paper opens some areas for
research in MongoDB databases.
Components of MongoDB :-
MongoDB is a cross-platform, document oriented database
that provides, high performance, high availability, and easy
scalability. MongoDB works on concept of collection and
doc-ument.
Uses of MongoDB
MongoDB has document oriented storage; data is stored in
the form of JSON style documents. It can be indexed on any
attributes. We can also explain where to use MongoDB:
• Big Data
• Content Management and Delivery
• Mobile and Social Infrastructure
• User Data Management
• Data Hub
Advantages of MongoDB
1. Schema-less: MongoDB does not require predefined
schemas, allowing flexibility in data storage and easier
scalability compared to relational databases.
2. Document-oriented: MongoDB utilizes documents that
map to native data types in programming languages,
reducing the need for joins and lowering costs.
3. Scalability: MongoDB offers horizontal scalability,
distributing data across a cluster of machines through
sharding, making it suitable for big data applications.
4. Third-party support: MongoDB supports multiple storage
engines and provides APIs for third-party development,
enhancing flexibility and customization.
5. Aggregation: MongoDB features built-in aggregation
capabilities, enabling direct execution of MapReduce code
on the database. It also includes a file system called GridFS
for storing large files, similar to Hadoop Distributed File
System.
Overall, MongoDB's features make it a versatile and
scalable option for various data storage and processing
needs, with integration capabilities with other data
processing frameworks like Hadoop and Spark.
disadvantages of MongoDB :
1. Continuity: MongoDB's automatic failover strategy, while
promising continuity, may not be instantaneous and can take
up to a minute to switch to a new master node. This
contrasts with databases like Cassandra, which support
multiple master nodes for higher availability.
2. Write limits: MongoDB's reliance on a single master node
can limit the speed of data writes, as all writes must be
recorded on the master node, constraining the database's
capacity for writing new information.
3. Data consistency: MongoDB lacks full referential integrity
through foreign-key constraints, potentially impacting data
consistency and integrity.
4. Security: MongoDB databases do not have user
authentication enabled by default, leaving them vulnerable to
malicious attacks. Although default settings have been
added to block unsecured network connections, security
remains a concern, especially given past incidents of attacks
targeting unsecured MongoDB systems.
Why use MongoDB?
• Simple queries
• Functionality provided applicable to most web applications
• Easy and fast integration of data
• No ERD diagram
• Not well suited for heavy and complex transactions
systems
MongoDB history
MongoDB was created by Dwight Merriman and Eliot Horowitz,
who encountered development and scalability issues with traditional
relational database approaches while building web applications at
DoubleClick, an online advertising company that is now owned by
Google Inc. The name of the database was derived from the word
humongous to represent the idea of supporting large amounts of
data.
Merriman and Horowitz helped form 10Gen Inc. in 2007 to
commercialize MongoDB and related software. The company was
renamed MongoDB Inc. in 2013 and went public in October 2017
under the ticker symbol MDB.
The DBMS was released as open source software in 2009 and has
been kept updated since.
Organizations like the insurance company MetLife have used
MongoDB for customer service applications, while other websites
like Craigslist have used it for archiving data. The CERN physics
lab has used it for data aggregation and discovery. Additionally, The
New York Times has used MongoDB to support a form-building
application for photo submissions.
MongoDB: CAP approach
Focus on Consistency and Partition tolerance
• Consistency
• all replicas contain the same
version of the data
• Availability
• system remains operational on
failing nodes
• Partition tolerance
• multiple entry points
• system remains operational on
system split
MongoDB: Hierarchical Objects
• A MongoDB instance
may have zero or more
‘databases’
• A database may have
zero or more
‘collections’.
zero or more
‘documents’.
• A document may have
one or more ‘fields’.
• MongoDB ‘Indexes’
function much like their
RDBMS counterparts.
----------------------------------------------------------------------
-----------------------------------------
------------------------------
-------
IOT DB :-
An Internet of Things (IoT) database is an updatable, queryable
dataset of data points gathered from a wide range of sources. IoT
data sources could include analog and digital sensors, industrial
control systems, wearables, etc.
IoT sensors and other devices usually output massive quantities of
time series data continuously and tend to be spread across both
digital and physical areas. For these reasons, the best database for
IoT applications must collect data in real-time, and store that big
data so it is usable and searchable within an IoT database
architecture.
Characteristics of IoT Databases :-
oT applications connect to IoT devices that number in the millions
or billions and generate real-time data in many formats. These IoT
devices connect to data centers or the cloud and are distributed
across ‘edge’ locations. Running these real-time IoT applications at
scale with high speed and low-latency connectivity demands real-
time data solutions with scalability, fault tolerance, flexible data
modeling, and cloud/multi-region availability.
Database Requirements for IoT :-
Varied IoT database schema. Data IoT sensors can take many
forms. A growing IoT ecosystem demands a database that can
accommodate different data schemas easily and tier data
automatically.
Scalability. IoT devices generate data in massive quantities. To
avoid performance issues and dowtime, an IoT database must scale
simply and automatically.
Data support. In order to reduce disk space and optimize data
queries, both operational and time series data support is essential.
Flexible deployment. IoT databases should have on-premises,
cloud/edge, and data center capabilities as well as multi-cloud and
platform flexibility to manage different platforms, tools, and
approaches.
Real-time analytics readiness. Keep data accessible, searchable, and
functional.
There are several types of IoT databases. Managed IoT database
design can be hot or cold—or streaming or static—depending on the
designer’s goals for the application.
Hot databases. “Hot” or streaming databases are a kind of IoT
database that is used to store data that is queried, updated, or
accessed often. These provide read and write access capabilities at
the lowest cost with little latency, so they are often a good choice for
simply storing data. Performance-oriented NoSQL databases are
commonly used for this use case.
To manage the load and scale requirements of real-time data
collection, hotdatabases are generally distributed. The features most
associated with hot databases are flexible data formats, messaging
and queueing capability, querying abilities, and tiered memory
models.
Cold databases. “Cold” databases, also called static or batch
databases, store and manage data that is altered
very little afterwards and keeps it at rest in its original state. A
common use case for this kind of cold database is the storage of
specific access management metadata for sensitive records. This
kind of data is often managed by a database management system
(DBMS) that typically, though not always, uses SQL.
Organizations that use streaming systems can still benefit from
including a static database component inside their NoSQL Database
system. In this way they can create a larger, unified database that
has both streaming and static capabilities.
IoT graph databases. IoT graph databases preserve and leverage
relationships between data generated by real-time IoT devices to
query in real-time to improve performance. Popular use cases for
IoT graph databases include fraud detection, 24-hour customer
service, knowledge graphs, network management, personalization,
and other areas. Next generation graph databases use artificial
intelligence and machine learning for identity and access
management, recommendation engines, supply chain management,
scientific research, and more.
Why is an Effective Database for IoT Important?
Iotdatabase systems store the data transmitted from different IoT
devices and form an essential component of an IoT network. They
help integrate data in real-time across a wide range of IoT
databases.
IoT database infrastructures process a vast amount of data in real-
time and assign meaningful context tags. The system can route
tagged data within an IoT infrastructure using MQTT, HTTP, or
CoAP, so the central application can use the data generated by
smart devices and sensors.
Databases for IoT applications enable more efficient data analysis,
improved security functionalities, and more agile data storage. The
challenges of IoT include an expensive app development life cycle
and the complexity of protecting large volumes of data against
potential threats.
Smart Cities: IoT Data can be used to monitor and manage urban
infrastructure, such as traffic, waste management, and energy
consumption.
Industrial IoT: IoT Data enables predictive maintenance, real-time
monitoring of machinery, and optimization of manufacturing
processes.
Healthcare: IoT Data from wearable devices and medical sensors
can help in remote patient monitoring, personalized treatments, and
preventive healthcare.
Retail: IoT Data can be used to track inventory, analyze customer
behavior, and improve supply chain management.
Agriculture: IoT Data helps monitor soil moisture, temperature,
and crop health for precision farming and efficient resource
management.
Why IoT Data is Important
IoT Data provides businesses with valuable insights that can be used
to optimize operations, improve efficiency, and drive innovation. By
collecting and analyzing IoT Data, organizations can gain a deeper
understanding of their processes, monitor equipment performance,
detect anomalies, and make data-driven decisions
Why Dremio Users Would be Interested in IoT Data ?
Dremio is a powerful data lakehouse platform that enables users to
optimize, update, and migrate their data infrastructure. For Dremio
users, IoT Data presents a valuable opportunity to incorporate real-
time and sensor-generated data into their analytics workflows.
Dremio's advanced capabilities in data ingestion, integration, and
processing make it well-suited for handling the large volumes and
diverse data formats typical of IoT Data. With Dremio, users can
easily connect to IoT data sources, perform complex data
transformations, and leverage machine learning algorithms to
extract insights from IoT Data.
Technologies Related to IoT Data :-
There are several technologies and terms closely related to IoT
Data:
Big Data: IoT Data often contributes to big data sets, which require
advanced storage and processing technologies.
Data Analytics: Analyzing IoT Data allows businesses to extract
meaningful insights and patterns to support decision-making.
Cloud Computing: IoT Data is often stored and processed in the cloud
due to its scalability and accessibility.
Edge Computing: Edge devices located close to IoT sensors can
perform real-time data processing and reduce latency.
Artificial Intelligence: AI techniques can be applied to IoT Data to
enable predictive analytics, anomaly detection, and automated
decision-making.
Internet of Things Use Cases-:
According to 451 Research, 65% of companies are using IoT. 69% of
organizations gather data from end points and 94% of those companies
use it for business purposes. The highest usage is among Utilities (92%)
and Manufacturing (77%).
The IoT data comes from:
Datacenter IT Equipment (51%)
Cameras and Surveillance Equipment (34%)
Smartphones and End Users (30%)
Buildings and Other Structures (21%)
Environmental Sensors (15%>(
Factory Equipment (14%)
Automobiles/Fleet Equipment (11%)
Retail Operations (8%)
Medical Devices (7%)
Conclusion & Use Cases
IoT is a major trend nowadays, like we can see in the chart, there is
tremendous economic impact in a cross-industry way.
In conclusion, IoT databases play a crucial role in managing the
massive volumes of data generated by IoT devices
The ever-growing amount of data to be collected and managed on
IoT edge devices poses challenges IoT engineers and database
management systems vendors must continually research and
address. It seems like the amount of data collected by the IoT, and
the methods of collecting it, is growing almost as fast as the number
of new systems. And while some variables remain constant such as
the need for low resource consumption, the data processing
demands only grow. Edge database management and analytics must
be small and fast, yet powerful enough to enhance device
functionality. Moreover, highly configurable database management
becomes key. Some devices don’t have enough juice in them to run
anything but a simple data collection task. Others are more capable.
Intermittent and variable connectivity must be considered and
planned for, while authorized accessibility should be seamless.