0% found this document useful (0 votes)
7 views

4unit NoSQL

NoSQL databases are designed to manage large amounts of unstructured or semi-structured data, offering high scalability and performance without a predefined schema. The CAP theorem states that in distributed systems, one can only guarantee two of the three properties: Consistency, Availability, and Partition Tolerance, leading to trade-offs in system design. Aggregate data models in NoSQL group related data into collections, simplifying data access and improving performance, while graph databases like Neo4j focus on complex relationships between data.

Uploaded by

Dhanya Sonu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

4unit NoSQL

NoSQL databases are designed to manage large amounts of unstructured or semi-structured data, offering high scalability and performance without a predefined schema. The CAP theorem states that in distributed systems, one can only guarantee two of the three properties: Consistency, Availability, and Partition Tolerance, leading to trade-offs in system design. Aggregate data models in NoSQL group related data into collections, simplifying data access and improving performance, while graph databases like Neo4j focus on complex relationships between data.

Uploaded by

Dhanya Sonu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

What is NOSQL? Explain the CAP theorem.

NoSQL refers to a category of database management systems that do not


use the traditional relational model. Instead, NoSQL databases are designed to
handle large amounts of distributed, unstructured, or semi-structured data, often at
a scale that exceeds the capabilities of relational databases. NoSQL databases are
particularly suited for applications requiring high scalability, flexibility, and fast
performance.

Characteristics of NoSQL Databases

1. Schema-less: No predefined schema; data can be stored in various formats.


2. Horizontal Scalability: Easily scales across distributed systems.
3. Horizontal scalability in DBMS (Database Management Systems) refers to
the ability to increase the database's capacity and performance by adding
more machines (or nodes) to the system, rather than upgrading the existing
hardware. It is also known as scale-out.
4. Data Models: Includes key-value, document, column-family, and graph
databases.
5. High Performance: Optimized for fast reads and writes.

CAP Theorem
The CAP theorem, proposed by Eric Brewer, states that in a
distributed data system, it is impossible to simultaneously guarantee all three
of the following properties:
 It is possible to attain only two properties and third would be
always compromised.
 The system requirements should define which two properties
should be chosen over the rest.
1. Consistency (C): Every read receives the most recent write or an error.
2. Availability (A): Every request receives a response (non-error) without
guaranteeing the most recent data.
3. Partition Tolerance (P): The system continues to operate despite network
partitions.
Trade-offs
According to the theorem:
 CP Systems: Prioritize Consistency and Partition Tolerance. Availability
might be sacrificed during partition failures.
o Example: MongoDB, HBase.
o Banking systems, where consistent data is critical.

 AP Systems: Prioritize Availability and Partition Tolerance. Consistency


may be eventual rather than immediate.
o Example: Cassandra, DynamoDB.
o Social media applications, where availability is more critical than
immediate consistency.

 CA Systems: Not feasible in distributed systems since Partition Tolerance


must be present in real-world distributed systems.
 Traditional databases like MySQL or PostgreSQL (in single-node setups) are
CA systems. These systems ensure consistency and availability but do not
tolerate network partitions.
o Use case: Internal systems like payroll or accounting, where there is no
distributed setup.

The CAP theorem underscores the necessity of trade-offs when designing
distributed databases.

Comparison Relational databases and New NoSQL stores.


Aspect Relational Databases NoSQL Databases
(RDBMS)
Data Model Table-based with predefined Flexible data models: key-
schemas. value, document, column-
family, or graph.
Schema Rigid schema; data must adhere to Schema-less or dynamic
predefined structure. schema.
Scalability Vertically scalable (requires more Horizontally scalable
powerful hardware). (distributed across nodes).
Query Language SQL (Structured Query Varies by database; custom
Language) with standardized APIs or query languages
syntax. (e.g., CQL for Cassandra,
JSON for MongoDB).
Performance Optimized for complex queries High-speed reads/writes
and transactions. for massive datasets; trade-
offs on complex queries.
Transaction Support Strong ACID compliance Often follows BASE
(Atomicity, Consistency, (Basically Available, Soft
Isolation, Durability). state, Eventual
consistency).
Use Cases Best for structured data, complex Ideal for unstructured or
queries, and analytics (e.g., semi-structured data,
financial systems). scalability needs (e.g., IoT,
real-time applications).
Data Relationships Strong support for complex Limited or no joins;
relationships using joins. designed for denormalized
data.
Scaling Scale-up by upgrading hardware. Scale-out by adding more
servers to a cluster.
Examples MySQL, PostgreSQL, Oracle DB, MongoDB, Cassandra,
Microsoft SQL Server. Redis, DynamoDB, Neo4j.
Consistency Model Strong consistency. Can vary: eventual
consistency, strong
consistency, or
configurable.
1. What is NOSQL? Explain briefly about aggregate data models with a neat
diagram considering example of relations and aggregates.
NoSQL databases are a type of database designed to provide high performance,
scalability, and flexibility for managing large amounts of unstructured, semi-structured, or
structured data. Unlike traditional relational databases, NoSQL databases do not use a tabular
schema (rows and columns) and often trade strict consistency for other advantages like speed
and scalability.
Aggregate Data Models in NoSQL
Aggregate data models group data into collections or aggregates that can be
treated as a single unit of data. This contrasts with relational databases, where data
is spread across multiple tables and relationships. Aggregate models are
particularly useful for distributed systems because they allow related data to be
stored and retrieved together, reducing the complexity of queries and increasing
efficiency.
Types of Aggregate Models
1. Key-Value Stores:
o Data is stored as key-value pairs.
o Example: Redis, DynamoDB.
o Aggregate: A single key-value pair.
2. Document Stores:
o Data is stored in a document format (e.g., JSON, BSON).
o Example: MongoDB, CouchDB.
o Aggregate: A document containing nested structures or arrays.
3. Column-Family Stores:
o Data is organized into rows and columns but with flexible schemas.
o Example: Cassandra, HBase.
o Aggregate: A row with a set of columns.
4. Graph Databases:
o Data is stored as nodes, edges, and properties.
o Example: Neo4j.
o Aggregate: A subgraph representing a set of related entities.

Example: Relations vs. Aggregates


Consider an e-commerce system where we need to model Orders and their associated
Products:
Relational Model
In a relational database, the data is normalized into separate tables:
1. Orders Table: Contains order details (OrderID, CustomerID, OrderDate).
2. Products Table: Contains product details (ProductID, Name, Price).
3. Order_Product Table: Represents the many-to-many relationship between
Orders and Products.
Query: To retrieve an order with its products, multiple joins are required.

Aggregate Model
In a NoSQL document store (e.g., MongoDB), the same data can be
represented as:
json
Copy code
{
"OrderID": "12345",
"CustomerID": "789",
"OrderDate": "2024-12-12",
"Products": [
{ "ProductID": "001", "Name": "Laptop", "Price": 1000 },
{ "ProductID": "002", "Name": "Mouse", "Price": 50 }
]
}
Here, all related data (Order and Products) are grouped into a single
aggregate document.
Advantages of Aggregate Models
 Simplified data access (no joins).
 Efficient storage and retrieval for distributed systems.
 Flexibility to store nested and hierarchical data.

Aspect Relations Aggregates


Purpose Establish associations between Group related entities into a single unit
entities of consistency
Consistency No specific consistency Defined consistency boundary
Boundaries boundary (aggregate root)
Management Managed through foreign Managed through business logic and
keys, joins, etc. aggregate root
Entity Lifecycle Entities can be independent Entities are managed through the
aggregate root
Examples Foreign key relations between ShoppingCart containing CartItem
entities entities

2. What is NOSQL Graph database? Explain Neo4j.

A NoSQL Graph Database is a type of NoSQL database designed to handle and


represent data in the form of graphs. In this database model, data is stored as nodes
(entities) and edges (relationships), with properties attached to both. It is particularly suited
for scenarios where relationships between data are complex and require traversal for
analysis.

Key Features:
 Nodes: Represent entities (e.g., people, places, things).
 Edges: Represent relationships between nodes (e.g., "friends with", "likes").
 Properties: Key-value pairs associated with nodes and edges to store metadata or
attributes.
 Traversal: Enables efficient querying of relationships, such as shortest paths or network
connections.
 Schema-less: Flexible data models allow for dynamic and evolving schemas.

Graph databases are ideal for applications such as social networks, recommendation engines,
fraud detection, and network management, where relationships are as important as the
entities themselves.

Neo4j: A Leading NoSQL Graph Database(application=auraDB , language=cypher)


Neo4j is a prominent open-source graph database management system. It is designed to represent and
analyze highly interconnected data. Neo4j uses a property graph model, which consists of nodes,
edges, and properties.
Key Characteristics:
1. Property Graph Model:
o Nodes represent entities and can have labels (e.g., User, Product).
o Relationships (edges) connect nodes and can have types (e.g., FRIENDS_WITH,
PURCHASED).
o Properties are key-value pairs that add metadata to both nodes and relationships.
2. Query Language - Cypher: Neo4j uses Cypher, a declarative query language specifically
designed for graph data. It provides an intuitive syntax for querying and manipulating graph
data.
Example of Cypher query:
Cypher
(Alice)-[:FRIENDS_WITH]->(Bob)-[:FRIENDS_WITH]->(Charlie)
MATCH (a:Person)-[:FRIENDS_WITH]->(b:Person)
WHERE a.name = 'Alice'
RETURN b.name;
This query retrieves the names of Alice's friends.
3. High Performance for Relationship Queries: Neo4j excels at complex queries involving
relationships because it uses index-free adjacency, meaning nodes store direct references to
their adjacent nodes.
4. ACID Compliance: Ensures data consistency, reliability, and transaction management.
5. Scalability: Neo4j supports horizontal scaling with features like sharding and clustering,
making it suitable for enterprise-level applications.

Applications:
 Social Networks: Modelling users, connections, and interactions.
 Recommendation Engines: Suggesting products, content, or friends based on relationships.
 Fraud Detection: Identifying unusual patterns in financial transactions or networks.
 Knowledge Graphs: Organizing and connecting domain-specific knowledge.
Neo4j has become one of the most widely used graph databases due to its robust features, ease
of use, and extensive ecosystem of tools and integrations.

3. Briefly describe the value of Relational databases.


Value of Relational Databases
Relational databases are a cornerstone of modern data management, offering robust
and reliable systems for structured data. Here are the key values they bring:
1. Structured and Organized Data
 Data is stored in tables with predefined schemas, ensuring consistency.
 Rows represent records, and columns represent attributes, making data easy to
organize and query.
2. Data Integrity
 Support for ACID properties (Atomicity, Consistency, Isolation, Durability)
ensures data reliability and consistency, even during system failures.Constraints
(e.g., primary keys, foreign keys, and unique constraints) enforce data rules.
3. SQL Query Language
 SQL (Structured Query Language) provides a powerful, standardized way to retrieve,
manipulate, and analyze data.
 Rich functionality for joining, filtering, grouping, and aggregating data.
4. Relationships Between Data
 Use foreign keys to establish and enforce relationships between tables, enabling
complex queries and data joins.
5. Scalability and Performance
 Suitable for medium to large-scale applications with consistent and predictable
workloads.
 Modern relational databases support horizontal and vertical scaling options to handle
growing data volumes.
6. Data Security
 Role-based access control (RBAC) and permission settings ensure data protection and
restrict unauthorized access.
 Support for encryption at rest and in transit further enhances security.
7. Proven Reliability
 Decades of development and optimization make relational databases highly reliable
and widely adopted across industries.
 Major systems like Oracle, MySQL, PostgreSQL, and Microsoft SQL Server are
trusted for mission-critical applications.
8. Ecosystem and Tooling
 A rich ecosystem of tools and integrations for database management, reporting, and
analytics.
 Strong support for backups, recovery, and monitoring.
9. Versatility
 Suitable for various applications, including finance, healthcare, e-commerce, and
more.
 Can handle structured data efficiently, making them ideal for transactional systems and
analytical reporting.
In summary, relational databases are valuable for their structured data management,
data integrity, query capabilities, and reliability, making them a go-to choice for
countless organizations.

4. Write short notes on

a. Consequences of Aggregate Orientation


Aggregate orientation is a design principle used in certain NoSQL databases (e.g.,
document or key-value stores) where data is grouped into self-contained units called
aggregates. An aggregate is a collection of related data that is treated as a single unit
for operations such as reading, writing, or updating.

Consequences of Aggregate Orientation:


1. Improved Performance for Specific Queries
o Aggregates allow related data to be stored together, reducing the need for joins and
improving query performance for operations within the aggregate.
2. Data Denormalization
o Aggregates often involve denormalized data, meaning the same data might be repeated
in multiple places. While this simplifies reads, it can complicate updates as changes
must be propagated to multiple locations.
3. Simplicity in Data Retrieval
o Aggregates simplify data retrieval by grouping all related information into a single
unit. This aligns well with common application use cases and reduces the complexity
of queries.
4. Limited Flexibility for Ad Hoc Queries
o Since data is grouped into aggregates, querying across multiple
aggregates can be inefficient and may require additional logic in the
application layer.
5. Scalability and Partitioning
o Aggregates serve as natural boundaries for data partitioning and
sharding in distributed systems, enabling efficient horizontal scaling.
6. Challenges with Relationships
o Aggregate orientation works well for hierarchical or nested data but
struggles with highly relational data where many-to-many
relationships are common. In such cases, managing relationships
across aggregates can become cumbersome.
7. Atomicity within Aggregates
o Operations are atomic at the aggregate level, ensuring consistency
within each aggregate. However, maintaining atomicity across
multiple aggregates is challenging and often requires additional
mechanisms.
8. Suitability for Domain-Driven Design
o Aggregate orientation aligns with domain-driven design (DDD)
principles, where each aggregate represents a distinct domain entity or
bounded context.

b. Key valued data model


The Key-Value Data Model is a foundational type of NoSQL database
that stores data as a collection of key-value pairs. It is a simple and
efficient model that is well-suited for scenarios requiring high
performance, scalability, and flexibility.
Structure:
 Key: A unique identifier used to retrieve or store a value. Typically a string,
number, or other unique identifier.
 Value: The associated data, which can be of any format (e.g., string, JSON,
binary, etc.).
Features:
1. Simplicity:
o The data model is easy to understand and implement, focusing solely
on storing and retrieving values by their keys.
2. High Performance:
o Direct access to data using keys ensures extremely fast reads and
writes.
3. Schema-less:
o Values can have arbitrary structures, offering flexibility in how data is
stored and updated.
4. Scalability:
o Key-value stores are designed for horizontal scaling, making them
suitable for large-scale, distributed systems.
Advantages:
 Speed: Ideal for use cases requiring low-latency access to data.
 Flexibility: No predefined schema allows for storing diverse types of data.
 Ease of Use: Simple operations like GET and PUT make it easy to work
with.
Disadvantages:
 Limited Query Capabilities:
o Queries are restricted to key-based lookups. Filtering or searching
within the value is not natively supported.
 No Relationships:
o It doesn't support relationships or joins, making it unsuitable for
complex relational data.
Use Cases:
1. Session Management:
o Storing session data for users in web applications (e.g., session
tokens).
2. Caching:
o Used as a high-performance cache for frequently accessed data (e.g.,
Redis, Memcached).
3. Shopping Carts:
oStoring cart details for e-commerce applications, where each cart is
uniquely identified by a key.
4. Real-Time Applications:
o Applications requiring quick access to data, such as leaderboards or
IoT devices.

c. Document data model


The Document Data Model is a type of NoSQL database model where
data is stored as documents, typically in a structured format like JSON,
BSON, or XML. It provides flexibility in representing hierarchical and
semi-structured data, making it ideal for modern applications.
Structure:
 Document: A self-contained unit of data that contains fields (key-value
pairs) and their associated values.
 Collection: A group of related documents, similar to a table in relational
databases.
Key Features:
1. Schema-less:
o Documents can have different structures, allowing for flexibility and
ease of updates without strict schema enforcement.
2. Rich Data Representation:
o Documents can store complex, hierarchical data, such as nested arrays
and objects.
3. Indexing and Querying:
o Fields within documents can be indexed, enabling efficient querying
of specific attributes.
4. Atomicity:
o Operations are typically atomic at the document level, ensuring
consistency during updates.
Advantages:
 Flexibility:
o Documents can adapt to evolving application requirements without
requiring schema modifications.
 Ease of Use:
o JSON-like formats align closely with modern programming
paradigms, simplifying integration.
 Scalability:
o Designed for horizontal scaling, supporting distributed systems with
sharding and replication.
 Efficient Data Retrieval:
o Hierarchical data can be stored and retrieved in a single query,
reducing the need for joins.
Disadvantages:
 Inconsistent Structure:
o The absence of a strict schema can lead to inconsistencies in
document design.
 Complex Relationships:
o Managing relationships between documents (e.g., joins) requires
additional effort or denormalized designs.
 Storage Overhead:
o Denormalized data may lead to duplication and increased storage
requirements.
Use Cases:
1. Content Management Systems:
o Storing articles, blogs, and metadata with varying attributes.
2. E-commerce:
o Product catalogs with nested attributes like categories, specifications,
and reviews.
3. IoT Applications:
o Capturing real-time sensor data in structured and semi-structured
formats.
4. Personalization Engines:
o Storing user profiles, preferences, and behavioural data.

d. Column family stores.

Column Family Stores are a type of NoSQL database that organizes


data into column families, where data is stored and retrieved by rows and
columns. Unlike relational databases, which use a fixed schema, column
family stores allow dynamic column definitions, providing flexibility for
large-scale, distributed data management.
Structure:
 Rows: Each row is identified by a unique key, and it contains multiple
columns grouped into column families.
 Columns: Each column is part of a column family and stores a key-value
pair.
 Column Families: Groups of related columns that are logically organized
and stored together.
Key Features:
1. Flexible Schema:
o Each row can have different columns, allowing schema flexibility.
2. Efficient Reads and Writes:
o Column families are optimized for fast data access, especially for
wide tables with many columns.
3. Wide-Column Data:
o Ideal for use cases where rows contain large numbers of columns with
sparse or variable data.
4. Horizontal Scalability:
o Supports distributed storage with automatic sharding and replication
for handling large datasets.
Advantages:
 High Scalability:
o Designed to handle massive datasets and scale horizontally across
clusters.
 Sparse Data Optimization:
o Stores only non-empty columns, optimizing storage for sparsely
populated datasets.
 Custom Query Patterns:
o Column families allow for designing data layouts to match specific
query patterns, improving performance.
Disadvantages:
 Complexity:
o Querying data is less intuitive than SQL-based relational databases
and often requires custom query languages.
 Limited Relationship Support:
o Not suitable for applications requiring complex relationships between
data entities.
 Schema Evolution Challenges:
o While flexible, poorly designed column families can lead to
inefficiencies.
Use Cases:
1. Time-Series Data:
o Storing logs, metrics, or sensor data where rows represent timestamps
and columns store related readings.
2. Data Warehousing:
o Aggregating and querying large datasets with optimized columnar
storage.
3. Recommendation Systems:
o Organizing user preferences and behaviour in wide tables for quick
lookups.
4. Social Media Analytics:
o Managing relationships, activity feeds, and user data.
Examples of Column Family Databases:
 Apache Cassandra:
o Distributed database known for high availability and scalability.
 HBase:
o A Hadoop-based database optimized for large-scale, real-time
applications.
 ScyllaDB:
o A high-performance, Cassandra-compatible column store.

5. What are the distribution models? Briefly explain two paths of data
distribution.

Distribution models in databases refer to strategies used to distribute data across


multiple nodes or servers in a distributed database system. This distribution ensures
scalability, fault tolerance, and optimized performance .

Two Paths of Data Distribution


1. Replication
Replication involves copying data across multiple nodes to ensure redundancy
and availability. Each node contains a replica (or copy) of the same data.
Characteristics:
o Redundancy: Increases fault tolerance by ensuring data is available even if
some nodes fail.
o Consistency Models: Can vary from strong consistency (all replicas are
updated before the write is acknowledged) to eventual consistency (updates
are propagated over time).
Advantages:
o High Availability: Data is accessible even if a node goes down.
o Load Balancing: Reads can be distributed across replicas, reducing load on a
single node.
o Fault Tolerance: Data remains safe and recoverable in case of hardware
failures.
Disadvantages:
o Update Overheads: Changes must be synchronized across replicas, which can
increase latency.
o Conflict Resolution: Requires mechanisms to resolve conflicts if updates
occur simultaneously on different replicas.
Use Cases:
o Content delivery networks (CDNs).
o Systems requiring high read throughput, such as caching layers.

2. Sharding (Partitioning)
Sharding involves splitting the database into smaller, independent pieces
called shards, with each shard containing a subset of the data.
Characteristics:
o Horizontal Partitioning: Data is distributed based on specific criteria
(e.g., a range of IDs or a hash of keys).
o Key-Based Access: Each shard manages its data independently and is
identified by a shard key.
Advantages:
o Scalability: Enables horizontal scaling by adding more nodes to the
system.
o Efficient Resource Utilization: Each shard handles only a subset of
the data, reducing the load on individual nodes.
o Cost-Effectiveness: Allows scaling with commodity hardware.
Disadvantages:
o Complex Querying: Queries spanning multiple shards may require
additional coordination and can reduce performance.
o Rebalancing Challenges: Adding or removing shards requires
redistributing data, which can be complex.
o Single Point of Failure: Without a properly configured shard map,
failure of a central shard can disrupt the system.
6. Write short notes on
a. Single Server
A Single Server refers to a database deployment model where the entire
database system, including both the application and its data, is hosted on
a single server (machine). This is the simplest setup for database
management, where both the database engine and all related resources
(e.g., storage, computation, and network) are managed in one place.
Characteristics:
1. Centralized Architecture:
o All operations, including database management, are handled by a
single physical machine or server.
2. Simple Deployment:
o Setting up a single server is easy and requires minimal configuration,
making it suitable for small applications or early-stage development.
3. Limited Scalability:
o The server can handle only a limited amount of data and requests.
Performance bottlenecks may occur as the workload grows, leading to
issues like slow queries or reduced reliability.
4. Resource Contention:
o Since all database processes and application functions run on the same
server, there may be contention for resources such as CPU, memory,
and disk space, especially under heavy load.
Advantages:
1. Cost-Effective:
o Initial setup and maintenance are cheaper because only one server is
required.
2. Simplicity:
o No need for complex configurations like clustering or replication,
making it easy to manage.
3. Low Latency:
o As the database and application run on the same machine, data access
is fast due to minimal network overhead.
Disadvantages:
1. Limited Scalability:
o Performance is constrained by the capabilities of the single server.
Scaling requires moving to more complex architectures, like adding
more servers.
2. Single Point of Failure:
o If the server fails (e.g., due to hardware issues or crashes), the entire
system becomes unavailable, risking data loss or downtime.
3. Resource Constraints:
o As data volume or user load increases, the server may struggle to keep
up, leading to slower performance or crashes.

b. Combining sharding and Replication.


Combining Sharding and Replication is a common strategy used in
distributed databases to achieve both scalability and high availability. By
using both techniques, a system can handle large datasets and high traffic
while maintaining data redundancy and fault tolerance.
How It Works:
1. Sharding:
o Sharding involves dividing the data into smaller, manageable pieces
called shards. Each shard is stored on a separate server or group of
servers.
o Shards can be distributed based on a partition key (e.g., user ID or
region), allowing the database to scale horizontally by adding more
servers as data grows.
2. Replication:
o Replication involves copying the data (or shards) across multiple
servers to ensure redundancy and high availability.
o Each shard typically has multiple replicas (copies), and each replica
can serve read requests. In the event of a server failure, replicas ensure
that data is still available.
Benefits of Combining Sharding and Replication:
1. Scalability:
o Sharding allows data to be distributed across multiple servers, making
it possible to scale the database horizontally. This enables handling of
massive datasets and high traffic loads.
2. High Availability:
o Replication ensures that multiple copies of each shard are maintained.
If one replica or server goes down, other replicas can serve requests,
minimizing downtime.
3. Load Balancing:
o Replication helps distribute read requests across multiple replicas,
balancing the load and improving read performance. Writes are
typically directed to the primary shard or replica, but read requests can
be handled by any replica.
4. Fault Tolerance:
o With both sharding and replication, the system can recover from
hardware failures or network issues. If a shard server fails, the replica
can take over, and new data can be automatically redirected to healthy
nodes.

7. Differentiate between Key value and document-oriented data models.

Feature Key-Value Model Document-Oriented Model


Data Structure Simple key-value pairs Complex documents
(JSON/BSON/XML)
Query Capabilities Key-based lookups only Complex querying on document
fields
Schema Schema-less, no constraints Flexible schema, similar structure
per collection
Use Cases Caching, session Content management, e-commerce,
management, real-time IoT
Data Representation Simple, non-relational Rich, hierarchical, and relational
Scalability Horizontal scalability (easy) Horizontal scalability (more
complex)

8. Explain ACID properties, CAP theorem and BASE properties.


ACID Properties
ACID is an acronym for four properties that guarantee reliable
transaction processing in a database system. These properties are:
1. Atomicity: A transaction is an indivisible unit of work, meaning that either
all operations within the transaction are completed successfully, or none are.
If any part of the transaction fails, the entire transaction is rolled back.
2. Consistency: A transaction must move the database from one consistent
state to another. It ensures that a transaction will not violate any database
constraints or rules, maintaining data integrity.
3. Isolation: Transactions should execute independently of each other. The
intermediate state of a transaction is not visible to other transactions until it
is completed. This prevents transactions from interfering with one another,
maintaining data consistency.
4. Durability: Once a transaction has been committed, its changes are
permanent, even in the case of a system crash. The data is stored in a non-
volatile memory or storage to ensure persistence.
CAP Theorem
The CAP Theorem (also known as Brewer’s Theorem) describes the
trade-offs in distributed systems and states that a distributed system can
guarantee at most two of the following three properties:
1. Consistency: Every read operation will return the most recent write for a
given piece of data, ensuring that all nodes have the same data at the same
time.
2. Availability: Every request (read or write) will receive a response, even if
some nodes in the system are unavailable. The system remains operational
but may return outdated data in certain cases.
3. Partition Tolerance: The system can continue to function even if network
partitions or failures occur between nodes, ensuring the system's availability
and consistency in such cases.
The theorem asserts that you cannot achieve all three simultaneously, and
systems must choose a balance between them based on their use cases.
BASE Properties
BASE is a concept used in distributed systems to address the
challenges posed by the CAP Theorem. It is an alternative approach to
ACID for scenarios where high availability and partition tolerance are
prioritized, and strict consistency is relaxed:
1. Basically Available: The system guarantees availability, meaning that the
system will respond to every request (either success or failure), but the
responses may not always be up-to-date.
2. Soft State: The system may not be in a consistent state at all times. It allows
for temporary inconsistencies as long as the system can eventually resolve
them.
3. Eventual Consistency: Over time, the system will eventually reach a
consistent state. Data across all nodes will become synchronized after some
time, but not necessarily immediately after every update.
In summary, ACID focuses on strict consistency and reliability in database
transactions, CAP Theorem highlights the trade-offs in distributed systems
between consistency, availability, and partition tolerance, and BASE offers a
more flexible model for distributed systems that prioritize availability and
partition tolerance while allowing eventual consistency.

9. Explain the Impedance mismatch with a suitable example

Impedance Mismatch refers to the problems that arise when there is


a discrepancy between the data models or structures used by two systems
that need to interact, particularly in the context of databases and application
layers. This is most commonly seen when an object-oriented programming
(OOP) language, like Java or C#, interacts with a relational database
management system (RDBMS), which uses a different data representation
(tables, rows, columns).

In simple terms, the mismatch occurs because object-oriented


programming languages represent data as objects, while relational
databases represent data in a tabular form (tables). The structures are
different, making it difficult to map objects to relational tables and vice
versa.

Mismatch Problems:
1. Relational to Object Mapping: The relational database stores data in rows
and columns, while objects in OOP have properties and methods.
Converting a database row into an object (or vice versa) can be complex, as
relational data does not inherently have behaviors (methods).
For example, converting the database row of Alice (EmployeeID = 1, Name
= Alice, Department = HR) into an instance of the Java Employee class
requires mapping the columns to the object’s properties. Similarly, when
saving an object into the database, the object's methods and the relationships
between objects have to be translated into tables and foreign keys.
2. Normalization and Relationships: In relational databases, data is often
normalized to reduce redundancy, meaning data might be split across
multiple tables (e.g., a separate Department table). In OOP, you might want
to model relationships as objects (like a Department object inside an
Employee object). This requires additional logic to handle the mapping.
3. Inheritance: Object-oriented programming often uses inheritance, where a
subclass inherits properties and behaviors from a parent class. Relational
databases do not have a direct way to model this concept. For instance, if
you have an Employee class and a subclass Manager in Java, translating this
hierarchy to relational tables (which are flat) is not straightforward.
Solutions to Impedance Mismatch:
1. Object-Relational Mapping (ORM): ORM frameworks (such as Hibernate
in Java, Entity Framework in .NET, or Django ORM in Python) help bridge
the impedance mismatch by automatically handling the conversion between
objects and relational tables. These frameworks map object properties to
columns in a table and provide a mechanism to persist objects to a database
and retrieve them back as objects.
2. Data Transfer Objects (DTOs): Instead of working directly with domain
objects, one can use data transfer objects to explicitly separate the concerns
of business logic and database access. This allows different data structures
for the database and the application while maintaining a clear separation
between them.

You might also like