03) Introduction to Database data management approches
03) Introduction to Database data management approches
Data can be categorized based on its volume, variety, velocity, and veracity. The
increasing volume of data presents storage and processing challenges. The variety
of data formats requires flexible tools for analysis. The velocity of data generation
necessitates real-time processing capabilities. Data veracity is crucial for ensuring
the reliability of insights derived from data analysis.
Big data refers to datasets that are so large,
complex, and rapidly growing that traditional data processing techniques
are inadequate.
Big data typically exhibits the 4Vs characteristics: high volume, high variety,
high velocity, and potentially lower veracity due to its sheer size.
Big data requires specialized tools and infrastructure for storage, processing,
and analysis.
There's no strict definition of "big." It depends on an organization's data
processing capabilities and the value derived from the data.
As data volume, variety, and velocity increase, data may become "big
data" at a certain point.
The challenge of managing big data arises when traditional processing
tools become overwhelmed by the data's size and complexity.
Information is processed and organized data that has meaning and
context.
Data becomes information when it's analyzed, interpreted, and presented
in a way that reveals patterns, trends, or relationships.
Examples of information:
A sales report showing top-selling products
A weather forecast based on temperature and pressure data
A customer profile based on purchase history and demographics
Data on its own isn't very useful. It's when we process and analyze data that
it becomes information. Information helps us understand the world around
us, make decisions, and solve problems. By applying context and analysis,
we can transform raw data into insights that can be acted upon.
A database (DB) is a structured collection of data organized for electronic
storage and retrieval.
It acts like an electronic library where information is stored and categorized
for efficient access.
Databases allow us to:
Store large amounts of data efficiently
Organize data in a structured way
Access and retrieve data quickly and easily
Share data with authorized users
Think of a database as a digital filing cabinet for information. It provides a
systematic way to store and organize data, making it easier to find what
you need. Databases are essential for managing large amounts of data in
various fields, from business to healthcare to scientific research.
A Database Management System (DBMS) is a software application used to
create, manage, and maintain databases.
It acts as an interface between the database and users or applications that
need to access the data.
Key features of a DBMS:
Data definition : Define the structure and organization of the data in the database.
Data manipulation : Insert, update, and delete data within the database.
Data retrieval : Query the database to retrieve specific information.
Security : Control access to the database and ensure data integrity.
A DBMS is the software that helps us interact with a database. It provides
tools for creating the database structure, storing and manipulating data,
and retrieving information. DBMS software also enforces security measures
to protect data integrity and control access. Popular DBMS examples
include MySQL, Oracle Database, and Microsoft SQL Server.
What is it?
Stores data in separate computer files, each
managed by a specific application program.
Think of it like a traditional filing cabinet with
separate folders for different categories of
documents.
The file-based approach is a straightforward
method of data storage. Each application
manages its own data files, similar to how you
might organize documents in a filing cabinet. While
simple to set up, it comes with limitations as your
data needs grow.
CSV files (Comma-Separated Values)
Description:
• Imagine a table with rows and columns, just
like a spreadsheet. Each cell in the CSV file
contains data separated by commas. This
simple format makes it easy for humans to
read and for different software programs to
understand the data structure.
Text files for logs
Description: Think of a plain
text document where each
line represents a single log
entry. Logs typically contain
timestamps, messages, and
sometimes even error codes.
These files are simple and
efficient for storing a
chronological record of
events.
Description: Imagine a family tree with
parents and children. XML files use a
similar structure with elements and
attributes. Each element represents a
piece of data, and attributes provide
additional details. This hierarchical
structure allows for complex
configurations to be defined in a
readable and organized way.
Simplicity: Easy to set up and use, particularly for small datasets.
Low Cost: No need for expensive DBMS software.
Flexibility: Developers have more control over file structure and data access.
Familiarity: Programmers already familiar with file handling techniques.
: Same data may be stored in multiple files, leading to
wasted storage and inconsistencies.
Data Inconsistency: Updates to one file may not be reflected in others,
causing discrepancies.
Data Sharing and Access Control Difficulties: Sharing data between
applications can be complex. Access control mechanisms might be limited.
Scalability Limitations: Managing and maintaining large datasets in
separate files can be challenging.
Data Integrity Issues: Ensuring data integrity becomes more difficult without
the features of a DBMS.
Small, Simple Applications: For applications
with limited data and user needs, a file-based
approach might be sufficient.
Prototyping or Proof-of-Concept Projects:
During initial development stages, a file-
based approach can offer a quick and easy
way to manage data.