4. GE ELECT 1 - Data and Databases
4. GE ELECT 1 - Data and Databases
Introduction
Welcome to the module on Data and Databases. In this module, we will explore the fundamental concepts of
data, information, and knowledge, and understand the importance of databases for managing data resources.
Additionally, we will delve into the concept of big data, various types of databases, and the basics of SQL.
1.1 Data
• Definition: Data consists of raw, unprocessed facts and figures without context. It can include
numbers, text, images, and other types of raw input.
• Examples:
o A list of customer names and their contact numbers.
o A series of numerical values like 25, 30, 35, 40.
1.2 Information
• Definition: Information is data that has been processed, organized, and structured to be meaningful
and useful. It provides context and relevance to raw data.
• Examples:
o A report showing the average age of customers based on the numerical data.
o A summary of customer contact information categorized by region.
1.3 Knowledge
• Definition: Knowledge is derived from information through analysis and interpretation. It involves
understanding patterns and making informed decisions based on information.
• Examples:
o Using customer age data to create marketing strategies tailored to different age groups.
o Predicting future customer needs based on historical purchasing patterns.
MJ Pagay-Cierva Property
Why Understanding These Concepts is Important:
2. Big Data
2.1 Definition
• Big Data: Refers to extremely large datasets that are complex and difficult to process using traditional
data processing tools.
2.2 Characteristics
• Volume: Refers to the enormous amount of data generated every second. For example, social media
platforms generate terabytes of data daily.
• Velocity: Refers to the speed at which data is generated and processed. For instance, real-time data
streaming from sensors.
• Variety: Refers to the different types of data formats, such as structured data (tables), semi-structured
data (XML, JSON), and unstructured data (text, images).
• Traditional Data: Often structured and manageable with conventional tools and databases.
• Big Data: Requires advanced technologies and tools (e.g., Hadoop, Spark) to handle its scale and
complexity.
Why It Matters:
• Scalability: Big Data technologies allow organizations to process vast amounts of information
efficiently.
• Insights: Analyzing big data can reveal patterns and trends that inform strategic decisions.
3. Databases
• Data Model: A framework that defines the structure of data, including how data is stored, organized,
and manipulated.
• Relational Database: A type of database that uses tables to represent data and relationships between
data. Each table consists of rows (records) and columns (fields).
o Example: A table for storing employee information with columns for Employee ID, Name,
Department, and Salary.
MJ Pagay-Cierva Property
3.2 Designing a Database
• Schema: The blueprint of a database that outlines the structure, including tables, fields, and
relationships.
• Normalization: The process of organizing data to reduce redundancy and ensure data integrity. It
involves dividing a database into tables and defining relationships to minimize duplication.
o Example: Splitting a customer table into separate tables for customer details and orders.
• Definition: The types of data that can be stored in a database, such as integers, floating-point numbers,
strings, dates, and more.
o Example: An integer data type for storing age, and a string data type for storing names.
Why It Matters:
• Efficient Storage: Proper database design and normalization improve data storage efficiency and
reduce redundancy.
• Data Integrity: Ensures that data remains accurate and consistent.
4.1 Definition
• DBMS: Software that manages and organizes databases, allowing users to store, retrieve, and
manipulate data efficiently.
4.2 Functions
Why It Matters:
• Centralized Management: A DBMS provides a central repository for data, making it easier to manage
and maintain.
• Scalability: DBMSs can handle large volumes of data and complex queries.
MJ Pagay-Cierva Property
5. SQL (Structured Query Language)
5.1 Definition
• SQL: A language used to interact with relational databases. It allows users to perform operations such
as querying, updating, and managing data.
Why It Matters:
• Data Manipulation: SQL provides powerful tools to manage and manipulate data effectively.
• Query Efficiency: Enables users to perform complex queries and analyses on data.
• Definition: Databases designed for handling unstructured and semi-structured data. They offer flexible
data models and are suitable for large-scale applications.
o Example: MongoDB (document-oriented), Cassandra (column-family).
• Definition: Databases that store data as objects, similar to how data is represented in object-oriented
programming.
o Example: db4o, ObjectDB.
Why It Matters:
• Flexibility: Different types of databases are tailored for various use cases and data requirements.
• Specialized Applications: NoSQL and object-oriented databases can handle specific data needs and
scale effectively.
MJ Pagay-Cierva Property
Conclusion
Understanding the distinctions between data, information, and knowledge, as well as the role of database
technology, is crucial for effective data management. This module has provided an overview of these concepts,
as well as practical insights into big data, database design, and SQL. Mastery of these topics will equip you
with the foundational knowledge necessary for managing and utilizing data resources efficiently.
Prepared by:
MJ Pagay-Cierva Property