Data and Data Storage
Data and Data Storage
{ <person>
"name": "Alice", <name>Alice</name>
"age": 30, <age>30</age>
"skills": ["Python", "Data Analysis"] <skills>
} <skill>Python</skill>
<skill>Data Analysis</skill>
</skills>
</person>
YAML (Yet Another Markup Language)
name: Alice
age: 30
skills:
- Python
- Data Analysis
Data Storage Options
• Relational databases: Structured Data stored and organized in tables,
rows, and columns that are related to each other
• NoSQL databases : Stores unstructured Data in non tabular format
• Data Warehouse : a centralized repository of data that stores cleaned
and processed data that's structured and historical and organizes
information from multiple sources for business analysis and
reporting.
• Data lakes : a centralized repository that stores, processes, and
secures large amounts of raw data, including structured and
unstructured data, at any scale.
Non-Relational Databases (NoSQL databases)
They are very efficient in analyzing large size unstructured data.
• Key-value databases : Store and Manage associative array (dictionary or hash table)
consists of a collection of key-value pairs in which a key serves as a unique identifier to
retrieve an associated value. Values can be anything from simple objects, like integers or
strings, to more complex objects, like JSON structures.
• The internal schema describes how the data will be physically stored and accessed, using the facilities provided by a
particular DBMS.
• The conceptual schema describes the organization of the data into tables and columns
• The external schemas specify views that enable different users of the data to see it in different ways.
What is a data center?
• A data center is a physical location that stores computing machines
and their related hardware equipment. It contains the computing
infrastructure that IT systems require, such as servers, data storage
drives, and network equipment. It is the physical facility that stores
any company’s digital data.
• Key Components of enterprise data center infrastructure:
o Compute
o Storage
o Network
Types of Data Centers
On Prem
Public Cloud
CoLocation
Cloud
Hybrid
Enterprise (on-premises) data centers
• In this data center model, all IT infrastructure and data is hosted on-premises. Many companies
choose to have their own on-premises data centers because they feel they have more control
over information security, and can more easily comply with regulations such as the European
Union General Data Protection Regulation (GDPR) or the U.S. Health Insurance Portability and
Accountability Act (HIPAA). In an enterprise data center, the company is responsible for all
deployment, monitoring, and management tasks.
• On-premises data centers are fully owned company data centers that store sensitive data and
critical applications for that company. You set up the data center, manage its ongoing operations,
and purchase and maintain the equipment.
• Benefits: An enterprise data center can give better security because you manage risks internally.
You can customize the data center to meet your requirements.
• Limitations: It is costly to set up your own data center and manage ongoing staffing and running
costs. You also need multiple data centers because just one can become a single high-risk point of
failure.
Public cloud data centers
• Cloud data centers (also called cloud computing data centers) house IT
infrastructure resources for shared use by multiple customers—from scores
to millions of customers—via an Internet connection.
• Benefits: Colocation facilities reduce ongoing maintenance costs and provide fixed
monthly costs to house your hardware. You can also geographically distribute hardware
to minimize latency and to be closer to your end users.
• Limitations: It can be challenging to source colocation facilities across the globe and in
different geographical areas you target. Costs could also add up quickly as you expand.
• In a managed data center, the client company leases dedicated servers, storage and
networking hardware from the data center provider, and the data center provider
handles the administration, monitoring and management for the client company.
Cloud data centers
• A cloud data center moves a traditional on-prem data center off-site.
Instead of personally managing their own infrastructure, an organization
leases infrastructure managed by a third-party partner and accesses data
center resources over the Internet. Under this model, the cloud service
provider is responsible for maintenance, updates, and meeting service level
agreements (SLAs) for the parts of the infrastructure stack under their
direct control.
• Benefits: A cloud data center reduces both hardware investment and the
ongoing maintenance cost of any infrastructure. It gives greater flexibility in
terms of usage options, resource sharing, availability, and redundancy.
Difference between Cloud and On Prem Data Center
S.No Cloud On PRem
Cloud is a virtual resource that helps businesses to store, Data Center is a physical resource that helps businesses to store,
1.
organize, and operate data efficiently. organize, and operate data efficiently.
The scalability of the cloud required less amount of The scalability of Data Center is huge in investment as compared to
2.
investment. the cloud.
The maintenance cost is less than service providers The maintenance cost is high because developers of the organization
3.
maintain it. do maintenance.
Third-Party needs to be trusted for the organization’s data The organization’s developers are trusted for the data stored in data
4.
to be stored. centers.
5. Performance is huge as compared with investment. Performance is less than compared to investment.
6. It requires a plan to customize the cloud. It is easily customizable without any hard plan.
It requires a stable internet connection to provide the
7. It may and may not require an internet connection.
function.
Data Centers require experienced developers to operate and are
8. Cloud is easy to operate and is considered a viable option.
considered not a viable option.
9. Data is generally collected from the internet Here, data is collected from the Organization’s network.