0% found this document useful (0 votes)
3 views

Project Work 1

The document outlines the history and evolution of big data, detailing its progression from pre-digital records to modern technologies like AI and cloud computing. It also explains the data science life cycle, applications of data analysis across various industries, and the importance of data visualization. Additionally, it covers concepts like data warehousing, machine learning, and text mining, highlighting their significance in today's data-driven world.

Uploaded by

haeinhong512
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Project Work 1

The document outlines the history and evolution of big data, detailing its progression from pre-digital records to modern technologies like AI and cloud computing. It also explains the data science life cycle, applications of data analysis across various industries, and the importance of data visualization. Additionally, it covers concepts like data warehousing, machine learning, and text mining, highlighting their significance in today's data-driven world.

Uploaded by

haeinhong512
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Project work (session

17)
Name- Pooja Pal
Email ID- [email protected]
Phase 1
1-Write about the history and evolution of big data.

The history and evolution of big data spans several decades, with key developments shaped
by advancements in technology and the growing need to manage and analyze vast amounts of
information.
1. Pre-Digital Era: Early data management relied on physical records like tally marks and
paper documents.
2. 1950s-1970s (Computers & Databases): The invention of early computers and relational
databases like IBM’s IMS laid the foundation for managing digital data.
3. 1980s-1990s (Internet Growth): The rise of the internet, personal computing, and online
platforms like Amazon and eBay increased data generation, straining traditional database
systems.
4. 2000s (Big Data Emerges): The term "big data" emerged as companies started using
technologies like Hadoop to handle vast amounts of diverse data from the web, social
media, and mobile devices.
5. 2010s-Present (Mainstream Adoption): Big data became central to industries, powered
by advancements in AI, machine learning, cloud computing, and real-time data
processing. Tools like Tableau and Power BI made data insights more accessible.
6. Challenges: Issues like data privacy, security, and algorithmic bias grew, prompting the
need for strong data governance.
7. Future: Emerging technologies like edge computing, quantum computing, and AI-
powered analytics will continue to shape the future of big data.
2- Explain data science life cycle.

The Data Science Life Cycle is a series of stages used to solve data problems. Here’s a brief
overview:
Problem Definition: Define the problem or question to be solved.
1. Data Collection: Gather relevant data from various sources.
2. Data Cleaning & Preprocessing: Clean the data by handling missing values, duplicates,
and formatting issues.
3. Exploratory Data Analysis (EDA): Analyze the data to find patterns, correlations, and
insights.
4. Modeling: Build and train machine learning models to address the problem.
5. Model Evaluation: Assess model performance using metrics and validation techniques.
6. Deployment: Deploy the model into a production environment for real-world use.
7. Monitoring & Maintenance: Continuously monitor and update the model to maintain its
performance.
3. Explain the application of data analysis.

Data analysis is widely used across industries to improve decision-making and efficiency.
In business and marketing, it helps understand customer behavior, optimize marketing
strategies, and predict sales. In healthcare, it enhances patient care and aids in disease
detection.
Finance uses it for fraud detection and risk management, while e-commerce leverages
data to personalize recommendations and optimize pricing. In manufacturing, data
analysis supports predictive maintenance and inventory management.
Lastly, education uses it to assess student performance and improve teaching methods.
Data analysis helps businesses understand customer behavior, improve marketing
strategies, and predict sales trends, enabling personalized campaigns and better product
offerings.
Phase 2
4. Difference between OLAP and OLTP.
5. Explain data warehousing and its types.

Data warehousing is the process of storing large volumes of data from different sources in a centralized
system, optimized for analysis and reporting.
Types of Data Warehouses:

Enterprise Data Warehouse (EDW): A centralized warehouse integrating data across the entire
organization for comprehensive analysis.

Operational Data Store (ODS): Stores real-time or near-real-time operational data for immediate
reporting and decision-making.

Data Mart: A smaller, department-specific version of a data warehouse, focusing on a particular business
unit like marketing or sales.

Cloud Data Warehouse: A cloud-based solution offering scalable, cost-effective data storage and
processing, such as Amazon Redshift or Snowflake.
Each type serves different purposes, from organizational-wide data analysis to specific departmental
needs.
6-Difference between descriptive business analysis and predictive business analysis.
• Descriptive Business Analysis:
• Purpose: Focuses on what has happened in the past.
• Function: Analyzes historical data to understand trends, patterns, and behaviors.
• Techniques: Uses tools like reporting, dashboards, and basic data aggregation (e.g., average
sales, total revenue).
• Outcome: Provides insights into past performance, helping businesses understand past
actions and outcomes.
• Example: Analyzing last quarter's sales data to determine which products performed best.
• Predictive Business Analysis:
• Purpose: Focuses on what is likely to happen in the future.
• Function: Uses historical data and statistical models to forecast future trends, outcomes,
and behaviors.
• Techniques: Involves machine learning, regression models, time series analysis, and other
forecasting techniques.
• Outcome: Helps businesses anticipate future trends and make data-driven predictions
about upcoming events or behaviors.
Phase 3 (8 marks)

7- Why is data visualization important?


Data visualization is important for several reasons:
1. Simplifies
2. Complex Data: It turns complex datasets into visual formats like charts, graphs, and maps,
making it easier to understand trends, patterns, and outliers at a glance.
3. Improves Decision-Making: By providing clear, immediate insights, data visualization helps
decision-makers understand data quickly, leading to more informed, data-driven decisions.
4. Enhances Data Interpretation: It helps people see connections between different data points,
which might not be obvious in raw data, and enables deeper insights through visual
representation.
5. Increases Engagement: Visuals are more engaging and memorable than raw numbers, making it
easier to communicate insights to diverse audiences, including non-technical stakeholders.
6. Identifies Trends & Patterns: Data visualization allows for quick identification of trends,
correlations, and anomalies, aiding in forecasting and spotting business opportunities or risks.
7. Facilitates Storytelling: It allows data to tell a story, making it easier to convey the narrative
behind the numbers and highlight key insights in a compelling way.
8- What is machine learning?

Machine learning (ML) is a type of artificial intelligence that enables computers to learn
from data and improve their performance without being explicitly programmed. It uses
algorithms to identify patterns in data, make predictions, and adapt over time.
• Learning from Data: ML models improve as they are exposed to more data.
• Types of Learning:
o Supervised Learning: The model learns from labeled data to make predictions.
o Unsupervised Learning: The model finds patterns in unlabeled data.
o Reinforcement Learning: The model learns by receiving feedback from its actions.
9- What are the applications of machine learning ?
Machine learning (ML) is applied in various fields to improve efficiency and
decision-making-
Healthcare: Used for disease diagnosis, predictive analytics, and personalized
treatments.
Finance: Helps detect fraud, assess credit scores, and automate trading.
E-Commerce: Powers recommendation systems, customer segmentation, and
dynamic pricing.
Autonomous Vehicles: Used in self-driving cars and traffic prediction.
Natural Language Processing: Enables speech recognition, text analysis, and
machine translation.
Manufacturing: Supports predictive maintenance and quality control.
Cybersecurity: Detects anomalies and classifies malware.
Entertainment: Powers content recommendations and game AI.
Retail: Improves inventory management and customer support.
Agriculture: Assists with crop prediction and disease detection.
10- What is text mining?
Text mining is the process of extracting meaningful information and patterns from
unstructured text data using computational and statistical techniques. It involves analyzing
large volumes of text (such as documents, articles, social media posts, emails, etc.) to discover
hidden insights, trends, and relationships.

You might also like