0% found this document useful (0 votes)
12 views7 pages

Big Data Chapter 1

Big data refers to large, complex data sets characterized by high volume, variety, and velocity, which traditional data processing software cannot manage. It encompasses structured, unstructured, and semi-structured data from various sources like social media, e-commerce, and telecommunications, and is crucial for analytics that drive business insights. Applications of big data span multiple sectors, including healthcare, finance, and government, enabling improved decision-making and operational efficiency.

Uploaded by

Pawan Hire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views7 pages

Big Data Chapter 1

Big data refers to large, complex data sets characterized by high volume, variety, and velocity, which traditional data processing software cannot manage. It encompasses structured, unstructured, and semi-structured data from various sources like social media, e-commerce, and telecommunications, and is crucial for analytics that drive business insights. Applications of big data span multiple sectors, including healthcare, finance, and government, enabling improved decision-making and operational efficiency.

Uploaded by

Pawan Hire
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

a

Introduction to Big Data


1.1 Introduction to Big data:-

What exactly is big data?


The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three “Vs.”

Put simply, big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data
processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able
to tackle before.

Data which are very large in size is called Big Data. Normally we work on data of size MB(WordDoc ,Excel) or maximum GB(Movies, Codes) but data in
Peta bytes i.e. 10^15 byte size is called Big Data. It is stated that almost 90% of today's data has been generated in the past 3 years.

Sources of Big Data

o Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data on a day to day basis as they have billions of users
worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data which are stored and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly publish their plans and for this they store the data of
its million users.
o Share Market: Stock exchange across the world generates huge amount of data through its daily transaction.

3V's of Big Data

o Velocity: The data is increasing at a very fast rate. It is estimated that the volume of data will double in every 2 years.
o Variety: Now a days data are not stored in rows and column. Data is structured as well as unstructured. Log file, CCTV footage is unstructured data.
Data which can be saved in tables are structured data like the transaction data of the bank.
o Volume: The amount of data which we deal with is of very large size of Peta bytes.
o

1.2 Types of Digital data


1. Structured Data :

Structured data is created using a fixed schema and is maintained in tabular format. The elements in structured data are addressable for effective
analysis. It contains all the data which can be stored in the SQL database in a tabular format. Today, most of the data is developed and processed in the
simplest way to manage information.

Examples –
Relational data, Geo-location, credit card numbers, addresses, etc.
Consider an example for Relational Data like you have to maintain a record of students for a university like the name of the student, ID of a student,
address, and Email of the student. To store the record of students used the following relational schema and table for the sam e.
S_ID S_Name S_Address S_Email

1001 A Delhi [email protected]

1002 B Mumbai [email protected]

2. Unstructured Data :
It is defined as the data in which is not follow a pre-defined standard or you can say that any does not follow any organized format. This kind of data is
also not fit for the relational database because in the relational database you will see a pre-defined manner or you can say organized way of data.
Unstructured data is also very important for the big data domain and To manage and store Unstructured data there are many pla tforms to handle it
like No-SQL Database.

Examples –
Word, PDF, text, media logs, etc.

3. Semi-Structured Data :

Semi-structured data is information that does not reside in a relational database but that have some organizational properties tha t make it easier to
analyze. With some process, you can store them in a relational database but is very hard for some kind of semi-structured data, but semi-structured
exist to ease space.
Example –
XML data.

1.3 Big Data Analytics

What is big data analytics?


Big data analytics is the process of collecting, examining, and analysing large amounts of data to discover market trends, insights, and patterns that can help
companies make better business decisions. This information is available quickly and efficiently so that companies can be agile in crafting plans to maintain
their competitive advantage.

Technologies such as business intelligence (BI) tools and systems help organisations take unstructured and structured data from multiple sources. Users
(typically employees) input queries into these tools to understand business operations and performance. Big data analytics uses the four data analysis
methods to uncover meaningful insights and derive solutions.

For example, big data analytics is integral to the modern health care industry. As you can imagine, systems that must manage thousands of patient records,
insurance plans, prescriptions, and vaccine information.

Types of big data analytics

1. Descriptive analytics

Descriptive analytics refers to data that can be easily read and interpreted. This data helps create reports and visualise information that can detail company
profits and sales.

Example: During the pandemic, a leading pharmaceutical company conducted data analysis on its offices and research labs. Descriptive analytics helped
them identify consolidated unutilised spaces and departments, saving the company millions of pounds.

2. Diagnostics analytics

Diagnostics analytics helps companies understand why a problem occurred. Big data technologies and tools allow users to mine and recover data that helps
dissect an issue and prevent it from happening in the future.

Example: An online retailer’s sales have decreased even though customers continue to add items to their shopping carts. Diagnostics analytics helped to
understand that the payment page was not working correctly for a few weeks.
3. Predictive analytics

Predictive analytics looks at past and present data to make predictions. With artificial intelligence (AI), machine learning, and data mining, users can analyse
the data to predict market trends.

Example: In the manufacturing sector, companies can use algorithms based on historical data to predict if or when a piece of equipment will malfunction or
break down.
4. Prescriptive analytics

Prescriptive analytics solves a problem, relying on AI and machine learning to gather and use data for risk management.

Example: Within the energy sector, utility companies, gas producers, and pipeline owners identify factors that affect the price of oil and gas to hedge risks.

Big data analytics tools

• Hadoop: An open-source framework that stores and processes big data sets. Hadoop can handle and analyse structured and unstructured data.
• Spark: An open-source cluster computing framework for real-time processing and data analysis.
• Data integration software: Programs that allow big data to be streamlined across different platforms, such as MongoDB, Apache, Hadoop, and Amazon EMR.
• Stream analytics tools: Systems that filter, aggregate, and analyse data that might be stored in different platforms and formats, such as Kafka.
• Distributed storage: Databases that can split data across multiple servers and can identify lost or corrupt data, such as Cassandra.
• Predictive analytics hardware and software: Systems that process large amounts of complex data, using machine learning and algorithms to predict future
outcomes, such as fraud detection, marketing, and risk assessments.
• Data mining tools: Programs that allow users to search within structured and unstructured big data.
• NoSQL databases: Non-relational data management systems ideal for dealing with raw and unstructured data.
• Data warehouses: Storage for large amounts of data collected from many different sources, typically using predefined schemas.

1.4 Application of Big data


1.Travel and Tourism
Travel and tourism are the users of Big Data. It enables us to forecast travel facilities requirements at multiple locations, improve business through
dynamic pricing, and many more.

2.Financial and banking sector


The financial and banking sectors use big data technology extensively. Big data analytics help banks and customer behaviour on the basis of investment
patterns, shopping trends, motivation to invest, and inputs that are obtained from personal or financial backgrounds.

3.Healthcare
Big data has started making a massive difference in the healthcare sector, with the help of predictive analytics, medical professionals, and health care
personnel. It can produce personalized healthcare and solo patients also.

4. Telecommunication and media


Telecommunications and the multimedia sector are the main users of Big Data. There are zettabytes to be generated every day and handling large-scale
data that require big data technologies.

5. Government and Military

The government and military also used technology at high rates. We see the figures that the government makes on the record. In the military, a fighter
plane requires to process petabytes of data.

Government agencies use Big Data and run many agencies, managing utilities, dealing with traffic jams, and the effect of crime like hacking and online
fraud.

Aadhar Card: The government has a record of 1.21 billion citizens. This vast data is analyzed and store to find things like the number of youth in the country.
Some schemes are built to target the maximum population. Big data cannot store in a traditional database, so it stores and analyze data by using the Big
Data Analytics tools.

6. E-commerce
E-commerce is also an application of Big data. It maintains relationships with customers that is essential for the e-commerce industry. E-commerce
websites have many marketing ideas to retail merchandise customers, manage transactions, and implement better strategies of innovative ideas to
improve businesses with Big data.

You might also like