Big Data
Analytics
                  GROUP 12
   University Roll No.                 Name
11500221052              Shuvam Pal
11500221054              Hrishikesh Kumar Chaudhary
11500221055              Ishika Rana
11500221056              Dipanjan Saha
What is big data?
“Big Data refers to data volumes in the range of
exabytes and beyond”
  1.   Sam Madden from Massachusetts Institute of
       Technology (MIT) considers” Big Data” to be
       data that is too big, too fast, or too hard for
       existing tools to process.
  2.   It refers to the massive amounts of data that
       is collected over time that are difficult to
       analyze and handle using common database
       management tools.
                                             Structured data refers to highly organized
                                        information that is easily searchable and typically
      ● Textual content                  stored in relational databases or spreadsheets. It
      ● Multimedia content                adheres to a rigid schema, meaning each data           ● Customer names and
      ● Data from IoT                      element is clearly defined and accessible in a          addresses in a CRM
        devices                                   fixed field within a record or file.             system
                                                                                                 ● Transactional data
                                                        Structured Data                          ● Employee data
   Unstructured data lacks a pre-defined data
model, making it more difficult to collect, process
  and analyze. It comprises the majority of data                            Semi-structured data occupies the middle ground
 generated today, and includes formats such as:                             between structured and unstructured data. While
                                                                               it does not reside in a relational database, it
              Unstructured Data                ● JSON (JavaScript Object        contains tags or other markers to separate
                                                  Notation) and XML           semantic elements and enforce hierarchies of
                                                  (eXtensible Markup                records and fields within the data.
                                                  Language) files
                                               ● Email
                                                                                         Semi-structured data
                                               ● NoSQL databases
                               Types of big data
                       Big Data Analytics
Big data analytics refers to the systematic
processing and analysis of large amounts of
data and complex data sets, known as big data,
to extract valuable insights. Big data analytics
allows for the uncovering of trends, patterns
and correlations in large amounts of raw data to
help analysts make data-informed decisions.
This process allows organizations to leverage
the exponentially growing data generated from
diverse sources, including internet-of-
things (IoT) sensors, social media, financial
transactions and smart devices to derive
actionable intelligence through advanced
analytic techniques.
                         How big data analytics works
 1. Collect Data               2. Process Data                3. Clean Data                  4. Analyze Data
• Data collection looks      • Once data is collected       • Data big or small requires   • Getting big data into a
  different for every          and stored, it must be         scrubbing to improve           usable state takes time.
  organization. With           organized properly to get      data quality and get           Once it’s ready, advanced
  today’s technology,          accurate results on            stronger results; all data     analytics processes can
  organizations can gather     analytical queries,            must be formatted              turn big data into big
  both structured and          especially when it’s large     correctly, and any             insights. Some of these
  unstructured data from a     and unstructured.              duplicative or                 big data analysis
  variety of sources         • Batch processing, which        irrelevant data must be        methods include:
• Raw or unstructured data     looks at large data blocks     eliminated or accounted
  that is too diverse or       over time.                     for.                         • Data mining
  complex for a warehouse    • Stream processing            • Dirty data can obscure       • Predictive analytics
  may be assigned              looks at small batches of      and mislead, creating        • Deep learning
  metadata and stored in a     data at once, shortening       flawed insights.
  data lake.                   the delay time between
                               collection and analysis
                               for quicker decision-
                               making.
Volume                                                                Value
The sheer volume of data                                              Big data analytics aims to
generated today, from social
media feeds, IoT devices,
                                   The five V's of                    extract actionable insights
                                                                      that offer tangible value.
transaction records and
more, presents a significant         big data                         This involves turning vast
                                                                      data sets into meaningful
challenge.                                                            information that can inform
                                     analytics                        strategic decisions.
    Velocity
                                                                    Veracity
    The velocity at which data
    flows into organizations                                        Veracity refers to the data's
                                                                    trustworthiness,
    requires robust processing      Variety                         encompassing data quality,
    capabilities to capture,
    process and deliver accurate    Data comes in many formats      noise and anomaly detection
    analysis in near real-time.     this variety demands flexible   issues.
                                    data management systems
                                    to handle and integrate
                                    disparate data types for
                                    comprehensive analysis.
                             Types of big data analytics
  Descriptive analytics.                                    Predictive analytics.
• This is the simplest form of analytics, where data is   • This refers to analysis that predicts what comes next.
  analyzed for general assessment and                       For example, this could include monitoring the
  summarization. For example, in sales reporting, an        performance of machines in a factory and comparing
  organization can analyze the efficiency of                that data to historical data to determine when a
  marketing from such data.                                 machine is likely to break down or require
                                                            maintenance or replacement.
  Diagnostic analytics.                                     Prescriptive analytics.
• This refers to analytics that determine why a           • This form of analysis follows diagnostics and
  problem occurred. For example, this could include         predictions. After an issue has been identified, it
  gathering and studying competitor pricing data to         provides a recommendation of what can be done
  determine when a product's sales fell off because         about it. For example, this could include addressing
  the competitor undercut it with a price drop.             inconsistencies in supply chain that are causing
                                                            pricing problems by identifying suppliers whose
                                                            performance is unreliable, suggesting their
                                                            replacement.
        Benefits of using big data analytics
         01                   02                  03
Real-time intelligence   Better-informed       Cost savings
                            decisions
        04                    05                  06
  Better customer         Optimized risk    Market insights and
    engagement            management       Product development
                           strategies
                Big challenges of big data
 Making big data accessible.                 Maintaining quality data.
Collecting and processing data becomes        With so much data to maintain,
  more difficult as the amount of data     organizations are spending more time
 grows. Organizations must make data          than ever before scrubbing for
easy and convenient for data owners of     duplicates, errors, absences, conflicts,
           all skill levels to use.                 and inconsistencies.
                                            Finding the right tools and
     Keeping data secure.                            platforms.
     As the amount of data grows, so        New technologies for processing and
    do privacy and security concerns.      analyzing big data are developed all the
   Organizations will need to strive for   time. Organizations must find the right
compliance and put tight data processes        technology to work within their
 in place before they take advantage of     established ecosystems and address
                big data.                           their particular needs.
  Big data analytics technologies and tools
    Hadoop                    Python and R   Machine Learning
                   Tableau
                                             Frameworks (e.g.,
                                               TensorFlow)
NoSQL databases   MapReduce      YARN             Spark
             Conclusion
Big Data Analytics is a game-changer that’s shaping a
smarter future. From improving healthcare and
personalizing shopping to securing finances and predicting
demand, it’s transforming various aspects of our lives.
However, Challenges like managing overwhelming data and
safeguarding privacy are real concerns. In our world flooded
with data, Big Data Analytics acts as a guiding light. It helps
us make smarter choices, offers personalized experiences,
and uncovers valuable insights. It’s a powerful and stable
tool that promises a better and more efficient future for
everyone.
Thank You !!
We express our profound gratitude to our mentors and members who
have provided their support and guidance and their invaluable
assistance and encouragement.
          Resources
 ●   https://journalofcloudcomputing.springe
     ropen.com/articles/10.1186/s13677-
     022-00301-w
 ●   https://www.techtarget.com/searchbusi
     nessanalytics/definition/big-data-
     analytics
 ●   https://www.ibm.com/topics/big-data-
     analytics
 ●   https://www.geeksforgeeks.org/what-is-
     big-data-analytics/