datasets containing a large amount of diverse data devices we use on a daily basis are constantly collecting and using data to perform functions and help us see important information clearly. Generated from many sources daily: business processes, machines, social media platforms, networks, human interactions, and many more It can: - decrease labor needs and drastically reduce costs by providing greater access to information, enabling timely and informed decision making, among other things. - handle routine tasks and save time - help analyze social media platforms - Improving customer experiences - Supporting innovation 5Vs of Big Data Volume: refers to the amount of data that is being collected Velocity refers to the rate at which data is coming in. Variety refers to the different kinds of data (data types, formats, etc.) that is coming in for analysis. Value refers to the usefulness of the collected data. Veracity refers to the quality of data that is coming in from different sources. Big Data process 1. DataCollection: from various sources such as social media, sensors, transactional systems, customer reviews, and other sources. 2. Data Storage: stored in a way that it can be easily accessed and analyzed later by specialized storage technologies capable of handling large volumes of data. Cloud storage by cloud service providers (like Amazon Web Services, Microsoft Azure, or Google Cloud Platform) who takes the responsibility of managing and storing the data. The data can be accessed easily and quickly with an API. Hadoop - open-source software framework - gives the ability to store and process large amounts of data at once 3. Data Processing: cleaning and organizing the data to remove any errors or inconsistencies, and transform into a format suitable for analysis. 4. Data Analysis: using tools like statistical models and machine learning algorithms to identify patterns, relationships, and trends. 5. Data Visualization: presented in visual formats such as graphs, charts, and dashboards, making it easier for decision-makers to understand and act upon them. Big Data Processing Big Data Tools • Apache Hadoop • Apache Spark • Apache Cassandra • Apache Flink • Apache Kafka • Splunk • Talend • Tableau • Apache NiFi • QlikView Big Data Best Practices Define clear business objectives Collect and store relevant data only Collaborate with partners to assess the situation and plan Start slowly and move quickly in later stages Ensure data quality Use appropriate tools and technologies Establish data security and privacy policies. Leverage machine learning and artificial intelligence Focus on data visualization Big Data Challenges: Data Growth, Data Security, Data Integration Advantages • Improved decision-making • Increased efficiency • Better customer targeting • New revenue streams • Competitive advantage Disadvantages Privacy concerns Risk of data breaches Technical challenges Difficulty in integrating data sources Complexity of analysis Implementation Across Industries Industry Use of Big data Healthcare Analyze patient data to improve healthcare outcomes, identify trends and patterns, develop personalized treatment Retail Track, analyze customer data to personalize marketing campaigns, improve inventory management Finance Detect fraud, assess risks and make informed investment decisions Manufacturing Optimize supply chain processes, reduce costs, improve product quality through predictive maintenance Transportation Optimize routes, improve fleet management, enhance safety by predicting accidents before they happen Energy Monitor and analyze energy usage patterns, optimize production, reduce waste through predictive analytics Telecommunications Manage network traffic, improve service quality, and reduce downtime through predictive maintenance and outage prediction Government and Address issues such as preventing crime, improving traffic management, and public predicting natural disasters Advertising and Understand consumer behavior, target specific audiences and measure the marketing effectiveness of campaigns Education Personalize learning experiences, monitor student progress and improve teaching CLASSIFICATION Big Data can be structured, unstructured, and semi-structured that are being collected from different sources. Data will only be collected from databases and sheets in the past, But these days the data will comes in array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc. CLASSIFICATION Structured data: In Structured schema, along with all the required columns. It is in a tabular form. Structured Data is stored in the relational database management system. Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction Processing) systems are built to work with semi- structured data. It is stored in relations, i.e., tables. Unstructured Data: All the unstructured files, log files, audio files, and image files are included in the unstructured data. Some organizations have much data available, but they did not know how to derive the value of data since the data is raw. Quasi-structured Data:The data format contains textual data with inconsistent data formats that are formatted with effort and time with some tools. BIG DATA EXAMPLES TO KNOW Marketing: forecast customer behavior and product strategies. Transportation: assist in GPS navigation, traffic, weather alerts. Government and public administration: track tax, defense, public health data. Business: streamline management operations and optimize costs. Healthcare: access medical records, accelerate treatment development. Cybersecurity: detect system vulnerabilities and cyber threats. Big Data Examples in Marketing
Like Facebook and Google, Amazon got sucked into
the adtech business by the sheer amount of consumer data at its disposal. Since its founding in 1994, the company has collected reams of information on what millions of people buy, where those purchases are delivered and which credit cards they use. In recent years, Amazon has begun offering more and more companies — including marketing companies — access to its self-service ad portal, where they can buy ad campaigns and target them to ultra-specific demographics, including past purchasers. Big Data Examples in Transportation As a rideshare company, Uber monitors its data in order to predict spikes in demand and variations in driver availability. That information allows the company to set the proper pricing of rides and provide incentives to drivers so the necessary number of vehicles are available to keep up with demand. Data analysis also forms the basis of Uber’s estimated times of arrival predictions, which goes a long way toward fulfilling customer satisfaction. Big Data Examples in Business The premise of Netflix’s first original TV show — the David Fincher-directed political thriller House of Cards — had its roots in big data. Netflix invested $100 million in the first two seasons of the show, which premiered in 2013, because consumers who watched House of Cards also watched movies directed by David Fincher and starring Kevin Spacey. Executives correctly predicted that a series combining all three would be a hit. Today, big data impacts not only which series Netflix invests in, but how those series are presented to subscribers. Viewing histories, including the points at which users hit pause in any given show, reportedly influence everything from the thumbnails that appear on their homepages to the contents of the “Popular on Netflix” section