50% found this document useful (2 votes)
2K views

Data Analysis in Microsoft Excel

Uploaded by

sami6114
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
50% found this document useful (2 votes)
2K views

Data Analysis in Microsoft Excel

Uploaded by

sami6114
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 320
Deliver Awesome Analytics in 3 Easy Steps Using VLOOKUPS, Pivot Tables, Charts And More A | | i AaraAAALE \ » | fa a i | Exce Alex Holloway DATA ANALYSIS IN MICROSOFT EXCEL DELIVER AWESOME ANALYTICS IN 3 EASY STEPS USING VLOOKUPS, PIVOT TABLES, CHARTS AND MORE ALEX HOLLOWAY © Copyright 2023 by Alex Holloway - All rights reserved. The content within this book may not be reproduced, duplicated, or transmitted without direct written permission from the author or the publisher. Under no circumstances will any blame or legal responsibility be held against the publisher, or author, for any damages, reparation, or monetary loss due to the information contained within this book, either directly or indirectly. Legal Notice: This book is copyright protected. It is only for personal use, You cannot amend, distribute, sell, use, quote, or paraphrase any part of the content within this book, without the consent of the author or publisher. Disclaimer Notice: Please note the information contained within this document is for educational and entertainment purposes only. All effort has been expended to present accurate, up- to-date, reliable, and complete information. No warranties of any kind are declared or implied. Readers acknowledge that the author is not engaged in the rendering of legal, financial, medical, or professional advice. The content within this book has been derived from various sources. Please consult a licensed professional before attempting any techniques outlined in this book. By reading this document, the reader agrees that under no circumstances is the author responsible for any losses, direct or indirect, that are incurred as a result of the use of the information contained within this document, including, but not limited to, errors, omissions, or inaccuracies. This book is dedicated to Meg, who inspired me to follow my dreams. CONTENTS Introduction 1 Getting Started 1. What Is Excel and Why Is It Useful? 2. What Is Data and Data Analysis? 3, The 3-Step System in Data Analysis um. Let’s Go 4, Getting to Know Excel . The Project . Prepare . Analyse . Consider w oN OD m. Over to You 9. Complete a Project 10. What Next? Notes Bibliography About the Author INTRODUCTION Are you tired of spending hours sifting through data in Excel, trying to find insights and patterns? Do you feel overwhelmed by the amount of information and unsure of where to start? You're not alone. Data anal- ysis can be a daunting task, but it's a necessary one in today's world of business and technology. In fact, data is created in huge volumes every day, and the direction of travel is one way: up! Did you know: - An estimated 1.145 trillion megabytes of data are produced daily?! - By 2025, the total amount of data created, captured, copied and consumed globally is forecast to reach over 180 zettabytes (that’s 180 billion terabytes or 1,800,000,000,000,000 gigabytes)?2 That's why I wrote Data Analysis in Microsoft Excel to help you navigate the complex world of data analysis in Excel. In this book, you'll learn how to use Excel to organise and analyse data, identify trends and pat- terns, and make data-driven decisions. With this book, you'll not only save time but also gain a competitive advantage in your career. You'll be able to present data in a meaningful way to stake- holders, make more informed decisions and, ulti- mately, drive business success. If you are a student, this book will help you better utilise data in your studies, come to better conclusions and be able to vi- sualise and describe your observations. As an experienced data analyst and Excel user, I un- derstand the challenges you face and the pain you experience. That's why I've written this book in a way that's easy to understand and follow, with prac- tical examples and step-by-step instructions. In Data Analysis in Microsoft Excel, you'll learn how to create powerful data visualizations, use Excel functions to write formulas, and much more. You'll also gain a deeper understanding of data analysis concepts and best practices. So, whether you're a beginner or an experienced Excel user, this book is for you. I'm confident that by the end of the book, you'll feel empowered to tackle any data analysis project that comes your way. Let's get started! PART ONE GETTING STARTED Data analysis can be performed using a variety of tools and techniques, including Excel, R, Python, and SQL. It is important to note that the process of data analysis requires a combination of technical skills and domain (subject area) knowledge, as well as critical thinking, creativity and the ability to com- municate the results effectively. In Part I, we will learn about Excel itself, what we mean by data and data analysis, and introduce the 3- Step System for producing awesome analysis. CHAPTER 1 WHAT IS EXCEL AND WHY IS IT USEFUL? "Excel is the Swiss Army knife of business tools." GUY KAWASAKI, AMERICAN MARKETING SPECIALIST, AUTHOR, AND SILICON VALLEY VENTURE CAPITALIST xcel is a powerful tool for managing and E analysing data. It is a spreadsheet program that allows users to organize and manipulate data in a variety of ways. A spreadsheet is a computer application that simulates a paper worksheet, where users can input data, perform calculations and cre- ate charts and graphs. One of the reasons Excel is so popular is that it is user-friendly and easy to learn. It is also versatile and can be used for a wide range of tasks, from basic data entry to complex financial modelling. Some of the main reasons Excel is used in data analysis in- clude the ability to organize and analyse data, create charts and graphs, and perform complex calcula- tions. In addition to its many uses in business, Excel can also be used for personal and educational purposes. For example, it can be used to create a budget, track expenses, or even plan a vacation. The possibilities are endless, and with a little creativity, you can use Excel to solve a wide range of problems. SPREAD... WHAT? A spreadsheet is a software program allowing users to organise, analyse, and store data in a tabular format. The data can be manipulated, sorted, and calculated using formulas and presented in various forms, such as charts and graphs. Microsoft Excel is the most popular spreadsheet software, but other options are available, such as Google Sheets and Apple Numbers. The concept of a spreadsheet dates back to the 1960s when computer scientists at MIT (Massachu- setts Institute of Technology) developed the first electronic spreadsheet program called VisiCalc. This program allowed users to create spreadsheets on a computer rather than on paper, which made data manipulation and analysis much more efficient. Since then, spreadsheet software has evolved to in- clude a wide range of features and capabilities. In the 1980s and 1990s, Lotus 1-2-3 was a popular appli- cation which has since been eclipsed in popularity by Microsoft’s Excel, owing in no small part to its bundling within the Office suite of products and the widescale adoption of Microsoft software in organi- sations in all sectors. Spreadsheets are widely used in both personal and professional settings. In a business setting, they are often used to organize and analyse financial data, such as budgeting and forecasting. They can also be used to track and analyse sales data, inventory levels, and customer information. In a personal set- ting, spreadsheets can be used for tasks such as bud- geting, tracking personal expenses, and creating a household inventory. One of the key benefits of using a spreadsheet is the ability to perform calculations on the data. Spreadsheets use formulas, which are a set of in- structions that tell the software how to manipulate the data. Formulas can be used to perform calcula- tions such as adding, subtracting, multiplying, and dividing. They can also be used to perform more complex calculations, such as finding the average of a set of numbers or calculating a running total. Another benefit of spreadsheets is the ability to cre- ate charts and graphs that can help to visualize the data. These charts and graphs can be used to quickly identify trends and patterns in the data. They can also be used to communicate the data to others, such as in a business setting where a manager may need to present the data to a team or upper management. Spreadsheets also have the ability to store and or- ganize large amounts of data. This can be a great advantage in a business setting as it allows for easy access to historical data, which can be used for fore- casting, budgeting and analysis. In a personal set- ting, it can be used to store and organize information such as household expenses, your shopping list or how much you spent while away travelling. Overall, spreadsheets are an essential tool for organizing, analysing, and presenting data. They are widely used in both personal, educational, and professional settings and have many useful features suited to relevant contexts. With the help of spread- sheets, data can be transformed into valuable infor- mation that can be used to make decisions and im- prove outcomes. WHY IS EXCEL SO POPULAR? One reason for its popularity is its integration with other programs in the Microsoft Office suite. Excel is included in the Microsoft Office suite and is often used in conjunction with other programs such as Word and PowerPoint. This makes it easy for users to work on documents and presentations that require data analysis and visualization. Additionally, Excel has been around for a long time and has a large user base. It has been on the market for over 30 years, meaning that many people have learned how to use it and are familiar with its inter- face. This makes it easy for new users to learn and for experienced users to continue to use Excel. Another reason for its popularity is its wide range of applications. Excel is used in many different in- dustries and for many different purposes. It is used in finance, accounting, marketing, and many other fields. It is also widely used in businesses of all sizes and in both public and private sectors. Finally, Excel is widely supported by third-party software and tools. Many Excel add-ins, templates, and other tools are available to extend its function- ality. This makes it easy for users to customize Excel to meet their specific needs and automate repetitive tasks. So, Excel's popularity is due to a combination of its wide range of features and capabilities, integration with other programs in the Microsoft Office suite, long history, wide range of applications, and wide support from third-party software and tools. This makes it a versatile and powerful tool that is widely used and well-liked among users. CHAPTER SUMMARY/KEY TAKEAWAYS. - Excel is a popular spreadsheet program that allows users to organize and manipulate data in a variety of ways. -It is user-friendly and easy to learn, with a wide range of uses from basic data entry to complex financial modelling. - Excel can be used for personal, educational, and professional purposes, including budgeting, tracking expenses, and data analysis. + It is integrated with other programs in the Microsoft Office suite, making it easy for users to work on documents and presentations that require data analysis and visualization. + Excel has been around for over 30 years and has a large user base, making it easy for new users to learn and for experienced users to continue to use it. -It is used in many different industries and for many different purposes and is widely supported by third-party software and tools. In the next chapter, we will explore two key ques- tions: What is Data? And What is Data Analysis? Excel is our tool of choice, but understanding Data and the art and practice of Data Analysis is the key skill of the data analyst. CHAPTER 2 WHAT IS DATA AND DATA ANALYSIS? ata is all around us. It is the information we use to make decisions, understand pat- terns, and make predictions. Simply put, data is a collection of facts and figures that can be analysed to reveal insights and trends. It can come in many forms, such as numbers, words, or images. For exam- ple, a weather app on your phone uses data to show you the temperature, humidity, and forecast for your location. Online retailers use data to track cus- tomer behaviour and recommend products. Social media platforms use data to show you which posts are most popular among your friends. All these ex- amples show how data is used in our everyday lives. In this chapter, we will explore the basics of data and how it can be analysed to gain valuable insights. WHERE DOES DATA COME FROM? Data is information that we record and store for various reasons. It can come from a wide range of sources, such as observations, surveys, experiments, and measurements. For example, a weather station might record temperature, humidity, and precipita- tion levels, while a retail store might record sales numbers and customer information. The data can be collected by individuals, organisations, or even ma- chines. The reason for recording and storing data can vary, but it is often done for simple record keep- ing, to help make decisions, track progress, or better understand a particular phenomenon. Data can be used for things like tracking inventory, monitoring customer behaviour, or even guiding public policy decisions. Essentially, data helps us make sense of the world around us and make more informed deci- sions. Did you know your weekly food shopping list jotted down on a scrap of paper is actually data? You may not be able to easily analyse this data or even want or need to, but you have created data (the items on the list) via a process of data entry (pen and paper) and stored it (the piece of paper). With enough scraps of paper and knowledge of when they were writ- ten, you could analyse your favourite foods, estimate when you would run out of a particular item or pre- dict the price of your weekly food bill. DATA CREATION AND COLLECTION Data is created by a variety of methods. Typically, these fall into the following two categories: 1. Administrative/transactional - this data is collected for record keeping and is generally essential to the functioning of the business, school, charity, or system in question. Examples would include your grocery store recording what you bought, how much you spent and whether you used your loyalty card. 2. Curated - this data has been consciously gathered to serve some purpose. Unlike administrative/transactional data, it is typically information which is not readily available through our day-to-day interactions as customers or users of a service. An example of this would be how a group of consumers felt about a new range of products the grocery store released this year. How much you spent and on what is available to the grocery store because it needs this information to charge you at the till and to manage the stock levels on the shelves. It is critical to capture this informa- tion. However, doing so is also relatively easy as each item is scanned at the till. Conversely, how you feel about that new pizza you bought last week is not information that the store has access to. It can make assumptions about how popular the pizza is by using its sales data, but it can never know what you liked about it and how it could be even better. It is also less convenient for the store to gather this information as a) at the point of buying the item, you haven't yet tried it, b) even if you could explain why you bought it, it would hold up the next customer, and c) people don’t necessarily want to publicly discuss their buying decisions with a stranger! Here are some familiar examples of administrative/ transactional data collection you may know: - Computer Systems: Businesses can collect data from various computer systems, such as website analytics, customer relationship management (CRM) systems, and enterprise resource planning (ERP) systems. This data can include information on website traffic, customer interactions, and financial transactions. - Social Media: Businesses can collect data from various social media platforms such as Facebook, Twitter, and Instagram. This data can include information on customer sentiment, brand mentions, and engagement metrics. + Mobile Apps: Businesses can collect data from mobile apps used by customers. This data can include information on app usage, location, and customer behaviour. - Electronic Point of Sale (POS): Businesses can collect data from electronic point of sale systems used in retail stores and restaurants. This data can include information on sales, inventory, and customer behaviour. + Internet of Things (IoT) Devices: Businesses can collect data from IoT devices such as sensors, cameras, and smart appliances. This data can include information on customer behaviour, environment, and usage patterns. Here are some examples of curated data collection you may know or even have participated in: + Surveys: which are used to gather information from a large group of people through self- administered questionnaires. + Interviews: which are used to gather in-depth information from a smaller group of people through face-to-face or phone conversations. «Focus Groups: which are used to gather information from a group of people through moderated discussion. - Observations: which are used to gather information by observing people and their behaviour in a natural setting. - Experiments: which are used to gather information by manipulating one or more variables and measuring the effect ona specific outcome. «Document Review: which are used to gather information by reviewing existing documents. It's worth noting that both types of data collection can be used together and complement each other. For example, administrative data can be used to identify patterns and trends, while curated data can be used to gather more in-depth information and context. SO WHAT IS DATA ANALYSIS, THEN? Data analysis is the process of evaluating, organiz- ing, and interpreting data in order to extract useful information and insights. It involves a variety of techniques and methods for cleaning, transforming, and modelling data, as well as visualizing and com- municating the results. The goal of data analysis is to identify patterns, trends, and relationships within the data that can be used to make informed deci- sions and improve organizational performance. Data analysis can be applied to a wide range of fields and industries, including business, finance, health- care, and science. For example, in business, data analysis can be used to identify customer segments, predict sales, and optimize marketing strategies. In finance, data analysis can be used to identify market trends, evaluate investments, and manage risk. In healthcare, data analysis can be used to improve pa- tient outcomes, reduce costs, and identify areas for improvement. There are several different types of data analysis, in- cluding descriptive, diagnostic, predictive, and pre- scriptive. Descriptive analysis is used to summarize and describe the data, such as calculating means, medians, and standard deviations. Diagnostic anal- ysis is used to identify potential issues or prob- lems within the data, such as identifying outliers or missing values. Predictive analysis is used to make predictions about future events or outcomes, such as forecasting sales or identifying potential fraud. Prescriptive analysis is used to recommend actions or decisions based on the data, such as identifying the best marketing strategy or the most efficient supply chain. Some of these are considered advanced techniques and are thus not covered in this book. However, once you master the basics, you will be well positioned to move on to these more advanced practices. Data analysis can be done using a variety of tools and techniques, including Excel, R, Python, and SQL. It is important to note that the process of data analysis requires a combination of technical skills and domain knowledge, as well as critical thinking, creativity and ability to communicate the results effectively. Excel is often considered the first tool in this toolset with which to gain experience and skill and remains useful even to an experienced Data Scientist. WHAT CAN YOU DO WITH IT? Effective Data Analysis is very valuable to an organ- isation trying to make the best use of its resources: whether to make a profit or simply to have the great- est positive impact on its stakeholders. Thousands or millions of dollars can be added to a business’ rev- enue or saved from its costs by the effective use of the data analyst’s skillset and tools. How is that pos- sible, you might ask? For example: - Targeting customers more effectively: by analysing data on customer demographics, purchase history, and browsing behaviour, a business can identify which segments of its customer base are most likely to be interested in its products or services. This allows the business to focus its advertising and marketing efforts on those segments, resulting in a more efficient use of resources and higher conversion rates. For example, a fashion retailer may discover that a large portion of its customer base is composed of young professional women and can then tailor its advertising and product offerings to that demographic, resulting in increased sales and customer loyalty. Reducing operational costs: by analysing data on business operations, a company can identify areas that are not performing as well as others and make adjustments to reduce costs. For example, a manufacturing company may use data analysis to determine that a certain production line is not operating at optimal efficiency, resulting in increased costs and lower output. The company can then make changes to that production line, such as reallocating resources or investing in new equipment to improve efficiency and lower costs, resulting in increased profitability. «Improving problem-solving: by analysing data, a business can make more informed decisions and avoid costly pitfalls. For example, a retail company may use data analysis to identify that a particular product line is not selling well. By analysing data on the product's sales and customer feedback, the company can make a decision to discontinue the product, resulting in cost savings anda more efficient use of resources. - Developing new products: by collecting and analysing data, a business can obtain more accurate information that can inform its future strategies and plans. For example, a food and beverage company may use data analysis to determine the customer preferences and trends of the market. Based on that information, the company can develop new products that align with customer preferences, resulting in increased sales and customer loyalty. + Identifying and addressing customer complaints and feedback: by collecting and analysing customer feedback, a business can identify common complaints and issues and take steps to address them. For example, a restaurant may use data analysis to identify that a high number of customers are complaining about long wait times. The restaurant can then take steps to address the issue, such as hiring additional staff or implementing a reservation system, resulting in improved customer satisfaction and retention. - Improving logistics and supply chain management: by analysing data on inventory levels, sales, and shipping times, a business can identify bottlenecks and inefficiencies in its logistics and supply chain operations. For example, a distributor may use data analysis to identify that a certain product is consistently out of stock, resulting in lost sales. The distributor can then work with its suppliers to improve delivery times or increase inventory levels, resulting in improved customer service and increased sales. The answer then is found in understanding and in- sight, which leads to high quality decision-making, and this applies not just in business but also in areas from academic research, public health and your per- sonal finances. CHAPTER SUMMARY/KEY TAKEAWAYS + Data is collected and stored for many reasons —much of this is to enable an organisation to function, but it can be deliberately collected to find out more about a phenomenon or group of people. - Data analysis is the process of evaluating, organising, and interpreting data in order to extract useful information and insights. + There are several different types of data analysis, including descriptive, diagnostic, predictive, and prescriptive. They all seek to better understand the world or make better decisions through the use of data. - Effective data analysis can have a huge impact on an organisation, whether through the better use of limited resources or the identification of new opportunities to attract customers or develop products, and so on. CHAPTER 3 THE 3-STEP SYSTEM IN DATA ANALYSIS W hatever your trade, it is likely the process by which you take an idea or some raw ma- terials to a finished product—one which contains a number of key steps. For example, making an item of clothing would involve initially considering who it is for, what size they are and what they want. Second, you might come up with a design on paper and show it to your friend or customer and get their views on it. If they’re happy, you can carry on sourcing some materials, making some initial cuts, pinning the ma- terial together and start seeing the garment's shape appear. Next, you might check in with the customer and establish if you’re heading in the right direc- tion with it. Hopefully, you've interpreted what they wanted correctly and can keep on stitching away until you produce the finished item. In an ideal world, they’ll be delighted with what you produce, and it won’t require any adjustment, but typically there’ll be something you need to tweak to make it fit just right. The better you know your customer, the easier it will be to get it right the first time, but this takes time and experience and an understanding of what they like, their body shape and style. You might get lucky from time to time, but you shouldn’t really rely on luck! Instead, true craftsmanship is a process by which you minimise wasted effort and produce excellent results in a repeatable way. The process of data analysis always follows 3 key steps: Prepare, Analyse, and Consider. PREPARE Prepare is the first step in this process. We consider what we are doing, why, for whom and, therefore, how it might be achieved. It can be tempting to con- sider “How?” first when asked to do something but resist the urge to act before deciding how to place your first step. An excellent place to start your thinking is with the problem statement. What is a problem statement? A problem statement is a concise description of a problem or issue that needs to be addressed. It should clearly define the problem, its significance, and the desired outcome. A good problem statement should be: 1. Specific: The problem should be clearly defined and focused without being too broad or vague. 2. Measurable: The problem should be measurable so that progress towards resolving it can be tracked. 3. Relevant: The problem should be relevant and important and have a direct impact on the organisation or individual. 4. Time-bound: The problem should have a clear deadline or timeframe for resolution. 5. Objective: The problem statement should be free of personal opinions or biases and instead be based on facts and evidence. A problem statement should be no more than a few sentences long and provide a clear and concise description of the problem. It should also include information on the scope of the problem and any relevant constraints or limitations. These examples from across different industries might give some in- sight into what this looks like: 1, Customer Service: "There is a significant increase in customer complaints about long wait times at our restaurant, leading to decreased customer satisfaction and repeat business. We need to find a solution to reduce wait times and improve customer experience by the end of Q2 2023”. 2. Healthcare: "Patients at our hospital are experiencing long wait times for appointments with specialists, leading to decreased satisfaction and decreased patient retention. We need to find a solution to reduce wait times and improve patient experience by the end of Q4 2022”. 3. Retail: "Our store is facing a decline in sales due to increased competition from online retailers. We need to find a solution to increase sales and improve customer engagement by the end of Q3 2023”. 4. Education: "Student performance in mathematics at our school is consistently below average, leading to a low graduation rate in STEM fields. We need to find a solution to improve student performance in mathematics by the end of the 2022-2023 school year”. 5. Manufacturing: "Our manufacturing plant is facing increased production costs due to outdated equipment and processes. We need to find a solution to reduce production costs and improve efficiency by the end of Q2 2023”. 6. Government: "The city is facing a high rate of traffic congestion during rush hour, leading to increased travel time and decreased commuter productivity. We need to find a solution to reduce traffic congestion and improve commuter experience by the end of Q4 2022”. As a data analyst, you often will not be handed a problem statement. More often, it will be for you to ask or determine what the context for a request of you actually entails. This contextualisation of what you are being asked to do is one of the key non- technical skills of the role. You may even find that the person asking you to do something doesn’t really know why they are asking you to do it! Being able to dig a little deeper and ask around is key to minimis- ing wasted effort. Some assumptions are useful in life—others are not. Having framed your analysis and understood what you are trying to do and why, you can next consider for whom you are doing this analysis. What we are considering here is audience. The person or peo- ple who will be reviewing your analysis are typically looking to learn something and then take action. In order for your work to be effective, it is worth considering what your ‘customer’ can do with the output: are they the CEO or a team leader in an operations team? Taking the example of reducing wait times at a restaurant, the CEO of the restau- rant chain and the shift manager who schedules how many staff have to work at any one time have different interests and “levers” (what actions they can take within the scope of their role and respon- sibilities). Whilst your analysis might lead you to conclude either way that “shifts are poorly sched- uled, and we are often understaffed”, you wouldn't have approached the shift manager’s request from the perspective that expanding the restaurant or moving to new premises would be in scope. The shift manager would have been much more inter- ested in understanding, say, staff-to-booking ratios throughout the day compared to another restaurant in the chain where complaints were lower—as one of the levers they control in the role is how many staff to schedule. They cannot acquire new premises. Audience is an essential consideration in our next step, Analyse, but we will come to that later. Once you have a problem statement and understand your audience, you can finally move on to consider how you are going to tackle the analysis. Here we start to consider the data itself. Specifically: + What data do I have/can I access? + What do I need to do to it (if anything) to begin exploring the issues? + What are the limitations of the data? What can and can’t it tell me? When considering what data you have or can access, time is a major factor. Like the supply of housing in an economy or the range of pharmaceutical drugs available to treat a health condition, the amount of data you can access in the short term to consider a problem is essentially fixed. In the long term, (al- most) anything is possible. To illustrate this further, here are some examples: «Ina business context, you can only access data created in the company’s systems which you can also then extract to analyse (into an Excel readable format, in this case). You may also have access to publicly available data, such as industry metrics or any intelligence the company buys and makes available such as market research. If you need to analyse a problem and have only been given a few days to do it, even getting the accounting team to raise a purchase order to buy some industry data may be out of scope. If the company has an IT team pulling data out of systems intoa useable format for analysts such as yourself, getting access to data you know is being gathered but cannot access may take weeks or months. -In an academic context, you may only have access to publicly available data or research available to you through university systems. You may have been asked to gather some data yourself by setting up a study or focus group, or controlled experiment, which takes time. If your initial focus group’s data created some interesting insights, a follow-up research piece is out of scope in the short term and may leave your first set of conclusions clouded by a set of second questions and hypotheses which merit further investigation. - Ina personal context, you may decide you want to take better control of your finances. You could probably access your banking data and download it, but it is very unlikely to be categorised by categories such as “Mortgage/ Rent”, “Food & Drink”, “Car Expenses”, and so on. So you might only be able to produce an “Income and Expenditure” analysis the first time you attempt it. This leads nicely to the second consideration noted above: What do you need to do with the data be- fore it becomes useful? If you are determined to save more money each month, is it enough to know that you are spending more than you earn? How much more valuable would it be to know that actually, your grocery budget has doubled over the last year or that your spending on fuel for your car now that your commute is an extra 10 miles is really cutting into your budget? In this case, your banking data is made much more valuable by taking the time to cat- egorise the spending. Do you recall the example about the shift manager tackling the wait times at the restaurant? Would it be more useful to have a list of the names, contact details, and booking times of everyone who attended the restaurant last week or a summary of how many bookings there were each day in, say, 30-minute time slots? Both examples reveal the value of preparing data for analysis as, in its raw form, it cannot always answer the questions you pose to it. In example one, we are enriching the data by categorising each transaction from the banking statement. In example two, we are summarising the restaurant booking data to make it more useable. The final consideration is to ask what the limita- tions of the data are? Data is typically gathered with one or more purposes in mind, and its value only stretches so far beyond this purpose. A bank records very accurate transaction data so as to keep your ac- count balance up to date and provides this back to you so that you can understand where your money has gone. Imagine how much less trusting of a bank you would be if they could only tell you how much money they think you have and couldn’t prove it. However, if you had access to, say, 10 people’s bank- ing information for a year, could you accurately an- alyse how wealthy they are? You might be able to ap- proximate an answer using some assumptions such as ‘the more you earn and save, the more you can invest’, but really you only have part of the picture. Many forms of wealth are held outside of the bank- ing system, and even if you have a mortgage, the balance on your mortgage account only describes what you owe, not what your property is worth. To take this a step further, your current account might only show payments into your mortgage account— not its balance. Are you a new home owner, making your initial mortgage payments on your first home, or are you about to pay off the mortgage and be free of this debt: you cannot tell and do not have enough information. Another key limitation, particularly common in a business context, is that of data quality. If you have a dataset about 1,000 customer orders, you can be fairly sure that you know what was ordered and how much the customer was billed (even then, er- rors can occur—what if the business was interna- tional, charged customers in their local currency, but your report is in US dollars and the exchange rate conversion wasn’t performed correctly for your dataset. . .?). A business couldn’t function without this information. However, if you wanted to calcu- late the gross profit on each sale (sale price—cost of item) and someone within the business was respon- sible for manually calculating what an item costs to produce—how confident would you be that this was always right? Have you ever given fake contact de- tails when buying something online to avoid being contacted with marketing emails? Have you said you never buy a certain type of product during a survey so that you don't have a salesman give you their pitch? What about a time you were genuinely pre- pared to share your mobile number with a retailer to get postal tracking information, only to miskey it and never hear anything until the package ar- rived? The analyst's role then includes data cleaning/ cleansing—another form of preparing data for use. Sometimes you can fix an issue, and sometimes you must throw the data out. Other times, you have to say, “Only 80% of the data was useful, so the analysis is based on that”. Spotting issues, adapting to them and being clear about the approach you have taken is another key skill. With that, we have covered the main elements of Prepare. Chapter Six considers this in more detail, and our example project will cover some practical examples of dealing with these challenges. ANALYSE We arrive then at Analyse. You would be forgiven for thinking this would make up 80% of the con- tent of this book. The most common mistake would- be analysts make is thinking that turning data into information is the extent of the role and that they can hand this over for someone else to use. However, what you are really responsible for producing is value. This means different things to different people —but you know this already because you just read the Prepare step. What you are trying to achieve, why, and for whom dictates the output that is valu- able. Some people will tell you that a good analyst always produces Insight. Qlik, the software company which produces Business Intelligence tools, defines it as such: The broad definition of insight is a deep under- standing of a situation (or person or thing). In the context of data and analytics, the word insight refers to an analyst or business user discovering a pattern in data or a relationship between vari- ables that they didn’t previously know existed.1 Insights can have tremendous value to an organ- isation, and indeed, this is the most value-adding activity of the data analysis function. However, as an analyst in the real world, you are often simply trying to make an organisation function better, one piece of work at a time. Sometimes, you are just summarising table bookings at a restaurant so that a shift manager can see how busy the place gets on a Tuesday night! What then is the process and skillset required dur- ing the Analyse step? We have our data and an un- derstanding of the problem and audience, so we are initially concerned with discovering the shape of the data—how much do we have? How many rows and columns are there? If it covers a period of time, how many years’ or months’ worth do we have? We can then consider more specific questions. Is it customer data? If so, what do we know about them? Do we have their age? How much have they spent? What have they bought? If it is booking data, do we know when they booked? How did they book? How many people are in the booking party? Adults, chil- dren? Who took the booking? We can use our technical skills to explore these ques- tions and produce summary data, tables and charts to begin looking for patterns and relationships in the answers to these questions. Initially, we are both ex- ploring and developing new hypotheses. As we work through the analysis step, our hypotheses become more refined and specific—and along the way, we discard any which prove to be uninteresting or irrel- evant. How then to assess relevance? Enter domain knowl- edge. This is where the value of experience comes to bear when analysing a problem and is often why it is hard to get an interview or a job offer for a data ana- lyst role (or any role, in fact). Businesses understand the value of experienced staff when delivering their roles, and being able to apply both knowledge and skill is where you can truly create value. Consider again the example of the restaurant where cus- tomers experience long waiting times. The follow- ing thoughts might occur to an analyst considering the issue. From first to last, we move from common sense to domain experience: 1. When the restaurant is busiest, waiting times are the longest. 2. It can get particularly busy at lunchtimes, evenings and weekends. 3. Bookings in advance help the restaurant plan for busy periods. 4. Last-minute cancellations frustrate planning but create availability in the short term. 5. The greater the lead time between booking and reservation date, the more opportunity to plan effectively. 6. A fixed availability of fixtures and fittings (e.g., high chairs for young children) effectively limits the capacity for simultaneous bookings for certain family types and group sizes. 7. Split-shift (staff member has to go home and come back between lunch and evening service) staff rotations are unpopular and could lead to an increase in unplanned staff absence and last-minute understaffing issues. Technical Effective Domain Skill PE Ast} Knowledge Having both technical skill and domain knowledge allows you quickly (technical skill) and effectively (domain knowledge) consider hypotheses that will add value. In exploring these hypotheses, you ap- proach your problem much like a sculptor aiming to turn a block of marble into a chiselled figure: an un- formed block (dataset) whittled down into the vague shape of a person (broad stroke analysis—what do I have, how much data is there?), finally adding detail (specific analysis of a data item or segment to gain a new insight). What do we mean by a hypothesis? In simple terms, a hypothesis is an educated guess or a proposed ex- planation for something that is yet to be proven or tested. For example, let's say you're curious about why plants in your garden seem to grow better when you water them with rainwater instead of tap water. Your hypothesis might be: "Plants grow better with rainwater because it has natural nutrients that tap water lacks”. The purpose of a hypothesis is to provide a starting point for research and experimentation. Once you have a hypothesis, you can design analyses to test it and see if it is supported by evidence. If the evidence supports your hypothesis, it may become a theory or a widely accepted explanation for a phenomenon. If the evidence does not support your hypothesis, you may need to revise or come up with a new one. With the benefit of domain knowledge, you can come up with better hypotheses more quickly (and waste less time on naive hypotheses). The process of analysis can also lead you to identify new limitations in the data. We previously consid- ered the purpose and method of data collection and data quality as limitations of a dataset. We may find additional limitations through the process of analy- sis. Imagine an analysis of sales at a car dealership. A comparison of sales this year vs last year to de- termine what types of vehicles are experiencing the fastest sales growth might reveal that the sales of pink SUVs doubled from one year to the next and that you were able to increase your profit on the sales of these vehicles by 50% over that period! However, you sold only one last year and two this year, which is unlikely to form the basis of a new stock pur- chasing strategy for a dealership selling 1,000 vehi- cles a year. The limitation here is that you do not have enough information to draw any conclusions. You have enough contextual data to know that pink

You might also like