0% found this document useful (0 votes)
26 views

Basics of Data Visualization A Necessity

Basics of data visualisation .This document contains basics and concepts of data visualization with lecture notes. Suitable for those who are pursuing data science as a career or for students looking for exam notes. ** NOTE - Most of them are in question answer formats as best suited for exam preparations and for easy understanding.

Uploaded by

sowmenp567
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
26 views

Basics of Data Visualization A Necessity

Basics of data visualisation .This document contains basics and concepts of data visualization with lecture notes. Suitable for those who are pursuing data science as a career or for students looking for exam notes. ** NOTE - Most of them are in question answer formats as best suited for exam preparations and for easy understanding.

Uploaded by

sowmenp567
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
F G. What is data visualisation? Data visualisation is the graphical or pictorial representation of data, Its a technique in which data is represented in the form of charts and graphs. Data visualisation helps to understand and analyse massive amounts of data in a short duration of time. The Visual Display of Quantitative Information, Edward Tutte defines ‘graphical displays’ and principles for effective graphical display in the following passage: "Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should: ‘* show the data ‘* induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else avoid distorting what the data has to say present many numbers in a small space make large data sets coherent encourage the eye to compare different pieces of data reveal the data at several levels of detail, rom a broad overview to the fine structure serve a reasonably clear purpose: description, exploration, tabulation, or decoration be closely integrated with the statistical and verbal descriptions of a data set. Data visualisation not only helps to understand the data but also get the crucial aspects of it easily which is nearly impossible to get when the data is in the tabular or numerical format. These aspects helps any data analyst to make better data driven decisions thus helping in the progress of any organisation. There are many tools that help to visualize data thus making the work even easier and time saving Eg: Microsoft excel, Microsoft Power bi, Tableau, Cognos Impromptu by IBM,etc. 2. What is the need/importance of Data Visualisation? Ans: A B. In these modern times we live in a data driven world. Every field in this world be it finance,medical,technical,pol has data as its core etitiy. In the year 2020 2.5 quintilion data bytes were generated on a daily basis that is 59 Zettabytes in total (one zettabyte is a trillion gigabytes.) And the growth of data generation is increasing exponentially as we generated 74 zettabytes of data in the year 2021 Thus, itis important to clean, sort,analyse and visualise data. Humans more easily grasp information through visualization. Ina business context, visualization helps convey a story to decision makers, allowing them to act more quickly than if the data were presented as reports. The following are some use cases that stress the importance of data visualization: ‘* Helping decision makers understand how the business data is being interpreted al education, sports,ete to determine business decisions. '* Leading the target audience to focus on business insights to discover areas that require attention. '* Handling large amounts of data in a pictorial format to provide a summary of unseen patterns in the data, revealing insights and the story behind the data to establish a business goal. ‘* Visualizing business data to manage growth and converting trends into business strategies by making sense of your information. ‘* Revealing previously unnoticed key points about the data sources to help decision makers compose data analysis reports. 3. What are data visualisation techniques? Explain each of them in detail Ans: ‘A. Data visualisation techniques are the types of graphs or plots that are used for data visualisation B. The major types of visualisation techniques are: 4. Line graph: A line graph—also known as a line plot or a line chart—is a graph that uses lines to connect individual data points. A line graph displays quantitative values over a specified time interval. Aline graph connects individual data points that, typically, display quantitative values over a specified time interval. Line graphs consist of two axes: x-axis (horizontal) and y-axis, (vertical), graphically denoted as (x.y) In investing, in the field of technical analysis, line graphs are quite informative in allowing the user to visualize trends. While line graphs are used across many different fields for different purposes, their most common function is to oreate a graphical depiction of changes in values over time. In finance, line graphs are used to create visual representations of values over time, including changes in the prices of securities. 2. Bar Graph: A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a column chart. Bar graphs/charts provide a visual presentation of categorical data. Categorical data is a grouping of data into discrete groups, such as months of the year, age group, shoe sizes, and animals. These categories are usually qualitative. In a column (vertical) bar chart, categories appear along the horizontal axis and the height of the bar corresponds to the value of each category. Bar charts have a discrete domain of categories, and are usually scaled so that all the data can fit on the chart. When there is no natural ordering of the categories being compared, bars on the chart may be arranged in any order. Bar charts arranged from highest to lowest incidence are called Pareto charts. 3. Pie Charts: A pie chart (or a circle chart) is a circular statistical graphic, which is divided into slices to illustrate numerical proportion. Ina pie chart, the arc length of each slice (and consequently its central angle and area), is proportional to the quantity it represents. While it is named for its resemblance to a pie which has been sliced, there are variations on the way it can be presented. The earliest known pie chart is generally credited to William Playfair's Statistical Breviary of 1801 Pie charts are very widely used in the business world and the mass media However, they have been criticized,and many experts recommend avoiding them,as research has shown itis difficult to compare different sections of a given pie chart, or to compare data across different pie charts. Pie charts can be replaced in most cases by other plots such as the bar chart, box plot, dot plot, etc, 4, Histogram: . Histogram is an approximate representation of the distribution of numerical data It was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" (or "bucket") the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent and are often (but not required to be) of equal size. 5. Scatter Plots: Scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data The most common use of the scatter plot is to display the relationship between two variables and observe the nature of the relationship, The relationships observed can either be positive or negative, non-linear of linear, and/or, strong or weak. 6. Column Charts: A column chart is a data visualization where each category is represented by a rectangle, with the height of the rectangle being proportional to the values being plotted, Column charts are also known as vertical bar charts. They are particularly useful when: © The data has a small number of discrete categories, with a single value for each category. Where there are multiple values per category, the variables such as small multiples, cluster column charts, and stacked column charts, shown above, are superior. © The goal is to compare the values of each category. © The intent is to make it simple for the viewer. Column charts are arguably sometimes the best of all visualizations, as they tap into our instinctive ability to understand heights, whereas most other data visualizations require some degree of training for the reader to decode. 4. What are data collection structures? Ans: © Data structure is a data organization, management, and storage format that enables efficient access and modification. * More precisely, a data structure is a collection of data values, the relationships among them, and the functions or operations that can be applied to the data ‘© Types of data structures are as follows: a. List: ‘¢ _Listis an ordered collection of items which are mutable in nature. ‘* Asingle list can contain heterogeneous datatypes in it * Each element or value that is inside of a list is called an item. © Eg :listt = [1,2,3.14, pi, TRUE] b. Tuple’ ‘*Tuple is also an ordered collection of items. ‘* The major difference between a list and tuple is that the items ina tuple are immutable in nature meaning that the items in a tuple cannot be deleted or changed but the tuple as a whole can be deleted, © Eg: tuplet = (0,1,"Hello’3.14) ©. Dictionary: * Itis a collection of key-value pairs. «Keys and values are separated by a colon “: Keys in a dictionary are immutable while values are mutable. Eg: dict1 = {subject: ‘Data Visualisation’ Topic:"Unit 1" Rollno: 3510} 4. Dataframes: Dataframes hold the data in a tabular format Dataframes can hold any datatype (bool, int,string). Itis a 2 dimensional structure consisting of rows and columns. Besides data, you can also specify the index and column names for your DataFrame. The index, on the one hand, indicates the difference in rows, while the column names indicate the difference in columns. You will see later that these two components of the DataFrame will ‘come in handy when you're manipulating your data. 5. Write a note o File I/O processing Ans: a. File /O processing mainly consists if the following steps: © Opening a file: © Performing Operations © Closing a file b.The following operations can be performed in a file: reading froma file ‘w" : writing to a file (write to an existing / creates a new one incase the file does not exist) a": append data to the existing file © ‘r#”: both read and write © "b" binary file ° Eg: ‘© Chocolates = [*kitkat”, “snickers”, "mars", “munch’] Entering data ChocolatesFile = open("Chocolate.txt” Opening the file using open() © For iin Chocolates: © ChocolatesFile.write(i) Writing data using “w’" and for loop © ChocolatesFile.close() Closing the file using close() © ChocolatesFile = open("Chocolate.txt” Reopening the file in read mode © MyFile = ChocolatesFile.read() Reading the file © Print(MyFile) Printing the data in the file © ChocolatesFile.close() Closing read mode 6. Write a code to read and write data in a osv file(Can give your own example). Ans: Reading Data from CSV: import csv myfile= open('filename.csv’, *r") mycsvdata = csv.reader(myfile) for i in mycsvdata: print(i) myfile.close() 19 to csv files: import csv myfile = open("Filename.csv’, “w") mydata = csv.writer(myfile) mydata.writerow(["Sr.No", “Name, "Age"]) mydata.writerow([1, “John Doe”, 32]) mydata.writerow([2, “Jane Doe", 28)) 7. Write a note on RegEx. Ans: Regular Expressions are simply a sequence of characters. ‘These expressions are often used for pattern matching! string matching or replacing. RegEx short for regular expressions are widely used for validating emails and passwords and for replacing strings of data from a dataset. We import the re module to use regular expressions Regex consists of 3 parts: 1. Quantifiers: Inregular expressions, quantifiers match the preceding characters or character sets a number of times. 2 Matches zero or ane time tothe ef ofthe string + (Occurs one or more time othe eft ofthe string . Matches zero or more Ueto the lef the sting to Mathes n numberof tines tn) Matches m or mors numberof tine finn} Matches a est m 8s bless than nines. Or m tom ties, 0 ‘Groups the pattern to ba matched ' ‘OR: Species cher or lf the patterns to be matched \ Excipe Oracter a Species set ts aiphabets tobe matched Caret is wed to check fa tring starts witha certain character, eis wed to check if sng ends with certain characte. Matches any singe character including a space Use re DOTALL to match anew line. © Lazy and Greedy Quantifiers: 1. The quantifiers by default are greedy quantifiers. 2. When we add a? after any greedy quantifier it becomes a lazy quantifier. 3. Alazy quantifier matches an element as few times as possible whereas a greedy quantifier will match as many times as possible. 4. Alazy quantifier will stop the moment it gets the first match whereas the greedy quantifier will stop after it finds the last match. 2. Special Sequence: What are special sequences in Python? The special sequence represents the basic predefined character classes, which have a unique meaning. Each special sequence makes specific common patterns more comfortable to use. Sa “ Mocha igs o Mechelen gts ww Matches igs and sings f-2AZ09] We Negition w Matches the peed character atthe begring ofthe ng (eg: sing lwo > Match ‘sings Onl tere Does ot match ie Mathes te ped racers ath beg rte and fhe word egtetsanng= nego srg= next > Mach sting = senting Dos not match . Neon fb « Mahesh sing cot ay ce tng helothere 3 sch sing = halts Dees rot mth w Matches he spcted daraces rath nd ofthe sng ‘gh une allo tere > No match ngs cele match Eg: Validating mobile number. The first digit should be 8 or 9. The entire number should be 10 digits. Solution: ""[789]\d(9}S Validating a landline number. The first set of number should be 022. Then add the remaining 8 digits. Solution: '*(022)[0-9(8}S Validate a password with the following instructions: Should contain at least one uppercase letter, a special character (*, & . ) The length should be 8 or more but less than 15. Solution: (2=2[arz](2=.*1A-Z))(2=."d)(2=."1@S!%"78)) 48 10}) 3. Functions: ‘* The re module offers a set of functions that allows us to search a string for a match: 1. findall() This method returns a list of strings containing all matches. + eg. © import re © string © pattern = * output pattern) © print(output) 2. search(): © search() : '* This method takes two arguments: a pattern and a string. The method looks for the first location where the RegEx pattern produces a match with the string. + eg © import re ‘* string = “Hello There” © pattern = “\s" © output re.search(pattern, string) © if output © print("The characters have spaces in between) © else © print(‘The characters have spaces in between) 3. sub(): The method returns a string where matched occurrences are replaced with the content of replace variable. eee cee subn() Same as sub, but it also returns how many ‘substitutions have been made in the code. eg. import re output = re.subipattem, replace, string) print(output) 4, match() eee This method returns the part of the string where there is a match. Nalw'ts' string = ‘arrest’ result = re.match(pattern, string) if result: print(‘String matches with the patter.”) else: print("String does not match with the pattern.") 5. match.group() oe Retums one or more subgroups of the match. If there is a single argument, the result is a single string. If there are multiple arguments, the result is a tuple with one item per argument. import re string = hell0 have a gr3at day!" pattern = ‘(\d) (\D)" match = re.search(pattern, string) if match: print(match.group()) else: print(*pattern not found") 6. split(): The split method splits the string where there is a match and retums a list of strings where the splits have occurred. import re string = ‘Hello There’ pattern = '\s’ output = re.split(pattern, string) print(output) 7. Raw string(e) “When r or R prefix is used before a regular expression, it means raw string. For example, “nis, anew line whereas rn’ means two characters: a backslash \ followed by n. “Backlash \is used to escape various characters including all metacharacters However, using r prefix makes the system treat \ as a normal character. import re string = "Test for printing \n escape sequence ' print(string) result = re findall(r\ny’, string) print(result)

You might also like