0% found this document useful (0 votes)
15 views

FDA Practical_Book

The document outlines a series of practical exercises aimed at teaching data analysis techniques using Excel and Tableau, including the application of pivot tables, descriptive statistics, histogram analysis, and various regression methods. Each practical section provides a clear aim, theoretical background, and step-by-step instructions for executing the analyses. The conclusion emphasizes Excel's utility in data analysis despite its limitations, making it a preferred tool for many users.

Uploaded by

kiradij268
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

FDA Practical_Book

The document outlines a series of practical exercises aimed at teaching data analysis techniques using Excel and Tableau, including the application of pivot tables, descriptive statistics, histogram analysis, and various regression methods. Each practical section provides a clear aim, theoretical background, and step-by-step instructions for executing the analyses. The conclusion emphasizes Excel's utility in data analysis despite its limitations, making it a preferred tool for many users.

Uploaded by

kiradij268
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

INDEX

Sr.
Name of Experiments
No.
1 Apply pivot table of Excel to perform data analysis

Perform Descriptive statistics of given dataset using Data Analysis Toolbox


2
of Excel

Perform the Histogram Analysis of given dataset using Data Analysis


3
Toolbox of Excel

Perform Simple Linear Regression using Data Analysis Toolbox of Excel or


4
with Python and Interpret the regression table

Perform Multiple Linear Regression using Data Analysis Toolbox of Excel


5
or with Python and Interpret the regression table

Perform the Logistic Regression and given dataset and Interpret the
6
regression table

Install Tableau, Understand User Interface, Dimensions, Measures, Pages,


7 Filters, Marks and Show Me, Dataset Connections and Create a
visualization
Various graphs in Tableau, Integration of Map and geo-locations, Creating
8 Interactive Dashboard and Publishing your Dashboard to Tableau Public
Site

Scatter Plots, Data Highlighter, Pages and Cards, Annotations Creating


9
Storyand publishing on Tableau Public

10 Given a case study: Perform Interactive Data Visualization with Tableau


Practical No. 01

Aim :
Apply pivot table of Excel to perform data analysis.

Theory:
Data analysis on a large set of data is quite often necessary and important. It
involves summarizing the data, obtaining the needed values and presenting the
results.
Excel provides PivotTable to enable you summarize thousands of data values easily
and quickly so as to obtain the required results.
Consider the following table of sales data. From this data, you might have to
summarize total sales region wise, month wise, or salesperson wise. The easy way
to handle these tasks is to create a PivotTable that you can dynamically modify to
summarize the results the way you want.
Creating PivotTable

To create PivotTables, ensure the first row has headers.


 Click the table.
 Click the INSERT tab on the Ribbon.
 Click PivotTable in the Tables group. The PivotTable dialog box appears.

As you can see in the dialog box, you can use either a Table or Range from the current
workbook or use an external data source.
 In the Table / Range Box, type the table name.
 Click New Worksheet to tell Excel where to keep the PivotTable.
 Click OK.
A Blank PivotTable and a PivotTable fields list appear.

Recommended PivotTables

In case you are new to PivotTables or you do not know which fields to select from the
data, you can use the Recommended PivotTables that Excel provides.
 Click the data table.
 Click the INSERT tab.
 Click on Recommended PivotTables in the Tables group. The
Recommended PivotTables dialog box appears.
In the recommended PivotTables dialog box, the possible customized PivotTables
that suit your data are displayed.
 Click each of the PivotTable options to see the preview on the right side.
 Click the PivotTable Sum of Order Amount by Salesperson and month.

Click OK. The selected PivotTable appears on a new worksheet. You can observe
the PivotTable fields that was selected in the PivotTable fields list.

PivotTable Fields
The headers in your data table will appear as the fields in the PivotTable.

You can select / deselect them to instantly change your PivotTable to display only the
information you want and in a way that you want. For example, if you want to display
the account information instead of order amount information, deselect Order Amount
and select Account.

Input Dataset:
Output:
Practical No. 02

Aim :
Perform Descriptive statistics of given dataset using Data Analysis Toolbox of Excel.

Theory:
Data Analysis Toolbox of Excel
Excel provides a data analysis tool called Descriptive Statistics which produces a
summary of the key statistics for a data set.
Example 1: Provide a table of the most common descriptive statistics for the scores
in column A of Figure 1.

Figure 1 – Output from Descriptive Statistics data analysis tool

The output from the tool is shown in the right side of Figure 1. To use the tool,
select Data > Analysis | Data Analysis and choose the Descriptive Statistics
option.
A dialog box appears as in Figure 2

Input Dataset:
Figure 2 – Dialog box for Excel’s data analysis tool

Now click on Input Range and highlight the scores in column A (i.e. cells A3:A14).
If you include the heading, as is done here, check the Labels in first row. Since we
want the output to start in cell C3, click the Output Range radio button and insert C3
(or click on cell C3). Finally,
click the Summary statistics checkbox and press the OK button. Note that if we
had also checked the Kth Largest checkbox, the output would also contain the value
for LARGE(A4:A14, k) where k is the number we insert in the box to the right of the
label Kth Largest. Similarly, checking the Kth Smallest checkbox outputs
SMALL(A4:A14, k).
The option Confidence Interval for the Mean option generates a confidence interval
using the t distribution as explained in One Sample t-Test.

To generate descriptive statistics for given scores, execute the


following steps.
1. On the Data tab, in the Analysis group, click Data Analysis. ...
2. Select Descriptive Statistics and click OK.
3. Select the range A2:A15 as the Input Range.
4. Select cell C1 as the Output Range.
5. Make sure Summary statistics is checked.
6. Click OK.
Example 2
Use Excel’s Descriptive Statistics data analysis tool .
Find the following
Mean
Standard error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Range
Minimum
Maximum
Sum
Count
For the given above table by using Data Analysis toolbox.
Solution: By using data Analysis tool in excel As method explained above.
We have use the above table data and get the result as
shown in figure 3
OUTPUT:
Conclusion:
MS excel is a tool which can cover and serve almost all the perspectives of Data
analysis and due to this it has become a by default choice for a first step towards the
world of Data Analysis. Though excel has almost everything to offer for Data
crunching and analysis still there are some limitations of this package like:
Handling huge data volumes
Handling streaming data from disparate sources
Excel’s limitations for scientific applications.
Many useful functions or tests are still not available in Excel and its Analysis
Tools,
Even, with the above-mentioned shortcomings, excel is the preferred tool for Data
analysis for many users. Also if someone is an experience user in the field of Data
Analytics still MS Excel offers something new to learn.
Practical No. 03

Aim :
Perform the Histogram Analysis of given dataset using Data Analysis Toolbox of
Excel.
Theory:
Histogram in Excel
What is a histogram?
"Histogram is a graphical representation of the distribution of numerical data."
Absolutely true, and… totally unclear :) Well, let's think about histograms in another
way.
Have you ever made a bar or column chart to represent some numerical data? I bet
everyone has. A histogram is a specific use of a column chart where each column
represents the frequency of elements in a certain x range. In other words, a
histogram graphically displays the number of elements within the consecutive non-
overlapping intervals, or bins.
For example, you can make a histogram to display the number of days with a
temperature between 61-65, 66-70, 71-75, etc. degrees, the number of sales with
amounts between $100-$199, $200-$299, $300-$399, the number of students with
test scores between 41-60, 61-80, 81-100, and so on.
The following screenshot gives an idea of how an Excel histogram can look like:

How to create a histogram in Excel using Analysis ToolPak


This example teaches you how to create a histogram in Excel.
1. First, enter the bin numbers (upper levels) in the range C4:C8.
2. On the Data tab, in the Analysis group, click Data Analysis.
Note: can't find the Data Analysis button? Click here to load the Analysis ToolPak
add-in.
3. Select Histogram and click OK.
4. Select the range A2:A19.
5. Click in the Bin Range box and select the range C4:C8.
6. Click the Output Range option button, click in the Output Range box and select
cell F3.
7. Check Chart Output

8. Click OK.

9. Click the legend on the right side and press Delete.


10. Properly label your bins.
11. To remove the space between the bars, right click a bar, click Format Data
Series and change the Gap Width to 0%.
12. To add borders, right click a bar, click Format Data Series, click the
Fill & Line icon, click Border and select a color.

INPUT :
OUTPUT:

Parts of a Histogram
1. The title: The title describes the information included in the
histogram.
2. X-axis: The X-axis are intervals that show the scale of values
which the measurements fall under.
3. Y-axis: The Y-axis shows the number of times that the values
occurred within the intervals set by the X-axis.
4. The bars: The height of the bar shows the number of times that the
values occurred within the interval, while the width of the bar
shows the interval that is covered. For a histogram with equal bins,
the width should be the same across all bars.
Importance of a Histogram
Creating a histogram provides a visual representation of data distribution.
Histograms can display a large amount of data and the frequency of the
data values. The median and distribution of the data can be determined by
a histogram. In addition, it can show any outliers or gaps in the data.
Practical No. 04

Aim :
Perform Simple Linear Regression using Data Analysis Toolbox of Excel or with
Python and Interpret the regression table
Theory:
Linear regression analysis, in general, is a statistical method that shows or predicts
the relationship between two variables or factors.
There are 2 types of factors in regression analysis:
Dependent variable (y):It’s also called the ‘criterion variable’, ‘response’, or
‘outcome’ and is the factor being solved.
Independent variable (x): This is otherwise known as ‘explanatory variables’ or
‘predictors’. They are factors used in solving the dependent variable due to their
influence or effect on the said variable.
Usually, this type of analysis is used when one is trying to find or establish the
correlation between variables.
Here’s the linear regression formula:
y = bx + a + ε
As you can see, the equation shows how y is related to x.
On an Excel chart, there’s a trendline you can see which illustrates the regression
line — the rate of change.
Here’s a more detailed definition of the formula’s parameters:
y (dependent variable)
b (the slope of the regression line)
x (independent variable)
a (y-intercept of the regression line)
ε (the error term which accounts the variability in y that can’t be explained by
the analysis)
The analysis accounts for an error since they can’t be completely eliminated
especially in a predictive analysis such as this.
But don’t be surprised if you can’t find the error term in Excel. The program does it in
the background.
In summary, here’s what you need to do to insert a scatter plot in Excel:
Format your data in such a way that the independent variable is on the left column
and the dependent variable on the right.
Highlight your data.
Find and click the ‘Scatter’ icon under the ‘Scatter’ group on the ‘Charts’ category
on the ribbon.

To draw the regression line, let’s add a trendline on the chart. Click on any of the
data points and right-click. Select ‘Add Trendline’.
After that, a window will open at the right-hand side.
‘Linear’ is the default ‘Trendline Options’. If it’s not selected, click on it.
Also, if you like to show the equation on the chart, tick the ‘Display Equation on
chart’ box.
How to interpret the results
Primarily, what you’re looking in a simple linear regression is the correlation
between the variables. Fortunately, in Excel, the trendline does it all for you.
The trendline will tell you if the relationship of your variables is positive or negative.
Positive: If the line shows an upward trend. This indicates that as the
independent variable increases, the dependent variable also increases. The
same with our example, as the pageviews increase, we can expect to see a rise
in sales as well.
Negative: If the line shows a downward trend. This suggests that as the
independent variable increases, the dependent variable decreases.
None at all: This is easy to spot. There is no correlation between the variables
(therefore, no way to predict the next values) when the points in the scatter plot
don’t resemble a line as they are scattered. You can still see a line if you add a
trendline no matter how random the points are, but the line is usually close to a
horizontal line.

Excel Regression Analysis Output Explained


1. Multiple R. This is the correlation coefficient. It tells you how strong the linear
relationship is. For example, a value of 1 means a perfect positive relationship
and a value of zero means no relationship at all. It is the square root of rsquared.
2. R squared. This is r2, the Coefficient of Determination. It tells you how many
points fall on the regression line. for example, 80% means that 80% of the variation
of y-values around the mean are explained by the x-values. In other words, 80% of
the values fit the model.
3. Adjusted R square. The adjusted R-square adjusts for the number of terms in a
model. You’ll want to use this instead of #2 if you have more than one xvariable.
4. Standard Error of the regression: An estimate of the standard deviation of the
error μ. This is not the same as the standard error in descriptive statistics! The
standard error of the regression is the precision that the regression coefficient is
measured; if the coefficient is large compared to the standard error, then the
coefficient is probably different from 0.
5. Observations. Number of observations in the sample.
INPUT:
OUTPUT:
Practical No. 05

Aim :
Perform Multiple Linear Regression using Data Analysis Toolbox of Excel or with
Python and Interpret the regression table.
Theory:
Regression Analysis With Excel
In the real world, you will probably never conduct multiple regression analysis by
hand. Most likely, you will use computer software (SAS, SPSS, Minitab, Excel, etc.).

Excel is a widely-available software application that supports multiple regression. In


this lesson, we use Excel to demonstrate multiple regression analysis. (Other
software packages produce outputs similar to Excel; so if you understand the
outputs from Excel, you will understand similar outputs from other software.)

Sample Problem With Excel


Consider the table below. It shows three performance measures for 10 students.

In this lesson, using data from the table, we are going to complete the following
tasks:

Develop a least-squares regression equation to predict test score, based on (1) IQ


and (2) the number of hours that the student studied.
Assess how well the regression equation predicts test score, the dependent
variable.
Assess the contribution of each independent variable (i.e., IQ and study hours) to
the prediction.
These are common tasks in regression analysis. With the right software, they are
easy to accomplish. We'll walk you step by step through each task, starting with
setting up Excel.

How to Enable Excel


 Open Excel.
 Click the Data tab.
 If you see the Data Analysis button in the upper right corner, the Analysis
TookPak is enabled and you are ready to go.

If the Data Analysis button is not visible, the Analysis ToolPak is not enabled. In that
case, do the following:

 Click the File tab.


 Select Options to open the Excel Options dialog box.
 Click the Add-Ins item, from the left column. This opens the View and Manage
Microsoft Office Add-ins screen.
 From the Manage drop-down box, choose Excel Add-Ins and click the Go
button. This opens the Add-Ins dialog box.
 From the Add-Ins dialog, check the box beside Analysis ToolPak and click Go.

This enables the Analysis ToolPak. Now, when you click the Data tab, you will see a
Data Analysis button in the upper right corner under the Data tab. (If this explanation
of how to enable the Analysis ToolPak is unclear,
go to https://stattrek.com/anova/excel-analysis-toolpak for more detailed instruction.)
Data Entry With Excel
Data entry with Excel is easy. There are three main steps:
o Enter data on spreadsheet.
o Identify independent and dependent variables.
o Specify desired analyses.
To illustrate the process, we'll walk through each step, using data from our sample
problem. First, we want to enter data on an Excel spreadsheet.

Next, we want to identify the independent and dependent variables. Begin by


clicking the Data tab and the Data Analysis button.
Excel will display the Regression dialog box. This is where you identify data fields
for the independent and dependent variables. In the Input Y Range, enter
coordinates for the dependent variable. In the Input X Range, enter coordinates for
the independent variable(s). If you include column labels in these input ranges,
check the Labels box. In the example below, we have included labels, so the Labels
box is checked.

By default, Excel will produce a standard set of outputs. For this sample problem,
that's all we need; so click OK to generate standard regression outputs.

Note: If desired, you can request additional outputs in the form of residual plots and
normal probability plots. To produce the plots, check the appropriate box(es) under
Output options on the Regression dialog box.

Data Analysis With Excel

Excel provides everything we need to address the tasks we defined for this sample
problem. Recall that we wanted to do three things:

 Develop a least-squares regression equation to predict test score, based on


(1) IQ and (2) the number of hours that the student studied.
 Assess how well the regression equation predicts test score, the dependent
variable.
 Assess the contribution of each independent variable (i.e., IQ and study hours)
to the prediction.

Regression Equation

The first task in our analysis is to define a linear, least-squares regression equation
to predict test score, based on IQ and study hours. Since we have two independent
variables, the equation takes the following form:

ŷ = b0 + b1x1 + b2x2

In this equation, ŷ is the predicted test score. The independent variables are IQ and
study hours, which are denoted by x1 and x2, respectively. The regression
coefficients are b0, b1, and b2. On the right side of the equation, the only unknowns
are the regression coefficients; so to specify the equation.

Excel does all the hard work behind the scenes, and displays the result in a
regression coefficients table:

Here, we see that the regression intercept (b0) is 23.156, the regression coefficient
for IQ (b1) is 0.509, and the regression coefficient for study hours (b2) is 0.467. So
the least-squares regression equation can be re-written as:

ŷ = 23.156 + 0.505 * IQ + 0.467 * Hours

This is the only linear equation that satisfies a least-squares criterion. That means
this equation fits the data from which it was created better than any other linear
equation.

Coefficient of Multiple Determination/

Researchers look at the coefficient of multiple determination (R2). The coefficient of


multiple determination measures the proportion of variation in the dependent
variable that can be predicted from the set of independent variables in the
regression equation. When the regression equation fits the data well, R2 will be large
(i.e., close to 1); and vice versa.

The coefficient of multiple determination can be defined in terms of sums of squares:


SSR = Σ ( ŷ - y )2

SSTO = Σ ( y - y )2

R2 = SSR / SSTO

where SSR is the sum of squares due to regression, SSTO is the total sum of
squares, ŷ is the predicted value of the dependent variable, y is the dependent
variable mean, and y is the dependent variable raw score.

Luckily, you will never have to compute the coefficient of multiple determination by
hand. It is a standard output of Excel (and most other analysis packages), as shown
below.

A quick glance at the output suggests that the regression equation fits the data
pretty well. The coefficient of muliple determination is 0.905. For our sample
problem, this means 90.5% of test score variation can be explained by IQ and by
hours spent in study.

An Alternative View of R2

The coefficient of multiple correlation (R2) is the square of the correlation between
actual and predicted values of the dependent variable. Thus,

R2 = r2y, ŷ

where y is the dependent variable raw score, ŷ is the predicted value of the
dependent variable, and ry, ŷ is the correlation between y and ŷ.

ANOVA Table

Another way to evaluate the regression equation would be to assess the statistical
significance of the regression sum of squares. For that, we examine the ANOVA
table produced by Excel:
This table tests the statistical significance of the independent variables as predictors
of the dependent variable. The last column of the table shows the results of an
overall F test. The F statistic (33.4) is big, and the p value (0.00026) is small. This
indicates that one or both independent variables has explanatory power beyond
what would be expected by chance.

Like the coefficient of multiple correlation, the overall F test found in the ANOVA
table suggests that the regression equation fits the data well.

Significance of Regression Coefficients

With multiple regression, there is more than one independent variable; so it is


natural to ask whether a particular independent variable contributes significantly to
the regression after effects of other variables are taken into account. The answer to
this question can be found in the regression coefficients table:

The regression coefficients table shows the following information for each
coefficient: its value, its standard error, a t-statistic, and the significance of the t-
statistic. In this example, the t-statistics for IQ and study hours are both statistically
significant at the 0.05 level. This means that IQ contributes significantly to the
regression after effects of study hours are taken into account. And study hours
contribute significantly to the regression after effects of IQ are taken into account.

Note: This analysis omits any consideration of multicollinearity, a topic we will cover
in the next lesson. Be aware, however, that it is best practice to assess
multicollinearity in the independent variables before testing significance of
regression coefficients.
INPUT:

OUTPUT:
Final Thoughts / Conclusion

This lesson was all about multiple regression analysis. We used Excel, but the
analysis would be much the same with other software packages. All major software
packages (SAS, SPSS, Minitab, etc.) produce three key outputs:

 Regression coefficients, based on a least-squares criterion.


 Measures of goodness of fit, like a coefficient of multiple determination and/or
an overall F test.
 Significance tests for individual regression coefficients.
Practical No. 06

Aim :
Perform the Logistic Regression and given dataset and interpret the regression
table.

Theory: Logistic regression is a method that we use to fit a regression model


when the response variable is binary.

This tutorial explains how to perform logistic regression in Excel.

Example: Logistic Regression in Excel

Use the following steps to perform logistic regression in Excel for a dataset that
shows whether or not college basketball players got drafted into the NBA (draft: 0 =
no, 1 = yes) based on their average points, rebounds, and assists in the previous
season.

Step 1: Input the data.

First, input the following data:

Step 2: Enter cells for regression coefficients.


Since we have three explanatory variables in the model (pts, rebs, ast), we will
create cells for three regression coefficients plus one for the intercept in the model.
We will set the values for each of these to 0.001, but we will optimize for them later.

Next, we will have to create a few new columns that we will use to optimize for these
regression coefficients including the logit, elogit, probability, and log likelihood.

Step 3: Create values for the logit.

Next, we will create the logit column by using the the following formula:
Step 4: Create values for elogit.

Next, we will create values for elogit by using the following formula:

Step 5: Create values for probability.

Next, we will create values for probability by using the following formula:

Step 6: Create values for log likelihood.


Next, we will create values for log likelihood by using the following formula:

Log likelihood = LN(Probability)

Step 7: Find the sum of the log likelihoods.

Lastly, we will find the sum of the log likelihoods, which is the number we will
attempt to maximize to solve for the regression coefficients.
Step 8: Use the Solver to solve for the regression coefficients.

If you haven’t already install the Solver in Excel, use the following steps to do so:

 Click File.
 Click Options.
 Click Add-Ins.
 Click Solver Add-In, then click Go.
 In the new window that pops up, check the box next to Solver Add-In, then
click Go.

Once the Solver is installed, go to the Analysis group on the Data tab and
click Solver. Enter the following information:

 Set Objective: Choose cell H14 that contains the sum of the log likelihoods.
 By Changing Variable Cells: Choose the cell range B15:B18 that contains
the regression coefficients.
 Make Unconstrained Variables Non-Negative: Uncheck this box.
 Select a Solving Method: Choose GRG Nonlinear.

Then click Solve.

The Solver automatically calculates the regression coefficient estimates:


By default, the regression coefficients can be used to find the probability that draft =
0.

However, typically in logistic regression we’re interested in the probability that the
response variable = 1.

So, we can simply reverse the signs on each of the regression coefficients:

Now these regression coefficients can be used to find the probability that draft = 1.

For example, suppose a player averages 14 points per game, 4 rebounds per game,
and 5 assists per game. The probability that this player will get drafted into the NBA
can be calculated as:

P(draft = 1) = e3.681193 + 0.112827*(14) -0.39568*(4) – 0.67954*(5) / (1+e3.681193 + 0.112827*(14) -0.39568*(4) –


0.67954*(5)
) = 0.57.

Since this probability is greater than 0.5, we predict that this player would get drafted
into the NBA.
INPUT:
Practical No. 07

Aim :
Install Tableau, Understand the User Interface, Dimensions, Measures, Pages,
Filters, and Marks and Show Me, Dataset Connections and Create a visualization.
Theory:
There are two points to consider here:
 Tableau Public is free.
 Tableau Desktop is available only for commercial use.

Downloading and Installing Tableau Public

1- Visit the URL https://public.tableau.com/en-us/s/download on your web browser.


Once the window opens, enter your email id when asked, and click on the “Download
the App” button.

2- The file will start downloading in “.exe” format. You can view the download progress
on the bottom-left corner of the tab.

3- Once the progress is 100 percent, open the file. Accept the terms and conditions
by selecting the checklist boxes and click on the “Install” button.
4- Once the installation is complete, open Tableau and start the screen of Tableau
Public as shown below.

Downloading and Installing Tableau Desktop

1- Enter this URL https://www.tableau.com/products/desktop on your web browser.

2- Click on the “TRY NOW” button in the top-right corner of the website as shown
below.
3- Once you click on the “TRY NOW” button, you will be redirected to a page that will
ask you to feed in your official email address. After filling in the email address, click
on the “DOWNLOAD FREE TRIAL” button.

4- The latest version of Tableau Desktop will start downloading, and you will be able
to view the download progress in the bottom-left corner of the screen.

5- Once downloaded, open the file. Accept the terms and conditions, and click on the
“Install” button.

6- A pop-up option will appear asking for the approval of the administrator to install
the software. Click on “YES” to approve and move further.

7- On approval, the installation will start. On the completion of the installation, open
Tableau.

8- This is the final stage that asks for registration. Click on “Activate Tableau” and
enter your license details or credentials.

9- Click on “Start Trial Now” and wait for the registration process to complete.

10- Once it is completed, open the Tableau screen as shown below.


Introduction to Tableau Desktop Software Workspace and Navigation

1- To open the Tableau Workspace, go to File in the Start window and click on “New.”
The Tableau Workspace looks like the following screenshot.

Description of the Tableau Workspace

Now, let us understand the individual sections of the Tableau Workspace:

 Menu Bar: It is the topmost bar; it contains File, Data, Worksheet,


Dashboard, Story, Analysis, Map, Format, Server, Window, and Help.
These options include features such as data exporting, file saving, etc.
 Toolbar: The tab present just below the Menu Bar is the Toolbar. It is
basically used for editing the workbook and consists of various options
such as view, undo, redo, slideshow, text edit, etc.
 Dimension Shelf: The dimensions involved in the data sources can be
viewed in the dimension shelf.
 Marks Card: The options in the marks card are used to design the
visualizations in the Tableau workspace.
 Measures Shelf: All measures that are present in the data source can be
viewed in the Measures Shelf.
 Sets and Parameters Shelf: All user-defined sets and parameters can be
viewed on this shelf. The existing sets and parameters can also be edited
with the help of the options.
 Tableau Repository: Tableau Repository is usually located in the file path
C:\Users\User\Documents\My Tableau Repository. The repository is used
to store all Tableau files; the repository is segregated into various folders
such as Bookmarks, Connectors, Logs, Data Sources, etc.

Tableau Navigation

The following image shows what Tableau navigation looks like:

Now, let us look at the individual sections of Tableau navigation:

 Data Source: It is typically used for either the addition of a new data
source or the modification of an existing source.
 Current Sheet: In the image, what you see as “Sheet 1” refers to the
current sheet. All sheets and dashboards in the current workbook can be
viewed here.
 New Sheet: The first squared box with a “+” sign refers to this option. It is
used for creating new worksheets in Tableau Desktop.
 New Dashboard: The second squared box with a “+” sign refers to this
option. This icon is used to create a new dashboard in Tableau Workbook.
 New Story: The third squared box with a “+” sign refers to this option. This
icon is also used to create a new storyboard in the Tableau workbook.
INPUT & OUTPUT:
Practical No. 08

Aim :
Various graphs in Tableau, Integration of Map and geo-locations, Creating
Interactive Dashboard and Publishing your Dashboard to Tableau Public Site.

Theory:
Importance of Spatial Data?

Spatial data is one of the most demand data types. Because spatial data can help
us;

 make a better understanding of the answer of Where?

 determine relationships

 identify patterns

 make prediction

Maps are certainly a great way to display spatial data. Nowadays, you don’t need
any GIS Software for creating maps and publishing them. BI Tools are now capable
of creating a thematic map.

There are lots of Business Intelligence Tools that help you create powerful and
effective graphs, dashboard, visualisations such as Tableau, Qlik Sence, Power BI,
Looker, Microstrategy etc.

London Boroughs: GeoJSON file of neighborhoods of the city acquired


from Inside Airbnb

 London Borough Profile: CSV file of London Borough Profiles acquired


from Greater London Authority
Step 1: Install Tableau Public and Create Profile

Tableau Public is free software that can allow anyone to connect to a spreadsheet or
file and create interactive data visualizations for the web. You can download Tableau
Public for windows and Mac using this link.

However, you need to create a profile to present your visualisations on the internet.

Step 2: Connect Your File To Tableau Public

When you open Tableau Public Application, you can see the “Connect” area at the
upper left.

o connect your spatial file (LondonBorough.GeoJson) select Spatial File, browse


your file and open. When you open your spatial file connection screen appears
shown below. Your file must include Geometry column in MultiPolygon type.
To add your tabular data (LondonBoroughProfile.csv) click Add button at the
Connections area and then select “Text file”.

Browse your LondonBoroughProfile.csv file and Click Open. Now you have 2
connections and 2 files one is Spatial file and other is “Text file”.
If you want to preview your tabular data, you can click the “View Data” button and
preview your data.

To create a relation of your two files, select LondonBoroughProfile.csv file, drag and
drop to relation area.
Now you have to select a common column (borough name) of two files. The name of
the column at LondonBoroughs.geojson is “Neighbourhood” and the name of the
column at LondonBoroughProfile.csv is “Borough Name”

After creating the relationship between your files, you can see connections as shown
below.
Step 3: Creating a Map Sheet

To create map visualises, click Sheet 1 that places lower left.

Now you have a basic map of London Boroughs. You need to configure your map
with some appearance features such as label, colour, tooltip etc.
To create a label: Select Neighbourhood, drag and drop to Label

To create a thematic map: Select one feature (under LondonBoroughProfile.csv)


that you want to see as a thematic map, drag and drop to Colour area. In this
tutorial, I selected “Crime rates per thousand population 2014/2015”
Now you have a thematic map based on Crime Rate. You can configure appearance
features by changing colours, labels etc. You can add Filter, Legend, Title etc to get
a more understandable map.

OUTPUT:
Practical No. 09

Aim :
Scatter Plots, Data Highlighter, Pages and Cards, Annotations Creating Storyand
publishing on Tableau Public
Theory:
Add new features as a Tooltip:
Tooltip provides us with additional information about attributes. So you can add
more attributes that user can interact when cursor moving over the map. Select one
or more attributes, drag and drop to Tooltip area. In this tutorial I
selected “Employment Rate (%) 2015”, “Average Age 2017” and “Number of
Cars, (2011 Census)” attributes and renamed shown below;

 Employment Rate

 Average Age

 Number of Cars

Step 4: Creating Graphs

To creating new graphs click New Worksheet that placed lower left.
After creating the new worksheet, empty worksheet appears. Select Borough
Name, drag and drop to Column area. Select Crime Rates, drag and drop to Rows
area.

To colourise and add a label, select Crime Rates, drag and drop to Colour and again
Crime Rates, drag and drop to Label
here is no data at “City of London” borough so you need to remove this record from
your graph. Click Borough Name’s arrow and select Filter. Then uncheck City of
London.

Step 5: Creating a Dashboard

You can create more worksheet that includes different graphs. And finally, you can
combine these worksheets to a dashboard. To create a new dashboard, click New
Dashboard button and empty dashboard appears.

Change Size to “Automatic”

Drag Sheet1 (Thematic Map) and Sheet2 (Bar Chart) drop to the dashboard area.

I placed these two sheets such as shown below.


if you want to see the interaction of all visualises, you need to set “Use as Filter”
button.

So, if you select a borough from the map or graph you see results to other graphs
dynamically

Step 6: Publishing

Finally, you prepared a Dashboard that includes your map and graph. Now you need
to publish this dashboard to your Tableau Public profile. Click File and select Save to
Tableau Public As
After saving the Dashboard, your Tableau Public profile opens automatically. You
can use the Full-Screen button to maximize your dashboard.

By converting and following above steps you can convert it into scatter plot or any
other type of data visual form.

OUTPUT:
Practical No. 10

Aim: Given a case study: Perform Interactive Data Visualization with Tableau.

Case Study:

You might also like