0% found this document useful (0 votes)
219 views30 pages

321321

321321

Uploaded by

Mohammad Rashman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
219 views30 pages

321321

321321

Uploaded by

Mohammad Rashman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 30
3.1 Overview of Data Visualization 73 information systems (GIS), Our detailed examples use Excel t0 generate tables and charts, andl we discuss several software packages that can be used for advanced data visualizagon. The appendix to this chapter covers the use of XLMiner (an Excel Add-in) for data visualization, Overview of Data Visualization Decades of research studies in psychology and other fields show that the human mind ean process visual images such as charts much faster than it can inteqpret rows of numbers. However, these same studies also show that the human mind has certain Himitations in its ability o interpret visual images and that some images arc better at conveying information than others. The goal of this chapter is to introduce some of the most common forms of visualizing data and demonstrate when these forms are appropriate. Microsoft Excel isa ubiquitous too! in business for basic data visualization. Software tools such as Excel make it easy for anyone to cteate many standard examples of data visualization. However, as discussed in this chapter, the default settings for tables and charts created with Excel can be altered to increase clarity. New types of software that are dedicated to data visu- alization have appeared recently. We focus our techniques on Excel inthis chapter, but we also ‘mention some of these more advanced software packages for specific data visualization usc. Effective Design Techniques One of the most helpful ideas for creating effective tables and charts for data visualization is the idea of the data-ink ratio, first described by Edward R. Tutte in 2001 in his book The Visual Display of Quantitative Information. The data-ink ratio measures the proportion of what Tufte teams “data-ink” to the total amount of ink used in a table or chart, Data-ink is the ink used in a fable or chart that is necessary to convey the meaning of the data to the audience, Non-data-ink is ink used in a table or chart that serves no useful purpose in conveying the data to the audience. Let us consider the case of Gossamer Industries, a firm that produces fine silk clothing, products. Gossamer is interested in tracking the sales of one of its most popular items, a particular style of women’s scarf. Table 3.1 and Figure 3.3 provide examples of atable and chart with Tow data-ink ratios used to display sales of this style of women’s scarf, The da used inthis able and figure represent product sales by day. Both of these examples are simi- lar to tables and charts gencrated with Excel using common default settings. In Table 3. ‘most of the grid lines serve no useful purpose. Likewise, in Figure 3.3, the horizontal lines in the chart also add little additional information. In both cases, most of these lines can be TABLE 3.1 EXAMPLE OF A LOW DATAAINK RATIO TABLE, Scarf Sales By Day Day Sales Day Sales 1 150 m 170 2 170 2 160 3 140 13 290 + 150 4 200 5 120 15 210 6 120 16 Tio 1 210 17 90 ® 20 18 0 i 9 740 7 150 i io 20 20 20 i 7a CChepler 3 Data Visualization FIGURE 3.3. EXAMPLE OF A LOW DATA-INK RATIO CHART Sear? Sales by Day 350 oe Sales 250 200 150] OT 23456 7 8 9 1011 1213 14 15 16 17 18 19 20 1g the information conveyed. However, an important piece of infor- ‘mation is missing from Figure 3.3: labels for axes. Axes should always be labeled in a chart unless both the meaning and unit of measure are obvious, ‘Table 3.2 shows a modified table in which all grid lines have been deleted except for those around the tile ofthe table. Deleting the grid lines in Table 3.1 increases the data-ink ratio because a Larger proportion of the ink used in the table is used to convey the informa tion (dhe actual numbers). Simnitary, deleting the unnecessary horizontal lines in Figure 3.4 increases the data-ink ratio, Note that deleting these horizontal lines and removing (or reducing the size of) the markers at each data point can make it more difficult to determine ‘the exact values plotted in the chart, However, as we discuss later, a simple chart is not the most effective way of presenting data when the audience needs to know exact values. In these cases, it is better to use a table. In many cases, white space in a table or chast can improve readability. This principle is similar to the idea of increasing the data-ink ratio. Consider Table 3.2 and Figure 3.4 Removing the unnecessary ines has increased the white space, makingit easier to read both, the table and the chart. The fundamental idea in creating effective tables and charts is t0 ake them as simple as possible in conveying information to the reader. TABLE 3.2 INCREASING THE DATA-INK RATIO BY REMOVING UNNECESSARY GRIDLINES Scarf Sales By Day Day Sales Day Sales 1 150 rm 170 2 170 2 160 3 140 B 290 4 150 4 200 5 180 15 210 6 180 16 110 1 210 7 90 8 20 is 140 9 140 19 150 10 200 20 230 ‘Pompe ee 20 ‘Sommer 3.2. Tables 75 FIGURE 3.4 INCREASING THE DATA-INK RATIO BY ADDING LABELS TO AXES AND REMOVING UNNECESSARY LINES AND LABELS Scarf Sales by Day 390 = 4 * 100) E) 5 e is a rss 7 8 Ws a7 19 Day i § NOTES AND COMMENTS 1, Tables have been used to display data for chapter. More ecenly, individuals such as Wile ‘more than a thousand years. However, chars iam Cleveland, Edward R. Tuite, and Stephen fare much more receat inventions. The famous Few have intoduced design tecaiques for both sevenigenth-century French mathematician, clvity and boauty in dats visualization, René Descartes, is credited with inventing the 2, Many of the defuilt settings in Excel are not ideal ‘now familiar graph with horizontal snd verti for disphying- data using tables and charts that cal axes, William Playfaie invented bar charts, _comaninicate effectively. Before presenting Excel line charts, and pie charts inthe late eighteenth gonerated wiles and charts to others, tis wort the century, all of Which we will discuss in this effort to remeve wanccessary lines an labels , (32) Tables Ss ‘The first decision in displaying data is whether a table ora chart will be more effective, In general, charts can often convey information faster and easier to readers, but in some cases a table is more appropriate, Tables should be used when 1, The reader needs (o refer to specific numerical values. 2. The reader needs to make precise comparisons between different values and net just relative comparisons. 3. The values being displayed have different units or very different magnitudes. Consider when the accounting department of Gossamer Industries is summarizing the ‘company’s annual data for completion of its federal tax forms. In this ease, the specitic numbers corresponding (0 revenues and expenses are important and not just the relative values. Therefore, these data should be presented in a table similar to Table 3.3. ‘Similarly, if it is important to know exactly by how much revenues exceed expenses cach month, then this would also be better presented as a table rather than as a line chart, as scen in Figure 3.5. Notice that it is very difficutt to detemine the monthly revenucs and costs in Figure 3.5. We could add these values using data labels, but they would clutter the figure, The preferred solution is to combine the chart with the table into a single figure, as in Figure 3.6, to allow the reader to easily see the monthly changes in revenues and costs while also being able to referto the exact numerical values. 7% Chapter 3 Dota Visualization TABLE 3.3. TABLE SHOWING EXACT VALUES FOR COSTS AND REVENUES BY MONTH FOR GOSSAMER INDUSTRIES Month . lie 2 2 4 5 6 | tou § Costs 8) 483 56d58 OS =—RSB SATIS S0.985 | 20507 Revennes (f) 64,124 66,128 67,125, 48.178, ‘51785 ‘35,687 353,027 i FIGURE 2.5 LINE CHART OF MONTHLY COSTS AND REVENUES AT GOSSAMER INDUSTRIES 50,000, 10,000) De uve Revenues ($) 50,000] Costs ($) 41900) 30,000 20.000) 10,000 G —— St 5 Month e 3 FIGURE 3.6 COMBINED LINE CHART AND TABLE FOR MONTHLY COSTS AND REVENUES AT GOSSAMER INDUSTRIES Revenues (8) Costs) 0 1 2 3 4 5 6 Month Month I z a z z 6] tual Cots ($) 48123 56458 64.125 52,158 54,718 50,985 ] 326,567 [Revenues (S) 64124 66128 67,125 48.178 51.785 _ $5,687 | 353,027 ‘Scmpgoentg 20s 3.2. Tables 7 TABLE3.4 TABLE DISPLAYING HEADCOUNT, COSTS, AND REVENUES AT GOSSAMER INDUSTRIES Month \ 2 3 4 5 6 Total g Headcount 3 9 0 9 9 9 : Cats (8) sais seass uns sass saris sages | aescr B Revensesis) | 6LIs 6128 TIS. «ARIE. SL8S.—SSa87 | aston? B 5 Now suppose that you wish to display data on revenues, costs, and head count for cach month, Costs and revenues are meastyed in dollars ($), but head count is measured in number of employees. Although all these values can be displayed on a line chart using multiple vertical axes, this is generally not recommended. Because the values have widely different magnitudes (costs and revenues are in the tens of thousands, whereas headcount is approximately 10 cach month), it would be difficult to intespret changes on a single chart, ‘Therefore, a table similar to Table 3.4 is recommended. Table Design Principles Th designing an effective table, keep in mind the data-ink ratio and avoid the use of un- necessary ink in tables. In general, this means that We should avoid using vertical Lines in a wble unless they are necessary for clarity. Horizontal lines are generally necessary only for separating column tiles from data values oF when indicating thy has taken place, Consider Figure 3.7, which compares several forms of a able displaying Gossamer’ costs and revenue data, Most people find Design D, with the fewest grid lines, easiest to read. In this table, grid lines are used only to separate the column headings from the data and to indicate that a calculation has occurred to generate the Profits row and the ‘Total column. In Large tables, vertical lines or light shading can be useful to help the reader diffeventi- ate the columns and rows. Table 3.5 breaks out the revenue data by location for nine cities and shows 12 months of revenue and cost data. In Table 3.5, every other column has been FIGURE 3.7 COMPARING DIFFERENT TABLE DESIGNS Men cose) [TES Saas | sas | szise) suais [ons ass] vena | tine | asa |6n125| 17] 524s | 609 [5307] presi | isons] enn} 20m | comm|aorn | arm | ase Design D (Comes Sa eae | anos | me] So | SOS [KSAT] Cmts) | TRIS SGN ISN SDI SHTIN snow | GST eave] err feos [eras | mie) sists [seer fsssczT] Rewmme 8) WAH eee ons 484 sis ssost | ssso07 roses [isc] vr] sao fas] eo] ana] a0] “Preis isle! 947 S00 ven) 0s) arm | aso ‘Saree kerry aactowereisies seo | 3 87829 61819 9 s'z9 asvos y sis GLO'6SL | 86769 STI'6D_8TI'8O ATTN _BRTTD 899 FCP rer, OFF SP ise 0G FORE ors ULI AWDpOY oz9'95 wer eres BLO errs 99's seseduie’y 6ss'ls wes UTS c0'S siv’s 18's ams sovies sso ots 81's ise’s SH OH, ise RELL OCI 9F0'E aL aot 18°05 SFO, EOL FRE st 96 TI Over OSOTE Sc6 CI cop SSe'w osrol PLEOL EONS zizs Soe 01 sis'e @37% _019'6 1868 weg | a no oOF 6 8 L 9 5 G © z 1 (g) wonea07y Aq sanuanay UOT VVC dO SHLNOW CI YOd NOLLVOO7 Ad SHANIATY ONINOHS TISVL WOU $C 3ISVL opti mo ening i er My tc mek eet Dt me in ein gd ee me Op eT ADELA TS eal nats ek ng pean Cag ange epee hed ans age nape 78 3.2 Tables 79 lightly shacled. This helps the reacler quickly scan the table to see which values comrespond ‘with each month, The horizontal ine between the revenue for Academy and the Total ow hholps the reader differentiate the revenue data for each location and indicates that a cale- lation has taken place to generate the totals by month. If one wanted to highlight the dif- ferences among locations, the shading could he done for every other row instead of every ‘other column, We depart fron these Notice also the alignment ofthe text and numbers in Table 3.5. Columns of numerical sideline insome figs ues in a table shouldbe tight-aligned: that i, the final digit of cach number should be andes Bis texook ened inthe column. This makes it eisy to see ferences in the magnitude of values. IF Dev oupat ‘you are showing digits to the right of the decimal point, all values should include the samme hhumber of digits to the right of the decimal, Also. use only the number of digits that are necessary to convey the meaning in comparing the values; there is no nced to include ad- ditional digits if they are not meaningful for comparisons. In many business applications, ‘we report financial values, in which case we often round to the nearest dollar or include two digits to the right of the decimal if such precision is necessary. Adeltional digits to the ‘ight of tho cccimal are usually unnecessary. For extremely large numbers, we may prefer todisplay data rounded tothe nearest thousand, ten thousand, of even million. For instance, if we need to include, say, $3,457,982 and $10,124,390 in a table when exact dollar values ‘are not nevessary, we cou write these as 3458 and 10,124 and indicate shat all values in tie table are in units of $1000) Icis generally best (o left-align text values within a column in a table, as inthe Rev- cenues by Location (the frst) column of Table 3.5. In some cases, you may prefer to center text, but you should do this only if the text values are all approximately the same length. Otherwise, aligning the first letter of each data entry promotes readability: Column head- ings should either match the alignment of the data in the columns or be centered ever the valucs, as m Hable 4.9, Crosstabulation Types of data se A useful type of table for describing dats of two variables is a erosstabuation, which as categorical and provides a tabular summary of data for two variables, To illustrate, consider the following amine are dseued yofiaton based on dat from Zaga's Restaurant Review. Data on the quality ating, meal price, and the usual wait time for a table during peak hours were collected for a sample ‘of 300 Los Angeles area restaurants. Table 3.6 shows the data for the first ten restaurants. (Quality ratings are an example of categorical dala, and meal prices are an example of quantitative data TABLE 3.6 QUALITY RATING AND MEAL PRICE FOR 300 LOS ANGELES RESTAURANTS Resturant Quality Rating Meal Price (S$) Wait Time (min) 1 Good 18 5 2 Very Good 2 ‘ 3 Good 28 1 fj 4 Excelent 38 4 WEB rH 5 Very Good 33 6 6 Good 28 5 Seat 7 Very Good 19 iW 8 Very Good u 5 8 Very Good 2 1B 10 Good B 1 ‘Stra kerb WEBhiIle Restaurant Chapter Data Visualization TABLE 3.7 CROSSTABULATION OF QUALITY RATING AND MEAL PRICE FOR, 300 LOS ANGELES RESTAURANTS. Meal Price Quality Rating | $1019 $2029 3039 S449 | Total Good 2 40 2 0 a Very Good a a 46 6 150 Excellent A 14 28 2 66 “Total 8 1s 76 28 300) For now, we will limit our consideration to the quality rating and meal price variables. A erosstabulation of the data for quality rating and meal price data is shown in Table 3.7. The left and top margin labels define the classes for the two variables. Inthe left margin, the row labels (Good, Very Good, and Excellent) correspond to the three classes of the ‘quality rating variable. In the top margin, the column labels ($10-19, $20-29, $30-39, and $40-49) comespond (o the four classes (or bins) of the meal price variable. Each restau- rant in the sample provides a quality rating and a meal price, Thus, each restaurant in the sample is associated with a cell appearing in one of the rows and one of the columns of the crosstabulation. For example, restaurant 5 is identified as having a very good quality rating and a meal price of $33. This restaurant belongs to the cell in row 2 and column 3. In constructing a crosstahulaion, we simply count the number af restaurants that belong to ceacit of the cells in the erosstabulation, ‘Table 3.7 shows that the greatest number of restaurants in the sample (64) have very good rating anda meal price in the $20-29 range. Only two restaurants have an excellent rating and. ameal price in the $10-19 range, Similar interpretations of the other frequencies can be made. In addition, note that the right and bottom margins of the crosstabulation give the frequency of quality rating and meal price separately. From the right margin, we sce that data on qual- ity ratings show 84 good restaurants, 150 very good restaurants, and 66 excellent restaurants. Similarly, the bottom margin shows the counts for the meal price variable. The value of 300 in the bottom right comer of the table indicates that 300 restaurants were included inthis data set. PivotTables in Excel A crosstabulation in Microsoft Excel is known as a PivotTable, We will first look at a simple example of how Excel’s PivotTable is used to create a crasstabulation of the Zagat’s restaurant data shown previously. Figure 3. illustrates a portion of the data contamed a the file Restaurants; the data for the 300 restaurants in the sample have been entered into cells B2:D301 To create a PivorTable in Excel, we follow these steps: Step 1. Click the INSERT tab on the Ribbon Step 2. Click PivotTable in the Tables group Step 3. When the Create Pivot'Table dialog box appears: Choose Select a Table or Range Enter A/:D30/ in the Table/Range: bos Select New Worksheet as the location for the PivotTable Report Click OK ‘The resulting initial PivotTable Field List and PivotTable Report are shown in Figure 3.9. ‘tap een 2 wee) Restaurant 3.2. Tables FIGURE 3.8 EXCEL WORKSHEET CONTAINING RESTAURANT DATA, a B o D 1 | Restaurant| Quality Rating) Meal Price ($) | Wait Time (ruin) 2[ 1 [Goo 1s 3 32 Wey Good 2 6 4[ 3 [Good 28 T [4 [Excellent 38 a 6| 5 Very Good 3 6 7[__6 [Good By 3 8[ 7 Ney Goal 19 1 98 [Very Good i 9 tol 9 [Very Good 2B 13 10 [Good is T 2] 1 Wery Good Eg is ta|___12 [Very Good 4 7 4[ 13 [Excellene az is 15[__14 [Excellent 4 ae 6| 15 [Good 3 0 19|_16 |Gooa z 3 1817 [Good 26 3 to] 18 [Excellent 17 36 20| 19 Very Good 30 7 2i[ 20 [Good 19 3 22[ 21 Very Goad 33 To ba] 22 |Very Good 22 1 24[ 25 [Excellen 2 oa i25|__24__[Excellene 3 0 26| 25 _|Very Good M o FIGURE 3.9 INITIAL PIVOTTABLE FIELD LIST AND PIVOTABLE FIELD REPORT FOR ‘THE RESTAURANT DATA PEP EE Ph ‘om te Poe AL fs eM piv le Fields Al MeaPace Tete) ne 1 counns (a Wes Sows ‘Serna arn 28 82 Chapter 3. Deta Visualization In Excel version prior w Each of the four columns in Figure 3.8 [Restaurant, Qu: Excel 2013, Drag fields Ihaweem areas below: i Rating, Meal Price ($),and Wait Time (min) is considered a field by Excl. Fields may be chosen to represent rows, ee esa columns, of values in the body of the PivotTable Report. The following steps show how to Tilda to add to report, USC Excel's PivotTable Ficld List to assign the Quality Rating field to the rows, the Meal Addiionally, Rows and Price ($) field to the columns, and the Restaurant field to the body of the PivotTable report. Colas ae laced os Row Label and Coton Step 4. In the PivotTable Fields area, 20 to Drag fields between areas below: Labels, respectively Drag the Quality Rating ficld to the ROWS area Drag the Meal Price () field to the COLUMNS arca Drag the Restaurant field to the VALUES area Step 5. Click on Sum of Restaurant in the VALUES area Step 6. Select Value Fleld Settings trom the list of options Step 7. When the Value Field Settings dialog box appea Under Summarize value field by, select Count Click OK Figure 3.10 shows the completed PivotTable Field List and a portion of the PivorTable worksheet as it now appears, To complete the PivolTable, we need to group the columns representing meal prives and place the row labels for quality rating in the proper order Step. Right-click in cell BA or any cell containing a meal price column label Step 9. Select Group from the list of options Step 10. When the Grouping dialog box appears: Enter /( in the Starting at: box Enter 49 in the Ending at: box FIGURE 3.10 COMPLETED PIVOTTABLE FIELD LIST AND A PORTION OF THE PIVOTTABLE REPORT FOR THE RESTAURANT DATA (COLUMNS H:AK ARE HIDDEN) C[P[F [eM AN RORPAOAR *| [piyorrable Fields val ‘Cou of Restaurant Cofamns abaS[ =) 1 Retreat RowLabels [=] 10 101213 141547 48 Grand Total fa getytas sees ty Pl? Fase tee a vese aa [very God 5 T ‘Grand Towa ze i now Evaus ‘uni ag >] [ Cour taaue | ‘Scmmae ena FIGURE 3.11 3.2 Tebles 83 FINAL PIVOTABLE REPORT FOR THE RESTAURANT DATA, 4 1 2 a 4 5 6 a a 9 10 "1 2 a 16 is 16 17 is 19 20) 2 Good Vary Good Excellen PivorTabie Fekis x mectanendiome [IB] name Ere) Fiseraesst fy ses cx Ma Bar) =] jen [Ema an eg >) [omen ‘urowe lara Enter /0 in the By: box Click OK Step 11. Right-click on “Excellent” in cell AS Step 12. Select Move and click Move “Excellent” to End ‘The final PivorTable, shown in Figure 3.11, provides the same information as the eross- tabulation in Table 3.7. The values in Figure 3.11 can be interpreted as the frequencies of the data. For instance, ‘ow 8 provides the frequency distribution for the data over the quantitative variable of meal price, Seventy-eight restaurants have meal prices of $10 to $19. Column F provides the frequency distribution for the data over the categorical variable of quality. One hundred. filly restaurants have a quality rating of Very Good. We can also use a PivotTable to create percent frequency distributions, as showm in the following steps: Step 1, Click the Count of Restaurant in the VALUES area Step 2 Select Value Field Settings... from the list of options Step 3. When the Value Field Settings dialog box appears, click the tab for Show Values As Step 4. Inthe Show values as area, select % of Grand ‘otal from the drop-down menu Click OK Figure 3.12 cisplays the percent frequency distribution for the Restaurant data as a Pivot- Table. The figure indicates that 50 percent of the restaurants are in the Very Good qui ‘category and that 26 percent have meal prices between $10 and $19. PivotTables in Excel are interactive, and they may be used to display statistics other than a simple count of items. As an illustration, we can easily modify the PivotTable in Figure 3.11 to display summary information on wait times instead of meal prices. 84 Chapter 3. Deta Visualization FIGURE 3.12. PERCENT FREQUENCY DISTRIBUTION AS A PIVOTTABLE FOR ‘THE RESTAURANT DATA. cal 1 Zi 3 i 5 6 E 8 9 0 You canal filter data in a PhurTable by dragaing the field tha you want 0 {ler te data on wo te FILTERS arca inthe PisorTable Pets, a (a Loe] 1.384] 067% 28.00% 133%) 21398) 15.33% 30.00% 7k] 467%] 9.33% 22.00% Is] Step L. Click the Count of Restaurant field in the VALUES area Select Remove Field Step 2. Drag the Wait Time (min) to the VALUES arca Step 3. Click on Sum of Wait Time (min) in the VALUES arca Step 4. Sclect Value Field Settings... from the list of options Step 5. When the Value Field Settings dialog box appears: Under Summarize value field by, select Average Click Number Format In the Category: atea, select Number Enter / for Decimal places: Click OK When the Value Field Settings dialog box reappears click OK ‘The completed PivotTable appears in Figure 3.13, This PivotTable replaces the counts of restaurants with values tor the average Waut time for a table at a restaurant tor each group- ing of meal prices ($10-19, $20-29, $3039, $4049). For instance, cell BT indicates that the average wait rime fora table at an Excellent restaurant with a meal price of $10-$19 is 25.5 minutes. Column F displays the total average wait times for tables in each quality rat- ing category. We see that Excellent restaurants have the longest average waits of 35.2 min- ‘utes and that Good restaurants have average wait times of only 2.5 minutes. Finally, cell [D7 shows us that the longest wait times can be expected at Excellent restaurants with meal prices in the $30-$39 range (34 minutes). ‘We can also examine only a portion of the data ina PivotTable using the Filter option in Bacel, To Filter data in a PivotTable, click on the Filter Arrow [> next to Row Labels or Column Labels and then uncheck the values that you want to remove from the PivotTable. For example, we could click om the arrow next to Row Labels and then uncheck the Good value to examine only Very Good and Excellent restaurants, ‘Starman 3.8 Charts FIGURE 3.13 PIVOTTABLE REPORT FOR THE RESTAURANT DATA WITH AVERAGE WaIT TIMES ADDED 85 Goud Very Good Excellent |e se || fenlonla | [oll iB 5 16 1 as| ol uo ino 3 Pct es Sma © The steps fo mesifving ard formating charts have been hanged in Exce! 2013 The steps shan her apply fo Excel 2011 In Excel versions prior to Excel 2013, mest chart formating options ean Be found inthe Lasout tb inthe Chart Tools Ribron. In previous verwons of Excel, tis is Ishere vou wil find options for adding a Chart Tile, Avis Tiles, Data Labels, ‘nd 50 on. mr file} Electronics Hovering the pointer over the char ype batons in Excel 2013 will display the names ofthe butions and short descriptions ofthe types of chart, Charts ‘Charts (or graphs) are visual methods of displaying data, In this section, we introduce some cof the most commonly used charts te display and analyze data including scatter charts, line charts, and bar charts. Excel is the most commonly used software package for creating simple charts. We explain how to use Excel (o ercate scatter charts, line charts, sparklines, bar charts, bubble charts, and heat maps. In the chapter appendix, we demonstrate the use of the Excel Add-in XLMiner to create a seatter chart matrix and a parallel coordinates plot. Scatter Charts A seatter chart (introduced in Chapter 2) is a graphical presentation of the relationship between two quantitative variables. As an illustration, consider the advertising/sales rela- ship for an electronics store in San Francisco. On ten occasions during the past three ‘month, the store used weekend television commercials to promote sales at its stores. The ‘managers want to investigate whether a relationship exists between the number of com- mercials shown and sales at the store the following week, Sample data for the ten weeks, ‘with sales in hundreds of dollars, are shown in Table 3.8. We will use the data from Table 3.8 to create a scatter chart using Excel’s chart roots and the data in the file Electronics: Step 1. Select cells B2C1L Step 2. Click the INSERT tab in the Ribbon Step3. Click the Insert Scatter (X,Y) or Bubble Chart button ~ in the Charts group Step 4. When the list ofscatter chart subtypes appears, click the Seatter button, Step 5. Click the DESI 3N tab under the Chart Tools Ribbon ‘boos enna 5 ‘Steps 9 and 10 are optional, ‘ut they improve the charts readability, We would ware 1 retain the gridines only i they were importa for ‘the reader fo determine ‘more precirely where data Posts are lacated relative to certainvalies athe horizontal andor vertical In Excel 2013, Step 1 should open the Format Trendlne sak pane. In previous versions of Excel, ‘Step wil open the Format Trendline dialog box where dow can glee Liner under Trendegression Type Scatter chars are often ‘referved to as seater ploos or seater diagrams, Chapter 3. Deta Visualization TABLE3.8 SAMPLE DATA FOR THE SAN FRANCISCO ELECTRONICS STORE Number of Commercials Sales ($1008) Week x y 1 2 50 2 5 57 3 1 41 4 3 54 5 4 34 6 1 38 7 5 6 8 3 48 9 4 59 10 2 46 Step 6, Click Add Chart Element in the Chart Layouts group Select Chart Title, and click Above Chart Click on the text box above the chart, and replace the text with Scatter Chart for the San Francisco Electronics Stove Step 7. Click Add Chart Element in the Chart Layouts group Select Axis Title, and click Primary Vertical Click on the text box under the horizontal axis, and replace “Axis Title” with Number of Commercials Step 8. Click Add Chart Flement in the Chart Layouts group Select Axis Title, and click Primary Horizontal Click on the text box next to the vertical axis, and replace “Axis Title” with Sales ($1003) Step 9% Right-click on the one of the horizontal grid Lines in the body of the chaxt, and click Delete Step 10, Right-click on the one of the vertical grid Iines in the body of the chart, and click Delete ‘We can also use Excel to adda trendline to the scatter chart, A trendline is a Tine that provides an approximation of the relationship between the Variables, To add a linear trend- line using Excel, we use the following steps: Step 1. Right-click on one of the data points in the seatter chart, andl select Add Trendline.. Step 2. When the Format Trendline task pane appears, select Linear under TRENDLINE OPTIONS Figure 3.14 shows the scatter chart and linear trendline created with Excel for the data in Table 3.8. The number of commercials (x) is shown on the horizontal axis, and sales ()) are shown on the vertical axis. For week 1, x = 2 and y = 50. A point is plotted on the seatter chart at those coordinates; similar points are plotted for the other nine weeks, Note that during two of the weeks, one commercial was shown, during two of the weeks, two commercials were shown, and s0 on, The completed scatter chart in Figure 3.14 indicates a positive linear relationship (or positive correlation) between the number of commercials andl sales: higher sales are associ ated with a higher number of commercials. The linear relationship is not perfect. because not all of the points are on a straight line. However, the general pattern of the points and the trendline suggest that the overall relationship is positive, From Chapter 2, we know ‘ores Aline chartforimeseries Line el dda is ofen called atime series plat. 3.2 Charts FIGURE 3.14 SCATTER CHART FOR THE SAN FRANCISCO ELECTRONICS STORE Pine 7 eI 1 Wea [of Come Sats Vo art zi at 3 I a H Scatter Chart for the San Francisco [7] s[a 3 Fectronics Store cy ss fi x 0 cy a6 L oh a[—7 5 att Fy 3[—% 3 2 50) oy Ww 4 3 4] cI ao 2 $30 ma 3 a 20) Fy bi a” ry 4 ma i 36 # ot FH fi A i i that this implies that the covariance between sales and commercials is positive and that the correlation coefficient between these two variables is between 0 and +1 e Charts reader 10 interpret changes over time. TABLE3.9 MONTHLY SALES DATA OF AIR COMPRESSORS AT KIRKLAND INDUSTRIES Month Sales ($1005) Jan 150 Feb 4s Mar 185 Ape 195 May: 170 Jun 1s nl 210 kirkland Be eS Sep 10 Ot bo Nov us Dec 20 wirts are similar fo scatter charts, but @ Hine connects the points in the chart, Line charts are very useful for time series data collected over @ period of time (minutes, hours, days, years, ete,). Asan example, Kirkland Industries sells air compressors to manufacturing companies. Table 3.9 contains total sales amounts (in $100s) for air compressors during each month in the most recent calendar year. Figure 3.15 displays a scatter chart and a line chart created in Excel for these sales data, The line chart connects the points ofthe seater chart. The addition of lines between the points suggests continuity, and it is easier for the "Bei args ‘Sogo wane 20 88 Chapter 3. Deta Visualization Inthe line chart FIGURE 3.15 SCATTER CHART AND LINE CHART FOR MONTHLY SALES DATA AT in Figure 3.15, KIRKLAND INDUSTRIES we have kp he markers at each dia point. Tis ‘Scatter Chart for Monthly Sales Data Line Chart for Monthly Sales Data isa rmater of 250 250, personal taste, It rersving the markers tend to ee suggest rh the data are conina- ‘ns shen infact we han only one data pont pe ‘month. 200] 150] 100] Sales ($1005) Seles ($1005) so} "SESE SSSFOSS "PES ESSISASSF ‘Dt era 8 To cteate the fine chart in Figure 3.15 in Excel, we follow these steps weehille Step 1. Select cells A2:B13, lep 2 Click the INSERT tab on the Ribbon ‘Kiana Step 3. Click the Insert Line Chart butten 320 in the Charts group Step 4. When the list of Ime chart subtypes appears, click the Lane with Markers Inversion of vel prior ition (2% unde 50 Lie 10 Excel 2013, you can ‘add tits oat aad aor ‘This creates a line chart for sales with a basic layout and minimum ‘tart abel by liking on formatting ‘ne Lasoutabinthe Chan ‘Stepp. Select the line char that was just created to reveal the CHART TOOLS Tos Rion. Ribbon Eve assumes tha line Click the DESIGN tab under the CHART TOOLS Rion charts will be nsed to graph ‘Step 6. Click Add Chart Element in the Chart Layouts group ee Select Axis Title from the drop-down menu ‘heat ev for eet Click Primary Vertical ding charts that include text Click on the teat box next to the vertical axis, and replace “Anis Title” cere forthe harzotal with Sales ($1003) axis (for exorple, the Step 7. Click Add Chart Element in the Chart Layouts group peak a Select Chart Title fom the drop-dowa mena sales data ia Figures 2.16 (sia bere Chart, and.15) When he Click on the text box above the chart, and replace “Chart Title” with osteoma ais represents Line Chart for Monthly Sales Data Te ‘values (1, 2, Step 8. Right-click on one of the horizontal lines in the chart. and click Delete hen itis exes to go tthe Charts roip Line charts can also be used to graph multiple ines. Suppose we want 10 break out under the INSERT tab it Kirkland’s sales data by tegion (North and South), as shown in Table 3.10. We can create a th Ron ct Re art ye chat in cel tha shows cals In bth eons, asin Figre 3.16 by following similar ZO steps ut selesing ells A2CI in he le KirdandRgional before creating ie ine ch Chart baron“ ~,ard Figure 3.16 shows an interesting pattern, Sules in both the North and South regions seemed then choos ie Seale follow the sane incteasingheeresing pattem until Ostober, Starting in tober. sales is in te North continued to decrease while sales inthe Sou increase, We would probably starters bron SA ——_want to investigate any changes that occurred in the North region around October. M73:) file ianare gion Inthe ine chartin Fie tre 2.16, we have replaced Excel's default legend with text hoes labeling the fines ‘corresponding ro sale in ‘he Norh ara Sour, Tis ‘ean afen mate the chart look clear and raier to Interpret WEBpills KirklandRagional 3.2 Charts 89 TABLE 3.10 REGIONAL SALES DATA BY MONTH FOR AIR COMPRESSORS: AT KIRKLAND INDUSTRIES Sales (4100s) ‘Month North South, Jan 95 40 Feb 100 48 Mar 120 55 ‘Apr ns 65 May 100, 50) Jun 85 50 Jul 135 6 Aug 10 65 Sep. 100 o Oct 50 0 Nov 0 8 Dec 0 80 FIGURE 3.16 LINE CHART OF REGIONAL SALES DATA AT KIRKLAND INDUSTRIES Line Chart of Regional Sales Data, Sales ($1005) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec A special type of fine chart is a sparkline, which is a minimalist type of line chart that can be placed direetly into a cell in Excel. Sparklines contain no axes; they display only the line for the data. Sparklines take up very litte space, and they can be effectively used 10 provide information on overall trends for time series data, Figure 3.17 illustrates the use of sparktines in Excel for the regional sales data. To create a spariline in Excel: Step 1. Click the INSERT tab on the Ribbon ‘Step 2. Click Line in the Sparklines geoup Step 3. When the Create Sparklines dialog box opens, Enter B3:B14 in the Data Range: box, Enter B15 in the Location Range: box Click OK ‘Step 4. Copy cell BIS to cell C1S ‘om inns ‘Daum wang 90 Chapter 3. Deta Visualization FIGURE 3.17 SPARKLINES FOR THE REGIONAL SALES DATA AT KIRKLAND INDUSTRIES, Excel differentiates between ‘bor ad entams charts ‘based on whether the bars ere horizontal or vertical ‘owe meiniain these defn Fons, However inconmon sage, Doth ofthese types of ‘chars may be refered i as Dor charts AccounteManaged Al A B © D F 6 H I 1 Sales ($1000) 2 [Monti | Novth South 3 [lan 95 0 4 [rcp 10 45 Gente Soaks 5 | Mar 120 55 ‘Groose the dite that you want 6 | Apr 115 6 ota tange S218 a 7 | May 100, oO - [un 33 50 occas youaneine pense ove pacea ey ial 138 is Location Renae: [Bis el 10] Aus 110 a sll Sep 100 @ los 0 7 eo) econo) 2 1a} Now 0 75 2 14] Deo 40 50 Fe 43 ol i ‘The sparklines in cells B15 and C15 do not indicate the magnitude of sales in the North and South regions, but they do show the overall trend for these data. Sales in the North ap- ppear to be decreasing in recent time, and sales in the South appear to be increasing overall. Reraiise sparktines am inpnt directly ina the cell in Excel, we ean also type text direerty into the same cell that will dhen be overlaid on the sparkdine, oF we can add shading to the cell, which will appear as the background, In Figure 3.17, we have shaded cells B15 and. C15 to hightight the sparklines. As can be seen, sparklines provide an efficient and simple way to display basic information about a time series. Bar Charts and Column Charts Bar charts and column charts provide a graphical summary of categorical data. Bar charts use horizonial bars to display the magnitude of the quantitative variable. Column chatrts Use vertical bars (0 display the magnitude of the quantitative variable. Bar and column charts are very helpful in making comparisons between categorical variables. Consider the regional supervisor who wants to exatnine the number of accounts being handled by each ‘manager, Figure 3.18 shows a bar chart created in Excel displaying these data, To create this bar chart in Excel: Step 1. Select cells A2:B9 Step 2. Click the INSERT tab on the Ribbon Step 3. Click the Insert Bar Chart button ™ ~ in the Charts group Step 4. When the list of bar chart subtypes appears: Click the Clustered Bar button =P in the 2-0 Bastion Step 5. Select the har chart that was just ereated to reveal the CHART TOOLS gibbon Cliok the DESIGN tab under the CHART TOOLS Ribbon Step 6. Click Add Chart Element in the Chart Layouts group Select Axis Title from the drop-down menu Click Primary Horizontal Click on the text box next tothe vertica with Accowits Managed aKi and replace “Anis Tile” 3.3 Charts a FIGURE 3.18 BAR CHARTS FOR ACCOUNTS MANAGED DATA. a)” Be pe pepe >a) nts ‘Acca 1 | Manager| Managed 2 [Davis 3 3 [Edwards [11 ar Char of Accounts Man 4 [Fanos | 28 Wits 5 [Genuy a7 naa 6 [Jones 15 : + [Loves a ae 8 [ Smith 21 1 ae 9 | Williams 6 2 a in ia HL Devi b ° rr id Accounts Managed iz 15 2 16 5 7 a 3 ‘Step 7. Click Add Chart Element in the Chart Layouts group Select Axis Title from the drop-down menu Click Primary Vertical Click on the text box next to the vertical axis, and replace “Axis Title” with Manager Step 8. Click Add Chart Element in the Chart Layouts group Select Chart Title from the drop-down menu Click Above Chart Click on the text box above the chart, and replace “Chart Title” with ‘Bar Chart of Accounts Managed ‘Step 9. Right-click on one of the vertical lines inthe chart, and click Delete From Figure 3.18 we can sce that Gentry manages the greatest number of accounts and Williams the fewest. We can make this bar chact even easier to read by ordering the results by the number of accounts managed, We can do this with the following steps: ‘Step 1. Select cells ALB ‘Step 2. Right-click any of the cells AL:B9 Choose Sort Click Custom Sort Step 3. When the Sort dialog bos appears: ‘Make sure that the check box for My data has headers is checked Choose Accounts Managed in the Sort by box under Column Choose Smallest to Largest under Order Click OK Inthe completed bar chaet in Excel, shown in Figure 3.19, we can easily compare the relative number of accounts managed for all managers. However, note that itis dif= ficult to intexpect feom the bar chaet exactly how many accounts arc assigned to cach 92 Chapter 3 Dota Visualization FIGURE 3.19 SORTED BAR CHART FOR ACCOUNTS MANAGED DATA, a _* Ee pp pep rLe pe pr. “ecounte 1] Manager| Managed 2[ Willams [6 3) Euan [11 ar Cha of Ascent Manned hl 46 ones 15 Sau l = [Sai —[ ey 1 6 | Davis 2 an q 7[ unos [25 —] mens H eS Se sain H 3 Gong a q tol Fidwards, H i Williams: H 2 ik a . o » » rH Accom Managed rl ‘ 15) ‘| E Is f 7 E 3 HGURE 3.20. BAR CHART WITH DATA LABELS FOR ACCOUNTS MANAGED DATA 4 A B c D Eg Fr Cc H 1 4 ‘ecounts 1| Manager| Managed 2 [Wiliams [6 3 [Edwards [in Rar Ganoraccomenaunt ‘| 4 [lores 1s ={saix—[—3t sed 1 2 ms H on i 8 [Lopes a reed [| 9 [Gey ofa Se ih . saa I a Witlems I 1 His 1 Accom Mae 8 iB : 1 A 7 r 3 ‘manager. If this information is necessary, these data are better presented as a table oF by adding data labels to the bar chart, asin Figure 3.20, which is created in Excel using the following steps: Step L. Select the bar chat just created to reveal the CHART TOOLS Ribbon Step 2. Click DESIGN tab in the CHART TOOLS Ribbon Alternatively. you can add Data Labels by right clicking on a bar inthe ‘hart and selecting Add Data Label, 3.3 Charts 93 ‘Step 3. Click Add Chart Element in the Chart Layouts group Select Data Labels Click Outside End ‘This.adds labels of the number of accounts managed to the end of each bar so that the reader can easily look up exact values displayed in the bar chart. A Note on Pie Charts and 3-D Charts Pie charts are another common form of chart used to compare categorical data, However, ‘many experts argue that pie charts are inferior to bar charts for comparing data, The pie Chart in Figure 3.21 displays the data for the number of accounts managed in Figure 5.18. ‘Visually, i is stil relatively easy to see that Gentry has the greatest number of accounts and that Williams has the fewest. However, itis difficult to say whether Lopez. or Francois has ‘more accounts. Research has shown that people find it very difficult. perceive differences in area, Compare Figure 3.21 to Figure 3.19. Making visual comparisons is much easier in the bar chart than in the pie chart (particularly when using a limited number of colors for dlifferentation). Therefore, we recommend against using pie charts in most situations and. suggest bar charts for comparing categorical data instead. Because of the difficulty in visually comparing arca, many experts also recommend against the use of three-cimensional (3D) charts in most scttings, Excel makes it very easy to create 3-D bar, line, pie, and other types of charts, In most cases, however, the 3-D. effect simply adls unnecessary detail that does not help explain the data. As an alternative, consider the use of multiple lines on a line chart (instead of adding a z-axis), employing, ‘multiple charts, or creating bubble charts where the size of the bubble can represent the ne Never nse a 4D chart when a twoeiimensional chart will suffice Bubble Charts A bubble chart is @ graphical means of visualizing three variables in a two-dimensional staph and is therefore sometimes a preferted alternative to a 3-D graph. Suppose that we ‘want t9 compare the number of billionaires in various countries. Table 3.11 provides a sample of six countries, showing, for each country, the number of billionaires per 10 mil- lion residents, the per capita income, and the total number of billionaires. We can create a bubble chart using Excel (o further examine these data: ‘Step 1. Select cells B2:D7 ‘Step 2. Click the INSERT tab on the Ribbon FIGURE 3.21 PIE CHART OF ACCOUNTS MANAGED m Davis m Edwards 1B Francois 1 Gentry 1B Jones B Lopez 1B Smith 1 Williams ‘Stree 4 CChapler 3 Date Visualization TABLES.11 SAMPLE DATA ON BILLIONAIRES PER COUNTRY Billionaires per Per Capita Number of Country 10M Residents Income Billionaires United States 132 48,300 412 WeEBEe) cuss 09 3.300 us Russia 1 15,800 101 atonaies Mexico o4 14,900 40 Hong Kong 510 49.300 36 United Kingdom 33 35.000 33 In Excel 2013, Step 9 opens «task pane for Fonnat Data Labels te right rad side ofthe Excel, window Step 3. In the Charts group, click Insert Scatter (X,Y) or Bubble Chart butto Inthe Bubble subgroup, click the Bubble button 88. Step 4. Sclec the chat tha was just create to reveal the CHART TOOLS. bbon Click the DESIGN tab under the CHART TOOLS Ribbon Step §. Click Add Chart Blement inthe Chart Layouts group ‘Choon Aa Tite fron to dop-dowen end Click Primary Horizontal (lick on the txt box under tint ania, and eplace bee" wn Bllionives por 10 Mion Resides ley & Glick ald Chart theme ath Chart Layout ge (Choose Aas ie fom tbe dip dow mea Clk Primary Vera Click onthe textbox next oth vertical axis, and replace “Axis Tite ll Pe Capita acon Step 7. Click Add Chart Hement nthe Chart Layouts group (Choos Chart ite inte dopowen me Ct Above Chart Click onthe text box above the chart and replace “Chast Tite” with ‘Bilionates by Gouniry Step 8 Click Add Chart Element in the Chart Layouts group. {Choos Gillan om te denp-doan tom Deieles iiaey Male Hoceaetal and Prinaey Maar Vertcal to emove the piles fom the bubble chart Figure 3.22 shows the bubble chart that has been created by Excel. However, we can make this chart much more informative by taking a few additional steps to add the names of the countries: Step 9. Click Add Chart Element in the Chart Layouts group Choose Data Labels from the drop-down menu Click More Data Label Options... select Value from Cells, and click the Step 10. Click the Label Options icon| Under LABEL OPTION’ Select Range button Step 11. When the Data Label Range dialog box opens, select cells A2:A7 in the Worksheet. This will enter the value “—SheetName!$AS2:8AS7" into oa logan 3.3 Charts 9% FIGURE 3.22 BUBBLE CHART COMPARING BILLIONAIRES BY COUNTRY nus (iencise eri eis a ie “6-38 ve Bee era +22 sree Ot. it a w Ww 20 30 a 3 w Tillonaires por 10M | Per Capita) Nomber of 1 Country Residents Income 2 | Catal Sas R00 a2 3 [china $300 TIS 4[ Russia 16800 rm 3 | Mexico ir T4600 Fa 6 | Hong Kong SI 49300 ES 7| United Kingdom 33 36000 3 x iliomsires by Cou T 4 cao H I sooo q| z e q 3 nw i iN sx00l @ 4 4 I E ono q] u I q Bilionaires per 10 millon Residents ‘Sheps 9-12 ud she name the Seleet Data Label Range box where “SheetName" is the name of ofeach counry tothe right the active Worksheet af te appropriate babble in te bubble chart. The Click OK occ tig tet Step 12. In the Format Data Labels task pane, deselect ¥ Value in the LABEL labels to bubble chart is _greatly improved in Excel 22013. In prior versions of Execk the asks quite Fime-consaming, s0 we do not describe the reco sieps here OPTIONS area, and select Right under Label Position ‘The completed bubble chart in Figure 3.23 enables us to easily associate each country with the corresponding bubble. The figure indicates that Hong: Kong has the most billion- sites per 10 million residents bat that the United States has many more billionaires overall (Hong Kong has a much smaller population than the United States), From the relative bubble sizes, we see that China and Russia also have many billionaires but that there are relatively few billionaires per 10 million residentsin these countries and that these countries ‘overall have low per capita incomes, The United Kingdom, United States, and Hong Kong, all have much higher per capita incomes. Bubble charts can be very effective for comparing, categorical variables on two different quantitative values Heat Maps A heat map is a two-dimensional graphical representation of data that uses different shades of color to indicate magnitude. Figure 3.24 shows a heat map indicating the magnitude of changes fora metric called same-store sales, which ate commonly used in the retail industry. ‘Scaeem eer 205 Chapter 3. Deta Visualization FIGURE 3.23 BUBBLE CHART COMPARING BILLIONAIRES BY COUNTRY WITH DATA LABELS ADDED aq * 5 e D i Tillonaires per TOMI] Per Capita] Nomberaf 1| counny Resients Tncamne | Bilionaies 2 [United Sts 132 16300 a2 [Chim a9 $300 Tis 4 Rossi TI Test 701 5 [Mexico ot 0 70 6 | Hong Kong 31 530 x6 7 | Unies King sz 56000 a 4 ° A iaisty oom 1 i] wom I Fa q Ky 00 I ul] Unie tes Hon an Omni i , el 3 sau | @ vatknenn q is) q i) 7 _ q pe EL Que ail) °° cone q ba I Ba) $$ +4443, 3-0 a Bilionsies per 10 nulion Resides bs Fi ‘Samestorosates Both the heat map andthe sparkles described here ‘can alo be created using the Quick Analysis buton Bilin Zee 2013.70 splay ths bation velest cells B2"M17. The Quick ‘Aaassis button will peor ‘atthe bottom sight ofthe selected cells. Click the ba toni splay options for ‘heat mops, sparklnes, and other data anaysis ions to measure trends in sales, The cells shaded grey in Figure 3.24 (which are shaded red in the full color version) indicate declining same-store sales for the month, and cells shaded blue indicate increasing same-store sales for the month, Column N in Figure 3.24 also contains sparklines for the same-store sales data, Figure 3.24 can be created in Excel by following these steps: Step L. Select cells B2:MI7 Step 2. Click the TOME tab om the Ribbon Step 3. Click Conditional Formatting in the Styles group Choose Color Seales and click on BBlue-White-Red Color Seale To add the sparklines in Column N, we use the following steps: Step 4. Select cell N2 Step 5. Click the INSERT tab on the Ribbon Step 6. Click Line in the Sparklines group Step 7. When the Create Sparklines dialog box opens: Enter 62:42 in the Data Range: box Enver N2 in the Location Range: box and click OK Step 8. Copy cell N2 to N3:N17 ‘The heat map in Figure 3.24 helps the reader to easily identify trends and patterns, We ccan see that Austin has had positive increases throughout the year, while Pittsburgh has (erwin 33. Charts 7 Tepried —AGURE3.24 HEAT MAP AND SPARKLINES FOR SAME-STORE SALES DATA color of Fige wears oe ee ee ea oe Forapaesor TaN [Fa [aR AeA LY ur [ave [Sir ocr [sow [DEC SPARNTNES wien Ta [oes [ote | vs | —— Figure, please i oe | 2% | 20 once te ae Stns ae mono] 2 had consistently negative same-store sales results, Same-store sales at Cincinnati started the year negative but then became increasingly positive after May. In addition, we can differentiate between strong postive inereases in Austin and fess substantial positive i creases in Chicago by means of color shadings. A sales manager could use the heat map Figure 3.24 to identify stores that may require intervention and stores that may be used as, models. Heat maps can be uscd effectively to convey data over different areas, across time, for both as seen here. Because heat maps depend strongly on the use of color to convey information, one must be careful to make sure that the colors can be easily differentiated and that they do not become overwhelming. To avoid problems with interpreting differences in color, ‘we can add the sparklines in Column N of Figure 3.24, The sparklines clearly show the ‘overall trend (increasing or decreasing) for each location, However. we cannot gauge dif- ferences in the magnitudes of increases and decreases among locations using sparktines. ‘The combination of a heat map and sparklines here is particularly effective way to show bot trend and magnitude. Additional Charts for Mul Figure 3.25 provides an altermative display for the regional sales data of air compressors WEBBED ey for Kirktand Industries. The figure uses a stacked column chart to display the North and the South regional sales data previowsly shown in a fine chart in Figure 3.16. We could rrkiandRegioral also use a stacked bar chart to display the same data by using horizontal bars instead of vertical. To create the stacked column chart shown in Figure 3.25, we use the following steps le Variables Nuetarheewehoen Step 1, Select cells ARCA ste for formating the Step 2. Click the INSERT tab on the Ribbon chart in Bel, bu thoweps Step 3. Inthe Charts group, click the Insert Column Chart buttoa | ~ are initartn ose wed I ‘gree hel previcas hatte Click the Stacked Column button || | under the 2-D Column orca eang5 98 Chapter 3 Data Visualization FIGURE 3.25 STACKED COLUMN CHART FOR REGIONAL SALES DATA FOR KIRKLAND INDUSTRIES 7S Sa I Sian] Nati [Sa al Feb 1007 4s = fies zg ae i ia 2 —— 3 10 aoe 3 sar 0 a © CSan ib Nar Apr May J Ang Sep Ox Bor ac : 3 Clustered column (bar) ‘charts are also referred toa side-by-side column (bar) charts, Stacked column and bar chasts allow the reader o compare the relative values of quan- titative variables for the same category in a bar chart, However, stacked column and bar charts suffer from the same difficulties as pie charts because the human eye has difficulty perceiving small differences in areas. As a result, experts often recommend against the use of stacked column bar charts for more than a couple of quantitative variables in each cat- ccaory. Am alternative chart for these same data is called a clustered column (or bar) chart. It is created in Excel following the same steps but selecting Clustered Column under the 2-D Column in Step 4. Clustered column and bar charts are often superior to stacked col- tumn and bar charts for comparing quantitative variables, but they can become cluttered for ‘more than a few quantitative variables per category. ‘An alternative that is often preferred to both stacked and clustered charts, particularly when many quantitative variables need to be displayed, is to use multiple charts. For the regional sales data, we would include two column charts: one for sales in the North and. one forsales in the South. For additional regions, we would simply add additional column ‘charts, To facilitate comparisons between the data displayed in each chart, itis important {o maintain consistent axes from one chart to another. The categorical variables should be listed in the samme order in each chart, and the axis for the quantitative variable should have the same range, For instance, the vertical axis for both North and South sales starts at Oand cends at 140. This makes it casy to see that, in most months, the North region has greater sales. Kigure 3.26 compares the stacked, clustered, and multiple bar chart approaches tor the regional sales data. Figure 3.26 shows that the multiple column chats require considerably more space than the stacked and clustered column charts, However, when comparing many quantitative variables, using multiple charts can often be superior even if each chart must be made smaller. Stacked column and! bar charts should be used only when comparing a few quanti- tative variables and when there are large differences in the relative values of the quantitative variables within the category. An especially useful chart for displaying multiple variables is the seatter chan matrix, Table 3.12 contains partial listing of the data for each of New York City’s subboroughs (a designation of a community within New York City) on monthly median rent, percentage of college graduates. poverty rate, and mean travel time to work. Suppose we want {o examine the relationship between these different categorical variables. FIGURE 3.26 3.3 Charis 99, COMPARING STACKED, CLUSTERED AND MULTIPLE COLUMN CHARTS FOR THE. REGIONAL SALES DATA FOR KIRKLAND INDUSTRIES Snel Caen Ea ate Con Ca Py Ad * msn, va ea i i a 2 «| a a, a ql al ae aca yee a ee eae aire ae aoe Nae Colma Chars Pal : nee a Ro rs fe 5 3 fo Fal 40 2 wl € : ol i ‘en Fb Mur i gp nr Dee Te Marcy anger 4 Figure 327 agplaysaseatier chart mati (ater plot mati) for dala elated Wo enlals in New York City ‘A scatter chart mati allows the reader to casiy see the elationships among multiple arlabon: Each scatie chart the mat crete ine same tne afor erating @ Single sealer chat Each column and row inthe seller chant mata eonesponds lo one TABLE 9.12 DATA FOR NEW YORK CITY SUBBOROUGHS Velen eee ee ee Area Rent ($) Graduates (%) Rate (%) ‘Time (min) Ase 1106 368 159384 Bay Rides 1082 43 69 Bayside Neck bs a3 1s a0 Bedi Syveuat 2 ao m2 bs es 6 7 Mao WEBBEI) icoush Pak 380 260 26 353 Brook Heights 1086 33 14 Ms WretyData Fort Greene BrownvilyOccan Hil na is 0403 Bostick ous bs 33 385 Cent! Haren oes 06 m1 280 ChelewClatonMilows 1624 fe by 437 Cag ae 786 22 m0 ae 100 The setter charts along she diagonal in ‘seater char ‘mats for in stance, ino 1, ‘olunin 1 andin ‘ow 2, columa 2) display the rea orship berween ‘avoriabie and dsl. Therefor the poiatsin these seater chan wi alas al along straight tine ata-dS-depree ‘angle, as shown in igure 227, Row 1 Row 2 Chopter 9. Desa Visualization FIGURE 9.27 SCATTER CHART MATRIX FOR NEW YORK CITY RENTDATA Column 1 Colnmn 2 Column 3 Conmn 4 MedianRent | CollegeCiraduates PovertyRate ComeniteTime MedisaRent CollepeGraduates ‘ta evens categorical variable, For instance, row | and column 1 in Figure 3.27 correspond to the ‘median monthly rent variable, Row 2 and colunm 2 correspond to the percentage of col- lege graduates variable, Therefore, the scatter chart showa in the row L, column 2 shows. the relationship between median monthly rent (on the y-axis) and the percentage of college graduates (on the x-axis) in New York City subboroughs, The scatter chart shown in row 2, column 3 shows the relationship between the percentage of college graduates (on the y-axis) and poverty rate (on the x-axis). Figure 3.27 allows us to infer several interesting findings. Because the points in the scatfer chart in row 1, column 2 generally get higher moving from left to right, this tells us that subboroughs with higher percentages of college graduates appear to have higher median monthly reals. The scatter chart in cow 1, column 3 indicates that subboroughs. with higher poverty rates appear to have lower median monthly rents. The data in row 2, column 3 show that subboroughs with higher poverty rates tend to have lower percentages of college graduates, The scatter charts in column 4 show that the relationships betvicen wesley Restaurant In Exce! versions prior t0 Excel 2013, PovoiCharts canbe ercated by clicking the Insert tab on the Rib don, clcking on the arrow tinder Pivot Table from the ‘ables group, and then selecting PvotChar. 33 Charts 101 the mean travel time and the other categorical variables ate not as clear as relationships in other columns. ‘The scatter chart mateix is very useful in analyzing relationships among variables. Un- fortunately, it is not possible to generate a scatter chart matrix using native Excel fune- tionality, In the appendix at the end of this chapter, we demonstrate how to create a scatter chart matrix similar to Figure 3.27 using the Excel Add-In XL Miner. Statistical software packages such as R, NCSS, and SAS can also be used to create these matrixes. PivotCharts in Excel ‘To summarize and analyze data with both a crowstabulation and charting, Excel pairs PivotCharts with PivorTables. Using the restaurant data introduced in Table 3.7 and Figure 3.7, we can ercate a PivotChart by taking the following steps: Step L. Click the INSERT tab on the Ribbon Step 2. Inthe Charts group, choose PivotChart Step 3. When the Create PivotChart dialog box appears: Choose Select a Table or Range Enter A/:D301 in the Table/Range: box Choose New Worksheet as the location for the PivorTuble Report Click OK Step 4, In the PivotChart Fields area, under Choose fields to add to report: Drag the Quality Rating field (o the AXIS (CATEGORIES) area Drag the Meal Price (§) field to the LEGEND (SERIES) area Drag the Wait Time (min) field to the VALUES area Step. Click on Sum of Wait Time (min) in the Values area Step 6. Click Value Field Settings... from the list of options that appear Step 7. When the Value Field Settings dialog box appears Under Summarize value field by, choose Average Click Number Format In the Category: box, choose Number Enter / for Decimal places: Click OK ‘When the Value Field Settings dialog box reappears, click OK Step 8 Right-click in cell B2 or any ccll containing a meal price column label Step 9 Select Group from the list of options that appears Step 10, When the Grouping dialog box appears: Enter 10 in the Starting at: box Enter 49 in the Ending at: box Enter 10 in the Bys box Click OK Step 11, Right-click on “Excellent” in cell AS Step 12, Select Move anid click Move “Excellent” to End The completed PivorTable and PivotChart appear in Figure 3.28, The PivotChart is a clustered column chart whose column heights correspond to the average wait times and are clustered into the categorical groupings of Good, Very Good, and Excellent. The columns are shaded to differentiate the wait times at restaurants in the various meal price ranges, Figure 3.28 shows that Excellent restaurants have longer wait times than Good and Very Good restaurants. We also sce that Excellent restaurants in the price range of $30-$39 have the longest wait times. The PivotChart displays the same information as that of the PivotTable in Figure 3.13, but the column chart used here makes it easier to compare the restaurants based on quality rating and meal price 102 Like Pivot Tables, PivorCharts are inscracive, You can we thear- ros on the aes ‘and legend labels to change the categorical data Doing displayed For example. you ‘an click on the Quai Rating Prorizontal axis label and choore 1 look ar only Very Good and Excellen sian Chopter 9. Desa Visualization FIGURE 9.28 PIVOTTABLE AND PIVOTCHART FOR THE RESTAURANT DATA, a te a rege ot Wait Colunins fens ing Be «RecN res 2 | Row Labs [=] ay [=] 20.29 en 3 [Good 26] 28 aoe |4 | Very Good a6] a6| 12.0] tao i23}) procs 5 [Biceten 255 aif 0] 323] 321 6 [Grand Torah 6] | 9a 2: 7 vrata % ye izcanwe 9 | Average of Wait Time (mia) lio lit 2 Mea Pie 13 i019 lit 20-29 his 3030 he 20-49 hz is Good Very Good Exeeller LI 19 | Quatiy Rating = [20 EI Dri kee) 1, A new feature in Excel 2013 is the inclusion ff Chart Buttons to quickly modify and format charts. Three new buttons appear next to a chat ‘whenever you click on it to make it active in Ex- cel 2013, The [#] button is the Chart Elements button, Clicking it brings up a list of check boxes lo quickly add and remove axes, axis titles, a chart title, data labels, trendlines, and more. The [button is the Chart Styles button, which al- lows you to quickly choose from many prefor= ‘matted chart styles to change the look of your chart. The [button is the Chart Filter button, ‘and it allows you to select the data to include in ‘your chat. The Char Filter button is very useful Tor performing additional dais analysis. 2. Coloris frequently used to differentiate elements ina chart. However, be wary of the use of color to differentiate for several reasons: (1) Many people ‘are colorblind and may not be able to differentiate colors. (2) Many charts are printed in black and ‘white as handouts, which reduces or eliminates the impact of color. (3) The use of too many col- ‘ors in chart can make the churt appear (00 busy and distract ot even coafuse the readec. In many ‘eases, IL is preferable to differentiate chart elo- ‘ments with dashed lines, patterns, or labels, Histograms and box plots (discussed in Chap- (er 2 in relation to analyzing distributions) are ‘other effective data visvalization tools for sum- smarizing the dis Ga) Advanced Data Visualization In this chapter, we have presented only some of the most basic ideas for using data visu- alization effectively both to analyze data and to communicate data analysis to others. The charts discussed so far are the most commonly used ones and will suffice for most data vie sualization needs. However, many additional concepts, charts, and tools can be used to im= prove your data visualization techniques, In this scction we briefly mention some of them, ‘Sora

You might also like