Violin Plot Journal
Violin Plot Journal
To cite this article: Jerry L. Hintze & Ray D. Nelson (1998) Violin Plots: A Box Plot-Density Trace
Synergism, The American Statistician, 52:2, 181-184, DOI: 10.1080/00031305.1998.10480559
To link to this article: https://doi.org/10.1080/00031305.1998.10480559
© 1998 American Statistical Association The American Statistician, May 1998 Vol. 52, No.2 181
30 able density trace requires experience and judgment in de-
15
•• termining the appropriate amount of smoothing. As with
the selection of the bin width in the histogram, the inter-
val width h, which is usually specified as a percentage of
o the data range, must be selected. Experience suggests that
values near 15% of the data range often give good results.
-15
The choice of h, however, must be tempered by the size of
the sample. The density trace is subject to the same sample
size restrictions and challenges that apply to any density
-30 +-----r----'-'T---.......,...-----.., estimator. For small data sets, too small a value for h gives
Bimodal Uniform Normal
a wiggly density trace that suggest features that are simply
(a) artifacts of the individual data points. The oversmoothed
density estimate that results from too large h values gives
the illusion of knowing the shape of the distribution, while
30 in reality the data set is too small for any conclusions. As
a rule of thumb based on practice, the density trace tends
15 to do a reasonable job with samples of at least 30 observa-
tions. Even with sample sizes of several hundred, however,
o choosing too large a value for h causes the density trace to
oversmooth the data. In general, values of h greater than
-15 40% of the range usually result in oversmoothed densities,
while values less than 10 percent of the range result in un-
-30 +----....----.,----"""T'"---..., dersmoothed densities. Hence, percentages between 10 and
40 percent are recommended.
Bimodal Uniform Normal
(b) 4. ILLUSTRATIONS AND APPLICATIONS
Figure 2. Comparison of Box Plots and Violin Plots to Known Distri- With the addition of the density trace to the box plot,
butions. (a) Box plots; (b) violin plots. violin plots provide a better indication of the shape of the
distribution. This includes showing the existence of clusters
where n is the sample size, h is the interval width, in data. The density trace highlights the peaks, valleys, and
and Oi is one when the ith data value is in the interval bumps in the distribution. Three applications and examples
[x - h/2, x + h/2] and zero otherwise. In order to plot the of violin plots illustrate these advantages. The first example
density trace, first select a value for h and then compute demonstrates the ability of violin plots to distinguish among
d(xlh) on a dense grid of equally spaced x values. Connect the shapes of known distributions. The second highlights
the d(xlh) by lines. The shape of the d(xlh) curve is essen-
tially driven by the interval length, h. It is very smooth for 140 140
large values of h, and "wiggly" for small values.
Unfortunately, several density traces shown side by side
are difficult to compare. Contrasting the distributions of sev-
eral batches of data, however, is a common task. In order to
add information to the box plot and still make comparisons
possible, Benjarnini (1988) suggested "opening the box" of 204.--====:.... ------+ 20
the box plot. He makes the width of the box proportional
to the estimated density. The violin plot builds on the Ben- (a)
jarnini proposal by combining the advantages of box plots 5.0
with density traces.
The violin plot, as shown in Figure 1, combines the box 4.1
I
plot with density traces. The density trace is plotted sym- ?
metrically to the left and the right of the (vertical) box plot. ,5j
c 2 3.3
There is no difference in these density traces other than the ·gil
!~
direction in which they extend. Adding two density traces
2.4
gives a symmetric plot which makes it easier to see the
magnitude of the density. This hybrid of the density trace
I.S
I
and the box plot allows quick and insightful comparison of
several distributions. (b)
3. SPECIFICATION OF INTERVAL WIDTH Figure 3. Additional Information in Violin Plots. Two examples from
the density estimation literature: (a) annual snowfall for Buffalo, NY,
As with other density estimators, achieving an accept- 1910-1972; (b) Old Faithful eruption length.