MERLOCK Frank Sp20ThesisFinal
MERLOCK Frank Sp20ThesisFinal
by Frank Merlock
Spring 2020
The Valuation of Songwriting Techniques:
An Analysis of How Song Elements Affect Song Value
by
Frank Merlock
APPROVED:
First, I would like to thank my family for their constant love and support. I would
like to thank Dr. Anderson for all of her fantastic guidance throughout this project. Dr.
Anderson was vital in helping me overcome several setbacks, as well as learning the
process of putting together an academic paper. I would also like to thank Dr. de Clercq
for pointing me in the right direction on a topic with which he is rather familiar. I would
like to thank my roommate Oscar Fernandez for listening to hundreds of songs with me
because he didn’t have a choice. I would also like to thank Cedric Gilmer, Zack Medic,
Maxwell Jamerson, and Stephen Borders for their friendship despite many hours working
on this project. I would like to thank Kyle Mackulak for telling me to cut the skits from
the introductions of my own songs. He was right all along. I would like to thank J.K.
Simmons for teaching me about keeping rhythm. I would also like to thank Austin
Rochez, Laura Les, Dylan Brady, David Duchovny and my cats for providing me endless
inspiration.
iii
Table of Contents
Abstract x
Background 1
Literature Review 9
Thesis Statement 12
Hypotheses 13
Methodology 16
Results 22
Discussion 33
Conclusion 36
Works Cited 37
Appendices 41
iv
Abbreviations & Definitions
1. Audio Streams: Audio streams are through online interactive services (which allow
for customers to choose specific songs) like Spotify and Apple Music, as well as
will count on a 1:1 basis on all streams charts. There will not be any weighting for
Concerning video streams, BuzzAngle states that, "Music Video rankings reflect the
Music-intensive films will be eligible in cases where digital and/or physical track
such titles as music videos, but some that are released theatrically will not be
2. Song Sales: Song sales are actual sales through online services such as iTunes or
Google Music. According to BuzzAngle, “Songs that are priced under $0.49 will not
be counted within the first three months after the song's release” (“Methodology”
2020).
1. NOST - Number of Song Titles: How many times the name of the song appears
v
2. LTFCA - Length to First Chorus Average: The average of recorded lengths of
6. P – Peak: The highest chart number the song received from release to 8/30/2019,
8. MSFSA – Multiple Songs from the Same Album: If the song has multiple songs
from the same album on the charts simultaneously, it receives a 1. If the song
doesn’t have other songs from the same album charting, it receives a 0.
9. STO – Sales Total, On: Total sales while the song was on the chart.
10. SWAO – Sales Weekly Average, On: Weekly average of sales while the song was
on the chart.
11. RTO - Radio Total, On: Total radio spins while song was on the chart.
12. STTO – Streams Total, On: Total streams while the song was on the chart.
13. STOB - Sales Total, Off Before: Total sales before the song charted.
14. RTOB – Radio Total, On Before: Total radio spins before the song was on the
chart.
vi
15. STTOB – Streams Total, Off Before: Total streams before the song appeared on
the chart.
16. STOA – Sales Total, Off After: Total streams of a song while not on the chart
after charting.
17. RTOA – Radio Total, Off After: Total radio spins of a song while not on the chart
after charting.
18. STTOA – Stream Total, Off After: Total streams of a song while not on the chart
after charting.
vii
Tables & Figures
Tables
Appendix Tables
viii
16. Table C.9: Regression Analysis: Pop/Contemporary – Sales 50
Figures
ix
Abstract
Although the music industry continues to capitalize on the power of big data and
analytics, the job of predicting a song's future value is left to Artists and Repertoire
(A&R) representatives who must trust their experiences and use their gut instinct. There
remains an opportunity for analytics to unearth the science behind what gives popular
music value. This paper analyzes four quantitative structural elements of a song to
determine how they impact a song's value. Using a systematic method of listening and
data mining, each element was measured and tested for a relationship with the song's
sales, radio spins, and streams. These are songs that made appearances on various
Billboard charts between 2015 and 2018. The difficulty of data cleansing, data
including the repeated lyrics and length of the intro, did show some relationship with
song value, and the extent to which this is true is also emphasized. While the model does
not explain all elements that impact value, this paper could serve to start the discussion
on using big data and analytics to guide music labels on predicting a song's value.
x
Background
Many aspiring musical artists want to write or perform a song that will catapult
their careers to financial stability. However, a clear path to financial sustainability and
success is unclear since many aspects of music’s value are intangible, such as creating
societal connectivity and a community of belief (Reimer et al. 2002, 3). Perhaps resulting
from this understanding of intangible value, the music industry uses an above-average
amount of intuition on the corporate side of the business (Schrieber and Rieple 2018).
That said, some companies eschew the use of intuition in the decision-making process
and instead leverage big data and analytic research. The industry for business analytics
has grown by $20 billion between 2013 and 2016 (Grover 2018, 390). A report in 2015
showed that companies across all industries that implemented big data and analytics were
showing a 15%-20% increase in annual ROI (“Marketing & Sales Big Data…” 2015, 25).
This is evidence of big data’s effectiveness in decision making, as well as its growing
dollars of revenue in music sales alone between 2015 and 2017, one can expect the same
usage of big data and analytics within the music industry (Friedlander 2018, 1). Several
business performance. For example, in 2016, Warner Music Group organized a new
business structure and data analytics department with the hiring of an official Chief
Information Officer and Chief Data Officer (Schneider 2016). Their press release
emphasized the firm’s mission of implementing big data to improve performance in artist
1
payouts and finances (Schneider 2016). Alternatively, in November 2019, Sony Music
introduced a new listening system called 360 Reality Audio, which is a reinvention of
stereo imaging (Introducing Sony’s 360… 2019). The company plans to rerelease older
However, major music companies and distributors have not publicized any
attempts to use big data to analyze musical structure and how it is connected to a song’s
value. The purpose of this study is to analyze trends in a specific set of structural
elements in American popular music to determine how they contribute to the song’s
commercial value (i.e., its “popularity”), how these relationships evolve through time,
and how they have changed during the rising popularity of music streaming. Focusing on
the years between 2015 and 2018, this study will use sales data from BuzzAngle to assess
the value of elements for the industry’s most popular genres (11-12). Furthermore, by
focusing on elements of songs that are quantitative, one can determine whether the use of
big data and trend analysis can be used effectively in the music industry to determine the
possible value of specific song structures and therefore lead to making impactful business
decisions. While subjectivity plagues the usefulness of many musical studies in the
business sense, this study aims to determine whether quantitative elements can be of use
Music has existed for centuries; however, it was not until the invention of the
phonograph and reproducible musical recordings that companies could profit from music
beyond the revenue gained from live performances. While there will always be niche
markets for styles and genres of music, it is evident that some types of music, through
sales and other mediums, are more profitable than others. The terms used to describe
2
different types of music can be easily confused, though. “Popular music,” as described by
Philip Tagg, is recorded music that is funded by the free enterprise for mass consumption
(1982, 42). This is not to be confused with “pop music,” which is a specific genre derived
from the rock-n-roll revolution beginning in the late 1950s (Middleton et al. 2019). While
other genres of music may be produced and released for many reasons, popular music’s
worth is found through maximizing value for those who release it.
As popular music entered into the recording era, music in the mainstream grew to
fit an incredibly specific mold of structural elements. The first elements are “sectional
elements,” which are elements or themes that are repeated in the song. This is in contrast
classical music (2004, 7). Sectional elements in popular music include verses, choruses,
and bridges. “Verses” are the details and story of the song, and the lyrics typically vary
with each iteration. The “chorus” or “refrain” is normally a repeated, catchy segment of
the song. The “bridge” is a transitional element within the song and can tie the
storytelling of the verse with the theme of the chorus (Davidson 1997).
Songwriter and Canadian Country Music Hall of Fame member Ralph Murphy
noted that there are only a handful of combinations that make up the larger portion of
popular music. While some song structures can be defined through the lyrical content
alone, several songs and genres use a linked combination of the above-stated elements.
popular combination of structural elements in country and rock music. Ralph Murphy
3
Chorus is seen statistically as the most profitable structure (“Ralph Murphy Lecture,”
2011). These elements of song structure are difficult to determine, as the subjectivity
inherent to music means these elements can have equal standing even if people perceive
them differently. For example, in a study done by Soundfly on the chord progressions of
the most popular songs of 2017, the author admits that many songs can be interpreted
differently. In the song “That’s What I Like” by Bruno Mars, researcher Dean Olivet
states that depending on which chords were focused on, the song’s tonal center could be
From a business perspective, the technology involved with how music reaches the
consumer has evolved systemically in the last 120 years. As would be expected, these
differences have created changes in the financial landscape of the music industry. At one
point, music could only be accessed by attending live performances. This changed with
the invention of the phonograph by Thomas Edison in 1877 and the improvement of flat
disc technology with the gramophone by Emile Berliner. By 1902, these flat discs could
be mass-produced, and by 1910, they were the best-selling medium of commercial music
(Starr and Waterman 2018, 65). This trend continued with the introduction of 8-track
music industry. For example, when cassettes surged in popularity during the 1980s,
people could record “mixtapes,” or a compilation of specific songs (Starr and Waterman
2018, 466). This meant that some people could not only bootleg versions of albums, they
could also create cassettes with only the songs they wanted. Another example of this
market disruption comes with the introduction of the CD. When record companies would
4
produce vinyl singles, they would include only one popular song by an artist. However,
the majority of CDs would include an entire album. Music consumers were being
prompted to purchase entire albums instead of having the freedom to purchase single
songs. This would change again when streaming services such as Spotify or Apple Music
gave the power of choice back to consumers (Starr and Waterman 2018, 562–563).
Regardless of the period, every technological innovation within the music world has
in recorded music sales. CDs, which at their peak generated $13.2 billion in revenue in
2000, have been waning in popularity (RIAA 2016). Recently, vinyl has seen a
Strategies, vinyl will easily outpace CD sales if both new and used markets are
considered (Rosenblatt 2019). Download purchases from iTunes and other paid download
music providers are also trending downwards. Streaming services currently make up 75-
80% of music sales, which is an estimated $8.9 billion according to The International
mixtapes and single-song vinyl records, people have more control over the songs they
choose to listen to. This is in contrast to CDs, which require one to buy a collection of
songs without the ability to customize one’s selection. Streaming services also have an
element of convenience, as they can be played from mobile devices. These are all
The business model for streaming services is not complex. For example, Spotify
offers two accounts for its service: a free account and a premium account. Free Spotify
5
accounts do not require a subscription fee but offer limited features on the platform. For
example, users are not able to pick individual songs and are subject to advertisements
every thirty minutes. Premium Spotify accounts charge users a monthly fee, but they
allow for unlimited online and offline streaming options. In 2019, Spotify charged $9.99
a month and included a subscription to the video streaming service Hulu. Spotify's main
expense is paying music labels and artists for using their music. Generally, payouts are
made based on how much the artist is listened to compared to all other artist. For
example, if a major artist was 10% of the world’s streams, and Spotify generated $1
million from streams, the artist would receive $100,000. While most artists are paid on an
individual basis through aggregators (services that help put music on streaming services),
Spotify has reportedly secured blanket licenses from some major labels in an attempt to
reduce their payout amounts. However, Spotify has faced some lawsuits concerning
illegal practices on paying out mechanical licenses, so how effective these systems will
discussion. There is insufficient evidence to prove that Apple Music and Spotify, the two
major interactive streaming services, use profitable business models. Apple Music is a
subsidiary of Apple, and therefore information is not available on the streaming services’
finances. Apple’s CEO Tim Cook said in an interview with Fast Company that Apple is
“…not in [music streaming] for the money.” Some observers, including Bobby
Owsinoski from Forbes, have stated that Apple considers Apple Music a loss leader.
Apple's main objective is to keep users within their interface and make their mobile
6
reported that Apple Music had 60 million current paying subscribers, although the
company later confirmed this number also included people who were on free trial
subscriptions. It may be concluded that Apple Music's role as a loss leader has been
effective. However, this does not mean that music streaming currently has a viable
business model.
Spotify, the other major streaming service, strictly does business in music
streaming. However, Spotify's financial reporting has shown a very concerning trend in
the music streaming industry. Spotify has recorded net losses every single year between
2008 and 2018, and it continues to tell investors that these loses are due to attempts to
increase membership (Spotify 2020, 30). Between 2016 and 2017, Spotify recorded a net
operating income of almost one billion dollars. In 2019, Spotify saw its first positive net
operating income, but only in the first two quarters. Spotify had a net operating loss
overall in 2019, and firms such as Bernstein Bank and Wells Fargo both believe the
Apple Music and Spotify, it is clear that the long-term viability of music streaming
understated, especially as concerts and live events are being cancelled in the wake of the
COVID-19 pandemic. Overall, live performances are the largest revenue stream for
musical artists. A study by the Music Industry Research Association showed that 80% of
artist revenue came from live shows and performances (Krueger et al. 2018). It can be
expected that the use of big data and analytics will grow in importance as businesses and
landscape.
7
Generally speaking, the two ways a business can increase operating profits are to
increase sales or reduce costs. Streaming services are already under fire for their payout
statistics. Streaming services offer little insight into the mathematics behind their per
song payout. Multiple sources (The Trichodist, Soundcharts, etc.) will report different
per-stream payouts for the same period. Regardless, artists have noticed that even as
revenue for Spotify and other services have increased, payouts have decreased.
According to The Trichodist, artists would need approximately 333,334 Spotify streams
every month to make minimum wage, based on the Federal minimum wage of $7.25 and
physical CDs and made a net profit of ten dollars per CD (Christman 11, 2017), they
would only need to sell 147 units to make minimum wage. To put this into perspective,
an individual would have to listen to a ten-track album 229 times to equal the time spent
in a streaming service to make the artist the same amount. Streaming services run the risk
of dissuading artists from using their services if these numbers get any lower. However,
SoundExchange reported that in 2018 there were 51 million paying customers of the
music streaming business (Glanz 2018). Therefore, looking at how to increase sales and
8
Literature Review
most focus on subjective elements and are not related to the song’s financial value. A
number of studies on how specific song elements affect a song’s popularity or value state
there is no clear determinant of what makes a song popular. Salganik and Dodes suggest
that popular trends may be due to the cultural phenomenon that is beyond the ability to
predict (2016). Jonah Berger and Grant Packard suggest that songs with lyrics that are
atypical of the genre tend to be the most popular (2018). Their study was done on 1,879
high-charting songs between the years of 2014 and 2016 and focused on buzzwords and
predict if a song would make the Billboard Hot 100 chart with 73.4% accuracy (Hernard
and Rossetti 2014). The study showed that the most common themes in the Billboard Hot
100 songs from 1950 to 2009 were loss, desire, aspiration, breakup, pain, inspiration, and
nostalgia. Breakup songs were found to be the most prevalent theme throughout every
decade. It is difficult to make business decisions based on interpreting lyrical themes due
observations about song elements that would allow the music industry to make decisions
Many elements can be identified in songs. Some of these include song length,
number of choruses, lyric content, and tempo. Many other factors have been covered in
other studies with specific focuses. For example, a paper published in Royal Society Open
9
Science analyzed pop songs that charted in the United Kingdom between 1985 and 2015
and put them in categories depending on the song’s mood. It was found that songs have
become sadder in terms of content, yet the instrumentals were more danceable (Interiano
et al. 2018). In another study entitled “What Has America Been Singing About?”
researchers suggested that music has become more materialistic, whereas older pop music
was more romantic and sexual in nature (Christenson 2018). A study by Michael Tauburg
found that streaming services like Spotify are reinventing classic song titles, with newer
songs either having 1 or 2 word titles, or as many as 7 or more word titles, with less
relevance to the song’s subject matter (“Spotify Is Killing Song Titles” 2018). The major
element of these studies, however, is a personal assessment by the author on what makes
music “materialistic” or “sad” in nature. This highlights the major difficulty in music
Music Genres and Why Simplicity Sells” by Gamaliel Percino, Peter Klimek and Stefan
Thurner published in 2014 by PLOS One. By using the Discogs database of Amazon
sales rankings, they attempt to see how the complexity of instrumentation affected a
surface, the study strictly defined this through information found on the Discogs
database. The authors assumed that the instrumentational complexity of a style “is related
to the set of specialized skills that are typically required of musicians to play that style"
and therefore only took into account how many and what variety of instruments were
played by contributors of the project (Percino et al. 2014, 3). They found that
10
As a style increases its number of albums, i.e. attracts a growing number of artists,
its variety also increases. At the same time the style's uniformity becomes smaller,
i.e. a unique stylistic and complex expression pattern emerges. Album sales
numbers of a style, however, typically increase with decreasing complexity
(Percino et al. 2014, 13).
Unlike other studies, the authors were confident their model could accurately
predict popular music. However, their methodology does assume that open source data
about the subjective topic of instrumentation is accurate. The ability to apply this in a
11
Thesis Statement
elements in American popular music to determine how they contribute to the song’s
commercial value (i.e., its “popularity”), how these relationships evolve through time,
and how they change within different forms of consumption. Focusing on the years
between 2015 and 2018, this study will use data from BuzzAngle to assess the value of
elements for the industry’s most popular genres and assess the impact each has on song
value, as measured by streaming data. (11-12) in the digital age of music. Furthermore,
by focusing on elements of songs that are quantitative, one can determine whether the use
of big data and trend analysis can be used effectively in the music industry to determine
the possible value of specific song structures and therefore lead to making impactful
business decisions. While subjectivity plagues the usefulness of many musical studies in
a business sense, this study aims to determine whether quantitative elements can be of
12
Hypotheses
1. The length of the song will be related to song value (Ciampaglia et al, 2015).
a) The relationship between song value and the length of the song will vary
across genres.
b) The relationship between the length of the song and specific sales data
mediums.
2. The length of the song’s intro will be related to song value (Frank 2009).
a) The relationship between song value and the length of the song’s intro will
b) The relationship between the length of the song’s intro and specific sales data
mediums.
3. The number of times the song title occurs within the lyrics will show a weak
a) The strength of the relationship between song value and the number of times
the song title occurs within the lyrics will vary across genres.
b) The relationship between the number of times the song title occurs within the
lyrics and specific sales data (streams, programmed streams, etc.) will vary
4. The length to the first chorus will be related to song value (Murphy 2011).
13
a) The relationship between song value and the length to the first chorus of the
b) The relationship between the length to the first chorus and specific sales data
mediums.
habits or guides to music writing technique. Hypothesis 1 follows from the study “The
Production of Information in the Attention Economy,” which used internet traffic to test
how people’s attention span for product and information is getting smaller in the digital
In his 2009 book Futurehit.DNA, Jay Frank states that the digital revolution is
pushing songs to have shorter intros to appeal to their consumers (38). Based on the
For Hypothesis 3, I test the findings in the study “An Analysis of Common
Songwriting and Production Practices in 2014-2015 Billboard Hot 100 Songs” by David
Tough (2017). He analyzed the relationship between the number of times a song’s title
appeared and its success on the Billboard Hot 100 chart. He compared this number to
peak date as well as the trend direction of the song by its charting number to calculate an
index number, which he used to define success. He found there was “a weak negative
non-significant relationship” between a song’s success and the number of times the song
14
In his book Murphy’s Laws of Songwriting: The Book, Ralph Murphy suggests
keeping the length to the song’s first chorus short and that successful songs follow this
trend. He states that the song’s chorus and catchy hook or phrase should happen before
the 60 second mark of the song (2011). In this study, I test a relationship between the
15
Methodology
The value of a financial investment is simply the present value of all future
expected cash flows. However, valuing a song is less clear cut. The study considers a
variety of measures of song value. In addition, I gather data on specific song elements to
The first phase of the study was collecting the raw data and creating a database of
songs to analyze. The songs analyzed were the top five songs per chart at the end of each
month between the years of 2015 and 2018. These songs were collected from Billboard
Pro, the official database for Billboard chart occurrences. According to a study by
BuzzAngle in 2017, the four most popular genres of music by album consumption are
Hip-Hop/R&B, Country, Rock, and Pop (11-12). Since the focus of this study is on a
song’s profitability, I will focus on these four genres. Specifically, Pop music was
analyzed through the Adult Contemporary chart, Hip-Hop and R&B music was analyzed
through the Hot R&B/Hip-Hop chart, Country was analyzed through the Hot Country
Songs chart, and Rock music was analyzed through the Mainstream Rock chart. In
addition, a fifth category of the Billboard 100 was analyzed, which accounts for each
month’s most popular songs without considering the genre. This allows insight into
which genres are seen as most popular by consumers as well as whether or not the most
popular songs are derived from the genre or elements within a song.
After the list of songs was finalized, I analyzed each song individually based on
four elements. One methodological limitation of the study is the subjective nature of
analyzing music. Specific details such as chord progressions or tempo proved too and
16
interpretive to offer clean data to analyze profitability. To address this, the four elements
of this study were chosen due to their quantitative nature, making cross-analysis easier.
While not entirely objective, the elements studied are less interpretive than others that
could have been chosen. The elements analyzed are the length of the song, the length of
the intro, the length to the first chorus, and the number of times the song title occurs
within the lyrics. All encoding was done by listening to each song individually.
The four song elements will be followed for the entirety of the study. The length
of song will be defined as the time when the music begins to the time when there is no
additional music being played. This includes fade-outs but does not include echoes or
delays. The length of intro will be defined as the time from the start of the song to the
first downbeat of the next structural element, such as verse or chorus. If a vocalist begins
singing before the first beat of the verse, for example, the length of the intro will still be
measured as extending to the first downbeat of the verse. This approach will also be used
to determine the length of the chorus. For the number of times the song’s title occurs in
the lyrics, variations from the title will not be counted. Lastly, to add consistency, the
original album release of the song will be the one used in the study. Songs that are
and stopped the time upon clicking the mouse. For consistency, all songs were listened to
using Apple Music and to avoid latency issues, all songs were downloaded through the
streaming service. To ensure accuracy, each song was listened through three times and
the average value for each element, length of intro (“LOIA”), length to first chorus
(“LTFC”) and length of song (“LOSA”), were used in the regression analysis. A fourth
17
listen of the song was to note the number of times the song’s title appeared in the chorus
(“NOST”) which was measured by identifying each occurrence, and cross-referencing the
To validate my plan and my personal analysis, I analyzed 10 songs that were also
analyzed by Hit Songs Deconstructed, a database that does song analysis and is used at
universities for reference purposes. After comparing the results of their research with my
own analysis, and discounting two unclear data inconsistencies within Hit Song’s
Deconstructed’s analysis, the two data groups were 100% synonymous after discounting
differences in methodology (see Appendix A). In the case of the one inconsistency
analyzing the song “Do I Wanna Know” by the Arctic Monkeys, I noted that the song
title was included six times within the song, whereas Hit Songs Deconstructed said it
occurred five times, with one partial occurrence. The difference is due to the fact that for
One aspect of this study was to see how well songs performed relative to their
time on each of our selected charts. Billboard Pro’s database provides a database that
includes charting appearances. This data shows the charting position of each song every
week. From there, I recorded the number of weeks the song was “on the chart” and “off
the chart.” Each song’s time off the chart was divided into pre and post charting
appearances. Before charting appearances were the number of weeks before the song’s
first appearance on the subsequent chart. After chart appearances are every week the song
was off the chart after the song had its first appearance on the chart.
After recording these elements, I cross referenced each song to BuzzAngle’s sales
18
transaction in its finest detail" (“Our Platform” 2020). Similar to Nielsen, this database
provided sales information for each song of the database. From this database, I was able
to get weekly data for each song in three categories: interactive online streams, and
digital sales and radio spins. These will be referred to as “audio streams”, “song sales”
and “spins.” Specific limitations of these three categories are provided in the
Abbreviations & Definitions section. These three categories were split into three
recorded the sales data from the song's release date until the end of the sample period
studied, August 30th, 2019. A cut-off date was chosen when data collection began, as
songs can sell infinitely beyond their release date. From the initial recording of 1,200
chart occurrences, the number of songs used in the final database was 356. However, for
the sake of regression analysis, songs that appeared on multiple charts were recorded
reoccurrences, the database contained 417 songs. This breakdown of the sample studied
is shown in Table 1.
19
Table 1: Song Breakdown of Observations for Regression
Next, I calculated several values from this database. I calculated the total and
weekly averages of audio streams, song sales, and spins while they were on-chart, before
they were on the chart, and after they were on the chart. I also calculated the maximum
number of weeks the song consecutively charted. Other elements that were recorded were
the peak charting number (1, 2, 3, etc.) and the first date in which the song reached its
peak charting number. These were used as control variables. Other control variables
include "Multiple Charts" and "Multiple Songs From the Same Album." "Multiple
charts" denotes when a song appears on multiple charts simultaneously. A "1" on the
database signified if this was true, and a "0" denotes if this is false. "Multiple Songs From
The Same Album" denotes when multiple songs from the same album were charting
simultaneously, and the same process was used to signify this on the database as the
"Multiple Chart" value. To make sure the data sample was of high quality, songs were
removed if all data was not available. The criteria for having an incomplete dataset
20
consisted of any missing information in the song’s sales figures and charting time or the
Appendix B.
21
Results
The first step was completing univariate analysis on the song database. This
included calculating the mean, median, standard deviation and the minimum and
maximum for each of the four song elements ("LOIA," "LOSA," "LTFCA," and
"NOST"). The univariate results are shown in Table 2 for the entire sample and for each
genre.
22
Across the whole database, the average song mentioned the title in the lyrics
10.23 times, with a 12.1 second intro, 44.26 seconds to the first chorus, and an overall
length of 218.84 seconds. On average, songs that appeared on the Billboard Hot 100 list
mentioned the title in the lyrics more than any other genre with 13.14 occurrences.
Mainstream Rock showed the fewest mentions of the title in the lyrics on average (8.72
occurrences) with the smallest standard deviation of 7.85 occurrences. Mainstream Rock
also had the longest time to first chorus on average with 52.64. My analysis showed that
in comparison to other genres, Hip-Hop/R&B had the highest number of songs that began
with the chorus and subsequently had the lowest average length to first chorus with 39.52
seconds. Pop/Contemporary songs had the shortest average intro with 8.73 seconds (and
the smallest standard deviation of 6.63 seconds) with Mainstream Rock averaging the
longest intros with 19.33 seconds. Pop/Contemporary charts are based heavily on radio
play, and this may explain the desire for a short introduction. Country songs averaged the
lowest total length of songs with 205.23 seconds, with Mainstream Rock averaging the
longest songs with 234.43 seconds. The extreme nature of Mainstream Rock’s elements
may explain why the sales of the genre were lowest of the group, perhaps serving a niche
market. Differences in observation among elements in the same genre can be explained
Next, a Tobit regression method was chosen due to the nature of the database and
the sales value. Tobit regressions are the best option because the measure of value is
limited to values greater than or equal to zero. A number of regressions were run to
predict sales, streams and radio spins based on the four song elements for the songs in the
database. Due to the large numerical values of sales, streams, and radio spins (our
23
dependent variables), each value was scaled by a million. The regressions were
completed by genre and for the total sample and for three measures of value. In addition,
models were also estimated for totals as well as pre-charting and post charting values.
Additional independent variables were added when calculating the regression over the
period of time from the song's release. These include the song's peak charting number
("P"), the song's longest consecutive run on the chart ("MCWOC"), if the song was
charting from the same album ("MSFSA"). Due to the vast number of regressions, I focus
on important findings throughout, and my explanations for what I learned. The results are
divided based on the hypotheses made about each element and highlight significant
values that support a working model, as well as the importance of which time segment of
data is focused on. The regression results not discussed in the text are provided in
Appendix C.
Tobit regression analysis does not have an equivalent to R-squared, which is seen
OLS regression, the sum of squared regressions and sum of squared errors will always
equal the sum of squared total. This is not the case when the regressions are non-linear.
There are a number of pseudo R-squared equations that attempt to calculate a goodness-
of-fit statistic. Stata, the software used for this study calculates McFadden’s pseudo R-
McFadden’s pseudo R-squared can be found in the book Urban Travel Demand: A
24
pseudo R-squared is calculated as equaling one minus the constant-only log likelihood
From this, it can be understood that larger pseudo R-squared values indicate a
stronger fit for the model, where smaller values do not. Negative values are also possible
25
Total vs. Before Charting vs. After Charting
The initial regression analysis was completed over the length of time between the
song’s release until the end of the sample period, August 30th, 2019. However, it became
obvious very quickly that a significant predictor of song value would not be found in total
values or after charting data. Once a song is on the chart, it is popular. Hence, there is a
redundancy to using the time on the chart to indicate value. By the time a song has left
the chart, many outside factors may be involved it what influences value. Referencing
Billboard Hot 100 streams in Table 3, NOST (p > .007), LOIA (p > .036) and LOSA (p >
.024) were all significant predictors. Longer songs with short introductions that used the
title of the song extensively throughout the lyrics correlated to a higher value. Restricting
the regressions to pre-charting data shows the strongest method of valuing a song through
these elements.
26
Length of Song’s Intro
significant relationship to value. Of the four song elements chosen, the length of the
song’s intro shows significance in several places. Referencing Table 4 and Table 5, LOIA
is significantly related to STOB (p > 0.005, with a pseudo R-squared of 2.26%) and
STTOB (p > 0.001, with a pseudo R-squared of 0.57%) for all songs included in the
27
database. In both cases, songs with short introductions result in higher streaming and
sales numbers. If we increase the LOIA by 1 standard deviation, we see decrease of 6.88
in the number of streams as measured by STOB. This is much more effective than for
streams, which sees a minute decrease of 0.044 in the number of streams as measured by
Referencing Table 3, Billboard Hot 100 songs saw a 39.72 decrease in the number of
streams for every 1 standard deviation increase in LOIA (p > .05, with a pseudo R-
squared of 1.31%).
Length of Song
28
Table 7: Regression Analysis: Country - Streams
relationship to value. Results for the Country genre are consistent with Hypothesis 2.
Referencing Table 6 and Table 7, LOSA is significantly related to STOB (p > .045, with
a pseudo R-squared of 8.12%) and STTOB (p > .025, with a pseudo R-squared of 0.75%)
for all Country songs included in the database. In both cases, the longer songs have
higher streaming and sales numbers. A 1 standard deviation change in LOSA results in
showed far less effect on sales in the Country genre, showing only a 0.03 increase of
0.044 in the number of sales as measured by STOB. This can also be for songs that
STTOB (p > .05, with a pseudo R-squared of 1.31%). We can reject the null hypothesis
in these instances.
29
Number of Times Title of Song Appears in Lyrics
Hypothesis 3 predicted that the number of times the title of a song appears in the
lyrics would not have a strong relationship with the song’s value. Although this remained
true across all songs, songs on the Billboard Hot 100 chart showed a significant
relationship between sales and streams and NOST. Referencing Table 3 and Table 8,
NOST is significantly related to STOB (p > .017, with a pseudo R-squared of -55.11%)
and STTOB (p > .024, with a pseudo R-squared of 1.31%) for all songs in the database in
Billboard Hot 100. For every 1 standard deviation increase in NOST, there is a 49.96
STOB is only a 0.03 increase for each 1 standard deviation in NOST. According to this
model, the number of times the title of a song appears in lyrics does have a relationship
with value.
30
Length to the Song’s First Chorus
Hypothesis 4 predicted that the length of a song’s first chorus would have a
significant relationship to value. Of the four song elements chosen, the length to the
song’s first chorus shows the least significance over the span of genre’s selected.
LTFC is significantly related to STOB (p > .041, with a pseudo R-squared of 1.9%) for
all songs included in the database from the Mainstream Rock genre. The faster a song
arrived at the chorus section, the more valuable it was seeming to be connected with sales
numbers in the Mainstream Rock genre. For every 1 standard deviation increase in
LTFC, there is a decrease of 1.83 in the number of sales as measured by STOB. This is
more effective than songs across all genres, which did not show significance for this
element. According to the database, the songs with the longest time to the first chorus
were mainly in the Mainstream Rock genre, which makes these findings surprising.
31
The critical value of p > .05 was used in my analysis. However, because multiple
tests were conducted, adjusting the alpha level may be a necessary step for future tests.
Multiple methods of to correct alpha levels exist, including the Bonferroni correction.
Bonferroni’s correction divides the alpha level by the number of tests conducted. In this
case, regressions were calculated on five genres, as well as the five genres collectively,
over three time periods. This would make the critical value, by Bonferroni’s correction,
p > .0028, or p > .01 if discounting time periods. This would eliminate the majority of
significances found throughout the regressions. However, due to the complexity and
methodologies, we assumed all the tests were independent of each other for this study to
avoid substantial occurrences of false positives. Future studies, however, may benefit
32
Discussion
The results of how songwriting elements affect a song’s value were surprising.
Some elements proved to be more effective at creating value. For example, the length of
song was more effective at predicting value than length to the song’s first chorus across
all genres. Some genres were more impacted by certain elements than others. For
example, Mainstream Rock songs were more valuable when they had shorter choruses
compared to other genres. Billboard Hot 100 songs were more valuable when they had
shorter introductions and the title of the track was used in the lyrics frequently. However,
the four factors analyzed do not offer a straightforward answer to predicting a song’s
automatically, the model does not provide a surefire way of answering the question of
what drives value. This perhaps could be expected in the digital age. So many influences
can affect a song’s value. These include popularity through viral means (including
memes and other social media posts), movie placements, and non-quantitative artistic
elements. Trying to quantify these factors remains difficult, and these findings are similar
to other studies. For example, the 2018 study by Minna Reiman and Philippa Ornell
attempted to use machine learning with algorithms such as Gaussian Naïve Bayes and
Support Vector Machine to predict whether songs would reach the Billboard Hot 100 list.
Their model was only able to predict songs with 60% accuracy (Reiman and Ornell,
2018).
A key element of my hypotheses was that the value would vary depending on
consumption medium. This turned out to be true, as the elements overall showed to be
33
valuable indicators in streaming numbers but not sales number. This may be due to the
demographics that are purchasing music versus those who are streaming services.
However, streaming is the most valuable medium to predict, as nearly 80% of music
In 2014, a viral video by Sir Mashalot emerged that showed six chart-topping
songs could be laced together seamlessly through minor editing in Pro Tools. These six
songs were “Sure Be Cool If You Did” by Blake Shelton, “Drunk on You” by Luke
Bryan, “Chillin’ It” by Cole Swindell, “Close Your Eyes” by Parmalee, “This is How
We Roll” by Florida Georgia Line, and “Ready, Set, Roll” by Chase Rice (“Sir
Mashalot…” 2014). At the surface, certain genres are more prone to following formulaic
song structures than others. However, my regression analysis only found significance in
the length of country songs in streams before charting and streams before charting. This
may suggest that using seconds instead of bars in studies on song structure may not be
effective. However, that analysis is subjective, and the use of quantitative and non-
subjective data collection becomes more difficult. Until a more sophisticated computer-
based listening methodology is created, human subjectivity may always be necessary for
However, for songwriters, these elements are not to be completely dismissed. The
strongest correlations were found within songs on the Billboard Hot 100, which is a good
indicator of what is popular among listeners at the time. Specifically, short songs with
titles that are repeated without the lyrics were linked to an increase in value through
streams. These factors may be considered when attempting to write songs to receive
widespread popularity.
34
Of all consumption mediums, radio spins had very few correlations throughout
the study. Unlike streaming and direct sales, the consumer has little choice in deciding
what they are consuming. As covered in a 2019 Rolling Stones article, despite formal
investigations in 2004, the practice of payola and “pay-to-play” is still rampant within the
industry (Leight 2020). While these allegations are far from proven, it does beg the
question of the usefulness of including radio spins in studies to determine the value of a
song.
music sales, charting data and information on song elements. As I experienced over the
course of this study, the data available on music is not easy to obtain nor extremely
accurate upon review. Individual labels and companies are in charge of reporting data.
Some songs, specific to certain labels, had large amounts of missing sales data. Billboard
has made third party collection of their data illegal under copyright infringement. Outside
of large established labels that have access to this data, the process remains tedious. With
advances in technology, the access and accuracy of data should continue to improve, so
that independent labels and lower-level artists may have the ability to use this data to the
35
Conclusion
This paper shows one example of how big data and analytics can be used to
predict song value. As companies continue to implement the use of big data and analytics
throughout the business world, it only seems certain that it will attempt to automate all
facets of the business. This includes A&R jobs at major labels and music conglomerates.
Although my model does not show any groundbreaking relationships between my chosen
elements, it does highlight the possibilities involved with creating a model that could
predict song value. In a market that is rewarding sustainable online revenue in the face of
pandemic, the use of big data and analytics to maximize these profits in foreseeable
future is vital. With the continued growth of these technologies, it may one day be
36
Works Cited
“2019-2020 Streaming Price Bible: YouTube Is STILL The #1 Problem To Solve.” The
Trichordist, March 5, 2020. https://thetrichordist.com/2020/03/05/2019-2020-
streaming-price-bible-youtube-is-still-the-1-problem-to-solve/.
Berger, Jonah, and Grant Packard. “Are Atypical Things More Popular?” Psychological
Science 29, no. 7 (July 2018): 1178–84. doi:10.1177/0956797618759465.
Christenson, Peter G., et al. “What Has America Been Singing about? Trends in Themes
in the U.S. Top-40 Songs: 1960–2010.” Psychology of Music, (January 2018) pp.
1-19. EBSCOhost, doi:10.1177/0305735617748205.
Christman, Ed. “Life at the Margins.” Billboard, 14 Apr. 2007, pp. 11.
Ciampaglia, G. L., Flammini, A., & Menczer, F. (2015). The production of information in
the attention economy. Scientific Reports, 5(9452). doi:10.1038/srep09452
Domencich, Thomas A., and Daniel McFadden. Urban Travel Demand: A Behavioral
Analysis. Amsterdam: North-Holland Pub. Co., 1975.
Friedlander, Joshua. “News and Notes on 2017 RIAA Revenue Statistics.” RIAA, last
modified March 22, 2018, www.riaa.com/wp-content/uploads/2018/03/RIAA-
Year-End-2017-News-and-Notes.pdf.
37
Grover, Varun, et al. “Creating Strategic Business Value from Big Data Analytics: A
Research Framework.” Journal of Management Information Systems 35, no. 2
(2018) pp. 388–423. EBSCOhost, doi:10.1080/07421222.2018.1451951.
Henard, David H, and Christian L Rossetti. “All You Need Is Love? Communication
Insights From Pop Music’s Number-One Hits.” Journal of Advertising Research
54 (June 1, 2014): 53–66. https://doi.org/10.2501/JAR-54-2-178-191.
Introducing Sony's 360 Reality Audio | The Future of Music. YouTube. YouTube, 2019.
https://www.youtube.com/watch?v=h4ZbhQgE9_k.
Jacobson, Erin M. “Spotify May Have To Pay Songwriters $345 Million.” Forbes. Forbes
Magazine, July 19, 2017.
https://www.forbes.com/sites/legalentertainment/2017/07/19/spotify-may-have-
to-pay-songwriters-345-million/#4d79c25c193d.
Krueger, Alan B., et al. “Inaugural Music Industry Research Association (MIRA) Survey
of Musicians.” 22 June 2018, https://img1.wsimg.com/blobby/go/53aaa2d4-793a-
4400-b6c9-95d6618809f9/downloads/1cgjrbs3b_761615.pdf.
Leight, Elias. “Want to Get on the Radio? Have $50,000?” Rolling Stone, January 16,
2020. https://www.rollingstone.com/music/music-features/radio-stations-hit-pay-
for-play-867825/.
“Marketing & Sales Big Data, Analytics, and the Future of Marketing & Sales.”
McKinsey & Company, March 2015,
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/Marketin
g%20and%20Sales/Our%20Insights/EBook%20Big%20data%20analytics%20an
d%20the%20future%20of%20marketing%20sales/Big-Data-eBook.ashx
Middleton, Richard, et al. “Pop.” Grove Music Online. (January 2001) pp. 1-54.
Accessed 9 Jan. 2019, www.oxfordmusiconline-
.com.ezproxy.mtsu.edu/grovemusic/view/10.1093/gmo/9781561592630.001.0001
/omo-9781561592630-e-0000046845
38
Murphy, Ralph. Murphy’s Laws of Songwriting: The Book. Nashville, Tennessee:
Murphy Music Consulting, 2011.
Nicolaou, Anna. “US Music Sales Set 13-Year Record After Streaming Surge.” Financial
Times. Financial Times, February 25, 2020.
https://www.ft.com/content/448e544a-57e1-11ea-a528-dd0f971febbc.
Olivet, Dean, et al. “Chartmania!! I Broke Down Every Song That Reached the Billboard
Top 5 in 2017.” Soundfly, 11 Jan. 2018, flypaper.soundfly.com/write/chartmania-
breaking-down-billboard-top-40-songs-2017/.
Owsinski, Bobby. “This Is Why Spotify Is Beating Apple Music.” Forbes. Forbes
Magazine, September 16, 2017.
https://www.forbes.com/sites/bobbyowsinski/2017/09/16/spotify-apple-
music/#503f01a3170b.
Reiman, M., & Örnell, P. (2018). Predicting Hit Songs with Machine Learning
(Dissertation). http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229705
Reimer, Bennett & Palmer, Anthony J. & Regelski, Thomas A. & Bowman, Wayne D.
"Why Do Humans Value Music?" Philosophy of Music Education Review 10, no.
1 (2002): pp. 41-41. Project MUSE, muse.jhu.edu/article/408670.
Rosenblatt, Bill. “Vinyl Is Bigger Than We Thought. Much Bigger.” Forbes. Forbes
Magazine, March 29, 2019.
https://www.forbes.com/sites/billrosenblatt/2018/09/18/vinyl-is-bigger-than-we-
thought-much-bigger/#3926d2eb1c9c.
Salganik, M. J., Dodds, P. S., Watts, D. J. (2006). Experimental study of inequality and
unpredictability in an artificial cultural market. Science, 311, 854–856.
Schneider, Marc. “Warner Music Group Hires Chief Information Officer, Adds New
Data Role.” Billboard. Prometheus Global Media, LLC, December 14, 2016.
39
https://www.billboard.com/articles/business/7624487/warner-music-group-ralph-
munsen-vinnie-freda.
Schreiber, David, and Alison Rieple. “Uncovering the Influences on Decision Making in
the Popular Music Industry; Intuition, Networks and the Desire for Symbolic
Capital.” Creative Industries Journal 11, no. 3 (2018): pp. 245–262. EBSCOhost,
doi:10.1080/17510694.2018.1490146.
Sribney, William. “Pseudo-R2 for Tobit .” Stata. StataCorp. 2020. Accessed March 22,
2020. https://www.stata.com/support/faqs/statistics/pseudo-r2/.
“Sir Mashalot: Mind-Blowing SIX Song Country Mashup” Sir Mashalot, published on
November 4th, 2014, YouTube video, 03:55,
https://www.youtube.com/watch?v=FY8SwIvxj8o.
Spotify. "Annual Report for 2019." Spotify (2019): 1-238. Spotify. February 12, 2020.
https://s22.q4cdn.com/540910603/files/doc_financials/quarterly/2019/601c445e-
1d37-4938-b854-e5344850c3f9.pdf
Starr, Larry, and Christopher Alan Waterman. American Popular Music: from Minstrelsy
to MP3. Oxford University Press, 2018.
Tagg, Philip. “Analysing Popular Music: Theory, Method and Practice.” Popular Music,
vol. 2 (1982): pp. 37–67. JSTOR, www.jstor.org/stable/852975.
Tauberg, Michael. “Spotify Is Killing Song Titles.” Medium, last modified March 23,
2018, www.medium.com/@michaeltauberg/spotify-is-killing-song-titles-
5f48b7827653.
Webster, James. Haydn's "Farewell" Symphony and the Idea of Classical Style: Through-
Composition and Cyclic Integration in His Instrumental Music. Cambridge Univ.
Press, 2004.
“What Are Pseudo R-Squareds?” UCLA. IDRE Stats, October 20, 2011.
https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-
squareds/.
40
Appendices
Appendix A
Table A.1: Data Collection Comparison – Hit Song’s Deconstructed vs. Personal Analysis
Note: This chart presents a comparison between my analysis and that of Hit Song's
Deconstructed, a database used in universities and studying in-depth song analysis.
Due to the possible subjectivity of music analysis, it is important to show the
similarities and or differences between my analysis and that of other published
sources. The abbreviation "CNIR" means that the report had conflict information
41
Appendix B
42
Table B.1: Songs Analyzed in Study – Complete List (Continued)
43
Table B.1: Songs Analyzed in Study – Complete List (Continued)
44
Table B.1: Songs Analyzed in Study – Complete List (Continued)
45
Appendix C
46
Country – Radio
Hip-Hop/R&B – Sales
47
Hip-Hop/R&B – Radio
Hip-Hop/R&B – Streams
48
Mainstream Rock – Radio
49
Pop/Contemporary – Sales
Pop/Contemporary – Radio
50
Pop/Contemporary – Streams
51