There seems to be a constant antagonization between old and new music lovers. A common discussion among the two parties tends to revolve around the percieved ‘repetitiveness’ of today’s music when compared to music of the Baby Boomer’s and of Generation X.
I want to study whether this argument hes legitimate statistical merit to back it up. Although repetitiveness can be defined in terms of melody, beat, lyrics, or even cadence, we will use lyrics as our basis of analysis for this project. Lyrical repetition tends to be the focus of most of the repetition debates, and I expect it to be a solid starting point for a meaningful analysis.
Now, Lyrical repetition in itself can be defined using numerous metrics. I will begin the exploratory data analysis by touring a few possibilities of metrics, weaving through their strengths and weaknesses to find meaningful consclusions in the Data.
The data set I am using comes from a kaggle user who has scraped the top 100 Billboard Hits from 1964 - 2015. Each Entity includes the title of the song, the year it was ranked in the top 100 most popular song (it can be included in multiple years each as an individual entity in the table), its rank amongst the songs of its respective year, its artist, and the lyrics of the song.
The data set being used allows us to gear our question a little more specifically, “Do the most popular songs of each year become more lyrically repetitive as time progresses?”
top100_1964_2015_df <- read.csv("billboard_lyrics_1964-2015.csv", colClasses=c("numeric", "character", "character", "numeric", "character") )
top100_1964_2015_df <- top100_1964_2015_df %>%
select(-Source)
head(top100_1964_2015_df)
## Rank Song
## 1 1 wooly bully
## 2 2 i cant help myself sugar pie honey bunch
## 3 3 i cant get no satisfaction
## 4 4 you were on my mind
## 5 5 youve lost that lovin feelin
## 6 6 downtown
## Artist Year
## 1 sam the sham and the pharaohs 1965
## 2 four tops 1965
## 3 the rolling stones 1965
## 4 we five 1965
## 5 the righteous brothers 1965
## 6 petula clark 1965
## Lyrics
## 1 sam the sham miscellaneous wooly bully wooly bully sam the sham the pharaohs domingo samudio uno dos one two tres quatro matty told hatty about a thing she saw had two big horns and a wooly jaw wooly bully wooly bully wooly bully wooly bully wooly bully hatty told matty lets dont take no chance lets not belseven come and learn to dance wooly bully wooly bully wooly bully wooly bully wooly bully matty told hatty thats the thing to do get you someone really to pull the wool with you wooly bully wooly bully wooly bully wooly bully wooly bully lseven the letter l and the number 7 when typed they form a rough square l7 so the lyrics mean lets not be square
## 2 sugar pie honey bunch you know that i love you i cant help myself i love you and nobody elsein and out my life you come and you go leaving just your picture behind and i kissed it a thousand timeswhen you snap your finger or wink your eye i come arunning to you im tied to your apron strings and theres nothing that i can docant help myself no i cant help myselfsugar pie honey bunch im weaker than a man should be i cant help myself im a fool in love you seewanna tell you i dont love you tell you that were through and ive tried but every time i see your face i get all choked up insidewhen i call your name girl it starts the flame burning in my heart tearing it all apart no matter how i try my love i cannot hidecause sugar pie honey bunch you know that im weak for you cant help myself i love you and nobody elsesugar pie honey bunch do anything you ask me to cant help myself i want you and nobody elsesugar pie honey bunch you know that i love you i cant help myself i cant help myself
## 3
## 4 when i woke up this morning you were on my mind and you were on my mind i got troubles whoaoh i got worries whoaoh i got wounds to bind so i went to the corner just to ease my pains yeah just to ease my pains i got troubles whoaoh i got worries whoaoh i came home again when i woke up this morning you were on my miiiind and you were on my mind i got troubles whoaoh i got worries whoaoh i got wounds to bind and i got a feelin down in my shooooooes said way down in my shooooes yeah i got to ramble whoaoh i got to move on whoaoh i got to walk away my blues when i woke up this morning you were on my mind you were on my mind i got troubles whoaoh i got worries whoaoh i got wounds to bind
## 5 you never close your eyes anymore when i kiss your lips and theres no tenderness like before in your fingertips youre trying hard not to show it but baby baby i know ityou lost that lovin feelin whoa that lovin feelin you lost that lovin feelin now its gone gone gone wohnow theres no welcome look in your eyes when i reach for you and now youre starting to criticize little things i do it makes me just feel like crying cause baby something beautifuls dyinyou lost that lovin feelin whoa that lovin feelin you lost that lovin feelin now its gone gone gone wohbaby baby id get down on my knees for you if you would only love me like you used to do yeah we had a love a love a love you dont find everyday so dont dont dont dont let it slip awaybaby baby baby baby i beg you please please please please i need your love need your love i need your love i need your love so bring it on back so bring it on back bring it on back bring it on backbring back that lovin feelin whoa that lovin feelin bring back that lovin feelin cause its gone gone gone and i cant go on wohbring back that lovin feelin whoa that lovin feelin bring back that lovin feelin cause its gone gone gone
## 6 when youre alone and life is making you lonely you can always go downtown when youve got worries all the noise and the hurry seems to help i know downtownjust listen to the music of the traffic in the city linger on the sidewalk where the neon signs are pretty how can you lose the lights are much brighter there you can forget all your troubles forget all your caresso go downtown things will be great when youre downtown no finer place for sure downtown every things waiting for youdont hang around and let your problems surround you there are movie shows downtown maybe you know some little places to go to where they never close downtownjust listen to the rhythm of a gentle bossa nova youll be dancing with em too before the night is over happy again the lights are much brighter there you can forget all your troubles forget all your caresso go downtown where all the lights are bright downtown waiting for you tonight downtown youre gonna be alright nowdowntownand you may find somebody kind to help and understand you someone who is just like you and needs a gentle hand to guide them along so maybe ill see you there we can forget all our troubles forget all our caresso go downtown things will be great when youre downtown dont wait a minute more downtown everything is waiting for you downtown downtown downtown downtown
As a naive approach we count the words in each song, unique words per song, and the proportion unique words over word count…
We first convert our factored Lyrics into something we can operate on - characters. We then count the lyrics in each song using library(ngram)’s wordcount function. I will remove the songs with unknown Lyrics and then count unique words in each song.
We compare the total word count to the number of unique words using tidyr::gather to turn our Lyric attribute into n attributes each containing a word, we summarize and count unique words.
Our final data frame contains the final proportion of unique words over word count for a naive word repetition metric.
top100_1964_2015_df <- top100_1964_2015_df %>%
mutate(Lyrics = as.character(Lyrics)) %>%
group_by_(.dots=c("Song","Artist","Year")) %>%
mutate(wc = wordcount(Lyrics)) %>%
arrange(desc(wc))
head(top100_1964_2015_df)
## # A tibble: 6 x 6
## # Groups: Song, Artist, Year [6]
## Rank Song Artist Year Lyrics wc
## <dbl> <chr> <chr> <dbl> <chr> <int>
## 1 49.0 im a flirt r kell… 2007 " im a im a im a im a f… 1156
## 2 19.0 been around the world puff d… 1998 " intro mase yo yo this… 1149
## 3 88.0 forever drake … 2009 " it may not mean nothi… 1050
## 4 71.0 forever drake … 2010 " it may not mean nothi… 1050
## 5 40.0 air force ones nelly … 2003 " big boy big boy big b… 1042
## 6 93.0 back in the day ahmad 1994 " back in the days when… 1035
##Seperate each lyric to re-organize each entity as a lyric from a song
top100_1964_2015_by_word_df <- separate(top100_1964_2015_df, Lyrics, as.character(c(1:1156)), sep=" ")
head(top100_1964_2015_by_word_df)
## # A tibble: 6 x 1,161
## # Groups: Song, Artist, Year [6]
## Rank Song Artist Year `1` `2` `3` `4` `5` `6` `7` `8`
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 49.0 im a… r kel… 2007 "" im a im a im a im
## 2 19.0 been… puff … 1998 "" intro mase yo yo this mase youk…
## 3 88.0 fore… drake… 2009 "" it may not mean noth… to yall
## 4 71.0 fore… drake… 2010 "" it may not mean noth… to yall
## 5 40.0 air … nelly… 2003 "" big boy big boy big boyi said
## 6 93.0 back… ahmad 1994 "" back in the days when i was
## # ... with 1,149 more variables: `9` <chr>, `10` <chr>, `11` <chr>,
## # `12` <chr>, `13` <chr>, `14` <chr>, `15` <chr>, `16` <chr>,
## # `17` <chr>, `18` <chr>, `19` <chr>, `20` <chr>, `21` <chr>,
## # `22` <chr>, `23` <chr>, `24` <chr>, `25` <chr>, `26` <chr>,
## # `27` <chr>, `28` <chr>, `29` <chr>, `30` <chr>, `31` <chr>,
## # `32` <chr>, `33` <chr>, `34` <chr>, `35` <chr>, `36` <chr>,
## # `37` <chr>, `38` <chr>, `39` <chr>, `40` <chr>, `41` <chr>,
## # `42` <chr>, `43` <chr>, `44` <chr>, `45` <chr>, `46` <chr>,
## # `47` <chr>, `48` <chr>, `49` <chr>, `50` <chr>, `51` <chr>,
## # `52` <chr>, `53` <chr>, `54` <chr>, `55` <chr>, `56` <chr>,
## # `57` <chr>, `58` <chr>, `59` <chr>, `60` <chr>, `61` <chr>,
## # `62` <chr>, `63` <chr>, `64` <chr>, `65` <chr>, `66` <chr>,
## # `67` <chr>, `68` <chr>, `69` <chr>, `70` <chr>, `71` <chr>,
## # `72` <chr>, `73` <chr>, `74` <chr>, `75` <chr>, `76` <chr>,
## # `77` <chr>, `78` <chr>, `79` <chr>, `80` <chr>, `81` <chr>,
## # `82` <chr>, `83` <chr>, `84` <chr>, `85` <chr>, `86` <chr>,
## # `87` <chr>, `88` <chr>, `89` <chr>, `90` <chr>, `91` <chr>,
## # `92` <chr>, `93` <chr>, `94` <chr>, `95` <chr>, `96` <chr>,
## # `97` <chr>, `98` <chr>, `99` <chr>, `100` <chr>, `101` <chr>,
## # `102` <chr>, `103` <chr>, `104` <chr>, `105` <chr>, `106` <chr>,
## # `107` <chr>, `108` <chr>, …
##re-organize dataframe as each entity being a lyric from a song, to get unique word count
top100_1964_2015_by_word_df_uniquewords <- top100_1964_2015_by_word_df %>% gather(key = 1:1156, word,-Rank, -Artist, -Year, -Song, -wc, na.rm = TRUE) %>%
group_by(Song, Artist, Year) %>%
summarize(unique_words = n_distinct(word))
top100_prop_unique_words <- top100_1964_2015_by_word_df_uniquewords %>%
inner_join(top100_1964_2015_by_word_df, by = c("Song" = "Song", "Artist" = "Artist", "Year" = "Year")) %>%
mutate(prop = unique_words / wc) %>%
arrange(Year, Rank) %>%
select(Rank, Song, Artist, Year, unique_words, wc, prop) %>%
filter(wc > 4, unique_words > 4)
head(top100_prop_unique_words)
## # A tibble: 6 x 7
## # Groups: Song, Artist [6]
## Rank Song Artist Year unique_words wc prop
## <dbl> <chr> <chr> <dbl> <int> <int> <dbl>
## 1 1.00 wooly bully sam the sham a… 1965 65 125 0.520
## 2 2.00 i cant help myself… four tops 1965 95 204 0.466
## 3 4.00 you were on my mind we five 1965 45 152 0.296
## 4 5.00 youve lost that lo… the righteous … 1965 89 232 0.384
## 5 6.00 downtown petula clark 1965 121 239 0.506
## 6 7.00 help the beatles 1965 76 228 0.333
Let us graph the average proportion of unique words per song over Time.
top100_prop_avgtop100_prop_unique_words <- top100_prop_unique_words %>%
group_by(Year) %>%
mutate(avg_prop = mean(prop))
ggplot() +
geom_line(data=top100_prop_avgtop100_prop_unique_words, aes(x=Year, y=avg_prop), color="Red") +
labs(title="Average Percent of Unique Words Per Song Per Year",
x = "Year",
y = "Percentage of Unique Words Used")
Our initial Analysis would beleive us to believe there is a correlation between Year and Percentage of Unique words used per top 100 Billboard Hits. Let us perform a formal test.
##Linear Regression
metric1 = lm(Year ~ avg_prop, data = top100_prop_avgtop100_prop_unique_words)
summary(metric1)
##
## Call:
## lm(formula = Year ~ avg_prop, data = top100_prop_avgtop100_prop_unique_words)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.5359 -3.2188 -0.3057 4.3298 12.1235
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2139.6472 0.9547 2241 <2e-16 ***
## avg_prop -404.7903 2.5786 -157 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.954 on 4829 degrees of freedom
## Multiple R-squared: 0.8361, Adjusted R-squared: 0.8361
## F-statistic: 2.464e+04 on 1 and 4829 DF, p-value: < 2.2e-16
We ran a linear regression with our independant variable as Year, to measure whether there is a significant relationship versus ‘Year’ and our proposed response varaible, ‘average proportion of unique words per song over total word count per year’, which we call the ‘average repetition metric’.
I will run a two-tailed t-test, whether there is a significant relationship between the two variables.
Null Hypothesis: There is no significant relationship between Year and our Average Repetition Metric. Alternative Hypothesis: There is a significant relationship between Year and our Average Repetition Metric.
We see since our p-value is essentially 0 (< 2.2e-16), we can reject our null hypothesis. We can say there is a significant relationship between Year and Year and our Average Repetition Metric. We have an R-squared value of .8361 which is very high.
We allow ourselves to see that there is a significant correlation between Year and Proportion of unique words per song. There is a very obvious inverse relationship between Year and Percentage of Unique Words Used. This is to say, as time increases the top 100 Billboard songs of the year tended to use less unique words per song.
Although this metric can be potentially useful, let’s explore Groove Armada’s 1997 hit, ‘At The River’, to see where its fault lies-
“If you’re fond of sand dunes and salty air, Quaint little villages here and there.
If you’re fond of sand dunes and salty air, Quaint little villages here and there.
If you’re fond of sand dunes and salty air, Quaint little villages here and there."
We see in this excerpt a wordcount of 45, a unique word count of 14, giving this a unique word proportion of .3111, pretty low, and we can hear how repetitive the song is. But what if we make a sort of anagram, re-arranging the order of the words.
“If you’re fond of sand dunes and salty air, Quaint little villages here and there.
Here of Quaint little salty air villages, and there if you’re fond of sand dunes.
There, if you’re here and fond of little salty sand dunes, and villages quaint."
The meaning of the song may deteriorate as the anagram is created, yet there continues to be a wordcount of 45, with a unique word count of 14. This keeps our unique word proportion at .3111, yet we can hear the repitition has significantly decreased. This makes repetition less easily tangible. It consists of repetition of word, phrase, and entire expressions.
Perhaps we should define repetition using a different metric.
In Progress…