If you’ve ever wondered how best-selling authors went from Elinor Dashwood to Bella Swan, maths — believe it or not — can help.
Tyler Vigen, the statistician who brought us hilarious spurious correlations, did some work with words, too. He analysed several of the most popular novels from the early 1800s to today, focusing on elements like sentence length and punctuation.
“I chose these books because they were seven of the all-time best-selling novels (which sold more than 50 million copies) that were written in English … that were spaced in time periods,” Vigen told Business Insider via email.
Now, these books might not represent their respective time periods, but the data provides interesting insight nonetheless. Some of the charts exhibit clear trends, while others seem more random.
Sentence length appears to be declining over time. It’s also important to note the number of words and sentences in each book, shown in the chart below.
“Sense and Sensibility” and “Twilight” contain about the same number of words — 119,000. But Jane Austen wrote roughly half as many sentences as Stephanie Meyer — 5,179 compared to 12,386, respectively.
While this trend could relate to later authors, in comparison, writing for young audiences, it’s fair to say that Victorian England possessed a greater appreciation for paragraph-long sentences than we do today.
Here, “unique” means “different from every previously encountered word in the book,” according to Vigen.
“Twilight” included greater word variation than many other novels, including Austen’s work. But the most interesting relationship appears when you consider words used per unique word, shown in the chart below.
The lower the number in the last column, the more extensive the author’s vocabulary. In this case, Mark Twain takes the top spot. For every nine words he wrote, one of them had never appeared in the “The Adventures of Tom Sawyer Before” before.
Adjective frequency appears to be increasing over time.
The decline of the semicolon in writing is a clear trend.
“Sense and Sensibility” included 1,572 semicolons — one every 3.3 sentences on average. In a book of about the same number of words, Meyer used only 224 (one every 55.3 sentences).
Commas and other punctuation marks present less of a trend.