Asset Details
MbrlCatalogueTitleDetail
Do you wish to reserve the book?
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
by
Danforth, Christopher M.
, Dodds, Peter Sheridan
, Pechenick, Eitan Adam
in
Culture
/ Datasets
/ Digitization
/ Divergence
/ Evolution
/ Humans
/ Information services
/ Information theory
/ Language
/ Libraries
/ Library collections
/ Linguistic analysis (Linguistics)
/ Linguistics
/ Linguistics - trends
/ Mathematics
/ Metadata
/ Reading
/ Science
/ Services
/ Social networks
/ Texts
2015
Hey, we have placed the reservation for you!
By the way, why not check out events that you can attend while you pick your title.
You are currently in the queue to collect this book. You will be notified once it is your turn to collect the book.
Oops! Something went wrong.
Looks like we were not able to place the reservation. Kindly try again later.
Are you sure you want to remove the book from the shelf?
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
by
Danforth, Christopher M.
, Dodds, Peter Sheridan
, Pechenick, Eitan Adam
in
Culture
/ Datasets
/ Digitization
/ Divergence
/ Evolution
/ Humans
/ Information services
/ Information theory
/ Language
/ Libraries
/ Library collections
/ Linguistic analysis (Linguistics)
/ Linguistics
/ Linguistics - trends
/ Mathematics
/ Metadata
/ Reading
/ Science
/ Services
/ Social networks
/ Texts
2015
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
Do you wish to request the book?
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
by
Danforth, Christopher M.
, Dodds, Peter Sheridan
, Pechenick, Eitan Adam
in
Culture
/ Datasets
/ Digitization
/ Divergence
/ Evolution
/ Humans
/ Information services
/ Information theory
/ Language
/ Libraries
/ Library collections
/ Linguistic analysis (Linguistics)
/ Linguistics
/ Linguistics - trends
/ Mathematics
/ Metadata
/ Reading
/ Science
/ Services
/ Social networks
/ Texts
2015
Please be aware that the book you have requested cannot be checked out. If you would like to checkout this book, you can reserve another copy
We have requested the book for you!
Your request is successful and it will be processed during the Library working hours. Please check the status of your request in My Requests.
Oops! Something went wrong.
Looks like we were not able to place your request. Kindly try again later.
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
Journal Article
Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution
2015
Request Book From Autostore
and Choose the Collection Method
Overview
It is tempting to treat frequency trends from the Google Books data sets as indicators of the \"true\" popularity of various words and phrases. Doing so allows us to draw quantitatively strong conclusions about the evolution of cultural perception of a given topic, such as time or gender. However, the Google Books corpus suffers from a number of limitations which make it an obscure mask of cultural popularity. A primary issue is that the corpus is in effect a library, containing one of each book. A single, prolific author is thereby able to noticeably insert new phrases into the Google Books lexicon, whether the author is widely read or not. With this understood, the Google Books corpus remains an important data set to be considered more lexicon-like than text-like. Here, we show that a distinct problematic feature arises from the inclusion of scientific texts, which have become an increasingly substantive portion of the corpus throughout the 1900 s. The result is a surge of phrases typical to academic articles but less common in general, such as references to time in the form of citations. We use information theoretic methods to highlight these dynamics by examining and comparing major contributions via a divergence measure of English data sets between decades in the period 1800-2000. We find that only the English Fiction data set from the second version of the corpus is not heavily affected by professional texts. Overall, our findings call into question the vast majority of existing claims drawn from the Google Books corpus, and point to the need to fully characterize the dynamics of the corpus before using these data sets to draw broad conclusions about cultural and linguistic evolution.
This website uses cookies to ensure you get the best experience on our website.