Vocabulary Analysis of the Gutenberg Collection


Vocabulary Analysis of the Gutenberg Collection

I found this page when looking for things on vocabulary density – something of relevance for reading books designed for language learners.  The guy who wrote it is also interesting, in that he has a non-traditional career path into academia.

He shows his analysis is of ~2000 Gutenberg texts based on vocabulary – the kind of thing I like to muck around with.  It is “unpublished” work, so lacks a few things, like references and axis labels, making it less useful than it otherwise might be.