Readability Zones

I’ve just been updating my database of French readers and observing the types of books or stories in the different ranges of my current preferred readability measure.

Scores under 4 are ridiculously easy for people with an English speaking background. Currently this consists only of episodes 1 and 2 of my Gnomeville comics. Sentences are short and vocabulary is highly constrained, exploiting French-English cognates.

Scores in the 4-4.99 range are very easy: Bonjour Luc, A First French Reader by Whitmarsh, and Histoires pour les grands. They tend to be conversation-based.

Scores in 5-5.99 tend to be the short illustrated graded readers such as Bibliobus, as well as La Spiga’s Zazar for grands débutants (target vocabulary of 150). Gnomeville Episode 3 sits here due to having longer sentences compared to the first two episodes.

Scores in 6-6.99 tend to have longer sentences, including some classic graded readers such as Si nous lisions and Contes Dramatiques, as well as the 300 word vocabulary Teen Reader Catastrophe au Camping des Roses.

Scores 7-7.99 also have the more text-like graded readers, including Sept-d’un-Coup by Otto Bond, which tends to have long sentences but well-controlled vocabulary.

In the 8-8.99 range I find the first story for native speaking children, as well as more graded readers, including one with a target vocabulary of 1000 words.

The first books for adult native speakers occur with scores between 10 and 12.

Looking at the stories in the list, my own level seems to be from 7 to 10, suggesting I should continue reading more challenging graded readers in addition to stories written for French children. That is pretty much what I have been doing for a while, as well as incidental reading on the web and elsewhere.

A quick look at the relationship between stated vocabulary sizes and the 95 percentile that I have been using indicates that the required vocabulary is  roughly 1.5x  + 2600. However, I am using a token-based vocabulary whereas most would use a word family one. If I assume token vocabulary sizes are 5 times word family sizes, then the equivalence point for this model is when the vocabulary is about 770, meaning that the vocabulary load will be excessive for stated vocabulary sizes less than 770 but be ok for sizes greater than 770. That’s reasonably reassuring. Mind you this is an extremely rough estimate.

This work was based on about 100 words from the start of the text of 40 stories, but it does seem to sort things fairly usefully. The outlier based on my experience of reading the stories is Aventure en Normandie, with a score of 9.49. I don’t recall it being a difficult read.

Meanwhile I am making more progress on Episode 3 of my comic book. I decided to divide one page into three pages, as it had a lot of text and too many new language concepts for a single page. So Episode 3 will probably be 32 pages long, breaking the standard Gnomeville pattern of 28 page episodes. Hopefully it will be ready within a month.

Advertisements

One thought on “Readability Zones

  1. Further information on this. Looking at the Lexique French lexicon, which has both types and lemmas, shows that there are about twice as many types as lemmas. Word families tend to be larger.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s