Category Archives: Research

These articles are about research related to language learning or text.

Different Ways of Beginning with Graded Reading

14 July, 2026Personal, Researchcomics, comprehensible input, education, extensive reading, graded readers, language, language acquisition, language learning, vocabulary, writingSandra Bogerd

There isn’t really just one way of starting out with reading in a language that you want to learn. Through my observations and reflections, I have found the following different approaches for absolute beginners and near beginners.

All new words are illustrated. This can be a book that is like an illustrated dictionary, such as some board books I have read. Alternatively, it can be like Le Français par la méthode nature, where new words are slowly added and the text slowly becomes more complicated.
Assume exact cognates (words with similar appearance and meaning) to give a starting pool of vocabulary, then introduce new words gradually. This is the approach in Gnomeville. Unlike the previous option, this one requires an assumption about the first language of the learner. But the benefit is a greater starting vocabulary, allowing for more interesting content. It only works well with related language pairs, like French and English.
Small vocabulary stories with much repetition. This is the TPRS/Wayside publishing approach, and it’s valuable for absorbing language thanks to the repetition. These also tend to use cognates, which mean there is an assumption about the first language of the learner.
Parallel texts. Some people are really keen on these. It can be like your own personal Rosetta Stone (the stone, not the app), which can be fun – especially where there is a different writing system. I think these are most useful where the text in the language being learnt isn’t too hard, so that the translation is just used to check things occasionally.
Reading a story you know well in your first language. This is a bit like using parallel texts. I think it is too difficult for starting out but some people like to intensively read (that is, slowly, with translation and lookups of unfamiliar words) a favourite book rather than fluently read something easier. The argument against the approach is that very little text is read per minute, reducing the opportunity to be exposed to more text.
Bootstrapping a text. This is an idea that I have explored a few times. The idea is to filter a book’s sentences based on difficulty, starting with the easiest sentences and gradually adding more complex sentences and vocabulary. Finally, you read the full book. My earlier experiments with this idea were not successful. My current version is “Bootstrapping the Three Musketeers“, which starts with mostly one-word sentences consisting of names, interjections and a few cognates, and slowly builds vocabulary. As more is added, the book snippets get a bit longer, being either multiple sentences or longer sentences. The bootstrap book is organised like a form of spaced repetition, where new words occur several times in the chapters. So far, the extracts are not long enough to allow people to deduce the meaning of words through context, so there are short (5-8 word) glossaries at the start of each chapter.
Another approach that has been used on occasion is to start a story in the person’s main language and slowly add more of the target language words, resulting in a mixture of both languages. I have seen this done to teach Chinese characters in a story written in English. I’ve also read a paper on the approach being applied to English-German. In a way it is not too different to the Gnomeville method of adding a new word periodically, in a cognate-rich text, except in Gnomeville you are reading the target language immediately. For more distant languages, such as Chinese and English, the approach is necessarily different.
Related to the previous idea, it might be possible to have text that is in the learner’s main language but structured according to the target language. This might only work where there is enough similarity or simplicity in the target language. For example, you could have sentences like “I it to him have given”, to provide the flow of the language but with fully understandable vocabulary. Then, as for the other graded readers that incrementally increase their difficulty, words could be switched from main language to target language.

Is there a best approach? The research emphasises that time spent reading and interest in what you are reading are the most important factors. As for level of difficulty, the optimal is considered to be 95-98% knowledge of the vocabulary in the text. But there is quite a spread for this across individuals. The main factor seems to be whether you are comfortable with the amount you don’t know, and are happy inferring meaning despite not knowing the definition of some words. It is a skill worth developing. I think we have it as children and lose it at some point, and then need to regain the skill for language learning. Certainly my recollection in childhood was of happily reading comics in Dutch without knowing the meaning of every word. Then a couple of decades later, being frustrated that there were so many words I didn’t know in Dutch children’s books. I’m now back to reading books without stressing about unknown words in Dutch, French and German. For other languages, I still need beginner material.

Gnomeville Comics are Easier than I Thought

18 April, 2026My Comic Books, My Publications, Research, Resourcescomics, comprehensible input, easy french, Easy French comics, education, extensive reading, French, French comics, French comprehensible input, French language, graded readers, language, language acquisition, language learning, readability, vocabulary, writingSandra Bogerd

On reviewing my readability measure results for various items in my collection, I suddenly thought, “hang on, how can the expected vocabulary size for Gnomeville Episode 1 be 25 when only 12 very frequent words are introduced?” Clearly something had gone wrong somewhere.

I blame the fact that part of my analysis is manual, and I probably didn’t follow the procedure very well. I run various scripts to produce a ranked list of words in the text in the frequency order of a large corpus of written French (mostly from Project Gutenberg). The manual bit is counting up cognates, or at least starting at the least frequent word end and counting up until I find 5% of the words that are not cognates or names. I think I went astray previously by having a less reliable process.

Results can differ depending on decisions that are made, such as whether to include titles (which I treat as sentences), the “Présentation” section that has brief notes about each character, and what is counted as a cognate. It is reasonably clear-cut for Gnomeville, but for other texts, it is less clear. Should “habiter” be considered a cognate due to its similarity to “inhabit”? And there are other words that are cognates in the linguistic sense but not particularly obvious from a learner perspective. The choice of general frequency list will also make a difference. Spoken text has different characteristics to written text, especially in French. Also, the very frequent words used for Episode 1 and 2 are the 20 most frequent in French newspapers, which is not the same set of words as any other corpus of text. The text I use for calculating expected vocabulary size has some of those words at lower ranks (“se” at 25, “au” at 31, and “on” at 40), which explains why there was the potential for the expected vocabulary size to be larger than the number of words introduced. But unless those words made up about 5% of the extract it was unlikely they would receive those scores.

Anyway, on revisiting my incorrect assessments of the Gnomeville episodes, I have the following updated vocabulary sizes.

Episode	Old Expected Vocab Size	New Expected Vocab Size	New Readability Score
1	25	3	2.20
2	16	14	3.23
3	40	17	3.83
4		15	3.66

You may notice that Episode 4 has a lower expected vocabulary size at 95% and a lower readability score than Episode 3. There’s not a lot in it, but Episode 3 had longer sentences in the extract.

Well, there you are. Gnomeville’s expected vocabulary size is much smaller than originally calculated – at least for Episodes 1 and 3.

The Book Flood Study

8 August, 2025Researchapplied linguistics, ESL, extensive reading, language acquisition, language learning, vocabularySandra Bogerd

In 1983, Elly and Mangubhai published their influential study that compared reading high interest stories to ordinary language instruction and found that there was considerable improvement in reading comprehension and other measures in the two reading-based groups compared to the language instruction group.

I’ve been reminded recently that the paper is behind a paywall, so I thought I would produce a few figures from it here and highlight some of the aspects of the study.

The study participants were primary school students in Fiji, who normally received instruction in their native Fijian for the first three years, switching to English in Class 4.

Here are the residual gains for each Class 4 group (300 students from 12 primary schools) and each type of assessment. The shared book group experienced the teacher reading aloud, sharing the story in an enlarged format, with students joining in to read easier sections, and doing story-related activities. The silent reading group read books of their own choice for 20-30 minutes a day. The control group did the normal curriculum (SPC/Tate audio-lingual program).

Another table showed that the gains a year later, continuing with the same reading activities, were even greater. The results were improved for exam marks in other subjects, including maths.

Comic Books versus Text-Only Books for Language Learning

1 March, 2025Research, Resources, Reviewsbook-review, books, comic books, comics, extensive reading, French, French language, graded readers, language, language acquisition, language learning, readability, reading, vocabulary, writingSandra Bogerd

Recently I have been reading a few comics in French, mainly by French-Canadian authors, or translated by them. The target audience for most of them is children and young adults. It had me thinking again about how best to grade comics in terms of difficulty.

My experience in attempting to read various Japanese books for children or learners showed me that it is possible to read a picture book that is really just an illustrated vocabulary without knowing any of the words beforehand. At the other extreme, it is theoretically possible to read everything in a parallel text, since the translation is right there to refer to, just very slow if every sentence needs to be analysed. That is known as “intensive reading”, which has been shown to be less useful than “extensive reading” for language acquisition. Complete glosses similarly make it possible to read a text without prior knowledge of the language, albeit with lots of interruptions to look things up.

Translations and glosses aside, a comic book will be easier than its text presented without illustration, since the illustrations provide clues to what is happening. It is also easier than text describing the same scenes provided by illustrations – a point that was made elsewhere in favour of learning language from comic books. In other words, “a picture paints a thousand words”.

In general, there is more dialogue and less descriptive text in comics, compared to novels, so the sentences are shorter on average. (This also applies to scripts of plays.) In addition, the pictures give clues as to what the text is about. A further benefit is that it often provides more examples of speech than would be found in a novel – or at least, as a proportion of the text read. This can be useful for absorbing speech patterns, particularly for people who are not exposed to much speech directly.

While the shorter average sentence length means that comic book text will generally be scored as easier than text from novels by readability measures, I think that a measure of difficulty of a comic may need to consider whether concrete nouns are illustrated when used. For example, a picture containing a wild boar with the text clearly indicating that it is “un sanglier” could be almost as easy as reading a French-English cognate, such as “village”. Or perhaps it is roughly equivalent to having a gloss entry, albeit introduced in the story instead of in a footnote.

Either way, comic books should be easier to read than books that have no illustrations. See my list of easy comic books in French for some that are a good starting point for beginners.

Phonics readers and the birth of Rod the Red Rat

2 October, 2022Red Rat Phonics, Researchearly reading, graded readers, phonics, satpinSandra Bogerd

I’ve been looking at and writing phonics readers lately. These are beginner reading books that either use a small vocabulary of letters and sounds, or are simply written, with a focus on a set of letters and sounds.

One of the popular systems out there is called Letter & Sounds, which has widespread use in the UK. It famously starts by introducing the letters s a t p i n, or satpin. The programme is quite comprehensive, adding different grapheme-phoneme pairs – such as “ch” pronounced as in “church” – one at a time over many months.

Systematic phonics has been shown to be highly effective in getting children reading, and that it is most effective when done right from the start, but the ideal system is not clear. One research group suggests that teaching a combination of decoding and recognising words by sight as two strategies for reading is superior than purely focusing on phonics.

The origins of satpin were recently written about by Cochrane and Brooks (2022), who found that it can be traced back to the 1960s and was selected based on a number of criteria including initially avoiding pairs of sounds and letters that might be confused, sticking to one sound (phoneme) for each letter/grapheme, and ensuring a large set of simple short words can be written from a small set of letters/grapheme-phonemes.

The original research that the set of six letters was derived from was based on American English pronunciation. There are separate analyses for British standard English, such as one by Gontijo et al.(2003). These would also be useful for Australian and New Zealand English.

Before discovering all this, my initial approach was to look at letter, bigram and word frequencies, in descending frequency order (as a percentage of text), see what words could be created, then attempt to create stories. Through this process Rod the Red Rat was born, which has inspired my series of phonics readers, the first of which should be published soon.

Personal Experiments in Extensive Reading

28 March, 2020Personal, Researchextensive reading, games, japanese, Japanese language, language, language acquisition, reading, stories, writingSandra Bogerd

As mentioned in my previous post about learning Japanese, I’ve been applying extensive reading principles to improving my Japanese language skills. At best, in Japanese, I can be described as a “false beginner”, as I don’t have the skills yet to pass the lowest level Japanese Language Proficiency Test. I have dabbled in learning the language whenever I have found a fun resource (Let’s Learn Japanese TV series, Kimono text books); completed a half-semester beginner course, which taught Hiragana and some introductory phrases; and visited Japan briefly three times, each trip providing immersive language learning experiences.

I don’t recall exactly when I started, but it was either from December or early January, I attempted to read something in Japanese for ten minutes per day. I keep a database of my language books, and used data in that to produce an approximate order of difficulty for attempting the books, and probably read through about 80 booklets, plus the chapters of Kimono level 1, and the starting chapters of Kimono level 2. I stopped about a week ago, as it was frustratingly difficult, and I had other things occupying my mind. For the moment I am filling the gap by playing Kana Quest, which is keeping my kana recognition alive.

So in a way, this has been an experiment in seeing how well extensive reading goes in a language where you are still at the beginning stages. In a couple of readability studies I’ve led, I’ve used a 5-point scale to indicate how easy something is to read:

Very easy, understood everything
Easy, a few words were not known, but it didn’t impact reading comprehension
Not easy, but it was possible to follow the story
Difficult, a dictionary is needed to make sense of it
Very difficult, a dictionary won’t help

Number 5 is rarely chosen by anyone, and it has been argued that maybe it doesn’t happen. However, if I were to attempt to read something in Chinese, a dictionary won’t help me, as I don’t know how to look up hanzu characters based on stroke count. Sure I might get there if I persist, but life is too short for that. So I guess Number 5 is more about lacking willpower when encountering very difficult text.

In terms of optimal language learning, ratings of 2-3 are ideal, since new language is being encountered, but the reading is not at the frustration level.

I “extensively read” in a few languages. In Dutch, my mother tongue (but not my best language, which is English), I can comfortably read novels written for children. I would need a dictionary for anything technical if it was important for me to understand the nuances of meaning, but I would still be able to “follow the story” for most texts I come across.

In French, a language I have studied both in courses and on my own, I am also at the stage where most texts would not be classed as difficult. For German, I read graded readers up to B1 level, giving me texts in the target zone. Thanks to my Dutch background and some study of the language, it would not take too much for me to be able to read texts for native German speakers and be able to “follow the story”.

Now we come to Japanese. The booklets I read went from rating 1 to rating 4. Naturally, the rating 4 ones led to frustration and reduced willpower to continue. My approach to reading them was to read straight through, and then afterward, allow myself to look up a word or two that seemed to be important for understanding, or that had occurred a few times in my reading. I will make the following observations of my experience.

My vocabulary definitely improved, but I think that stopping will lead to much of it being forgotten again
My ability to recognize kanji improved, and I think that this may actually last a bit longer in my memory than the pronunciation, particularly where there is some obvious logical connection between the ideogram and the meaning, eg. the words for “above” and “below”.
Illustrated books that are basically an illustrated vocabulary are very easy to read, even if you don’t know any of the words beforehand and forget most of them afterwards. However, it can be challenging to make them interesting.
Booklets with illustrations and repeated sentences can be easy. For example, where the concrete noun, such as a type of animal, is substituted into a template sentence, and the sentence has an illustration of the concrete noun on the page. Even if the sentence isn’t fully understood, the substituted noun will be.
Where it is clear from the illustrations what is being said, the meaning of the text can be deduced.
In languages that I can read reasonably well I don’t like to read stories that I already know, but I was grateful for the known stories presented in Japanese, to allow me to deduce what the text meant. While the text may have been just as difficult as other stories, the fact that the story was already known meant that the text was better understood and learnt from.
What also fascinated me was how focused and absorbed I was in the reading and sense-making task – much as I have observed young children when they are following a story in a book that is being read to them. It was as though my entire brain was switched on – until I got to that frustration point recently.
Attractive illustrations make the experience much more enjoyable.

A take home lesson for me is: once an alphabet is known, it is possible to read something, such as illustrated vocabulary books. My Japanese collection includes colour+object combinations, transport, trains, illustrated loanwords in katakana, cities and countries. The difficulty is perhaps learning the alphabet in an absence of known vocabulary. In our first languages we have clues from words we know, such as B for banana. For languages we haven’t learnt yet, authors would need to resort to other tricks such as loanwords, cognates, international words and names and place names.This is one of the tricks I use in my comic book series for French.

The Kimono series simplifies the reading a little by rendering all words that would be in katakana in romaji (our regular western latin alphabet), but spelt as it would be when rendered in katakana. I like this idea for slowly ramping up the difficulty. Books for Japanese native speakers also slowly ramp things up. Children’s books might be in hiragana, hiragana plus katakana with hiragana support, or kana and kanji with support, depending on the target age group.

One study I read about extensive reading and vocabulary acquisition examined what happened if you re-read stories. Each time you re-read you pick up more of the vocabulary. I think I will explore this next, and see how many re-readings of the easier books will make it easier for me to advance to the more difficult ones. Stay tuned for the answer!

Function word frustrations

9 September, 2019My Comic Books, Personal, Research, Reviews, Writingcomics, extensive reading, French, French language, language, language acquisition, language learning, movies, reading, research, stories, vocabulary, writingSandra Bogerd

I recently re-watched Dilili in Paris, which is a fabulous animation movie for children, with French dialogue that is slow enough for French language learners to follow. I originally watched the movie during the Melbourne French Film Festival and considered buying the movie later so I could try watching it without English subtitles.

Frustration 1: Memory

There is a frequently repeated phrase when Dilili meets new people: “Je suis heureuse de vous rencontrer”. It was semi-humorous, and certainly designed to be remembered, to teach how to be polite when meeting someone new. However, what I actually remembered after a week or two was: “Je suis heureuse __ vous rencontrer.” Despite being exposed to many occurrences, the function word was lost. Function words don’t provide semantic content and therefore appear to be harder to retain. There is certainly research evidence that concrete nouns are easier to remember than various other types of words. This movie brought that home to me in a big way.

Frustration 2: Resources

(Not really about function words…)

I bought the DVD of the movie, and then when viewing it, discovered that the subtitles could not be switch off, and that the only subtitles were in English. I don’t know who makes these decisions when preparing DVDs for sale, but perhaps they don’t really consider their audience carefully enough. A French movie sold in Australia would have various audience segments: French ex-pats – possibly including some French people who are hard of hearing, Australian francophiles, Australians learning French. To me, movies and TV episodes are highly useful for practising comprehension of the spoken language. Ideally it can be done at three levels of difficulty (with the example given for L2 referring to the language being learnt and L1 referring to the native language):

L2 audio with L1 subtitles,
L2 audio with L2 subtitles,
L2 audio without subtitles

I even do this with DVDs that were originally in English. I’ve watched two entire series of Perry Mason with French audio, which was quite illuminating. If you are short of practice material, check your DVD collection for audio in your target language. You may be pleasantly surprised to find a good selection amongst your favourite shows.

Frustration 3: Vocabulary Size

(Function words are frequent words…)

One of the excellent things about some graded readers was that they were designed for a specific vocabulary size. For me, vocabulary makes all the difference between a readable text and an unreadable one. CLE International used to publish books targeting a specific vocabulary size. For example, Niveau 1 had vocabularies of 400-700 words. Through extensive reading, I have successfully moved from 300-word vocabulary books to 700-1000 word ones, and I hope to continue to progress through further reading. However, as with other publishers, the publications have now been converted to CEFR levels: A1, A2 etc. and as far as I can tell, the subtleties of vocabulary size have been removed from the book information.

I have completed a CEFR B1 in French, yet I’m most comfortable reading A1 texts (and texts with less than 1000 word vocabularies) and with few exceptions they are not easy apart from the grammar, which is too easy for me, but the books are still sometimes challenging vocabulary-wise. What frustrates me is that A2 covers such a wide range of vocabularies, depending on the source material, from readable to incomprehensible. Published vocabulary sizes for A2, where they occur at all, vary from 400 to >1200 words. The level of frustration with some of these graded readers is the same as for texts written for native speakers. I oscillate between A1, A2, native texts and back again. The original memoirs of Céleste de Chabrillan are as easy and more exciting than many A2 texts.

CEFR is designed, as far as I can tell, to describe a person’s practical skill in a language, and for that it is useful. However, the jumps between levels are quite large, so that the defined levels are not very useful for the learner themselves. Some publishers solve this by dividing up levels. ELI uses A0, A1, A1.1. The Danish Teen Readers/Easy Readers also divide up the levels, and still appear to quote target vocabulary sizes. Indie publishers tend to ignore vocabulary size in their writing. However, writers and publishers should remember that:

Extensive reading is at its best if learners are reading at a comfortable level while not being familiar with all vocabulary. Ideally learners should know 98% of the words in text they are reading.
Readability of text largely consists of grammar and vocabulary components.
The more readable AND interesting reading material is, the more learners will read, the better their vocabularies will become, and the better their skill in a language will be.
Publishing vocabulary levels required for 95-98% coverage of the text will assist learners in finding materials of the right level for them at any point. Vocabulary levels should be (loosely) based on general word frequency.

This is why I write my comic books for language learners. This is why I research extensive reading, readability and language acquisition.

Readability Zones

19 January, 2019My Comic Books, Research, Resources, Reviewsextensive reading, French, French easy reader, French graded reader, French language, French reader, graded readers, language, language acquisition, language learning, languages, readability, reading, stories, vocabulary, word frequenciesSandra Bogerd

I’ve just been updating my database of French readers and observing the types of books or stories in the different ranges of my current preferred readability measure.

Scores under 4 are ridiculously easy for people with an English speaking background. Currently this consists only of episodes 1 and 2 of my Gnomeville comics. Sentences are short and vocabulary is highly constrained, exploiting French-English cognates.

Scores in the 4-4.99 range are very easy: Bonjour Luc, A First French Reader by Whitmarsh, and Histoires pour les grands. They tend to be conversation-based.

Scores in 5-5.99 tend to be the short illustrated graded readers such as Bibliobus, as well as La Spiga’s Zazar for grands débutants (target vocabulary of 150). Gnomeville Episode 3 sits here due to having longer sentences compared to the first two episodes.

Scores in 6-6.99 tend to have longer sentences, including some classic graded readers such as Si nous lisions and Contes Dramatiques, as well as the 300 word vocabulary Teen Reader Catastrophe au Camping des Roses.

Scores 7-7.99 also have the more text-like graded readers, including Sept-d’un-Coup by Otto Bond, which tends to have long sentences but well-controlled vocabulary.

In the 8-8.99 range I find the first story for native speaking children, as well as more graded readers, including one with a target vocabulary of 1000 words.

The first books for adult native speakers occur with scores between 10 and 12.

Looking at the stories in the list, my own level seems to be from 7 to 10, suggesting I should continue reading more challenging graded readers in addition to stories written for French children. That is pretty much what I have been doing for a while, as well as incidental reading on the web and elsewhere.

A quick look at the relationship between stated vocabulary sizes and the 95 percentile that I have been using indicates that the required vocabulary is roughly 1.5x + 2600. However, I am using a token-based vocabulary whereas most would use a word family one. If I assume token vocabulary sizes are 5 times word family sizes, then the equivalence point for this model is when the vocabulary is about 770, meaning that the vocabulary load will be excessive for stated vocabulary sizes less than 770 but be ok for sizes greater than 770. That’s reasonably reassuring. Mind you this is an extremely rough estimate.

This work was based on about 100 words from the start of the text of 40 stories, but it does seem to sort things fairly usefully. The outlier based on my experience of reading the stories is Aventure en Normandie, with a score of 9.49. I don’t recall it being a difficult read.

Meanwhile I am making more progress on Episode 3 of my comic book. I decided to divide one page into three pages, as it had a lot of text and too many new language concepts for a single page. So Episode 3 will probably be 32 pages long, breaking the standard Gnomeville pattern of 28 page episodes. Hopefully it will be ready within a month.

Ford & Hicks’s Elementary New French Reader – a review

2 January, 2019Research, Reviews, Writingbilingualism, extensive reading, French, French language, graded readers, language acquisition, language learning, readability, writingSandra Bogerd

Ford & Hicks published this reader back in 1939, with the intention of making an easier graded reader than their other book, “A New French Reader”, by using present tense to start with, and introducing the other verb tenses in later stories.

The stories are mostly interesting, with a couple sourced from books I have not encountered before (“Deux jeunes aviateurs” and “Le secret du château”). It also includes Cosette, which had me sighing “not again”, since every publisher does stories from Les Misérables. However, Ford & Hicks may have been one of the first to make a simplified reader from it, so I shouldn’t grumble. What I will grumble about is the extract from Comte de Monte Cristo. You may recall in my review of Otto Bond’s series, that the escape episode of this was a highlight, and made me want to read more.

In the Ford & Hicks version, we get the initial backstory in English before reading the scene that led to the unjust incarceration of Dantès. The story then includes more English breaks between sections of French, and given the Otto Bond version, I don’t see the purpose of these interruptions to immersive reading. The Ford & Hicks version covers more of the story, and at the end summarises the remaining plot. Then they end the summary with the statement “The interest of the story weakens after the discovery of the treasure”. Unlike Bond’s version, which had me wanting to read on, this version annoyed me with the English interruptions and further annoyed me by taking away my interest in reading more of the story.

Recent trends in language acquisition research have focused on “translanguaging”, which seems to be what I’ve done most of my life, which is mixing languages together in order to keep communicating fluently with my level of knowledge (of Dutch). There seem to be benefits to doing so, but I’m starting to think it might not be so good for reading. This is the opposite of my previous thoughts on the topic, where I supervised the building of prototype bilingual ereaders that present foreign language stories, having the most difficult parts presented in the native language of the person reading it.

What may be more beneficial is reading foreign language text that resembles the native language of the speaker. That is, using cognates, simple non-idiomatic forms of expression and sentence structures that are not too unusual. This is what happens with stories in French written for native English speakers such as my Gnomeville series (Episode 1 and Episode 2 currently available), and the books by Otto Bond, and Ford & Hicks. It also happens to a large extent in stories by non-native writers. For example, I’ve seen some English graded readers written by a Chinese author, where the English is very Chinese in style, so would not be classed as “good English”. However, as long as it is known to not be “English” English, it is probably helpful to start with for people with a Chinese-speaking background. Then there should be a transition to more English-like English at a later stage. I’m somewhat more forgiving of this idea currently than I was previously.

Are some graded readers not worth reading?

25 September, 2018Research, Reviewsextensive reading, French, graded readers, language acquisition, language learning, languagesSandra Bogerd

Something I have been pondering lately is the enormous vocabulary load that occurs in some graded readers that are intended for beginners. The grammar is simple but the vocabulary load is huge. Sure, things are often glossed, which speeds up the process of finding out the meaning of words, but it still prevents fluent reading.

When I first found booklets from the Bibliobus series published by Mary Glasgow, I thought they were wonderful. I only had 3 of them, at levels 6 and 8. I also loved Le Chapeau Rouge released by the same publisher. I’ve since collected more Bibliobus stories, and also acquired a collection of Lire Davantage booklets. What is clear to me, and I have been reminded of it by a friend who has been reading them lately, is that there is not much text but a lot of vocabulary load. The stories do vary a little in terms of quality and difficulty within the published levels, so some are probably of greater value than others.

In theory, these high vocabulary load stories provide language exposure that will increase a learner’s skill, since there are things that are unknown. Provided the learner can read them quickly they have some value for extensive reading. However, if there is a choice between another story with lower vocabulary load, more text and a smoother gain in vocabulary, then that would be better. It’s all down to availability. However, ultimately what matters is whether the story appeals to you enough that you are keen to read it. If not, it is best to find something else to read. As long as you are reading at least 10 minutes per day at a level that allows you to fluently read, follow the story, but not already know all the language that you encounter then you will improve your language knowledge.

Here are a couple more books/stories I’ve analysed for general vocabulary size at the ~95% cut-off, based on the first 100 words. Note that just using this figure in isolation is a bit misleading, because books like the one by Ford and Hicks use a lot of repetition and a relatively small vocabulary overall, making it possible to learn relatively easily. It just isn’t necessarily all highly frequent vocabulary. This is where vocabulary density is also a useful guide, so I’ve included this figure as well.
If the 95% general vocabulary size is high and the vocabulary density is low, it means that you may be able to read comfortably, learning the vocabulary of the given text, but its relative usefulness will depend on whether it matches the vocabulary that you need for your language goals.

Title	Author	Publisher/Series	Gen Vocab Size at 95%	Vocab Density at (n) words
Reading approach to French	Ford and Hicks	J.M. Dent and Sons (Canada) Ltd.	12,059	0.39 (109)
Le Visiteur	Sue Finnie	Mary Glasgow/Bibliobus	11,260	0.50 (121)

This follows on from my previous table of figures. When using general vocabulary rank frequency lists it certainly seems normal for graded readers to effectively use a very wide vocabulary, leading to an expected general (raw) vocabulary size of 4,000 to 12,000. To do something considerably less requires careful vocabulary control, such as occurs in my Gnomeville series, which achieves this through exclusively using French-English cognates and the most frequently occurring words. Initially it may seem a little artificial, but becomes more natural and flowing as the stories progress. A similar approach is used in Si Nous Lisions, in that a very small vocabulary is used initially, and then a new word is added every 90 or so words. While I came up with the idea independently, the concept of vocabulary control is attributed to Michael West.

At some point I’ll publish a comprehensive list of graded readers with these statistics, but I’ll first need to automate the process a bit more and get rid of a few bugs. Meanwhile, let’s keep reading at least 10 minutes a day of easy but not too easy text in the languages we want to learn.