Category Archives: My Publications

Gnomeville Comics are Easier than I Thought

On reviewing my readability measure results for various items in my collection, I suddenly thought, “hang on, how can the expected vocabulary size for Gnomeville Episode 1 be 25 when only 12 very frequent words are introduced?” Clearly something had gone wrong somewhere.

I blame the fact that part of my analysis is manual, and I probably didn’t follow the procedure very well. I run various scripts to produce a ranked list of words in the text in the frequency order of a large corpus of written French (mostly from Project Gutenberg). The manual bit is counting up cognates, or at least starting at the least frequent word end and counting up until I find 5% of the words that are not cognates or names. I think I went astray previously by having a less reliable process.

Results can differ depending on decisions that are made, such as whether to include titles (which I treat as sentences), the “Présentation” section that has brief notes about each character, and what is counted as a cognate. It is reasonably clear-cut for Gnomeville, but for other texts, it is less clear. Should “habiter” be considered a cognate due to its similarity to “inhabit”? And there are other words that are cognates in the linguistic sense but not particularly obvious from a learner perspective. The choice of general frequency list will also make a difference. Spoken text has different characteristics to written text, especially in French. Also, the very frequent words used for Episode 1 and 2 are the 20 most frequent in French newspapers, which is not the same set of words as any other corpus of text. The text I use for calculating expected vocabulary size has some of those words at lower ranks (“se” at 25, “au” at 31, and “on” at 40), which explains why there was the potential for the expected vocabulary size to be larger than the number of words introduced. But unless those words made up about 5% of the extract it was unlikely they would receive those scores.

Anyway, on revisiting my incorrect assessments of the Gnomeville episodes, I have the following updated vocabulary sizes.

EpisodeOld Expected Vocab SizeNew Expected Vocab SizeNew Readability Score
12532.20
216143.23
340173.83
4153.66

You may notice that Episode 4 has a lower expected vocabulary size at 95% and a lower readability score than Episode 3. There’s not a lot in it, but Episode 3 had longer sentences in the extract.

Well, there you are. Gnomeville’s expected vocabulary size is much smaller than originally calculated – at least for Episodes 1 and 3.

Gnomeville Episode 4 Soon to be Released!

Slowly (6 years!) but surely, my next comic for learners of French has been completed! I am holding a launch party for it on Sunday, where attendees will hear the Gnomeville songs performed, and have the opportunity to buy the comics at greatly reduced prices. Then, the physical comics will appear in the Square store, and not too much later, I intend to publish the ebook “wide”, as they call it, meaning it will be available from Kobo, Apple, and other ebook platforms. I intend to make Episodes 1 to 3 available in a bundle format for the platforms that haven’t had the comics before. So, more work to do. But first, we have the launch on Sunday!

Book cover with musketeer holding a boot, saying "Diable !"

Bootstrapping the Three Musketeers

Those who have visited my blog this year will know that I have put up some “filtered French”, such as a list of the most common one-word sentences in French classic literature, and sentences that fit the highly constrained vocabulary of my comic books. After musing on language acquisition, in particular how babies learn, not to mention our experience of picking up a few words and phrases in a foreign language by ear, I thought I’d try a different approach. This has resulted in producing a book (with more volumes to come) where I filter Les trois mousquetaires, and add vocabulary one word at a time based on which word will complete the most sentences. Using a combination of manual and automatic filtering, I have created extracts that have sufficient repetition in their vocabulary for people to become familiar with the words.

It has been fascinating to see what happens as I add each new word. The algorithm tends to find dialogue first, gradually increasing in average sentence length, then short non-dialogue sentences – after the 93rd word of vocabulary was added.

Anyway, if you’d like to have a look, it’s on Amazon, with a substantial preview.

(Affiliate links in this post.)