Tag Archives: computational linguistics

Common One-Word Sentences in French, Revisited

I’ve been playing with my scripts lately, filtering French. Previously I have published a list of the top one-word sentences in a corpus of French classic texts, as well as my lists of very easy extracts, based on the language repertoire covered by my Gnomeville comics. Today, while waiting for my very inefficient scripts to finish processing my old download of the French texts from Project Gutenberg, I revisited the frequent one-word sentences. I decided to keep the exclamation marks and question marks this time, so it is clear whether something is being used as a question or not. Here is what is coming up so far…

  1. Ah !
  2. Oh !
  3. Eh !
  4. Hélas !
  5. Oui.
  6. Non !
  7. Non.
  8. Oui !
  9. Comment !
  10. Quoi !
  11. Bah !
  12. <name>. (most likely names of characters in a play, the first one being Bonaparte.)
  13. Pourquoi ?
  14. Bon !
  15. etc. (probably an artifact of how things were processed)
  16. Allons !
  17. Ha !
  18. Tiens !
  19. Hé !
  20. Moi.

There’s quite a bit in common with the previous list of one-word sentences. The exclamations that showed in the previous list (Diable ! Parbleu !) still occur in the top 30, so there isn’t a lot of change despite the much larger corpus. I suspect further changes to be quite minor as the processed corpus grows.

Beginner French Resources

tldr: Easy French sentences from classics here.

Years ago I was tinkering with creating my beginner comic book in French, and then researching what made things easy to read in French for those with English speaking background. I learnt that the two main aspects that characterise text difficulty are grammar and vocabulary, with other aspects usually having a much smaller role to play. Through my own research, inspired by my own frustration and anecdotal experience, I learnt that for French the typical readability measures that use word length or even how common a word is for vocabulary difficulty just don’t work for people with English speaking backgrounds. This is because so many of the longer “difficult” words in French are identical to those in English, or close enough not to matter. My experiment demonstrated that you may as well just use sentence length to decide on difficulty, being the simplest measure of grammatical complexity. Despite this, vocabulary matters. It’s just that the words that are difficult are differently distributed than for languages that don’t have this peculiar French-English relationship.

In another of my experiments, I tried to filter a large collection of French text to find extracts that are easy for English speakers. While the extracts that are very easy are not long, they do exist. It’s a matter of playing around with the constraints to get something sizeable. It should also be noted that the text I used consists of French classics, which can be challenging to read. Anyway, it’s been a while since I looked at this. The other day I created a page on this site that contains all the sentences and extracts I found that restrict themselves to the vocabulary and grammar of Episode 1 of my comic book, (le, la, les, de, du, des, et, est, se, que, and present tense third person singular of -er verbs) plus cognates and names. I hope it is useful. More to come.