Solving Cryptograms: Letter Frequency Analysis

Cryptograms are excellent puzzles to solve. Solving cryptograms can also help you to improve your spelling and improve your vocabulary. You are presented with a seemingly random collection of letters, and your job is to crack the cipher and discover the message! But this is easier said than done ... where do you start? Read on to find out!


Tip: If you are serious about improving your spelling, then Troy and I highly recommend you try the popular spelling-improvement program, Ultimate Spelling. Click Ultimate Spelling for further information.*


Cryptograms are what's known as substitution ciphers. This is a cipher where one letter is substituted for another letter, number, or symbol. For example, A may be encrypted as S, and B may be encrypted as F, C = E, D = L, and so on. Sometimes these can be in a set and predictable pattern (eg A=Z, B=Y, C=X, D=W, or A=1, B=2, C=3, etc), or it can be random. Another rule of cryptograms is that no letter is encrypted as itself, so you'll never see F=F, for example. Most puzzle books of cryptograms use random substitution ciphers.

(NB: a code is not a cipher. A code is a whole new language, where one word is encrypted as another, so the word "spy" might be encrypted as "purple". Codes are very difficult to crack; you generally have to have stolen the enemy's code book to solve them. The very necessity for and existence of such code books is their weakness, though.)

You will have greater success at solving substitution cryptograms when you have a better understanding of English and a little-known field called letter frequency analysis.

Letter frequency analysis is the study of (you guessed it) the frequency with which various letters appear in a language. This information is hugely valuable when it comes to cracking ciphers, and has been used by military intelligence for hundreds of years to crack substitution ciphers. Each language has different letter frequencies. Because this is English Language Skills, after all, we will just be looking at English here.

So, let's learn a little about letter frequencies in English.


When passages of English are analysed, it is found that E is the most common letter.

The "top five letters" are E T A O and N.

The exact sequence for letter frequency in English varies depending on which texts are analysed. One commonly used sequence is : ETAON RISHD LFCMU GYPWB VKXJQZ.

This tells you that the most commonly seen letters in a piece of English will probably be E, T, A, O, and N, and the least commonly seen letters will probably be J, Q, and Z.


Now we can move beyond single letter frequency to short letter patterns. This information will also help in cracking cryptograms.

The top five most commonly seen letters that start words in English are, in order: T, A, I, S, and O. This doesn't mean that there are more words starting with "T" in the dictionary, but that in a given piece of writing, more words will start with "T" than any other letter. This is from the use of words such as THE, THIS, THAN, THEM, THESE, THAT, THERE, THEY, and so on.

The top five most commonly seen letters that end words in English are, in order: E, S, T, N, and D. So, on average, most words in English end with an E.

Q is (almost) always followed by a U.

The consonant that most frequently follows a vowel (A, E, I, O or U) is N: AN, EN, IN, ON, UN.

The most common three-letter word in English is THE.

The most common four-letter word in English is THAT.

One-letter words are almost always A or I.

Two-letter words almost always consist of one vowel + one consonant.

The top five commonly-seen double letters in English are, in order: LL, EE, SS, OO, and TT.


Pattern words are very helpful in cracking ciphers. They are just words with repeated letters in them. One of the most useful pattern words to remember is THAT. If you see a cipher with the letter pattern 1231 (ie, a four letter word where the first and last letters are the same), it very probably is the word THAT (the most common four-letter word in English). Be careful, though, as it could also be AREA, EASE, ELSE, HIGH, NEON, SAYS, TROT, or a bunch of other (less common) words!

Another useful pattern word to remember is 12-1-2, which can be PEOPLE (most common), ASIANS, INDIAN, or PROPER.

There is a heap more detail I could go into here (and there are whole technical books on letter frequency analysis and even pattern letter dictionaries), but this is generally enough information to get a cryptogram out.

The next article in this series takes you step-by-step through solving a cryptogram.


*Troy and I recommend only products that we have tried and tested. These include Ultimate Spelling. We have agreed to receive a commission from some sales of Ultimate Spelling software because we are happy to endorse that software.

 

Last modified on Friday, 27 November 2015 23:45
English Language Skills (Denise)

English Language Skills (Denise)

I'm a syndicated puzzle writer, with 8 puzzle books to my name, including Word Searches for Dummies and Cracking Codes and Cryptograms for Dummies (with Mark Koltko-Rivera). I have a background in science and graphic design, and am a trained indexer. My favourite puzzles are cryptic crosswords. and my favourite books are murder mysteries and cookbooks. I am also a very keen knitter.

I write a blog all about puzzles, called Puzzling.

Website: sutherland-studios.com.au E-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it
You are here: Home
BLOG COMMENTS POWERED BY DISQUS