
The Distribution of Letters in Crossword Puzzles

In the overall corpus of English, the most frequently seen letters are


in that order.

The prevalence of "E" and "T" is probably due to the commonness of the word "the". The letters "A", "AN", "O", and "I" are common because the words "a", "of", "on", "in", and "I" are common, and so forth.

But in crossword puzzles, there is no requirement for articles or prepositions; indeed, answers don't even need to be words! So the distribution of letters changes dramatically. To further confuse the distribution, some answers in the puzzle must be formed entirely out of "ending letters" of other words while other answers are formed entirely from "starting letters."

Thus, for your edification we provide the following information about the distribution of letters in crossword puzzles.
The overall distribution of letters looks like this (numbers are percentages, so "E" represents 14.26% of all letters in a crossword puzzle on average):

(percent, letter)

14.26 E
11.65 A
 8.95 S
 7.61 R
 7.27 T
 7.26 O
 6.23 N
 6.16 I
 5.34 L
 3.69 D
 2.76 C
 2.73 M
 2.63 P
 2.19 H
 2.12 U
 1.85 G
 1.72 B
 1.27 Y
 0.95 W
 0.94 F
 0.88 K
 0.82 V
 0.23 X
 0.21 Z
 0.18 J
 0.08 Q

The distribution of letters that form the first letter of some answer is:

(percent, letter)

13.44 A
11.81 S
 9.00 E
 6.80 T
 6.08 R
 5.60 O
 4.66 L
 4.65 C
 4.60 P
 4.53 I
 4.03 D
 4.00 M
 3.61 B
 3.44 N
 2.93 H
 2.41 G
 1.90 F
 1.65 U
 1.43 W
 0.89 K
 0.75 V
 0.72 Y
 0.55 J
 0.24 Z
 0.16 Q
 0.10 X

Finally, the distribution of letters that form the last letter of some answer is:

(percent, letter)

18.15 E
17.03 S
 8.59 T
 7.83 A
 7.22 R
 6.84 N
 5.46 D
 5.11 O
 4.08 L
 3.16 Y
 2.64 I
 2.19 P
 2.00 M
 1.71 H
 1.57 G
 1.35 C
 1.11 K
 0.90 W
 0.89 B
 0.80 U
 0.59 F
 0.36 X
 0.19 V
 0.17 Z
 0.04 J
 0.04 Q

Are there any surprises (where the distribution differs significantly from the standard "ETAOIN" distribution)? Perhaps the overall commonness of "C" in crossword puzzles; or the commonness of "C" and "P" at the start of answers; or the commonness of "D", "Y", and "P" at the end of answers; or the diminished role of "U" and "H".

