A bigram or digram is a sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. A bigram is an n-gram for n=2.
Gappy bigrams or skipping bigrams are word pairs which allow gaps (perhaps avoiding connecting words, or allowing some simulation of dependencies, as in a dependency grammar).
Some activities in logology or recreational linguistics involve bigrams. These include attempts to find English words beginning with every possible bigram,[2] or words containing a string of repeated bigrams, such as logogogue.[3]
Bigram frequency in the English language
The frequency of the most common letter bigrams in a large English corpus is:[4]
th 3.56% of 1.17% io 0.83%
he 3.07% ed 1.17% le 0.83%
in 2.43% is 1.13% ve 0.83%
er 2.05% it 1.12% co 0.79%
an 1.99% al 1.09% me 0.79%
re 1.85% ar 1.07% de 0.76%
on 1.76% st 1.05% hi 0.76%
at 1.49% to 1.05% ri 0.73%
en 1.45% nt 1.04% ro 0.73%
nd 1.35% ng 0.95% ic 0.70%
ti 1.34% se 0.93% ne 0.69%
es 1.34% ha 0.93% ea 0.69%
or 1.28% as 0.87% ra 0.69%
te 1.20% ou 0.87% ce 0.65%