We will use the Pumping Lemma, in order to understand whether it is possible to describe the prime numbers using a Regular Grammar.

Problem Statement

For simplicity let's consider the unary numeral system (number N is represented by word XN, which consists of N consecutive symbols '1').
Also, let's use the following notation: |Xi| in order to express the amount of '1' symbols inside the word Xi.

So, having the infinite set of words, which corresponds to prime numbers:

X1 = 11
X2 = 111
X3 = 11111
X4 = 1111111
X5 = 11111111111
...

where |Xi| is a prime number, for every i,

we would like to understand, whether it is possible to express all words from this language using a regular grammar?

In other words, is it possible construct the finite state machine, which will be able to recognize all prime numbers (and only the prime numbers), expressed in unary numeral system?

Pumping Lemma

The Pumping Lemma states, that every sufficiently long word of a regular language can be infinitely many times expanded in a special way, such that every new word (obtained after expansion of the original word) will belong to this language as well.

Formally saying, for every infinite regular language L: every word Xi (such, that length of the word is greater than some threshold) can be splitted into 3 consecutive parts: X, Y and Z, and every new word (e.g. XYYZ, XYYYZ, ..., XYnZ) will belong to the language L as well.

The lemma can be understood in such way, that the reguar expression for recognizing of any infinite regular language must contain a Kleene operator.

For example, lets's consider a regular language L, which corresponds to the following regular expression: ( 1 | 11 | 111 | 111(1)*1 )

Let's select some sufficiently long word of this language:
Wi = 111111

One of the possible ways to split Wi is:
Wi = XYZ = 111 _ 11 _ 1
(underscore characters are not a part of the alphabet of the language L, but I am using these characters just to display the separation between X, Y and Z)

So, the parts of the word Wi are:
X = 111
Y = 11
Z = 1

We can infinitely many times replicate the part Y ("pump" the word Wi), and all generated words: XY2Z, XY3Z, ..., XYnZ will belong to the langauge L as well:

XYZ = 111111
XY2Z = 11111111
XY3Z = 1111111111

The Pumping Lemma is only a necessary condition (and not a sufficient condition) for the language being a regular.

However, this lemma is a very convenient tool for proving, that the language is not regular: proving that every sufficiently long word of the language can't be "pumped" - implies that the language is not regular.

Regular grammar for prime numbers?

The unary representation Wp of every prime number p (such that |Wp| = p) can be splited into 3 parts: X, Y and Z.

Which, in general case, implies that:
|Wp| = |XYZ| = |X| + |Y| + |Z|

XYnZ means, that we just should replicate n times the Y part of the word Wp.

So, the amount of symbols '1' inside the generated word can be represented as:
|XYnZ| = |X| + n×|Y| + |Z| = (|X| + |Y| + |Z|) + (n - 1)×|Y| = |Wp| + (n - 1)×|Y|

We can choose the n = |Wp| + 1
Which implies, that:
|XYnZ| = |Wp| + (n - 1)×|Y| = |Wp| + |Wp|×|Y| = (|Y| + 1)×|Wp|
Hence, XYnZ is a composite number.

Hence, for any arbitrary splitting of Wp (any arbitrary prime number), we can choose such n, that XYnZ will be a composite number.

Our implication contradicts to the Pumping Lemma, which leads to the conclusion, that the language, which consists of the prime numbers can’t be expressed using a Regular Grammar.