Section 3

Section 3 - Frequency of Primes

In this section, I will present more specific results concerning the frequency of occurrence of primes, in terms of formulae that provide infinite supplies of primes and formulae that provide a high density of primes. I will also consider the counting of primes of various forms and investigate prime gaps.

We have already established the infinitude of primes. Since all but the first prime are odd, it is obvious that there is an infinity of primes of the form 2n+1 (or 2n-1) since all odd numbers can be represented in this form. What if we restrict consideration further to numbers of the form 4n+1, 4n+3, 8n+1, etc. ? In the final analysis, it was proved by Dirichlet that for any positive integers a and b such that gcd (a, b) = 1, there is an infinity of primes of the form an+b. However, the proof of this involves mathematics too advanced for inclusion here. Nevertheless, simpler maths can still be used to provide short proofs for particular forms, as we shall see. First, I need to develop some additional theory.

Definition : Let p be an odd prime and gcd(a, p) = 1. If the congruence x² º a (mod p) has a solution, then a is a quadratic residue of p, and if not then a is a quadratic non-residue of p.

This concept arises naturally from attempts to find solutions of quadratic congruences of the more general form ax² + bx + c (mod p).

Lemma 3.1 (Euler's Criterion). Let p be an odd prime and gcd(a, p) = 1. Then a is a quadratic residue of p if and only if a^(p^-1)/2 º 1 (mod p).

Proof : Suppose a is a quadratic residue. Then x² º a (mod p) for some x. Since gcd(a, p) = 1, we have gcd(x, p) = 1, so by Fermat's Little Theorem, a^(p^-1)/2 º x^p^-1 º 1 (mod p), as required. Now suppose that a^(p^-1)/2 º 1 (mod p), and let r be a primitive root of p. Then a º r^k (mod p) for some k, so r^k(p^-1)/2 º 1 (mod p). Now, by Lemma 2.13(ii), p-1 divides k(p-1)/2, so k must be an even number. Let k = 2j. Then (r^j)² º a (mod p), so r^j is a solution of x² º a (mod p). Hence result.

Definition : Let p be an odd prime and gcd(a, p) = 1. The Legendre symbol (a/p) is defined as follows : if a is a quadratic residue of p, then (a/p) = 1, otherwise (a/p) = -1.

The Legendre symbol is a very convenient and condensed notation, and allows us to develop mathematical relations involving it as a Boolean operator.

Lemma 3.2 :

if a º b (mod p), then (a/p) = (b/p)
(a²/p) = 1
(a/p) º a^(p^-1)/2 (mod p)
(ab/p) = (a/p).(b/p)

Proof : (i) if x² º a (mod p) and a º b (mod p) then x² º b (mod p) by the transitivity of congruence. (ii) Take x = a. (iii) This is a reformulation of Euler's Criterion.

Proof of (iv) : (ab/p) º (ab)^(p^-1)/2 º a^(p^-1)/2.b^(p^-1)/2 º (a/p).(b/p) (mod p). Each of (ab/p), (a/p) and (b/p) is either 1 or -1, so the difference (ab/p) - (a/p).(b/p) is either 0, 2, or -2. Since p is an odd prime, this difference must be 0. Hence result.

In (ii) above, we can take a = 1, so that (1/p) = 1. Also, by definition (-1/p) º (-1)^(p^-1)/2 (mod p), and since (-1/p) is either 1 or -1, we must have (-1/p) = (-1)^(p^-1)/2. Since (p-1)/2 is even if p is of the form 4n+1 and odd if p is of the form 4n+3, this can be rephrased as :

Lemma 3.3. (-1/p) = 1 if p º 1 (mod 4) and (-1/p) = - 1 if p º 3 (mod 4).

We can find such explicit expressions in other cases, for instance :

Lemma 3.4. (2/p) = , i.e. (2/p) = 1 if p º ±1 (mod 8) and (2/p) = -1 if p º ±3 (mod 8).

Proof : Consider the congruences :

p - 1 º 1.(-1)¹ (mod p) 2 º 2.(-1)² (mod p)

p - 3 º 3.(-1)³ (mod p) 4 º 4.(-1)⁴ (mod p)

etc. up to the halfway point r º ((p-1)/2).(-1)^(p^-1)/2 (mod p), where r is either (p-1)/2 or p-(p-1)/2, depending as p º 1 or 3 (mod p), respectively. The left hand side of each of these congruences is even and consists of every even number up to p-1. Hence, multiplying the congruences, we obtain

2^(p^-1)/2.((p-1)/2)! º ((p-1)/2)!(-1)^1+2+…+(p^-1)/2 (mod p).

Cancelling the common term we get 2^(p^-1)/2 º (mod p), as required, by Lemma 3.2(iii) and the fact that both sides are either 1 or -1.

We can now give our first improvement over the 2n+1 formula.

Lemma 3.5.

There is an infinity of primes of the form 4n+3.
There is an infinity of primes of the form 4n+1.

Proof of (i) : This involves only very basic theory. Assume the statement is false and let N be the number obtained by subtracting 1 from the product of all primes of the form 4n+3. Then N is also of the form 4n+3. Now, N is odd and so all its divisors are of the form 4n+1 or 4n+3. However, the product of two or more integers of the form 4n+1 must also be of the form 4n+1, so N must be divisible by a prime q of the form 4n+3. But q also divides N+1, so q divides 1, a contradiction. Hence result.

Proof of (ii) : Assume the statement is false and let N be the number obtained by adding 1 to the square of the product of all primes of the form 4n+1. Now, N is odd, so let p be an odd prime that divides N. Then by definition, -1 is a quadratic residue of p, i.e. (-1/p) = 1. By Lemma 3.3, p is therefore of the form 4n+1. But p also divides N-1, so p divides 1, a contradiction. Hence result.

Lemma 3.6. There is an infinity of primes of the form 8n-1.

Proof : Assume the statement is false. Let P be the product of all primes of the form 8n- 1, and let N = (4P)² - 2. Now N is odd, so let p be an odd prime that divides N. Then (4P)² º 2 (mod p) so 2 is a quadratic residue of p. By Lemma 3.4, we must have p º ±1 (mod 8). However, the product of two or more integers of the form 8n+1 must also be of the form 8n+1, so N must be divisible by a prime q of the form 8n-1. But q also divides N+2, so q divides 2, a contradiction. Hence result.

Such arguments as those just given can be repeated and extended for other forms, especially since we can use the following famous result.

Lemma 3.7 (Gauss' Quadratic Reciprocity Law). If p and q are distinct odd primes, then

(p/q).(q/p) = .

The proof of this result is not difficult conceptually, but does involve a lot of formatting of mathematical equations and so I shall leave it out. It can be found in the reference texts.

Consider the prime 3. We have 1² º 2² º 1 (mod 3), so (p/3) = 1 if p º 1 mod 3 and (p/3) = -1 if p º 2 mod 3. Using Lemma 3.7, we have (3/p) = (p/3) if p º 1 mod 4 and (3/p) = -(p/3) if p º 3 mod 4. Combining these, we obtain the following additional explicit result:

Lemma 3.8. (3/p) = 1 if p º ±1 (mod 12) and (3/p) = -1 if p º ±5 (mod 12).

This result will be of use when we consider Fermat numbers.

Let us now consider counting prime numbers.

Definition : Let p(x) be the number of primes less than or equal to x.

It has been known for over one hundred years that p(x) ~ x / log(x) (the Prime Number Theorem, proved independently by Hadamard and de la valle Poussin), with the expectation that a particular number p is prime being 1 / log(p). The proof of these facts is well beyond the scope of these pages, in the realms of analytic number theory, but we shall make use of the results from time to time.

Obviously, we can verify the value of p(x) for small x by brute force, and this has been done in the past at least up to x = 10¹². However, a method exists, originated by Meissel and refined in turn by Lehmer, then Lagarias, Miller and Odlyzko, then Deleglise and Rivat, that allows the calculation of p(x) using only the knowledge of explicitly calculated values up to Ö x, plus some additional calculations on the number of integers surviving trial division by the first p(Öx) primes. This has been used to calculate p(x) to much higher values. The current record is p(4*10²²) = 783,964,159,852,157,952,242. The pi-x project is an active collaboration taking place to calculate p(10²³). Check the records section for a table of prime counts to various powers of 10.

All odd primes are either of the form 4n+1 or 4n+3, and we have seen that both of these arithmetic progressions contain an infinity of primes. However, there is no indication as to the relative growth in numbers of primes for each of these. Let us expand the definition of the counting function, as follows.

Definition. Let p_d,a(x) be the number of primes less than or equal to x in the arithmetic progression dn + a, where gcd(a, d) = 1.

Consider d = 4. Then we can take a = 1 and a = 3. We observe that p_4,1(x) £ p_4,3(x) for all small x, and it is not until x = 26861 that p_4,1(x) > p_4,3(x).

Similarly, apart from p = 3, all odd primes are either of the form 3n+1 or 3n+2 and we can check that p_3,1(x) £ p_3,2(x) for all small x. This time, however, it is not until x = 608981813029 that p_3,1(x) > p_3,2(x).

Needless to say, it has been proved that primes in arithmetical progressions to the same modulus have the same density in the long run, namely that p_d,a(x) ~ x / [j(d)log(x)], noting that the right hand side of this formula does not depend on a.

Primes seem to be fairly commonly occurring, and the Prime Number Theorem gives an idea of how common. In fact, there is still a 1% chance of a random number near 10⁴³ being prime (a percentage which increases if we restrict our selections to those where the units digit is 1, 3, 7 or 9), so let us make some reasonable conjectures :

(Bertrand's Postulate). p(2n) > p(n) for all n. Another way of expressing this is that if p_n is the nth prime, then p_n+1 < 2p_n, or, for all n, there is a prime between n and 2n.
For all n, there is a prime between n² and (n+1)².

In fact, Conjecture A has the status of a theorem, whereas Conjecture B remains unproved. However, it is an entirely reasonable assumption, I'm sure you would agree. Now (n+1)² = n² + 2n + 1. The following conjecture is even tighter.

For all n, there is a prime between n² and n² + n.

Casual observation would tend to sustain the idea that wherever we are in the integers, we are never too far away from a prime. In fact, even though the above conjectures are reasonable, and may well be true, it is easy to produce arbitrarily long sequences of integers, none of which is a prime. For instance, the sequence starting at n!+2 and ending at n!+n has n-1 consecutive composites, and extending the sequence on both sides may continue to provide composites for some time, although ultimately we know that a prime will be found. On the other hand, for this sequence to produce 1000 composites in a row, we must consider n to be near 1000, and 1000! is a number of 2568 digits. Similarly, the sequence starting at p#+2 and ending at p#+p, where p is a prime, has p-1 consecutive composites, and 1000# has 416 digits. Can we find runs of composites of length n for numbers significantly smaller numbers? This problem is often viewed in terms of prime gaps, in the following manner.

Definition : Let d_n = p_n+1 - p_n be the gap between successive primes.

Obviously, since p₁ = 2 is the only even prime, we have d_n is even for all n > 1. Also, a gap of d_n provides a sequence of d_n-1 consecutive composite integers. The first gaps are 1, 2, 2, 4, 2, 4, 2, 4, 6, 2, 6, 4, 2, 4, 6, 6, 2, …, and in general there are two issues at hand :

For a particular gap value 2k, what is the smallest n such that d_n = 2k?
Finding ever larger gaps (not including the pathological cases mentioned above).

Point (i) presupposes that there is always a gap of size 2k for any k. The following conjecture goes even further :

D. For any k, there are infinitely many values of n such that d_n = 2k.

This conjecture has not even been proved for k = 1, though it is thought to be true.

While evaluating p(x) by explicitly calculating every prime less than or equal to x, both the first occurrences for various 2k, and the largest gap up to that point can be read off immediately. Occasionally, a first gap of size 2k will be found before a first gap of size 2m, where k > m. In this case, the first occurrence of 2m-1 consecutive composite numbers occurs within the earlier gap. The smallest example of this is the first occurrence of a gap of size 10, following the prime 139, which is larger than the first gap of size 14, which follows the prime 113.

There are two main methods used for finding large prime gaps: theory-based examination of specially constructed sequences with built-in divisibility properties, or brute force, the latter encouraged by the advent of efficiently implemented sieving and pseudoprime tests, by searching for large runs of composites flanked by probable primes. For convenience I refer to these as pseudoprime gaps. They may be converted to prime gaps once deterministic tests are performed on the endpoints. Check the records section for the current best.

Since we can always find a run of composites of any length, it is of interest to measure how much better these runs are than the pathological examples given above. If we consider that a run of n composites can be found just after n#, then a good measure is the ratio g : log(p) where p is the probable prime at the lower end of the run and g is the gap. The larger this ratio, the larger the gap in comparison to the expected gap size at that magnitude. Since log(n#) » n, we can consider the pathological case as giving a ratio of g : g = 1. Dubner has noted that if p#+1 and p#-1 are both composite, a common condition, then the range p#-p to p#+p has a run of 2p+1 composites at least, and therefore we should consider the pathological case as giving a ratio (or score) of 2g : g = 2. The constructed solutions tend to give the best scores. Paul Leyland keeps a record list of the gaps that score highest in this manner.

The special case d_n = 2 has come under a lot of scrutiny.

Definition. If p and p+2 are both prime, then they are called twin primes.

The first few examples of twin primes are (3,5);(5,7);(11,13);(17,19);(29,31);(41,43), etc.

As alluded to earlier, it has not been proved that the number of twin prime pairs in infinite, but there is no-one who actually doubts this to be true. On the other hand, while the sum of the inverses of all primes diverges, the sum of the inverses of all twin primes (primes in more than one pair counting twice) converges to Brun's constant (B » 1.902160577783278…). Additionally, it has been proved that there are arbitrarily long sequences of primes not containing a twin prime pair. The number of twin prime pairs with smallest number less than or equal to x is denoted p₂(x). The current record is p₂(3*10¹⁵) = 3310517800844. See the records section for the top-10 known twin prime pairs, and a table of twin prime counts for various powers of 10.

Comprehensive lists of prime gaps and twin prime counts are available on Thomas Nicely's website.

There is also competition to find the largest pair of twin primes. The form k*2ⁿ ± 1 is typical of the largest twin primes, again because of the availability of suitable primality tests.

There is even an interest in the largest known number of consecutive twin primes, that is, a sequence of 2n consecutive primes providing n pairs of twins. This currently stands at n = 8, the smallest sequence starting at the prime 1107819732821 and finishing at the prime 1107819733063 (deVries, 2001).

I have already mentioned Dirichlet's Theorem, which states that arithmetic progressions of the form a + kd contain an infinity of primes whenever gcd(a, d) = 1. A corollary of this is the following :

Lemma 3.9. If gcd(a, d) = 1 and 2 divides ad, then there exists an infinite sequence of pairwise relatively prime integers n₁, n₂, …, such that a + n_id is prime.

Proof : Let a₀ = a+d, d₀ = 2d.(a-d). Then gcd(a₀, 2) = gcd(a₀, d) = gcd(a₀, a-d) = 1. Hence gcd(a_o, d₀) = 1. By Dirichlet, there is an m₁ such that a₀ + m₁d₀ = p₁ is a prime. Now, a₀ + m₁d₀ = a + n₁d where n₁ = 1+2m₁.(a-d), so n₁ º 1 (mod (a-d)). Assume that k ³ 1 and n₁, n₂, …, n_k have already been found, and let N_k = n₁n₂…n_k. Then N_k º 1 (mod (a-d)). Let a_k = a - d - 2dN_k and d_k = 2dN_k.(a-d). Then gcd(a_k, 2) = gcd(a_k, d) = gcd(a_k, N_k) = gcd(a-d, N_k) = gcd(a_k, a-d) = gcd(a_k, 2dN_k) = 1. By Dirichlet, there exists m_k+1 such that a_k + m_k+1d_k = p_k+1 is prime. Now, a_k + m_k+1d_k = a + n_k+1d where n_k+1 = 2N_k - 1 + 2m_k+1N_k.(a-d), so n_k+1 º 1 (mod (a-d)), and n_k+1 º -1 (mod N_k), so gcd(n_k+1, n_j) = 1 for all j £ k. Hence result.

The primes themselves form an infinite sequence of pairwise relatively prime integers. This leads to the following conjecture.

E. If gcd(a, d) = 1 and 2 divides ad, then there exist infinitely many primes q₁, q₂, …, such that a + q_id is prime.

The connection with twin primes is obvious - just take a = 2 and d = 1. Taking a = 1 and d = 2, conjecture E provides the following :

F. There are infinitely many primes q such that 2q+1 is also prime.

Definition : Such primes q as in Conjecture F are called Sophie-Germain primes.

Are there infinitely many Sophie-Germain primes ? Probably, but no proof exists. However, whenever a very large prime is found, it is usually checked to see if it is one of a Sophie-Germain prime pair. It is also convenient to sieve numbers of the form k*2ⁿ -1 and k*2ⁿ⁺¹ -1 to search for large Sophie-Germain primes. Check the records section for the current top-10. Needless to say, primes of this form would not be given a name if they did not have some additional importance, one of which will become apparent in Section 4.

Concerning arithmetic progressions directly, there are several issues worth investigating :

Finding arithmetic progressions of primes.
Finding arithmetic progressions of consecutive primes.

In the first case, we try to find snapshots of consecutive values in the sequence a + kd which are all prime. In the second case, the primes must be consecutive. It seems reasonable to expect progressions of type (ii) to have small d, while those of type (i) can be more flexible. This is given some justification by the following.

Lemma 3.10. If gcd (a, d) = 1, d ³ 2 and a, a+d, a+2d, … , a+(n-1)d is a sequence of n primes in arithmetic progression, and q is the largest prime less than or equal to n, then either the product of the primes up to q divides d, or a = q and the product of the primes less than q divides d.

Proof : If p is a prime not dividing d, then the numbers a, a+d, a+2d, … , a+(p-1)d are pairwise incongruent modulo p, and one of them is divisible by p. Assume that the product of primes up to q does not divide d, and let p be the smallest such that p £ n and p does not divide d. Then p divides a+kd for some 0 £ k < p. But a+kd is prime, so p = a+kd. But a is a prime, and so if k ¹ 0 then a < p, so by definition of p, a divides d, which is false, so k = 0, i.e. p = a. If p < q, then p < n, so a+pd is prime, which is false since it is divisible by p. Hence p = q. By the definitions of p and q, either p does not exist, in which case all the primes up to q divide d, or p = a = q, and the primes smaller than p (and therefore q) divide d, as required.

Here's a simple example : a = 5, d = 6 gives a progression of length n = 4, so q = 3. Now, either a = q, which is not true in this case, or 2*3 divides d, which is true.

Simply writing out the first few primes in a grid, it is easy to see that there is no shortage of primes in arithmetic progression (PAPs). PAPs of length 2 have been considered already in the form of prime gaps. It has been proved that there is an infinity of PAPs of length 3. However, for longer PAPs, we are left with the following conjecture :

G. For any n ³ 4, are there infinitely many PAPs of length n ?

As usual, we are always interested in the longest and largest PAPs. Check the records section for the largest known 3-PAPs, 4-PAPs, 5-PAPs and 6-PAPs.

In contrast, the longest PAP currently known has length 22 :

11410337850553 + 4609098694200*k, for 0 £ k £ 21 (Pritchard, Moran, Thyssen, 1993)

In this case, the gap is the product of a lot of small primes, a common occurrence for long PAPs, and for which I will give an explanation soon.

The longest PAP currently known consisting of consecutive primes has length 10 (Toplic, Forbes, 1998), and is rather amazing :

100996972469714247637786655587969840329509324689190041803603417758904341703348882159067229719 + 210, 420, 630, 840, 1050, 1260, 1470, 1680, 1890

Check the records section for the largest known 3-CPAPs and 4-CPAPs.

I mentioned above that p(x) ~ x / log(x) and that the likelihood that x is prime is approximately 1 / log(x). But what if x (or p, since we are interested in primes) has already survived a sieve to a limit q ? Now, it is obvious that the proportion of integers surviving a sieve to q is s(q) =

If p has survived a sieve to q, then the likelihood that p is a prime therefore becomes 1/[s(q)log(p)], or t(q)/log(p), where t(q) = 1/s(q), and the expected number of survivors required in order to obtain a prime is the inverse of this. By a classical result of Mertens, the value of t(q) rises logarithmically as eg .log(q), where g is Euler's constant and eg » 1.781, so for large values of q, increasing the trial division limit by small percentages does not give us any real benefits. On the other hand, when q is low, adding a few extra primes to the sieve makes a big difference. Setting q = 257 reduces the chances of any number surviving trial division by over 90%.

We have considered gaps between primes. However, an interesting extension is to consider gaps between survivors of trial division to a particular limit. The following conjecture appeared in the primenumbers mailing list.

H. Let p be an odd prime and q be the largest prime less than p. Then the maximum gap possible between numbers surviving a trial division sieve to p is 2q. (The original statement of the conjecture was in terms of consecutive numbers each divisible by a prime less than or equal to p, but it amounts to the same thing).

As with many other conjectures, this seems reasonable when investigated for small p. It is obvious that the combined divisibility pattern will repeat modulo p#, so we can use this as a search limit. The conjecture is true for all primes p £ 19, with maximum allowable gap being reached in all cases. However, for p = 23, the conjecture implies that the biggest gap is 38, whereas in reality, the largest gap found is 40. The estimate then holds good for p = 29 and p = 31, but then fails again at p = 37. In fact, p = 41 is the only other value for which the statement holds, with actual results diverging more and more from the estimate. It is certainly possible that loose estimates may be found heuristically for upper bounds on the gap sizes in this case, but I shall leave that as an exercise.

Having established the fact that primes occur frequently but perhaps with hidden pattern, in Section 4, we will investigate some more advanced primality tests.