Section 7 - Advanced Primality Tests

There are numerous occasions when we would want to be able to prove the primality or otherwise of a number that does not conveniently fall into one of our favoured categories. Such numbers often appear as cofactors remaining after attempts at factorisation. Other situations include the generation of random primes for cryptography, searches for primes with peculiar properties or digit patterns, and searches for primes that occur in unusual places, such as decimal expansions of fundamental constants like p and e. We can easily apply trial division and then pseudoprimality tests so that we don't have to waste our time with numbers that are definitely composite, which leaves us with large probable primes that need primality to be confirmed by additional means.

We cannot reasonably expect to be able to prove the primality of such formless probable primes as easily or as quickly as the very particular forms previously mentioned. Since a generic test must be able to handle any number, regardless of how that number was originally produced, the mathematics is bound to be more complicated. Given these difficulties, it would be preferable if it were not necessary to repeat such a test in its entirety in order to check its accuracy or confirm its result. A desirable feature, therefore, of a general purpose primality test, is some sort of validation, or proof, in the form of verifiable output, that a tested number n is either prime or composite. If such a proof exists, it is known as a **certificate**. Verification of this certificate should ideally be very much faster than the original test. As an example, if a number n has been successfully factored into a prime p and a cofactor c, then (p*c) = n is a certificate of compositeness, which can be verified by performing the multiplication in a time much faster than it took to obtain the factor p in the first place. Certificates of primality are more difficult to obtain. For instance, consider that a number has been proved prime by trial division. There is no output, apart from the fact of its success, that can be used to verify this, other than by re-applying the trial division. The situation is slightly better for numbers that a proved prime using Lemma 4.3, the Brillhart-Lehmer-Selfridge method. In this case, the list of the primes dividing n- 1 and the bases a that satisfy the required condition comprise a primality certificate, although most of the hard work has already been done in factoring n- 1 in the first place.

Notwithstanding the above issues, algorithms have been devised that allow testing on numbers of several thousand digits to be performed in a practical time on a computer.

Before mentioning some truly deterministic tests, I will mention some "better-than-probable" prime tests.

The most obvious check is to perform repeat pseudoprime tests to various bases. Using the standard definition of a pseudoprime to the base a, i.e. a composite n such that a^{n-1} º 1 (mod n), is one possibility. If such a value a such that gcd(a,n) = 1 is provided, then we have another example of a certificate of compositeness. Of course, there are the Carmichael numbers to worry about. However, there are other types of pseudoprime that are less common.

Definition. Let P and Q and D = P^{2} - 4Q be non-zero integers and U_{n} and V_{n} the associated Lucas sequences. If n is a composite number such that gcd(n,D) = 1 and U_{n-(D/n)} º 0 (mod n) then n is a **Lucas pseudoprime** (with parameters P and Q).

Needless to say, primes satisfy this condition. In a repeat test, we can choose specific values of P and Q to ensure that (D/n) = - 1, making the calculations easier.

For the two types of pseudoprime already mentioned, there is no measure of the probability of composite n surviving repeated tests; we just accept that it decreases substantially at each iteration. We would prefer further theoretical justification. This is provided by the **Solovay-Strassen** test.

Definition. The Legendre symbol is defined so that, by Lemma 3.2(iii), (a/p) º a^{(p-1)/2} (mod p) when p is a prime and gcd(a,p) = 1. Any composite number n satisfying this congruence (with the Legendre symbol replaced by the Jacobi symbol) is called an **Euler pseudoprime** to the base a.

For each a, the probability of a composite n satisfying this test is at most 1/2, so the more we perform it, with different a, the higher the likelihood that a composite number will fail. After k successful tests, the probability that n is prime is higher than 1-(1/2)^{k}.

However, we can do even better, using the **Miller-Rabin** test, which uses yet another type of pseudoprimality.

Definition. Let n be an odd composite integer. Then we can express n uniquely in the form n = k*2^{s} +1 where k is odd and s ³ 1. Let a be an integer such that gcd(a, n) = 1. Then n is a **strong pseudoprime** to the base a if a^{k} º 1 (mod n) or a^{kx} º -1 (mod n) for some x = 2^{r}, with 0 £ r < s. If a does not satisfy any of these congruences, then n is immediately composite and a is said to be a **witness** for n.

Every strong pseudoprime to the base a is already an Euler pseudoprime to the base a. There are also infinitely many strong pseudoprimes to any base. However, if a witness can be found, we have yet another certificate of compositeness.

The extended Riemann hypothesis, which will hopefully be proved one day, can be used to show that if n is composite then n has a witness a such that a < 2*(log(n))^{2}, which is nice mathematically but not a great help when n has 100 digits. It is better to use the repeat pseudoprime method for a smaller range of test values, and using the fact that the probability of a composite n passing each iteration of the test is at most 1/4, so that after k successful tests, the probability that n is prime is greater than 1-(1/4)^{k}.

The above tests are adequate, depending on the requirements. It may be that it is not absolutely essential that n is known to be prime for whatever purposes n is being used. However, there are times when we really need to prove, categorically, that a number is prime.

The classical tests are based around full or partial factorisations of n ± 1, and it is possible to use partial factorisations of

n^{2} + 1, n^{2} ± n + 1, etc. to chip away until there are enough factors to satisfy a primality test. However, there is nothing to restrict us to quadratic polynomials in n. Take, for instance, n^{m} -1. By Fermat's Little Theorem, for any prime p such that p-1 divides m, then p divides n^{m} -1, that is, n^{m} º 1 (mod p). If we choose m carefully, then the product of all the primes p such that p-1 divides m is greater than the square root of n. We then need to find a test that acts in the same way as Pocklington's result (Lemma 4.4) in order to restrict the form of any possible divisors of n. This is the starting point for the **APRT-CL** test, designed by Adleman, Pomerance and Rumely, and improved by Cohen and Lenstra.

For any n, the integers modulo n form a finite group Z/nZ of order j(n). In particular, if n is prime, then Z/nZ is cyclic of order n-1, the group being generated by any element of order n-1, i.e. any primitive root. We construct numbers S and T such that if p divides S then p-1 divides T and then consider Gauss sums over the finite ring Z/nZ [z _{p},z _{q}], where q is a prime dividing T. Gauss sums involve multiplicative characters c acting on Z/nZ. The upshot is that if we choose c correctly, then we can obtain conditions involving Gauss sums which force restrictions on any divisor of n. The improvements incorporated by Cohen and Lenstra include replacing the Gauss sums by Jacobi sums, which are combinations of two multiplicative characters but over the simpler ring Z/nZ [z _{p}]. With the restrictions in place, we test each possible divisor, and if all fail, then n is prime.

Although it wanders quite deeply into the realms of algebraic number theory, APRT-CL is a practical, deterministic, primality test and there are several implementations of this algorithm available. Of particular note is VFYPR (VeriFY PRimes) program by Tony Forbes, which improves on the original by combining the standard APRT theory with the Pocklington and Morrison results of partial factorisations of N-1 and N+1 to enforce additional restrictions on possible divisors.

A negative feature of the APRT-CL algorithm is that is does not provide a certificate of primality, i.e. there is no way of verifying that n is prime using APRT-CL except by applying the complete test again. This is costly, and of course perpetuates a reasonable doubt that its outcome is totally reliable, for instance, it may very well be the case that the algorithm has been incorrectly implemented, and, with its source code unavailable for inspection, errors will pass undetected. This has naturally led to vigorous testing and checking of implementations, and such fears are generally unfounded. Nevertheless, it is preferable that a general primality test produces a certificate of primality that can be verified, ideally by a distinctly different and open procedure.

A standard generalisation of the use of groups of residues modulo our test input n is to consider replacing these by groups of points on elliptic curves, an approach which had previously been of outstanding benefit to the problem of factorisation, with the introduction of the Elliptic Curve Method of Lenstra. This idea has more recently been used to good effect to produce another practical deterministic primality test. As ever, we need some basic concepts.

Elliptic curves appear as solutions of the elliptic integrals used to calculate the arc length of ellipses, and are basically the set of points that satisfy the equation y^{2} = Ax^{3} + Bx^{2} + Cx + D, with additional restrictions on the values of the coefficients, and are generally considered over the set of complex numbers **C**, since this is the algebraic closure of the reals **R**, but can also be considered over the rationals **Q** or a finite field. When an elliptic curve is viewed as an algebraic function, we can construct a corresponding Riemann surface of genus 1 (i.e. a torus). The importance of elliptic curves lies in the fact that the converse is true, i.e. that every Riemann surface of genus 1 can be associated with a particular elliptic curve.

Simple substitutions of x by a linear function of x, e.g. 2x +1, changes the elliptic curve equation but preserves the set of points that satisfy it - all we are doing is changing the coordinate system. For any elliptic curve, a suitable change of coordinates can convert the equation to the form y^{2} = x^{3} + ax + b. This is known as the **Weierstrass normal form**. For a general equation of this form to be an elliptic curve, we require that there are no repeated roots. This is true if and only if the discriminant 4a^{3} + 27b^{2} is non-zero. By custom, this value is normally replaced by the D = -16.(4a^{3} + 27b^{2}). An elliptic curve over a field K normal form y^{2} = x^{3} + ax + b is usually represented by E_{K}(a,b).

Elliptic curves are isomorphic if they can be equated under a change of coordinates. If two elliptic curves have the same normal form then they are isomorphic. However, it is possible two have two isomorphic curves with different, but obviously related, normal forms. In other words, normal forms are not unique, so that isomorphic curves can have different discriminants. However, the **j-invariant **j = 12^{3}.4a^{3} / (4a^{3} + 27b^{2}) is unique, i.e. distinct normals forms of isomorphic curves have the same j-invariant.

In addition, elliptic curves turn out to have purely algebraic properties and take on the structure of a group when a binary operation and identity element are properly defined. First, let us consider the equation over the normal real x-y plane, where it takes the form of a cubic curve. Hence any straight line not parallel to the y-axis passes through either 1 or 3 points on the curve. Given any two distinct points a and b on the curve, the straight line intersecting these two passes through a third. We define multiplication a*b of points on the curve to be the point of reflection of this third, so that we can introduce the point at infinity I¥ as the identity element. The only difficulty is when b = a, in which case we take the tangent to the point. Although this seems rather contrived, it allows us to consider elliptic curves as abelian groups, which means that we can bring a huge mass of theory to bear. Also, since the algebraic theory can be viewed over an arbitrary field, we can shift back to the complex numbers or the finite fields when appropriate.

I can now give a couple of results that are finite field specific and help towards the construction of a primality test.

Lemma 7.1 (Hasse). If p is a prime, then the number of points on an elliptic curve over the finite field GF(p) is p+1- t, where |t| £ 2Öp.

Lemma 7.2. Let p be a prime and E_{p} an elliptic curve over GF(p). Let q be a prime such that q > p^{1/2} + 2p^{1/4} + 1. If there exists m Î E_{p} / I¥ such that qm = I¥ then p is prime.

Proof: Suppose p is composite, and r £ Öp such that r is a prime that divides p. Then m_{r} Î E_{r} is an element of order q. But q > p^{1/2} + 2p^{1/4} + 1 ³ r + 2Ör + 1 ³ |E_{r}| (this last by Lemma 7.1), a contradiction. Hence result.

The **Goldwasser-Kilian** primality test is based on this result, as follows : If p is a probable prime, we proceed by seeking a and b such that |E_{GF(p)}(a,b)| = 2q for some prime q. The primality of q implies the primality of p by Lemma 7.2. In practice, we only test q to probable-primeness and iterate the production of a sequence of probable primes q_{1} = q, q_{2}, q_{3}, … etc. until we reach a lower bound, q_{n} say, where we can apply a more straightforward primality test such as trial division. The primality of q_{n} causes a cascade that ultimately provides the primality of p. The difficulty is in finding suitable a and b. The Goldwasser-Kilian test selects random elements from Z/pZ and then applies the Schoof algorithm to calculate the number of points on the associated elliptic curve. Once a suitable pair of values is found, we then locate m in E_{p}(a,b) such that qm = I¥ . Each of these stages has a known probability of success, giving an estimate of the expected number of test values that are needed.

The Goldwasser-Kilian test produces a primality certificate consisting, at each iteration, of the particular a, b, q and m used. A validation algorithm using this certificate therefore does not have to perform any random searches and so is much quicker than re-running the original test.

The greatest overhead in the Goldwasser-Kilian test is the repeated calculation of the number of points on the elliptic curve E_{p}(a,b) for random a and b. This is problem bypassed, at the expense of even more complicated theory, in the **Atkin-Morain** test, often referred to as **ECPP** in the literature (Elliptic Curve Primality Proving). Atkin showed that it is possible to construct a specific elliptic curve such that the number of points is predetermined and of approximately the right magnitude. This is done by first finding a value d for which Q[Öd] is an imaginary number field in which p is the norm N_{d}(p) of a particular algebraic integer p . Then he constructs an elliptic curve over GF(p) with N_{d}(1- p) points. The test is again an iterative application of a result similar to Lemma 7.2.

Lemma 7.3. Let p be a prime and E_{p} an elliptic curve over GF(p). Let q be a prime such that q > p^{1/2} + 2p^{1/4} + 1 and r an integer such that q divides r. If there exists m Î E_{p} / I¥ such that rM = I¥ and (r/q)M ¹ I¥ then p is prime.

Unfortunately, it would take a small textbook to provide enough background algebraic number theory for us to go into details about how exactly to go about finding first a value for d and then the specific elliptic curve itself. This involves binary quadratic forms and Hilbert Class field theory, ideas of which I am only vaguely aware. We must content ourselves with knowing that better minds than us have managed to implement ECPP. These include Morain's ECPP, whose first practical implementation gave rise to a number of improvements that could be made. Another implementation of ECPP, called Primo, by Marcel Martin, is currently the most popular general primality test in use. Both of these are freely available via the Internet.

As with the Goldwasser-Kilian test, ECPP produces a primality certificate. In the case of Primo, a validation program Cert_Val, by Jim Fougeron, is available that will read in this certificate and verify its authenticity.