Doing Mod N Via Mod R
Gödel’s Lost Letter and P=NP 2018-03-12
Variations on Montgomery’s trick
Peter Montgomery is a cryptographer at Microsoft. Just recently, Joppe Bos and Arjen Lenstra have edited a book titled Topics in Computational Number Theory Inspired by Peter L. Montgomery. The chapters range to Montgomery’s work on elliptic curves, factoring, evaluating polynomials, computing null-spaces of matrices over finite fields, and FFT extensions for algebraic groups. Bos and Lenstra say in their intro that most of this was “inspired by integer factorization, Peter’s main research interest since high school.” Factoring, always factoring…
Today we discuss one of Montgomery’s simpler but most influential ideas going back to a 1985 paper: how to compute mod when you can only do operations mod
.
We’ve previously thought in terms of the “make” and “break” sides of cryptography. Montgomery has certainly done a lot of work on the latter. He has even said to have demonstrated in the 1980s that the Data Encryption Standard as adopted in 1977 was vulnerable to attacks on even on IBM PCs.
However, we realize that the “make” side has three parts, two of which the “breakers” often help directly. The first part is the design of new cryptosystems. The only help from breakers is knowing what not to design. The second part however is their implementation. Any mathematical ideas used to make attacks faster might also help speed methods that are safer but were previously clunky. The third part is defense, specifically against timing attacks. Speedier execution is often smoother execution and hence less vulnerable to inference by timing.
The Montgomery Reduction
The simple function conceived by Montgomery solves a problem whose most common case is as follows:
Compute
modulo an odd number
using only addition, multiplication, and operations modulo a power of
, that is, modulo
. Pre-computed quantities such as
are also allowed.
Of course this is used for implementations of the RSA public-key cryptosystem. With binary notation, mod means ignoring all but the bottom
bits and division by
is just a right shift by
bits. Montgomery multiplication has also been used to implement Peter Shor’s famous quantum factoring algorithm here and here. I and my student Chaowen Guan, who contributed some of the text and equations below, are using it to emulate Shor’s algorithm in a model where we translate quantum circuits into Boolean formulas in order to apply heuristic
solvers. We have a paper with Amlan Chakrabarti in the Springer LNCS journal Transactions on Computational Systems, which became the destination for this refinement of my old “cookbook” draft with Amlan which I described some years ago.
The basic trick is to compute the following function. Given and
relatively prime, we can find integers
such that
Then define the Montgomery reduction of any natural number by
We have omitted the last line of the routine which Montgomery called “REDC” in his paper. This last line takes the output of
as we have defined it and tests
; upon failure it outputs
rather than
. We, however, want to avoid even this dependence on
. From several papers on avoiding this test, all surveyed in the new book's chapter on “Montgomery Arithmetic From a Software Perspective” by Bos and Montgomery himself, we follow this 2009 paper by Qiong Pu and Xiuying Zhao and section 2.2 of this earlier paper by Colin Walter and Susan Thompson (note the latter’s attention to timing attacks).
First we note that the value is always a natural number. Let where
means integer division as in Python. Then
. We get:
So is an integer, and by (1) it is positive except that
. Hence we can freely compose and iterate
. What kind of function is it?
Some Properties of MR
By our last calculation, embodies integer division by
both in the implicit use of
and the explicit product with
. It does not, however, compute
. Its actions on numbers decomposed via
and via
go as follows:
Thus increases by one with
only when subtracting
from
crosses a multiple of
. In that sense
imitates division by
. Moreover, when
is relatively prime to
then so is
because
, so if
shared a proper divisor with
then so would
. Thus for such
,
is relatively prime to
and in particular cannot be a multiple of
.
It follows with that
and
. Now presume
and let
. Then
Thus with ,
has the same congruence modulo
as
, while with
,
.
Montgomery Multiplication and Inequalities
Most in particular, for any , if we define
and
, then
Thus defining
gives a commutative operator that when restricted to acts like multiplication. Moreover, if we have
and
then with
and
we have:
Thus the multiplicative behavior of carries through mod
on multiplying by
modulo
. The
operator is Montgomery multiplication proper. The idea is that one can do whole iterated products
with one division by
and one reduction mod
per
and have only the single reduction mod
at the end.
But again we want to avoid even that one use. But there are cases where fails. The largest ratio for
and
with
is
given for
and
by
. In search of a general result, we are helped by the following inequalities:
The last line follows because prevents
. Finally note that if
then
To use this iteratively, one wants the output to be
and if so return
instead. The comparison with
is simple to do.
Iterated Multiplication
Suppose we want to multiply modulo
. We may suppose that each
already satisfies
. The problem with transforming each
to
, besides the extra multiplication for each
, is going to magnitudes as high as
. Transforming to
would defeat our purpose because while we can efficiently pre-compute
giving
, we cannot suppose the reductions mod
for the
to be pre-computed.
So we plunge ahead and do:
First, how much can this grow? Put . Since
can be greater than
, the precondition for
is not met. But
then
, so that we preserve the bound
.
Note that when is a power of 2 it may need to approach
, but that does not affect the
upper bound. The issue now is that
includes
extra factors of
, that is,
. Hence
. If we can compute a number
such that
and
, then a final
will give the final targeted value
such that
.
Thus, modulo , we have reduced iterated multiplication to exponentiation. The extra
multiplications—or rather,
more applications of
—beats computing
or
for each
.
Montgomery Exponentiation
When it comes to implementing fast modular exponentiation, we need to square a running product in the sense of computing
or something related. To maintain a constant bound
given
, we need
and
so that:
so
Presupposing , we can substitute
to require
Ignoring lower-order terms gives us , so
. The discriminant is
so we need
, which yields
to realize the same
upper bound. Since
is a power of 2,
may need to be chosen nearly
to use this guarantee. In hardware contexts this is annoying since
may thus need one more machine word than
(or else
must be restricted to
the possible size), but when working with individual bits—or qubits—this extra overhead from squaring is no big deal.
It remains to show how to imitate standard algorithms for fast exponentiation. Here is one, using (*) to mean . Recall that
and
can be pre-computed.
fastexp(a,x): montyfastexp(a,x): # needs R > 4N A = a A = a (*) r_2 # cong. to ar X = x X = x Y = 1 Y = r while X > 0: while X > 0: if X is even: if X is even: A = (A * A) A = A (*) A X = X // 2 X = X // 2 else: else: Y = (A * Y) Y = A (*) Y X = X - 1 X = X - 1 return Y return Y (*) 1
By induction, A and Y always have an “extra ” until the return
strips it away. They always have magnitude at most
, so that the final
returns a value
. This value equals
.
An example of why requiring only can fail is
,
,
,
:
but the right-hand code outputs
, which has the right congruence but is not
; there is none with
as in the aforementioned papers, or returning
as the last line of our code, make it work with smaller
.
Open Problems
Our little “scholium” has turned up some interesting congruence properties. Does the above suggest any further ideas and savings?
[removed erroneous “Lemma 1”]