Quadratic Sieve

From testwiki
Revision as of 10:00, 24 October 2022 by imported>ThaniosAkro (Review)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The quadratic sieve is a general purpose factoring method invented by Carl Pomerance in 1981. Invention of quadratic sieve greatly advanced the science of factorization of integers, particularly those considered difficult, those large integers containing exactly two prime factors of approximately the same precision.

Running time of quadratic sieve is dependent on size of integer to be factored, and the quadratic sieve has accomplished some impressive records in the field of factorization of integers.

This page presents some simple examples of the factorization of small integers. Although using the quadratic sieve on small integers is like driving a thumb tack with a sledge hammer, the examples illustrate how the quadratic sieve is implemented.

Introduction

The quadratic sieve attempts to build quadratic congruences of form x2y2(modN) where N is number to be factored.


Many congruences are created:

x12y1(modN)

x22y2(modN)

x32y3(modN)

x100002y10000(modN)


Suitable congruences are combined to produce congruent squares:

X2Y2(modN) where

X2=x22x32x52x99992

Y2=y2y3y5y9999


Then N(X2Y2) and, with a little luck,

factor1=igcd(X+Y,N) and

factor2=igcd(XY,N) where igcd is function integer greatest common divisor.

Implementation

This exercise calculates prime factors of N=5137851827.

Build Factor base

Let the Factor Base contain 40 small primes, the first 40 for which N is a quadratic residue.

[2, 11, 13, 17, 19, 23, 37, 41, 43, 59, 73, 103, 107, 113, 127, 131, 137, 149, 163, 179, 181, 
191, 197, 199, 211, 223, 233, 241, 251, 271, 281, 293, 307, 311, 317, 347, 367, 373, 379, 383]

Dictionary with logarithms of primes

Create a table or dictionary containing primes and logarithms of primes. Logarithms to base 2 are calculated. The reason for choosing base 2 will become apparent later.

x log2(x)=ln(x)ln(2)
    2   1.0                              
  11   3.4594316186372973
  13   3.700439718141092  
  17   4.087462841250339  
  19   4.247927513443585  
  23   4.523561956057013  
  37   5.20945336562895    
  41   5.357552004618084  
  43   5.426264754702098  
  59   5.882643049361842  
  73   6.189824558880018  
103   6.6865005271832185
107   6.741466986401147  
113   6.820178962415188  
127   6.9886846867721655
131   7.03342300153745    
137   7.098032082960526  
149   7.219168520462161  
163   7.348728154231077  
179   7.483815777264256  
181   7.499845887083206  
191   7.577428828035749  
197   7.622051819456376  
199   7.636624620543649  
211   7.721099188707185  
223   7.800899899920305  
233   7.864186144654281  
241   7.912889336229962  
251   7.971543553950772  
271   8.08214904135387    
281   8.134426320220927  
293   8.194756854422248  
307   8.262094845370179  
311   8.280770770130603  
317   8.308339030139408  
347   8.43879185257826    
367   8.519636252843213  
373   8.543031820255237  
379   8.566054038171092  
383   8.581200581924957  

Initialize sieve

In a regular sieve fine material falls through sieve and coarse material is retained. In this sieve coarse material falls through sieve and fine material is retained.

The expression "coarse material" refers to integers containing a small number of large primes, the great majority of integers.

The expression "fine material" refers to integers containing a large number of small primes, primes that are members of this factor base, or "smooth" according to this factor base.

The base or root of operations is the smallest value of x for which y=x2N is a positive number.

root=x=71679, y=27214.

# python code:

root = 71679

sieve = dict()

sieve[root] = [2]

for prime in factorBase[1:] :
    count = 0
    for x in range (root, root+2*(prime+2)) :
        y = x**2 - N
        if (y % prime) == 0 :
            if x in sieve : sieve[x] += [prime]
            else :  sieve[x] = [prime]
            count += 1
        if count >= 2 : break
    if count != 2 : print ('Internal error 1.')

L1 = sorted([p for p in sieve])

print()
print ('Initialized sieve contains', len(sieve),
       'keys in range from', L1[0], 'through', L1[-1],
       'or', L1[-1]-L1[0]+1, 'possibilities.')
Initialized sieve contains 65 keys in range from 71679 through 71954 or 276 possibilities.
for key in L1[::-1] :
    print (key, '|', sieve[key])
x Factors of  y=x2N
  71954 [379]
  71946 [293]
  71939 [311]
  71917 [293]
  71915 [373]
  71906 [281]
  71876 [211,  307]
  71864 [317]
  71852 [191]
  71850 [241]
  71849 [223,  383]
  71847 [179]
  71841 [347]
  71839 [251]
  71830 [271]
  71824 [163]
  71820 [149]
  71817 [347]
  71816 [149]
  71815 [211]
  71813 [181]
  71806 [367]
  71804 [131]
  71800 [271,  307]
  71786 [241]
  71780 [191]
  71779 [163]
  71777 [113]
  71776 [383]
  71773 [199]
  71772 [131]
  71771 [233]
  71763 [223]
  71760 [107,  127]
  71758 [103]
  71757 [233]
  71751 [73]
  71750 [127]
  71743 [311]
  71737 [317]
  71733 [113,  137,  251]
  71727 [107]
  71721 [43,  103]
  71720 [181]
  71717 [59]
  71716 [41]
  71715 [37]
  71713 [197]
  71712 [59]
  71711 [179]
  71706 [137,  199]
  71703 [197]
  71702 [41]
  71697 [37]
  71694 [73]
  71692 [23]
  71691 [367]
  71690 [17,  23,  373]
  71688 [17,  19]
  71687 [379]
  71686 [19]
  71685 [281]
  71684 [11,  13,  43]
  71680 [13]
  71679 [2,  11]

Activate sieve

While information in table above is accurate, it is incomplete. Processing of sieve attempts to find values for which information is complete.

Processing locations in sequence

Step 1

The code examines location 71679. Product of factors 2,11 = 22, not enough to provide complete factorization of y=x2N.

Location 71679+2 or 71681 is initialized with [2] and location 71679+11 or 71690 is appended by [11]. Location 71679 is deleted.

x Factors of  y=x2N
  71697 [37]
  71694 [73]
  71692 [23]
  71691 [367]
  71690 [17,  23,  373,  11]  #  Factor  11  appended.
  71688 [17,  19]
  71687 [379]
  71686 [19]
  71685 [281]
  71684 [11,  13,  43]
  71681 [2]  #  Location  initialized.
  71680 [13]
Step 2

The code examines location 71680. Factor 13 is not enough to provide complete factorization of y=x2N.

Location 71680+13 or 71693 is initialized with [13]. Location 71680 is deleted.

x Factors of  y=x2N
  71697 [37]
  71694 [73]
  71693 [13]  #  Location  initialized.
  71692 [23]
  71691 [367]
  71690 [17,  23,  373,  11]
  71688 [17,  19]
  71687 [379]
  71686 [19]
  71685 [281]
  71684 [11,  13,  43]
  71681 [2]
Step 3

The code examines location 71681. Factor 2 is not enough to provide complete factorization of y=x2N.

Location 71681+2 or 71683 is initialized with [2]. Location 71681 is deleted.

x Factors of  y=x2N
  71697 [37]
  71694 [73]
  71693 [13]
  71692 [23]
  71691 [367]
  71690 [17,  23,  373,  11]
  71688 [17,  19]
  71687 [379]
  71686 [19]
  71685 [281]
  71684 [11,  13,  43]
  71683 [2]  #  Location initialized.
Step 4

Location 71682 does not exist in sieve. The code examines location 71683. Factor 2 is not enough to provide complete factorization of y=x2N.

Location 71683+2 or 71685 is appended by [2]. Location 71683 is deleted.

x Factors of  y=x2N
  71697 [37]
  71694 [73]
  71693 [13]
  71692 [23]
  71691 [367]
  71690 [17,  23,  373,  11]
  71688 [17,  19]
  71687 [379]
  71686 [19]
  71685 [281,  2]  #  Factor 2 appended.
  71684 [11,  13,  43]

Processing continues in this way until location 71690 is examined. Factors [17, 23, 373, 11] are complete factorization of y=x2N. Values [ 71690 , [17, 23, 373, 11] ] are appended to array "smooth."


Sieving continues until array "smooth" contains 45 members, at which time sieving is complete.

To improve speed

There are very few values of y that are smooth. Therefore code is designed to discard unsuitable values of x quickly, and to make other decisions quickly.

Decisions to discard unsuitable candidates are:

  • if x does not exist in sieve.
  • if there is only 1 prime for location x. Complete factorization of y requires at least 2 primes.

Code intended to reduce processing time: Template:RoundBoxTop y=x2N

y=(root+a)2N

y=root2+2aroot+a2N

y=root2N+2aroot+a2

y=rootSquaredMinusN+a(twoRoot + a)

Multiplication in last statement above is of smaller numbers than calculation of y=x2N. As size of N increases, last statement above becomes more efficient. Template:RoundBoxBottom

Template:RoundBoxTop Divisions and multiplications are time consuming. Instead of multiplying primes or dividing y=x2N to determine smoothness, logarithms of primes are added and compared to log(y). Calculation of ln(y) or log(y) to base 10 is time consuming. Because y is type int, a good approximation of log(y) to base 2 can be calculated quickly. As it happens, in this application "close enough" is good enough.


log2(37) for example:

As binary, 37=100101

Length of 100101 is 6. Therefore log2(37)=5.


log2(53) for example:

As binary, 53=110101

Length of 110101 is 6. Therefore log2(53)=5.

However, second most significant bit is set. Therefore log2(53)=5+1=6.

x log2(x)
1 0
2 1
6>x3 2
12>x6 3
24>x12 4
48>x24 5
96>x48 6
(3<<(k+1))>x(3<<k) k+2

Template:RoundBoxBottom

Sieving complete

Python code that accomplishes the sieving is:

# python code.

limit = 10000
rootSquaredMinusN = root**2 - N
twoRoot = root << 1
notin = count1 = count = 0
smooth = []
numberOfSmoothDesired = 45
for x in range (root, root+limit) :
    if x & 0xFF == 0 :
        '''
This code is added to provide information about sieve
during sieving.
'''
        L1 = sorted([ p for p in sieve ])
        range_ = L1[-1] - L1[0] + 1
        print ('For x =',x, 'Range of sieve =',range_, 'number of empty locations =', range_ - len(L1) )
    if x not in sieve :
        notin += 1
        continue

    primes = sieve[x]
    for prime in primes :
        v = x + prime
        if v in sieve : sieve[v] += [prime]
        else : sieve[v] = [prime]
    del (sieve[x])

    if len(primes) == 1 :
        count1 += 1 ; continue                 
    a = x - root
    y = rootSquaredMinusN + a*(twoRoot + a)

    sumOfLogs = log[primes[0]] + log[primes[1]]
    for prime in primes[2:] : sumOfLogs += log[prime]

    log_y = logBase2(y)
    if sumOfLogs >= 0.95*log_y :
        count += 1
        product = primes[0] * primes[1]
        for prime in primes[2:] : product *= prime
        if y == product :
            smooth += [[x,primes]]
            if len(smooth) >= numberOfSmoothDesired : break
            
print ('Range of values of x =', x-root+1)
print ('Number of empty locations in sieve =', notin)
print ('Number of locations with only 1 prime =', count1)
print ('Number of locations tested =', x-root+1-notin-count1)
print ('Number of locations chosen for close examination =', count)
print ('Number of values of x retained after close examination =', len(smooth))
x Range of sieve Number of empty locations
71680   275   210
71936   353   288
72192   350   287
72448   377   312
72704   331   266
72960   349   286
73216   366   305
73472   378   314
73728   347   276
73984   356   287
74240   368   306
74496   345   280
74752   358   292
75008   358   293
75264   342   278
75520   336   272

Thousands of values of x are tested. At any one time the range of values of x in sieve is <400 and sieve is mostly empty.

Range of values of x = 3922
Number of empty locations in sieve = 585
Number of locations with only 1 prime = 1331
Number of locations tested = 2006
Number of locations chosen for close examination = 45
Number of values of x retained after close examination = 45

Array of smooth values

Array "smooth" contains 45 members:

[71690, [17, 23, 373, 11]]
[71706, [137, 199, 13, 11]]
[71733, [113, 137, 251, 2]]
[71771, [233, 59, 37, 13, 2]]
[71800, [271, 307, 19, 11]]
[71824, [163, 103, 73, 17]]
[71849, [223, 383, 13, 11, 2]]
[71876, [211, 307, 23, 19]]
[72030, [103, 41, 37, 19, 17]]
[72066, [379, 131, 59, 19]]
[72071, [271, 59, 43, 41, 2]]
[72237, [233, 211, 43, 19, 2]]
[72268, [241, 163, 127, 17]]
[72372, [271, 43, 41, 19, 11]]
[72425, [367, 191, 59, 13, 2]]
[72509, [241, 211, 107, 11, 2]]
[72659, [211, 41, 37, 17, 13, 2]]
[72750, [241, 113, 23, 19, 13]]
[72809, [373, 281, 41, 19, 2]]
[72863, [149, 113, 23, 17, 13, 2]]
[73246, [163, 113, 59, 19, 11]]
[73279, [197, 179, 23, 13, 11, 2]]
[73322, [317, 179, 19, 17, 13]]
[73411, [307, 293, 127, 11, 2]]
[73472, [113, 107, 103, 19, 11]]
[73576, [347, 73, 43, 23, 11]]
[73868, [233, 131, 73, 13, 11]]
[73898, [163, 137, 37, 23, 17]]
[73900, [131, 107, 59, 23, 17]]
[73968, [293, 271, 19, 17, 13]]
[74176, [73, 41, 37, 23, 13, 11]]
[74427, [311, 127, 23, 17, 13, 2]]
[74435, [281, 181, 107, 37, 2]]
[74544, [127, 59, 23, 17, 13, 11]]
[74583, [293, 137, 37, 13, 11, 2]]
[74671, [127, 113, 73, 19, 11, 2]]
[74890, [199, 181, 179, 73]]
[75052, [271, 197, 127, 73]]
[75247, [251, 163, 149, 43, 2]]
[75252, [233, 211, 181, 59]]
[75365, [379, 163, 107, 41, 2]]
[75433, [293, 181, 127, 41, 2]]
[75462, [293, 113, 43, 23, 17]]
[75554, [223, 199, 43, 23, 13]]
[75600, [191, 37, 23, 19, 17, 11]]

If you look closely at contents of array "smooth," you notice that primes 383,367,317,347,311 are used exactly 1 time for each prime. Therefore, locations 71849,72425,73322,73576,74427 are not usable.

This section removes members containing these values of x from array "smooth." Unfortunately, the removal of these values of x causes other members to become unusable. The process continues until array "smooth" is ready, meaning that each prime in array "smooth" appears at least 2 times.

Removing unusable primes from smooth:
unusable = [383, 367, 317, 347, 311]
unusable = [223, 191]
len(smooth) = 38

The final version of array "smooth" contains 38 members:

[71690, [17, 23, 373, 11]]
[71706, [137, 199, 13, 11]]
[71733, [113, 137, 251, 2]]
[71771, [233, 59, 37, 13, 2]]
[71800, [271, 307, 19, 11]]
[71824, [163, 103, 73, 17]]
[71876, [211, 307, 23, 19]]
[72030, [103, 41, 37, 19, 17]]
[72066, [379, 131, 59, 19]]
[72071, [271, 59, 43, 41, 2]]
[72237, [233, 211, 43, 19, 2]]
[72268, [241, 163, 127, 17]]
[72372, [271, 43, 41, 19, 11]]
[72509, [241, 211, 107, 11, 2]]
[72659, [211, 41, 37, 17, 13, 2]]
[72750, [241, 113, 23, 19, 13]]
[72809, [373, 281, 41, 19, 2]]
[72863, [149, 113, 23, 17, 13, 2]]
[73246, [163, 113, 59, 19, 11]]
[73279, [197, 179, 23, 13, 11, 2]]
[73411, [307, 293, 127, 11, 2]]
[73472, [113, 107, 103, 19, 11]]
[73868, [233, 131, 73, 13, 11]]
[73898, [163, 137, 37, 23, 17]]
[73900, [131, 107, 59, 23, 17]]
[73968, [293, 271, 19, 17, 13]]
[74176, [73, 41, 37, 23, 13, 11]]
[74435, [281, 181, 107, 37, 2]]
[74544, [127, 59, 23, 17, 13, 11]]
[74583, [293, 137, 37, 13, 11, 2]]
[74671, [127, 113, 73, 19, 11, 2]]
[74890, [199, 181, 179, 73]]
[75052, [271, 197, 127, 73]]
[75247, [251, 163, 149, 43, 2]]
[75252, [233, 211, 181, 59]]
[75365, [379, 163, 107, 41, 2]]
[75433, [293, 181, 127, 41, 2]]
[75462, [293, 113, 43, 23, 17]]

The Matrix

Prime factors of y = x^2 - N

Assign a unique bit to each prime number in factor base:

Factor Unique bit assigned.
      2 0000000000000000000000000000000000000001
    11 0000000000000000000000000000000000000010
    13 0000000000000000000000000000000000000100
    17 0000000000000000000000000000000000001000
    19 0000000000000000000000000000000000010000
    23 0000000000000000000000000000000000100000
  317 0000010000000000000000000000000000000000
  347 0000100000000000000000000000000000000000
  367 0001000000000000000000000000000000000000
  373 0010000000000000000000000000000000000000
  379 0100000000000000000000000000000000000000
  383 1000000000000000000000000000000000000000

Values of x in smooth array

Assign a unique bit to each value of x in smooth array:

Value of  x Unique bit assigned.
  71690 00000000000000000000000000000000000001
  71706 00000000000000000000000000000000000010
  71733 00000000000000000000000000000000000100
  71771 00000000000000000000000000000000001000
  71800 00000000000000000000000000000000010000
  71824 00000000000000000000000000000000100000
  75052 00000100000000000000000000000000000000
  75247 00001000000000000000000000000000000000
  75252 00010000000000000000000000000000000000
  75365 00100000000000000000000000000000000000
  75433 01000000000000000000000000000000000000
  75462 10000000000000000000000000000000000000

Create Matrix

The matrix contains patterns representing values of x and patterns representing the corresponding prime factors of y=x2N.

Pattern representing values of  x Pattern representing factors of  y.
  00000000000000000000000000000000000001 0010000000000000000000000000000000101010
  00000000000000000000000000000000000010 0000000000000000100000010000000000000110
  00000000000000000000000000000000000100 0000000000010000000000010010000000000001
  00000000000000000000000000000000001000 0000000000000100000000000000001001000101
  00000000000000000000000000000000010000 0000000100100000000000000000000000010010
  00000000000000000000000000000000100000 0000000000000000000001000000110000001000
  00000000000000000000000000000001000000 0000000100000001000000000000000000110000
  00000000000000000000000000000010000000 0000000000000000000000000000100011011000
  00000000000000000000000000000100000000 0100000000000000000000001000001000010000
  00000000000000000000000000001000000000 0000000000100000000000000000001110000001
  00000000000000000000000000010000000000 0000000000000101000000000000000100010001
  00000000000000000000000000100000000000 0000000000001000000001000100000000001000
  00000000000000000000000001000000000000 0000000000100000000000000000000110010010
  00000000000000000000000010000000000000 0000000000001001000000000001000000000011
  00000000000000000000000100000000000000 0000000000000001000000000000000011001101
  00000000000000000000001000000000000000 0000000000001000000000000010000000110100
  00000000000000000000010000000000000000 0010000001000000000000000000000010010001
  00000000000000000000100000000000000000 0000000000000000000000100010000000101101
  00000000000000000001000000000000000000 0000000000000000000001000010001000010010
  00000000000000000010000000000000000000 0000000000000000010010000000000000100111
  00000000000000000100000000000000000000 0000000110000000000000000100000000000011
  00000000000000001000000000000000000000 0000000000000000000000000011100000010010
  00000000000000010000000000000000000000 0000000000000100000000001000010000000110
  00000000000000100000000000000000000000 0000000000000000000001010000000001101000
  00000000000001000000000000000000000000 0000000000000000000000001001001000101000
  00000000000010000000000000000000000000 0000000010100000000000000000000000011100
  00000000000100000000000000000000000000 0000000000000000000000000000010011100110
  00000000001000000000000000000000000000 0000000001000000000100000001000001000001
  00000000010000000000000000000000000000 0000000000000000000000000100001000101110
  00000000100000000000000000000000000000 0000000010000000000000010000000001000111
  00000001000000000000000000000000000000 0000000000000000000000000110010000010011
  00000010000000000000000000000000000000 0000000000000000100110000000010000000000
  00000100000000000000000000000000000000 0000000000100000010000000100010000000000
  00001000000000000000000000000000000000 0000000000010000000001100000000100000001
  00010000000000000000000000000000000000 0000000000000101000100000000001000000000
  00100000000000000000000000000000000000 0100000000000000000001000001000010000001
  01000000000000000000000000000000000000 0000000010000000000100000100000010000001
  10000000000000000000000000000000000000 0000000010000000000000000010000100101000

Process matrix

                   Pattern representing values of x   |           Pattern representing factors of y
------------------------------------------------------|--------------------------------------------------
Line 1:   x1 = 00000000000000000000000000000000000001 |  f1 = 001000000000000000000000000000000010 1 010
               00000000000000000000000000000000000010 |       000000000000000010000001000000000000 0 110
               00000000000000000000000000000000000100 |       000000000001000000000001001000000000 0 001
               00000000000000000000000000000000001000 |       000000000000010000000000000000100100 0 101
               00000000000000000000000000000000010000 |       000000010010000000000000000000000001 0 010
Line 2:   x2 = 00000000000000000000000000000000100000 |  f2 = 000000000000000000000100000011000000 1 000
               00000000000000000000000000000001000000 |       000000010000000100000000000000000011 0 000
Line 3:   x3 = 00000000000000000000000000000010000000 |  f3 = 000000000000000000000000000010001101 1 000
               00000000000000000000000000000100000000 |       010000000000000000000000100000100001 0 000
               00000000000000000000000000001000000000 |       000000000010000000000000000000111000 0 001
               00000000000000000000000000010000000000 |       000000000000010100000000000000010001 0 001
Line 4:   x4 = 00000000000000000000000000100000000000 |  f4 = 000000000000100000000100010000000000 1 000
               00000000000000000000000001000000000000 |       000000000010000000000000000000011001 0 010
               00000000000000000000000010000000000000 |       000000000000100100000000000100000000 0 011
Line 5:   x5 = 00000000000000000000000100000000000000 |  f5 = 000000000000000100000000000000001100 1 101
               00000000000000000000001000000000000000 |       000000000000100000000000001000000011 0 100
               00000000000000000000010000000000000000 |       001000000100000000000000000000001001 0 001
Line 6:   x6 = 00000000000000000000100000000000000000 |  f6 = 000000000000000000000010001000000010 1 101
               00000000000000000001000000000000000000 |       000000000000000000000100001000100001 0 010
               00000000000000000010000000000000000000 |       000000000000000001001000000000000010 0 111
               00000000000000000100000000000000000000 |       000000011000000000000000010000000000 0 011
               00000000000000001000000000000000000000 |       000000000000000000000000001110000001 0 010
               00000000000000010000000000000000000000 |       000000000000010000000000100001000000 0 110
Line 7:   x7 = 00000000000000100000000000000000000000 |  f7 = 000000000000000000000101000000000110 1 000
Line 8:   x8 = 00000000000001000000000000000000000000 |  f8 = 000000000000000000000000100100100010 1 000
Line 9:   x9 = 00000000000010000000000000000000000000 |  f9 = 000000001010000000000000000000000001 1 100
               00000000000100000000000000000000000000 |       000000000000000000000000000001001110 0 110
               00000000001000000000000000000000000000 |       000000000100000000010000000100000100 0 001
Line 10: x10 = 00000000010000000000000000000000000000 | f10 = 000000000000000000000000010000100010 1 110
               00000000100000000000000000000000000000 |       000000001000000000000001000000000100 0 111
               00000001000000000000000000000000000000 |       000000000000000000000000011001000001 0 011
               00000010000000000000000000000000000000 |       000000000000000010011000000001000000 0 000
               00000100000000000000000000000000000000 |       000000000010000001000000010001000000 0 000
               00001000000000000000000000000000000000 |       000000000001000000000110000000010000 0 001
               00010000000000000000000000000000000000 |       000000000000010100010000000000100000 0 000
               00100000000000000000000000000000000000 |       010000000000000000000100000100001000 0 001
               01000000000000000000000000000000000000 |       000000001000000000010000010000001000 0 001
Line 11: x11 = 10000000000000000000000000000000000000 | f11 = 000000001000000000000000001000010010 1 000
                                                                                                   ^
                                           Lines designated with a line number all have bit 3 set: ^

Last line of matrix Line 11 becomes the source of the next operation. Line  11 has bit 3 set. Targets of the operation are all other lines with bit 3 set, lines designated as Line 1 through Line 10.

Line 11 is combined in turn with Line 10, Line 9, Line 8,  Line 2, Line 1, by means of operation exclusive or, thus:

x10 = x10 ^ x11
f10 = f10 ^ f11

x9 = x9 ^ x11
f9 = f9 ^ f11

...............

x1 = x1 ^ x11
f1 = f1 ^ f11

Line 11 is deleted and bit 3 has been removed from all values on Right Hand Side of matrix.

The process is repeated. When a zero value is found on Right Hand Side of matrix, it means that this value is a perfect square.

Produce factors of N

Processing the matrix revealed 7 values of X that produce a perfect square on Right Hand Side of congruence.

Perfect square #1

00000000010000000001001000100000000000

The 4 bits set in this pattern represent values of x : [72268, 72750, 73246, 74544]

X = 72268 * 72750 * 73246 * 74544 = 28706195569530528000


These 4 values of x produce a value on RHS of congruence containing factors:

[11, 11, 13, 13, 17, 17, 19, 19, 23, 23, 59, 59, 113, 113, 127, 127, 163, 163, 241, 241]

Every factor in this list appears an even number of times.

Therefore, Y is product of factors: [11, 13, 17, 19, 23, 59, 113, 127, 163, 241]

Y = 35335010025681509


Using p1 = iGCD(X+Y, N) and p2 = iGCD(X-Y, N)

p1,p2 = 1, 5137851827


This congruence produced trivial factors of N.

Perfect square #2

01010000010010001000001111010010000000

The 11 bits set in this pattern represent values of x :

[72030, 72237, 72372, 72509, 72659, 72750, 73472, 73968, 74544, 75252, 75433]

X = product of values of x = 331906592471738688342554821695566634067059049758720000


From these 11 values of x:

Y = product of factors:

[2, 2, 11, 11, 13, 13, 17, 17, 19, 19, 19, 23, 37, 41, 41, 43, 
59, 103, 107, 113, 127, 181, 211, 211, 233, 241, 271, 293]

Y = 3343990074727707345948316157765706289327953268


Using p1 = iGCD(X+Y, N) and p2 = iGCD(X-Y, N)

p1,p2 = 89123, 57649


This congruence produced non-trivial factors of N.

Example #2

This example presents a more advanced implementation of the quadratic sieve because it includes the following features:

  • Use of powers of primes.
  • Use of primes not in factor base, "large" primes.
  • Participation in a distributed computing effort.

Past attempts to factor large integers were successful only because many computers contributed to the production of "smooth" values of y.

One way to organize a distributed computing effort is to assign different tasks to many different computers so that no computer duplicates the work of another. For example, different computers can be assigned the following tasks:

  • Smooth values for y=x2N with y positive.
  • Smooth values for y=x2N with y negative.
  • Smooth values for y=x22N with y positive.
  • Smooth values for y=x22N with y negative.


This example assumes the role of computer assigned:

Smooth values for y=x24N with y negative.

To ensure that this task does not duplicate other work, values of x are all odd.


Let n=147573952589676412927.

Then N=4n.

base=24296004001 where base is an odd integer very close to N.

Template:RoundBoxTop Because N is even and x is always odd, y=x2N is always odd.

Consequently, prime number 2 is not in factor base. Template:RoundBoxBottom

Factor base

Create the factor base, a list containing 230 small primes for which N is a quadratic residue:

[3, 7, 13, 23, 37, 41, 53, 61, 67,3121, 3137, 3167, 3203, 3217, 3251]

You may notice that primes 5,11,17,19, do not exist in this factor base.

Tables of logarithms

Create a dictionary of logarithms, called logs:

x log2(x)=ln(x)ln(2)
      3    1.58496
      7    2.80735
    13    3.70044
    23    4.52356
   
3167   11.6289
3203   11.64521
3217   11.6515
3251   11.66667

For example, log2(3251)=logs[3251]=11.66667.


Create a dictionary of antilogarithms, called aLogs:

log2(x) x
  1.58496         3
  2.80735         7
  3.70044       13
  4.52356       23
   
11.6289  3167
11.64521  3203
11.6515  3217
11.66667  3251

For example, desired prime =aLogs[11.6515]=3217.

Create sieve

Preparation

The following code adds to sieve information concerning powers of primes.

A closer look at this information will be presented later.

# python code

sieve = dict()

def powers_of_primes (p,power,L1) :
    '''
x1,x2 = powers_of_primes (p,power,[xa,xb])
This function may return None.

If power == 2 :
L1 contains 2 values of x that satisfy: x**2 /// N (mod p)
This function calculates the values of x for which:
x**2 /// N (mod (p**2))

If power == 3 :
L1 contains 2 values of x that satisfy: x**2 /// N (mod (p**2))
This function calculates the values of x for which:
x**2 /// N (mod (p**3))

and so on.

The method presented here, while simple and accurate, is
suitable for only small primes. As the size of p increases,
this method becomes impractical.
'''
    thisName = 'powers_of_primes() :'
    P_ = p**(power-1)
    P = P_ * p

    if P > 1_000_000 : return None

    L2 = []
    for x_ in L1 :
# Verify that x_ is valid.
        y = x_**2 - N
        if y % P_ :
            print (thisName,'Error 1 for x,p =',x_,p)
            return None
        for k in range(0,p) :
            x = x_ - k * P_
            y = x**2-N
            if y%P : continue
            if not (x&1) : x -= P # x must be odd.
            if not (x&1) :
                print ('error1a') ; continue
            y = x**2 - N
            if y%P :
                print ('error2') ; continue
# Every increment that goes into sieve must be even because:
# Odd x + even increment gives odd x.
            if x in sieve :
# Value of prime p is not included here. It will be recovered later.
# Hence the requirement for dictionary aLogs.
                sieve[x] += [(P<<1,logs[p])]
            else :
                sieve[x] = [(P<<1,logs[p])]
            L2 += [x]
            break
    if len(L2) != 2 :
        print (thisName, 'error2a, len(L2) =', len(L2))
        return None
    return L2

Initialize sieve

# python code

for p in factorBase :
    modulo = N%p
    for q in range (p>>1, 0, -1) :
        y = q**2 - modulo
        if y%p : continue

        base1 = base - (base%p)
        L1 = []
        for x in (base1 + q, base1 - q) :
            if not (x&1) : x -= p # Make x odd.
            if not (x&1) : print ('error3')
            y = x**2 - N
            if y%p :
                print ('error4') ; continue
# Every increment that goes into sieve must be even because:
# Odd x + even increment gives odd x.
            if x in sieve :
                sieve[x] += [(p<<1,logs[p])]
            else :
                sieve[x] = [(p<<1,logs[p])]
            L1 += [x]
        break
        
    last = L1
    for power in range (2,14) :
        this = powers_of_primes(p,power,last)
        if this == None : break
        last = this

Initialized sieve contains 624 valid locations.

Check sieve

It is wise to check entries in sieve because an error in initialized sieve can cause many problems later.


First entry in sieve is :

sieve[24296004819]=[(6166,11.59012)]

Value 11.59012=log2(3083) and 6166=2*3083.

Value 6166 is the decrement, always even.

Decrement must be exactly divisible by associated prime.

Value 24296004819=x and (x2N)%prime must be 0.


The other entry containing this prime is:

sieve[24296002923]=[(3902,10.93),(6166,11.59012)]

There must be exactly 2 instances of each decrement, and difference between associated values of x must not be exactly divisible by decrement.


Here is an entry for prime 3:

sieve[24294998077]=[(1062882,1.58496)]. Value 1.58496=log2(3)

Decrement =1062882=2*531441 and 531441=312.

This decrement appears in the sieve exactly 2 times, at 24294998077 and 24295715435, and (2429499807724295715435)%1062882 is non-zero, correct.

Check: (242949980772N)%(312). This calculation =0, which is correct.


In this way, all entries in sieve are checked.

Activate sieve

# python code.
print ('''
Activate sieve
''')
log_of_maximum_prime2 = 0.9*2*logs[factorBase[-1]]
smooth = []
last_log = 0

# (x_ + a)**2 = x_**2 + 2ax_ + a**2
# (x_ + a)**2 - N = x_**2-N + 2ax_ + a**2
# (x_ + a)**2 - N = x_**2-N + a(2x_ + a)
# x_ = base
base_squared_minus_N = base**2 - N
two_times_base = 2*base

count1 = count2 = count3 = 0

for x in range (max, max-(2*(10**5)), -2) :
    if (x & 0x7FFF) == 1 :
# This piece of code is not normally required.
# It is included here to provide some statistics about sieve during sieving.
        L1 = sorted([ p for p in sieve ])
        min_ = L1[0] ; max_ = L1[-1] ; used = len(L1)
        print ('''
max = {}
min = {}
range of sieve = {}
number of values used = {}
'''.format(max_, min_, 1+max_-min_, len(sieve))
)

    if x not in sieve :
        count1 += 1 ; continue

    values = list(sieve[x]) ; del (sieve[x])

    sum_of_logs = 0
    for v in values :
        increment, log2_ = v
        next = x-increment
        if next in sieve : sieve[next] += [v]
        else : sieve[next] = [v]
        sum_of_logs += log2_

    if sum_of_logs < last_log :
        count2 += 1 ; continue

    a = x - base ;
    y = base_squared_minus_N + a*(two_times_base + a)
    logy = logBase2 (abs(y))
# this_log is < logy to allow inclusion of "large" prime.
    last_log = this_log = logy-log_of_maximum_prime2
    if sum_of_logs < this_log :
        count3 += 1 ; continue
    smooth += [[x,values]]

The following data are representative of statistics collected during sieving.

max = 24295997441
min = 24294230499
range of sieve = 1766943
number of values used = 601

Range of values in sieve was about 1.75 million, and number of entries in sieve at any time was about 600. Template:RoundBoxTop Within this range of values of x only odd values are used. Template:RoundBoxBottom

# python code.

print ('''
Sieving complete:

not in sieve = count1 = {}
(sum_of_logs < last_log) = count2 = {}
(sum_of_logs >= last_log), but (sum_of_logs < this_log) =  count3 = {}
length of ( smooth ) = {}
total = {}
'''.format(count1,count2,count3,len(smooth), count1+count2+count3+len(smooth))
)
Sieving complete:

not in sieve = count1 = 8238
(sum_of_logs < last_log) = count2 = 89543
(sum_of_logs >= last_log), but (sum_of_logs < this_log) =  count3 = 2
length of ( smooth ) = 2217
total = 100000
  • 8,238 values of x were discarded because they did not exist in sieve.
  • 89,543 values of x were discarded because they were not "smooth." Note that decision to discard did not depend on calculation of y or log2(y).
  • 2,219 values of x were examined for smoothness. Of these, 2 were discarded.
  • Array "smooth" contains 2,217 members.

Process array "smooth."

Array "smooth" contains entries similar to this example:

[
24296003881, 
[(106, 5.72792), (386, 7.59246), (562, 8.13443), (54, 1.58496), (18, 1.58496), (14, 2.80735), (6, 1.58496)]
]

24296003881 is value of x.

Antilogs are replaced by the corresponding prime.

[(106, 53), (386, 193), (562, 281), (54, 3), (18, 3), (14, 7), (6, 3)]
  • decrement 106=253.
  • decrement 386=2193.
  • decrement 562=2281.
  • decrement 54=227=233.
  • decrement 18=29=232.
  • decrement 14=27.
  • decrement 6=23.
  • If there is entry for 33, there must be entry for 32.
  • If there is entry for 32, there must be entry for 3.
# python code.
>>> N,x
(590295810358705651708, 24296003881)
>>> y = abs(x**2 - N) ; y
5773138589547
>>> q,remainder = divmod (y, 53*193*281*3*3*7*3);q,remainder
(10627, 0)
>>>

Product 53*193*281*3*3*7*3 divides y exactly.

Quotient 10627 is a "large" prime. Without the powers of 3, this entry may have been missed.

This entry becomes :

[
24296003881, 
[53, 193, 281, 3, 7, 10627]
]

When a prime is included here, it was used an odd number of times above.

Template:RoundBoxTop Here is an example of a smooth value of y:

[24295805845, [3203, 641, 577, 547, 127, 13, 3, 3]]

There is no large prime. This entry becomes:

[24295805845, [3203, 641, 577, 547, 127, 13]]

Prime 3 is not used. It did not appear an odd number of times. Template:RoundBoxBottom The smooth array is processed as necessary to eliminate any entry containing a prime used only once. Any prime in smooth is used at least twice.

Here is an example deleted because the large prime was used only once:

[24295822145, [619, 181, 157, 83, 3], 2017529]

Here is an example retained because the large prime was used twice:

[24296002963, [797, 83, 3, 3, 23, 3, 3], 408803]

This entry becomes:

[24296002963, [797, 83, 23, 408803]]

Recall that the range of values of x examined by the sieve was 200,000.

Yet, here we see large prime 408,803 used twice. How is this possible?

Because prime 1732331 was found twice, the same question can be asked for these entries:

[24295960025, [863, 547, 67, 13, 3, 1732331]]
[24295924525, [1277, 857, 97, 7, 3, 1732331]]

Here is an example of a large prime used 3 times:

[24295844383, [1259, 983, 863, 67, 12043]]
[24295854167, [2179, 1093, 443, 191, 3, 12043]]
[24295964813, [2347, 661, 641, 53, 3, 12043]]

If you look closely, 2429596481324295844383=120430=1204310. Why only 3 entries for prime 12043?


Result is array smooth containing 286 entries. Of these 96 are smooth. 190 contain large primes.

The total number of primes is 281. Of these 91 are large primes. You can see that large primes are almost a third of all primes used.

Note that number of entries is more than number of primes. This indicates that the probability of finding perfect squares is good. Template:RoundBoxTop Although this array is called "smooth," the numbers show that it is not smooth. Perhaps a name such as "useful" would be better.


91*2=182, less than 190 containing large primes. This shows that some large primes must be used at least 3 times. Template:RoundBoxBottom

Produce results

Create and process matrix as in example above.

Remember that all values of y are negative. Use last entry of matrix as source and combine this source with all other lines in matrix. This eliminates factor 1 from matrix.

6 exact squares were produced:

0x441c794316262b96cba6611846d4d08f140a9112094634a0a4005ca0a55049f3242a9a
0x191d545b83a46c88439a5c3aca122ec85583100679c867ecb95c568398b33ae37c35f49
0x2108304080466a0a6b086014a1d2fb252586943d48ac34aa938041431a58210034283a4
0x421979c5e5689e36142c1b3ee2158332eab3a2f21ba1ba9bab1fff1ce2cabeec565245d
0x89e6c569cd412504035f785fb66f3cb43542eb37a09f8572451618e98153cc889520475
0x1093981134aa745002d0c560f80a3686d00c76b3d3e7e44aefbf81350998e106ad09d833

First exact square produced the trivial solution. All others produced the non-trivial solution.

Here are details about solution number 2:

0x191d545b83a46c88439a5c3aca122ec85583100679c867ecb95c568398b33ae37c35f49

Bits in this solution represent values of x:

24295805611  24295808157  24295809107  24295810993  24295813401  24295814777
24295815203  24295815539  24295817177  24295817523  24295818143  24295822355
24295822809  24295823345  24295823809  24295824387  24295826851  24295829685
24295834427  24295834433  24295834903  24295835917  24295836679  24295840049
24295840069  24295842793  24295845053  24295850555  24295851025  24295852037
24295856707  24295856797  24295858601  24295859223  24295860675  24295864007
24295868039  24295869383  24295870685  24295872787  24295874983  24295875361
24295875395  24295877125  24295882233  24295887633  24295888671  24295889123
24295889763  24295891739  24295891827  24295894117  24295896665  24295897715
24295897757  24295898563  24295898639  24295900473  24295900529  24295902379
24295903131  24295903735  24295903845  24295906899  24295907743  24295909587
24295909633  24295915013  24295915591  24295925549  24295928591  24295928827
24295933043  24295933489  24295935911  24295937075  24295938625  24295941777
24295944341  24295946505  24295947657  24295947869  24295948051  24295949051
24295950065  24295950821  24295955895  24295956829  24295958335  24295959525
24295960117  24295962247  24295963279  24295963477  24295965661  24295965791
24295965927  24295966189  24295967757  24295969127  24295970477  24295972089
24295972745  24295973099  24295976427  24295978303  24295981347  24295981883
24295982671  24295982999  24295983685  24295984517  24295985429  24295985575
24295987391  24295987487  24295992373  24295993279  24295993381  24295994227
24295994959  24295996271  24295997723  24295997941  24295998495  24295998895
24296000299  24296001081  24296001127  24296002045  24296002715  24296002963

x=776937957  4216461181640625, integer containing 1371 decimal digits.


y=128050773  5131604744727049, integer containing 1021 decimal digits.

# python code.
    gcd1 = iGCD (x+y,n)
    print ('gcd1 =',gcd1)
    gcd2 = iGCD (x-y,n)
    print ('gcd2 =',gcd2)
gcd1 = 193707721
gcd2 = 761838257287

Review

Example #2 above began with 100,000 simple operations on the sieve. Yes, the range of the sieve was about 1.75 million. This means that there were entries in bottom of sieve that were never used.

The 100,000 simple operations produced about 2,000 items of data worth investigating, This investigation produced a matrix of length 286 from which 5 non-trivial congruences were derived.

The original factor base contained 230 primes. Of these, 190 were used, leaving 40 unused.


If it seems that there was "too much data," this is not a problem. If you find yourself with not enough data, this really is a problem. Template:RoundBoxTop n is in fact 2671, called M67, a number made famous by Professor Frank Nelson Cole in 1903.

Before the era of computers he produced the factors of n with pencil on paper, a magnificent achievement. Template:RoundBoxBottom

Professor Cole's Paper

Mersenne prime

Carl Pomerance

Quadratic sieve

RSA-129