Confirmation of Results for Edges Only Cube, Face Turn Metric, Reduced by Symmetry

Even though it's only confirmation of some rather ancient results, I would like to report that I have succeeded in replicating Tom Rokicki's 2004 results concerning the edges only cube in the face turn metric reduced by symmetry.  My goal was not really to solve that particular problem.  Rather, it was to use that problem as a way to prototype some ideas I have had for improving my previous programs that enumerate cube space.

The base speed of my program is that I am now able to enumerate about 1,000,000 positions per second per processor core when not reducing the problem by symmetry, and I am now able to enumerate about 150,000 symmetry representatives per second per processor core when reducing the problem by symmetry.

I use the term "base speed" because my algorithm still runs into problems with an excessive number of duplicate positions when it gets into the tail of a search space, for example when searching out to 12, 13, and 14 moves from Start in face turn metric for the edges only cube.  There are 87,801,812 positions that are 7 moves from Start, there are 555,866 positions that are 5 moves from Start, and there are 304,786,076,626 positions that are 12 moves from Start in the face turn metric for the edges only cube.  My program can store out to 7 moves from Start, and to enumerate out to 12 moves from Start it must form the product of all the positions that are 7 moves from Start with all the positions that are 5 moves from Start, for a total of 48,806,042,029,192 products.  Therefore, there are about 160 times as many products as there are positions that are 12 moves from Start.  That's a lot of wasted time producing products that result in duplicate positions.  Many of the ideas I have used to improve my programs involve mitigating that particular problem.

The net of all the improvements is that my program ran for about 2.5 days on my laptop computer.  It's a Dell Inspiron N5010 with two dual core processors each running at 2.67 GHz, running 64 bit Windows 7 with 8GB memory.  Therefore, my machine has four processor cores, and my program is multithreaded.  The net speed of all four threads after dealing with all the duplicate positions was 24,000 symmetry representatives per second per processor core.

I plan to continue for a while with the existing program, running it for the quarter turn metric, adding support for anti-symmetry, etc.  But the real goal will be to include include the corners along with the edges to see how far from Start I can enumerate cube space.

I'm very doubtful that I will be able to get anywhere near as far from Start as others already have done with supercomputers and server farms just by using my home computer.  But I think it will be fun project anyway.  And I probably won't have to deal with the duplicate position problem in the tail of the search space because I won't be able to enumerate the edges plus corners out far enough from Start for tail of the search space to matter.

Comment viewing options

Select your preferred way to display the comments and click 'Save settings' to activate your changes.

Thanks, Jerry!

There's a lot to read and understand here, and I just wanted to thank you for writing it all up. I'm sure I'll have more to say once I've digested it all. You might consider how well the ideas apply to the 4x4x4; with its larger branching factor, your approach may win over existing approaches.

I can now report that I have

I can now report that I have mostly confirmed Tom's results for the edges only cube in the quarter turn metric reduced by symmetry. I am able to store out to 8q from Start, so I am able to enumerate out to 16q from Start. The diameter of the Cayley graph for the edges only group in the quarter turn metric is 18q. Hence, I can confirm Tom's results out to 16q but not all the way to 18q. Reduced by symmetry, there are only 3 positions at 17q and only 1 position at 18q, so I have confirmed Tom's results for all but the last 4 positions. There are ways I could have determined the length of these last 4 positions, but since the problem was already solved twelve years ago doing so hasn't seemed worth the effort.

My program ran 4 days (almost exactly) in the quarter turn metric and 2.5 days in the face turn metric. I need to analyze the details a little further, but there are two reasons for this difference. One is that my algorithm has a problem with duplicate positions in the tail of the search space, and the quarter turn metric has a longer tail than does the face turn metric. The other reason is that I worked harder on mitigating this problem in the face turn metric than I did in the quarter turn metric. There were some improvements I could have made in quarter turn metric, but that would have required a lot of coding and testing. So I decided just to go ahead and run the quarter turn metric and see what happened. The results were fine, but if I were to proceed further with this project I would go back and work as hard in the quarter turn metric to try to mitigate the problem of duplicate positions in the tail of the search space as I did for the face turn metric.

Jerry

Same screenshot for face turn

Same screenshot for face turn metric. The program ran for 11 hours.

Jerry

  0              1              1              1
  1             18              2              2
  2            243              9              8
  3           3240             75             48
  4          42807            925            505
  5         555866          11684           6018
  6        7070103         147680          74618
  7       87801812        1830601         918432
  8     1050559626       21890847       10960057
  9    11588911021      241449652      120788522
 10   110409721989     2300251615     1150428080
 11   552734197682    11515452614     5759027817
 12   304786076626     6349914756     3176487580
 13      330335518        6896891        3500434
 14            248             24             24

total positions    =    980995276800
total positions_p  =     20437847376
total positions_p2 =     10222192146

cpu: 624293 seconds, system: 159.67 seconds, wallclock: 39892.4 seconds      

positions per cpu second         =    1571369.6
representatives per cpu second   =      32737.6
representatives2 per cpu second  =      16374.0
positions per wall second        =   24591061.4
representatives per wall second  =     512325.0
representatives2 per wall second =     256244.4

I didn't have time to work fu

I didn't have time to work further on the program for several months, but I have recently added anti-symmetry to the results. The results continue to confirm Tom's results from 12 years ago. Also, I have come into possession of an ancient server (about 15 years old) with 64GB of memory and 4 quad-core processors (16 cores total). A friend of a friend purchased the machine on eBay for $25.00, and my friend is loaning it to me. The processors are slow, but there are a lot of them and the 64GB of memory has allowed me to run the quarter turn metric for the edges all the way to the end. Here follows a screen shot from my latest run. It took about 13 hours.

Jerry

  0              1              1              1
  1             12              1              1
  2            114              5              5
  3           1068             25             17
  4           9819            215            128
  5          89392           1886            986
  6         807000          16902           8652
  7        7209384         150442          75740
  8       63624107        1326326         665398
  9      552158812       11505339        5759523
 10     4643963023       96755918       48408203
 11    36003343336      750089528      375164394
 12   208075583101     4334978635     2167999621
 13   441790281226     9204132452     4603365303
 14   277713627518     5785844935     2894003596
 15    12144555140      253044012      126739897
 16          23716            750            677
 17             30              3              3
 18              1              1              1

total positions    =    980995276800
total positions_p  =     20437847376
total positions_p2 =     10222192146

cpu: 731960 seconds, system: 170.8 seconds, wallclock: 47056.4 seconds

positions per cpu second         =    1340230.1
representatives per cpu second   =      27922.1
representatives2 per cpu second  =      13965.5
positions per wall second        =   20847234.8
representatives per wall second  =     434326.9
representatives2 per wall second =     217232.9

I thought I would provide a f

I thought I would provide a few details about my new program. It does things in much the same way as most of my older programs have done. In particular, it represents cube positions as an ordered pair of words from an alphabet of 24 letters, one word for the edges and one word for the corners. Each letter represents one color tab. The words are permutations, so each letter from the alphabet appears exactly one time in each 24-letter word.

In my computer code, each letter is a number from 0 to 23, but outside of my computer code I like to think of the letters as the letters a to x from the standard English alphabet. As letters, I do not need to include punctuation to write the words. As numbers, some of the numbers are single digit numbers and some of the numbers are two digit numbers, so I have to write the numbers with punctuation.

In theory, the exact representation of the cube positions shouldn’t matter all that much with respect to the algorithm that is used. But in my case, I have always been oriented towards producing positions in lexicographic order and the easiest way for me to deal with a lexicographic ordering of cube positions has been to represent the positions as words from an alphabet.

For the edges, I need to store only 12 of the 24 letters. For the corners, I need to store only 8 of the 24 letters. That’s because if you know where one of the color tabs for a cubie is located, then you also know where the other color tabs for the same cubie are located. My new program stores both the edges and corners, but at the present time it processes only the edges while ignoring the corners. I wanted to use Tom’s previous results for the edges to prototype my new program, but I also wanted my cube model to include the corners even if I wasn’t yet processing the corners.

We let P[n] be the set of all positions that are n moves from Start. My new program continues with the familiar paradigm where for sets of positions S and T we form the product Z=ST, where Z=P[j+k], S=P[j], and T=P[k]. Many of my previous programs produced the products st in lexicographic order as the way of keeping track of which positions have already been visited. While my new program does take extensive advantage of lexicographic ordering, it does not depend on lexicographic ordering to keep track of which positions have already been visited. Rather, it has a large array of bits, one bit per position, and it turns on the bit for each position when that position has been visited. The number of bits for the entire problem is too large to fit in memory, so my new program has to decompose the problem into cosets.

My program uses stabilizer subgroups to define the cosets into which the problem is decomposed. In the scheme I use, a stabilizer subgroup could fix 0 through 12 of the letters. My program can handle stabilizer subgroups which fix anywhere from 2 to 10 letters. When 2 letters are fixed, the subgroup size is 1,857,945,600 positions and when 10 letters are fixed, the subgroup size is 4 positions. The coset sizes are of course the same as the subgroup sizes. So a wide range of coset sizes is available to the program when using a 3-letter stabilizer subgroup, when using a 4-letter stabilizer subgroup, etc. The performance of the program is the best when the coset sizes are as large as possible, but smaller coset sizes are very useful for program development and debugging.

1,857,945,600 sounds like a pretty large number of bits to store in memory, but it’s only 232,243,200 bytes which isn’t so bad on my 8GB machine. I do have to be cautious about this number because each thread has to have its own separate copy of this array of bits, but it’s manageable. If necessary, going from 2-letter stabilizer subgroups to 3-letter stabilizer subgroups is only a minor hit on program performance while reducing the size of the bit array by a factor of 20. With 2-letter stabilizer subgroups, each coset is defined by the first two letters of the words for the permutations in the coset. So there are 528=24*22 cosets with 2 letters– the ab coset through the xw coset. That is, there are 24 ways to choose the first letter of each word, 22 ways to choose the second letter of each word, 20 ways to choose the third letter of each word, etc.

My program builds up positions in Z one letter at a time. To use a 4-letter example rather than a 24-letter example, it builds the word abcd as a, then as ab, abc, and abcd. The next word after abcd is abdc, but it would first backtrack from abcd to ab, and then proceed to abd and to abdc, etc. As it is enumerating possible Z values in this manner, my program is also making lists of possible S and T values which when multiplied together would yield the appropriate Z value. Still using 4-letter examples, we can produce Z=a as a???×a???, as b???×?a??, as c???×??a?, or as d???×???a. So what my program spends most of time actually doing is building up these lists of S and T values which, if multiplied together, would produce the desired Z value. Curiously, the program never actually forms the products, nor does it have to. It is sufficient to know that if the products were formed that the appropriate Z values would be produced.

It might appear from the previous paragraph that my program is producing products in lexicographic order after all, and that perhaps that I don’t need the bit array which keeps track of which positions have been visited. However, there are several reasons why I do need the bit array of visited bits. First of all, we run through this process, for example, for P[7]P[1], then for P[7]P[2], then for P[7]P[3], etc. My programs from many years ago integrated these processes together with a tournament of losers algorithm, but the algorithm was just too slow and just require too much memory for my present needs. And even within one particular distance from Start, such as P[7]P[2] to calculate all positions that are 9 moves from Start, there are a lot of moving parts in keeping track of all the lists of S values and T values and Z values that match each other properly. Therefore, it has proven very useful to do the problem in a very piecewise fashion. So the positions may be visited in a way that may be lexicographically ordered in a piecewise fashion but which may not be lexicographically ordered over all. Visiting the positions in any way other than lexicographically ordered all at one go seems to require an array of visited bits.

An additional reason for the visited bits has to do with mitigating the problem of duplicate positions slowing the algorithm down. Consider for example the situation after building up the first the first 10 letters of a 12 letter word for an edge position. Remember that an edge is really represented as 12 letters from a 24 letter alphabet rather than as 24 letters from a 24 letter alphabet because we only have to store one color tab from each edge cubie. After 10 letters, 20 of the letters will have been used up and 4 letters will remain unused. So there are 4 ways to choose the 11th letter. After we have visited each of those 4 positions, there is no reason to visit any of them ever again. So we create another array of visited bits, this time not for complete cube positions but rather for the 10th letter. The bit for the 10th letter is turned on if the bits for all of its 11th letters have been turned on.

As the process continues, visiting the positions in a piecewise lexicographic order, we will eventually be able to identify positions as being duplicate after only building up 10 of its letters rather than having to build up 11 of its letters. Note that we never really have to build up the 12th letter because there is just one choice for the 12th letter. Being able to identify a duplicate position one letter early may not sound like very much, but what the program is really doing is walking a lexicographic tree of positions over and over again. When walking a tree, the most time is nearly always spent near the leaf nodes, and the process I’m describing where the program is keeping up with lists of S values, T values, and Z values spends most of its time at or near the leaf nodes of the lexicographic tree. So small trimmings of the tree do make a big difference in the performance of the tree.

And of course we do not stop with adding an array of visited bits for positions after the 10th letter, we add an array of visited bits for positions after the 9th letter, after the 8th letter, etc. – all the way back the 3rd letter. We can’t go back an further than that because the largest arrays of visited bits we can store are when we have 2-letter cosets.

The array of visited bits for the 10th letter is 1/4 the size of the array for the 11th letter. The array of visited bits for the 9th letter is 1/(6*4) the size of the array for the 11th letter. The array of visited bits for the 8th letter is 1/(8*6*4) the size of the array for the 11th letter, etc. So these additional arrays do not take up much additional space as compared to the original array of visited bits.

Strictly speaking, these new arrays should not be called visited bits. They are not turned on just because a particular pattern of the first k letters of the word has been visited. Rather, they are turned on only after all the words with the same pattern of the first k letters of the word have been visited. A useful interpretation of these bits is that each one of them represents a nested coset, and that the bit is turned on when all the positions in the respective coset has been visited. And for the 11 letter, we simply have the trivial coset consisting of only one position.

Finally, these arrays are the mechanism by which symmetry is managed. The program calculates a symmetry representative for each equivalence class xM where x is a position and M is the set of 48 symmetries of the cube. For the (up to) 48 positions in xM, the program chooses the first position in lexicographic order from xM as the symmetry representative. When taking symmetry into account and when building up Z positions one letter at a time, the program only visits those positions that are symmetry representatives.

Sometimes the program can determine that a position is not a symmetry representative after building its first letter, sometimes after building its second letter, etc. These nodes that are not symmetry representatives become cutoff nodes for the transversal of the lexicographic tree. So before the program even starts building up positions for a coset and making lists of S and T values, it goes through the visited arrays for the whole coset and marks each bit for non-representative positions as already visited. Well, such positions have not been visited, but because they are not symmetry representatives they are marked never to be visited.

Indeed, the program uses the same process to eliminate entire 1-letter and 2-letter cosets from consideration. For example, any word starting with the letter d can be mapped by symmetry into a word beginning with the letter b. Hence, no position starting with the letter d can ever be a symmetry representative. This process reduces the number of cosets to be processed from 528 down to 116, and many of the 116 cosets contain very few symmetry representatives.