# About 490 million positions are at distance 20

Even though we now know the diameter of Rubik's Cube group in the half-turn
metric, there is still much yet to be discovered. The diameter in the QTM
and the STM are unproved (although they are almost certainly 26 and 18,
respectively). The exact count of positions at distance 16, 17, 18, 19,
and 20 in the half-turn metric is unknown. This note reports some progress
on an estimate for the count of 20's in the half-turn metric.

It is fascinating to me how problems of distinctly different difficulty
exist around the 3x3x3 cube in the half-turn metric. Initially, back in
the early days, we could solve individual positions non-optimally.
With appropriate ideas and some moderate computer power (about 1/1000
of a modern cell phone), we got to the point where we could solve
arbitrary positions near-optimally, with Kociemba's algorithm.
Eventually ideas and computer speeds got to the point where we could
solve arbitrary positions optimally with Korf's solver (initially in
about a day, now in much less than a second). With some further ideas
and effort we got to the point where we could solve huge chunks of the
space optimally at a high rate of speed, and even, with a heroic
amount of CPU, calculate the actual diameter of the group. But even
today, finding a distance-20 position is difficult---they are rare,
and proving a candidate position to be at distance 20 still requires
a few minutes of computer time.

I have made some earlier approximations of the count of distance-20
positions in this forum, but the uncertainty in those numbers has
always been very high. About a year ago I decided to work towards a
better estimate. To obtain this estimate, I solved 2500 randomly
chosen cosets of Kociemba's group (the group generated by
{U,D,F2,R2,B2,L2}), each with 19,508,428,800 positions. This took
about five months of CPU time, distributed across a number of
machines, and found optimal solutions for 48,771,072,000,000
positions at a rate of about four million optimal solves per CPU second.
This effort netted a total of 647 distance-20 positions (not all of
which were new), which is an average of 0.2588 distance-20 positions
per coset. Since there are 2,217,093,120 total cosets, simple
multiplication predicts a total of 574 million distance-20 positions.

Unfortunately, the variance of the sample set was very high. More
than 90% of the cosets had no distance-20 positions (2286 of the 2500).
Most of the remainder (142) had only one. And the cosets with the
greatest count of distance-20 positions had 159 and 59 such positions,
respectively. The standard deviation of the sample was 3.5, more than
thirteen times the mean; thus, the estimate derived from this sample was
highly uncertain.

Luckily, there is a way to reduce this uncertainty without too much
additional work. There is a fair amount of variability in the count
of length-19 canonical sequences that lead to each coset, and a
simple dynamic programming program can calculate, for each coset all
at once, how many length-19 canonical sequences lead to that coset.
Intuitively, the more length-19 canonical sequences that lead to a
coset, the fewer distance-20 positions that coset is likely to have,
and this is supported by all of the empirical evidence as well. Thus,
distance-20 positions are concentrated in those cosets with few
length-19 canonical positions, so we should sample those cosets more
heavily.

I created a list of all 138,639,780 symmetry-distinct cosets, sorted
by a decreasing count of distance-19 canonical sequences, and assigned
each an index from 1..138,639,780 based on that ordering. I broke
this set into four subsets and, for the three subsets with higher than
average incidence of distance-20s, I calculated additional sample points.

It turns out that the cosets with a higher incidence of distance-20s
are, in some sense, faster to calculate. We compute the distance-20
positions by performing a full search out to distance 19; the greater
the number of length-19 canonical sequences, the longer this takes.
Thus, those cosets that yield more distance-20 positions are usually
faster to run to completion in addition. (This observation breaks
down when you get further down the list, because for the majority of
the cosets it's sufficient to search through depth 18, do a "prepass"
to depth 19, and solve the few remaining positions with an optimal
solver, but for the portion of the space that is rich in 20s, the
observation holds true.)

The first sample set was for the cosets numbered from 1 to 5,000. I
solved every such coset in this sample exactly (I have been doing this
for some time). From these 5,000 cosets I obtained a total of 884,425
distance-20 positions (not all unique, but we will handle that below).
The mean is 176.9 and the standard deviation is 147.5. The high
standard deviation does not matter because we explored the entire
set, so there is no sampling error.

The next sample set was for the cosets numbered 5,001 to 15,000. I
solved 200 of these cosets, found 16,239 distance-20 positions,
with a mean of 81.645 and a standard deviation of 50.0. Extrapolating
these statistics, we expect a total of 816,450 distance-20
positions from this range.

The next sample set was for cosets numbered 15,001 to 1,500,000.
I solved 297 of these cosets, found 2,129 distance-20 positions,
with a mean of 7.168 and a standard deviation of 8.429. Extrapolating
these statistics, we expect a total of 10,645,000 distance-20
positions from this range.

The last range is for cosets numbered 1,500,001 to 138,639,780.
I solved 2479 of these cosets (these were entirely from the
original 2500 cosets solved that were indexed at greater than 1,500,000).
From this subset, I found 334 distance-20 positions, with a mean
of 0.1347 and a standard deviation of 0.7233. Extrapolating, we
expect this range to yield a total of 18,477,082 distance-20
positions. The ratio of standard deviation to mean is still
extremely high, but it is much better than before; in addition, the
range only accounts for about 60% of the total so the impact of
uncertainty is reduced.

The total expected distance-20 positions from all four ranges added
up is thus 30,822,957. Since we only explored symmetrically-unique
cosets, we need to multiply this by the 16 symmetries of the Kociemba
subgroup to obtain a final total of 490,000,000 distance-20 positions.
I believe the true count is likely to be within 10 or 15 percent of
this estimate.

It is striking that there can be so many distance-20 positions (more
than one for every man, woman, and child in the United States), yet
they are so rare (fewer than one in 80 billion positions overall,
equivalent to finding a single grain of sand in a dump truck full
of sand). This is why I am fascinated by them; they are rare, yet
plentiful, and very hard to find.

I am continuing my search for distance-20 positions. I occasionally
get another idea for speeding up the program, and I am still mining
the richest portion of the cosets, so success is frequent; I am
currently finding about 200,000 new ones a day using multiple
machines. As I get further down the list of cosets, solving each
coset gets harder, and yields fewer new distance-20 positions.

Including all the cosets listed above, and additional cosets solved
in prior experiments over the years, and the early results of Radu
and Kociemba in solving all symmetrical positions, I presently have
a total of 63,222,090 distance-20 positions, or about one in eight.
My computers continue churning through cosets extending this set.
In a few decades it may be child's play to list all distance-20
positions, but in the meantime, I will continue to chip away until
I have them all.

## Comment viewing options

### Length of the neighbors of the 20f* positions

Each position in the face turn metric has 18 neighbors. Have you looked at the length of neighbors of the 20f* positions?

The reason I ask is as follows. Because all positions in the face turn metric can be solved in 20 moves or less, the 20f* positions are all local maxima. In the face turn metric, a local maximum can be a weak local maximum or a strong local maximum. The distinction is that for a strong local maximum, all 18 neighbors are closer to Start than the position itself. For a weak local maximum, one or more of the neighbors is the same distance from Start as the position itself.

In my experience, most local maxima in the face turn metric are weak local maxima. If that is true of the 20f* positions (and it may not be), then there will be a certain clustering of the 20f* positions within the Cayley graph of the face turn metric. However, I can't even remotely picture if there is any relationship between clustering in the Cayley graph and clustering in a coset of the Kociemba group. Which is to say, I can't picture if making one move tends switch to a different coset or to remain in the same coset.

Jerry

### Connected 20s

Wow, I did a quick analysis on the 735K known distance-20 positions I have (mod M+inv) and was surprised how "unconnected" they were.
```735099 1
652 2
77 3
13 4
4 5
3 6
1 8
```
Almost all were completely isolated. The biggest connected component (using both moves and premoves) is only of size 8. I really thought there was going to be a big one with 100 positions or something. What this tells me is almost all 20f*'s are strict local maxima. Even though we only have about 1 in 8 20f*'s in this set, the space we searched is highly connected because of the coset approach.

### the cluster of 8?

I've become interested in modeling the outer surface of cube space despite how computationally hard that is and my not having a great amount of mathematical training. If we took cube space and reduced it to a solid 3D projection with the solved cube in the middle and the harder solves outside what do you think the surface might look like with these mostly separate peaks of 20f on top of a somewhat discontinuous surface of 19fs top of???

I've found a cluster of five 20f positions. Can I ask for a scramble in the cluster of 8? Thank you so much.

### Cluster of 9 (6 unique mod M+inv)

Here is a cluster of 9 (by position):

There are only six unique mod-M+inv in this set of 9.

FL RU BL LU FR RD BR LD UF DF UB DB FDR BRD URB UFR DFL FUL BLU DLB C_4h A 8
FU RU BL LU FD RD BR LD RF LF UB DB FLD BRD URB RFD LFU FRU BLU DLB C_4 A 4
FU RU BD LU FD RD BU LD RF LF LB RB FLD BUR LUB RFD LFU FRU BDL RDB C_4 A 4
FR RU BD LU FL RD BU LD DF UF LB RB FUL BUR LUB DFL UFR FDR BDL RDB C_4 A 4
FR RU BR LU FL RD BL LD DF UF DB UB FUL BLU DLB DFL UFR FDR BRD URB C_4h A 8
FD RU BR LU FU RD BL LD LF RF DB UB FRU BLU DLB LFU RFD FLD BRD URB C_4 A 4
FD RU BU LU FU RD BD LD LF RF RB LB FRU BDL RDB LFU RFD FLD BUR LUB C_4 A 4
FL RU BU LU FR RD BD LD UF DF RB LB FDR BDL RDB UFR DFL FUL BUR LUB C_4 A 4
FU RU BU LU FD RD BD LD RF LF RB LB FLD BDL RDB RFD LFU FRU BUR LUB C_4h A 8

The scrambles are here:

B1L1B3L3F2D3R1B3U1D2B1L1B2R1D1R3B2R2B2R3
L2F3L2D1F2B2L1B3R2B1D3R3F1U1L1R2D2B2L3R2
F3U1B3U3F2L3D1B3L2R1B1U1B2D1L1D3B2D2B2D3
L1B3L3F2D3R1B3U1D2B1L1B2R1D1R3B2R2B2R3F2
L3B2F3L3B1D1L1U3B3R3F3D1L2F1D2F3B2L1D2L2
L3F1L1B2D1R3F1U3D2F3L3F2R3D3R1F2R2F2R1B2
B1U3F1U1B2L1D3F1L2R3F3U3F2D3L3D1F2D2F2D1
L2B1L2D3B2F2L3F1R2F3D1R1B3U3L3R2D2F2L1R2
U1L1U2B2R2F3D1L3F3B3D3F1R3F1B1R2B3D2R2D3

### I am wondering if the cluster

I am wondering if the cluster size counts are the "true size" of each cluster, or only the number of symmetrically/antisymmetrically distinct elements within each cluster.

### Number of symmetrically/antis

Number of symmetrically/antisymmetrically distinct elements.

I also have the other number, but I can't put my hands on it
right at the moment.

I'll report the actual size-8 cluster when I get home to my data.

-tom

### Tom, much thanks for checking

Tom, much thanks for checking on this. It's quite interesting how isolated the distance-20 positions are, and especially that for distance-20 positions the inverses of strong local maxima are also nearly always strong local maxima (i.e., your results from using both moves and pre-moves). These are very impressive results.

Jerry

### Howdy, Jerry! This is an i

Howdy, Jerry!

This is an interesting idea. I've pursued it earlier when I had fewer positions, but with 600K+ unique positions (mod M+inv) it's getting difficult. Even with my new super-fast cube solver, that's a lot of 20's to solve.

Since a coset is defined as aS*, where a is a representative position and S is the set of moves {U,F2,R2,D,B2,L2}, we know immediately that solving a coset also finds implicitly all neighbors connected by 10 of the 18 moves. So at least some data exists.

I'll do some analysis of the existing positions, in that I'll see how many clusters I have and how large each is based on the set I have; that should be relatively easy to do. It will be interesting to determine the largest connected component. I'll use both moves and "premoves" (that is, both left and right group multiplication by the 18 generators).

My intuition says most 20f*s are strong maxima.

-tom

### I wonder how you can compute

I wonder how you can compute fast the exact number of 19 move maneuvers, which "lead to a coset". Is this the same or something different than the number of 19 move maneuvers which "solve this coset", that is bring it into H? Maybe you can give some details here how you did this.

### Calculating the count of canonical sequences that "reach" a cose

The cosets of Kociemba's group partition the group G into cosets. Every canonical sequence takes the solved sequence into an element of one of those cosets. We wanted to sort the cosets according to how many canonical sequences took the identity into that coset, because we believed the count of distance-20 positions in a coset was strongly and inversely correlated with the canonical sequence count.

To compute this value, we used dynamic programming to compute the function

```   cs(d, c)
```
where d is the length of the canonical sequence, and c is the specific coset. This function cannot be decomposed recursively, but a related function can:
```   cs(d, c, f)
```
This counts canonical sequences ending in a turn of face f, that end in coset c, and have length d. The recursive formulation is simply
```   cs(d+1, c, f) = sum(m, p) cs(d, c . m, p)
```
where m is a move of face f, and p is a face such that the sequence of face turns (p, f) is permitted. (For instance, a move of face U followed by a move of face D is permitted, but not the reverse, and not two consecutive moves of any given face.)

Then, of course,

```   cs(d, c) = sum(f) cs(d, c, f)
```
where f is any face of the cube.

To compute cs(d, c, f), an enormous table can can be used; the total state space is 19 (lengths 1..19) times 6 (faces) times 2,217,093,120 for a total of 252,748,615,680 entries. This can be reduced by computing the values in order of d and by reducing the space of cosets by symmetry (since values between two cosets related by symmetry will be the same); this cuts the entry requirements down to 2 times 6 times 138,639,780 which is 1,663,677,360. Using single-precision floating point, this requires only 6.5GB of RAM.

(I originally performed this calculation using doubles on a 4GB machine, by using Hoey syllables and axis instead of faces, and by storing the table for each value of d into a disk file, and rereading those disk files when calculating a new value of d, but in these days of 16GB machines such heroics are no longer needed.)

Since I'm on the topic, I'll share some of the results. We know there are 3,292,256,598,848,819,251,200 canonical sequences of length 19; this means the average coset has about 1,484,942,860,158 canonical sequences that lead to it. The coset with the fewest canonical sequences to depth 19 is

```   F3U2R3D3B3L1F1U1R1D3F3
```
It has only 558,579,323,856 length-19 canonical sequences that lead to it (this is about 37.6% of the average.) Out of 19,508,428,800 distinct positions in the coset, these sequences (and the shorter canonical sequences that lead to it) manage to miss 2,767 distance-20 positions. This is almost certainly the coset with the greatest number of distance-20 positions; the first-place coset is at index 738 on my list, with 4367 distance-20 positions; that coset has 634,379,723,072 length-19 canonical sequences that lead to it:
```   L3U2B1R3F3U3L3D3R2F1U3R3
```
The coset with the greatest count of length-19 canonical sequences, with 14,979,482,126,237,464, or about 10,000 times the average, was the subgroup itself.

Almost certainly the only cosets with 1000 or more distance-20 positions are these:

```   index sym  20s representative
----- --- ---- ------------------------
738  8  4367 L3U2B1R3F3U3L3D3R2F1U3R3
1  8  2767 F3U2R3D3B3L1F1U1R1D3F3
92  8  2221 B3L3F3R1U1R2B3D2L1F3U3F3
4  4  2160 R3D2B1U1L1F3R3D3B3U1R3
24  4  2130 F3R3D2L3D1B1U3R1U2L1F3U3
8  2  1861 R3U3R2F1L3D1B3U1F2L3F3
26  2  1533 B3R2U1L3B1F1R1D3U3L3F3
14  2  1485 F3R2U3L3B3F3R1D1U1L3F3
23  2  1434 B3R2U1L3B1F1R1D3U3L3F3U3
6  1  1390 F3R3D1U1L1B3F3R3U3L2F3
71  1  1318 R3D1L2B1U1L3B2F3U1B1R3
7  2  1287 F3L1D2R1D3B3U1L3U2R3F1
2  4  1195 R3D2F1L1U3L3B1R1B2D1F3U3
227  1  1157 F3R1D1L3R2B3F2D1F1R3
829  4  1117 L3B2F1U1R1B1D1L1F1U2R3
90  2  1038 L3U2B1U3F1D3F3R3B2F3L3
28278  8  1036 R3F2D3B1R2U1L1D1B1R1F3U3
154  1  1031 R3D1L2B1U1L3B2F3U1B1R1
```
The first value is the index, the second is the symmetry of the coset, the third is the count of distance-20 positions, and the fourth value is a representative position. The coset at index 28,278 had 734,351,183,840 length-19 canonical sequences. Note that symmetrical cosets can have a higher incidence of 20s (although usually many of the 20s in symmetrical cosets are related to each other by the symmetry of the coset).

### Coset Symmetry

How is the correlation between the coset position in your list and the symmetry of the coset ? Have most of the first 5000 cosets some symmetry and most of the cosets with index > 1,500,000 no symmetry of the 16 cube symmetries which leave {U,D,F2,R2,B2,L2} unchanged?

### Symmetry

Of the 5000 "top" cosets, less than 5% exhibit symmetry; 213 have 2-fold symmetry, 31 have 4-fold symmetry, and 3 have 8-fold symmetry; the remaining 4,753 exhibit no symmetry. So while symmetry has some impact, it is not dramatic.

Most distance-20 positions exhibit no symmetry or antisymmetry. The work you and Radu did found all symmetrical distance-20 positions, and you present that information at http://kociemba.org/symmetric2.htm; you find a total of 1,091,994 distinct symmetrical positions, 32,625 mod M+inv. All remaining distance-20 positions must be, of course, not symmetrical (but they are sometimes antisymmetrical or self-inverse).

The antisymmetry classes of the distance-20 positions that exhibit no symmetry are as follows. Mod m, I have found 622,907 positions lacking antisymmetry, 47,282 positions that are antisymmetric but not self-inverse, and 1,281 that are self-inverse. Thus, the total number of known distance-20 positions is

622907 * 96 + 47282 * 48 + 1281 * 48 + 1091994 = 63,222,090

Very few elements of the Kociemba subgroup exhibit any symmetry (as a fraction of the total positions); but yes, they do have a higher incidence in the "20s-richer" portions of my sorted list, I do not believe this has a first-order effect on my estimate, however.