Sudoku Analyzer development story

Tue November 24 2020 1:53pmSudoku

Sudoku is easy to understand: a 9x9 grid, 81 cells, in which every row, column, and 3x3 box must contain digits 1 through 9. Simple enough, but with enormous possibilities. If one were to write a little program to generate all possible 9x9 grids with all possible combinations of 9 digits in each of the 81 cells, it would have to generate 9^81 (9 to the 81st power) grids (that's a number with 78 digits). I did in fact write such a simple program over 15 years ago. But to complete the task within my lifetime, even on a powerful 5 gHz machine generating, very generously, one grid per 10 processor ticks or 500 million grids per second, the program would have to have started running a 62-digit number of years ago. I think that's well before God created Earth.

Now, there are considerably fewer than 9^81 **valid** Sudoku grids. My program needn't try all 9 digits for all 81 cells. If I made the logic a bit more complex, it could generate only valid Sudokus without wasting processor time on invalid ones. Taking just the top row, for example: there are 9^9 or 387,420,489 possible combinations of digits 1 - 9 for those 9 cells. But there are only 9! (nine factorial), or 362,880 combinations of 9 digits in which each digit occurs exactly once (a valid Sudoku row). So, for example, for a given cell in the row, if there were already a 2, 3, 5 and 8 in other cells in the same row, my program need only try the remaining digits 1, 4, 6, 7, and 9. It could therefore solve the row in a fraction of processing repetitions, or loops.

Applying this to the whole Sudoku grid is a bit more complex, because every cell in the 9x9 grid is a member of exactly 3 groups of 9 cells: one row, one column, and one 3x3 box. The improved logic would have to take this into account. I did so by moving to assembly language and using bitwise operations AND, OR, and XOR to make valid digit selection more intelligent and efficient for each cell, in relation to each of the 3 groups to which that cell belongs. Also, the order in which my program looked at each cell needn't be a row-by-row scan. I settled on this order: [a1-a9, b1-b3, c1-c3, b4-b9, c4-c9, d1-i1, d2-i2, d3-i3, d4-d9, e4-e9, f4-f9, g4-g9, h4-h9, i4-i9] as a way to optimize the smart formation of valid groups with a minimum of looping.

Moreover, for any given valid Sudoku grid, there exist many derivations which may look different, but are logically the same Sudoku. For example, you could swap the first two rows, putting the second row on top, and the first row below it. Each row retains its own contents and validity. Each column in the Sudoku also retains its validity, since no digit has moved from one column to another. Every 3x3 box is likewise still valid. You could do the same thing with the 2nd and 3rd rows. There are in fact 6 distinct ways in which the first 3 rows can be arranged without disturbing the Sudoku's validity nor changing its uniqueness. Note that you could **not** swap one of the first 3 rows with any of the lower 6 rows, since that would involve moving cells between different 3x3 boxes, altering their contents and their validity. But you could swap rows 4 through 6 in six ways, and rows 7 through 9 as well, resulting in 6 x 6 x 6 = 216 ways the 9 rows could be swapped without changing the Sudoku's validity or uniqueness. In the same way, you could also swap any two tiers, or groups of 3 rows. Since there are six ways the 3 tiers can be arranged, we now have 216 x 6 = 1,296 arrangements of rows and tiers of rows that are all logically the same Sudoku. For any one of these 1,296 arrangements of rows and tiers, we can also swap columns and groups of 3 columns. Then we could rotate the entire 9x9 grid 90° for an entirely different arrangement which is still the same valid Sudoku. (Note that rotating the 9x9 grid 180° gives us the same grid as one of the row and column variations, and so would not count as a new arrangement. Similarly, rotating it 270° would be the same as rotating it 90° together with one of the row and column variations.) Finally, we can swap the nine digits with each other, for example changing each '1' in the Sudoku to a '4', each '8' to a '3', etc. There are 9! (nine factorial) or 362,880 ways that the nine digits 1 through 9 can be swapped. Putting all this together, there are 1,296 x 1,296 x 2 x 362,880 = 1,218,998,108,160 ways any given valid Sudoku can be arranged to look different while remaining the same logically. This is pretty much in concordance with the mathematicians at Technology Review, who have used their computers to determine that there are 6,670,903,752,021,072,936,960 possible valid sudoku grids but only 5,472,730,538 unique Sudokus.

Together with some other improvements, after a few weeks my program was a little more complex, but alot more efficient. It was generating almost one million valid Sudoku grids per second on my old 500 mHz machine. There were still duplicate morphs being generated, but this was now fast enough to generate all possible valid Sudokus in a couple weeks or so of continuous running. More to the point, this same improved logic was able to work on a partially filled grid, i.e, a Sudoku puzzle. As noted in the previous article, a Sudoku with 60 or fewer empty cells is exponentially faster to analyze.

But this introduces a new layer of complexity, in that for every complete Sudoku grid there exist an immense number of partially complete grids, composed of filled and empty cells. Some of these partially complete grids may be true single-solution Sudoku puzzles, that can only resolve to one unique complete Sudoku. Others may pertain to more than one complete grid, that is, they may have multiple solutions. Still other partial grids may be unsolvable; there exists no way to fill in the empty cells to obtain a valid complete Sudoku grid.

Limiting ourselves to the first category: for any given single-solution Sudoku, there may exist a huge number of paths to solve it. At any one step along the way, you may be able to solve any number of different cells, using a variety of strategies. Solve one of those cells, and now you have a different puzzle, one that is probably easier to solve. Solve a different cell, and you now have a slightly different Sudoku, perhaps harder to solve, not easier!

I don't know how to calculate this additional complexity: how many incomplete grids exist, how many of these are single-solution Sudokus, and how many different ways there may be to solve them. I suspect the total number may be of a similar or greater magnitude as in the first paragraph above, virtually impossible to compute all possibilities exhaustively.

Well, OK. You may well question why a retired geek would spend so much time chasing such an unattainable goal. What can I say? Am content to have developed my program to the point where it seems to be able to analyze, if not all, at least the vast majority of the aforementioned enormous possibilities. To repeat: if you disagree with my boast, if you find a problem, or if you ever come across another analyzer that performs like my Sudoku Analyzer, be sure to let me know.

rev. Nov 24 2020 8:11pm |

0 comments: