| The Sudoku Assistant uses several techniques to solve a Sudoku puzzle: cross-hatch scanning, row/column range checking, subset elimination, grid analysis,and what I'm calling 3D Medusa analysis, including almost-locked set analysis. When all that fails, the Sudoku Assistant resorts to hypothesis and proof and a sort of depth. All of these techniques are based on identifying all the possible "candidates" for a cell (indicated by marks) and then eliminating them one by one until only one possibility remains in a given cell. | |||||||||||||||||||||||||||||||||||||||||
Cross-Hatch Scanning (looking for singles) | |||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The rule of singles requires:
When a candidate k is possible in only a single cell of a row, column, or block, then that cell must be k.This situation can arise for one of two reasons. A naked single arises when there is only one possible candidate for a cell; a hidden single arises when there is only one possible cell for a candidate. Despite the name, hidden singles are far easier to find than naked singles. You should always start a Sudoku by finding all the hidden singles. No marks required! | |||||||||||||||||||||||||||||||||||||||||
The dots in the cell in row 3, column 5, indicate that in that cell the numbers 4, 5, 6, and 8 are all possible. But in that top middle block only one cell can hold a 5. That's a hidden single. | This is the most basic technique. Since a number can only appear once in any given column or row and must appear exactly once in any given 3x3 block, the easiest place to start is to first check for cells that must hold a value because no other cell in a 3x3 block can hold that number. For example, in this case the number 5 is excluded from all but one cell in the top center 3x3 block. The 5 in this cell is called a "hidden single" because it can only be in this single location, and that fact is "hidden" by the presence of the other marks. This process, referred to as cross-hatching, is repeated for each row and each column. Cross-hatch scanning is generally all that is necessary for "easy" puzzles. Most people do this step without actually making any marks. | ||||||||||||||||||||||||||||||||||||||||
Row/Column Range Checking ("locked" candidates) | |||||||||||||||||||||||||||||||||||||||||
The locked candidate rule, form 1:
When a candidate is possible in a certain block and row/column, and it is not possible anywhere else in the same row/column, then it is also not possible anywhere else in the same block.The locked candidate rule, form 2: When a candidate is possible in a certain block and row/column, and it is not possible anywhere else in the same block, then it is also not possible anywhere else in the same row/column.Once all the singles have been found, I usually start marking. Then the job is to eliminate marks until only one remains in a cell, so then we know that cell's value. In this next technique, we use the fact that once a number can be assigned to a given row or column of a specific block (even if its exact location is still unknown), no other block may have that same number in the same row or column. | |||||||||||||||||||||||||||||||||||||||||
| In the example shown on the right, the location of the number 1 is already known in both the top row and the left-most column. In addition, the second row of the top left 3x3 block is already "filled." The only possible locations for a 1 in the top left 3x3 block, then, is the third row. But this means that, in the third row, the number 1 must appear in one of the crcled cells -- not only for this 3x3 block but at all. We can eliminate it as a possibility (remove its mark) from the other cells in this row. (Think about it -- only one 1 in the third row; it has to be in the third row of the first 3x3 block; so it has to be in the first 3x3 block; it has to be in those circled cells; it can't then be in any OTHER cells. Tricky, eh?) |
| ||||||||||||||||||||||||||||||||||||||||
|
The same idea eliminates the possibility of 9s in the center block's top row. (Because in this case the only possible place for a 9 in the left center block is the top row.)
In this case, the 9 in the center 3x3 block is also determined using
subset elimination. While there is a cell in the 6th row that could possibly be 4, 8, or 9,
there is already a subset of exactly five cells in the 6th row that must contain
the numbers 2, 3, 4, 5, and 8, thus forcing this cell to be a 9.
Range checking can be tricky to catch on to. Don't worry if you don't get it right away. Many puzzles can be solved without it. But if you do start seeing it, you will find it helpful. | |||||||||||||||||||||||||||||||||||||||||
Subset Elimination | |||||||||||||||||||||||||||||||||||||||||
As for locked cells, there are two subset rules. The naked subset rule states:
When n candidates are possible in a certain set of cells all in the same block, row, or column, and no other candidates are possible in those cells, then those n candidates are not possible elsewhere in that same block, row, or column.The hidden subset rule states: When n candidates are possible in a certain set of n cells all in the same block, row, or column, and those n candidates are not possible elsewhere in that same block, row, or column, then no other candidates are possible in those cells.Thus if a "subset" of three cells can be identified for which the only possibilities are exactly three numbers, then even though we don't know which of those cells has which number in it, we still know that no other cells in the row, column, or 3x3 block containing that "subset" of cells can have any of those numbers. | |||||||||||||||||||||||||||||||||||||||||
| In this example, since there are already exactly two locations in the top row that together must contain "3" and "7", the cell fourth from the left cannot be 7; It must be 6. Subset elimination along with cross-hatch scanning and block range checking can generally take care of "moderately difficult" puzzles. (This technique is also referred to as looking for naked or hidden pairs, triples, and quads.) | ||||||||||||||||||||||||||||||||||||||||
Grid Analysis | |||||||||||||||||||||||||||||||||||||||||
Other advanced techniques are required to solve more difficult Sudoku puzzles such as this one.
These techniques are based on one of the following two rules:
When a candidate is possible in a certain set of cells that form the intersection of a set of n rows and n columns, but are not possible elsewhere in that same set of rows, then they are also not possible elsewhere in that same set of columns.and its flipside, When a candidate is possible in a certain set of cells that form the intersection of a set of n rows and n columns, but are not possible elsewhere in that same set of columns, then they are also not possible elsewhere in that same set of rows.The grid technique involves picking a specifc number, such as 5, and checking for known allowed patterns of possibility such as X-wings and swordfish.
Note that grid analysis is essentially a form of 2-dimensional subset elimination, where now the subsets themselves are correlated over multiple rows or columns. So, for example, if you look at that blue pattern, there is a set of three columns (the third, the sixth, and the seventh) for which 5 can only be in rows 2, 5, and 7. Three columns...three rows. Just like "three numbers...three cells". In exact analogy to subset elimination, this means that for these three rows the five must be in one of these three columns. it's tricky! In fact, though, X-wings and swordfish are simply two simple varieties of a much broader category of beast.
The demonstration of this equivalence of all such "circuits" is the way in which the Sudoku Assistant handles all circuit analysis without distinction:
| |||||||||||||||||||||||||||||||||||||||||
A 4x4 grid based on the candidate number 4 | The point is simply that finding X-wings and swordfish and the like is not a daunting task. One simply needs to discern the n by n grid containing the set of candidate squares. You don't have to actually discover the circuit. The principle at work here is as simple as can be: Once a number can be assigned to a given row or column (even if its exact location is still unknown), no other cells in those same rows and columns may also contain that particular number. | ||||||||||||||||||||||||||||||||||||||||
| Since an n by n set of cells defines exactly where n numbers can go, the principle is the same regardless of the number n. So all we are doing in this sort of analysis is idenitifying a set of n rows and n columns that contain a given candidate's possibility at their points of intersection. All other possibilities, all those not at these grid points, can be eliminated safely. | |||||||||||||||||||||||||||||||||||||||||
Generalized Method Rule | |||||||||||||||||||||||||||||||||||||||||
If all these rules sound pretty much the same, it's because they are all just permutations of the same
idea in different dimensions. Using "X" and "Y" here for some number of rows, columns, cells, or blocks, then we have:
If a candidate k is possible in the intersection of X and Y but not possible elsewhere in X, then it is also not possible elsewhere in Y. | |||||||||||||||||||||||||||||||||||||||||
3D Medusa Analysis | |||||||||||||||||||||||||||||||||||||||||
| If a grid of cells can be made for which the value in one cell directly forces the value in another cell, we can sometimes follow that grid to see if it leads to a logical inconsistency. This can be done even though the exact contents of a cell are unknown. Some algorithms look for "strong cycles" with "weak edges," but there is another way to think about this. What is significant is that in a "binary grid" every other cell along the grid either has one value or another -- only two possible values. (The grid is made by specifically selecting only cells that have exactly two possibilities.) If any two such cells are in a single row, column, or 3x3 block, it is as if they were the ONLY possibilities for their cells. | |||||||||||||||||||||||||||||||||||||||||
a complicated X cycle | Thus, if we can say, because of a connecting chain of grid points, that two cells "either must both by 5 or neither is 5" then if they are in the same block, row, or column, neither can be 5. In the case on the left, we have a grid with two endpoints in the same 3x3 block. They are opposite "parity" -- we don't know which value is where, but regardless of that, whatever one is, the other is not at these two points. We can eliminate the circled possibility. That particular cell cannot be a 5, because it is in a block with two other cells, one of which has to be a 5. Since cells with only two possible solutions are relatively easy to spot, this technique lends itself reasonably well to human solving of Sudoku puzzles. | ||||||||||||||||||||||||||||||||||||||||
| A related condition is that if we know that two cells of a strong chain are positioned such that both must be X or both must not be X, and together they exclude X from a block, then neither can be X. On the left, for example, the two red Xs are linked because they are part of the same strong chain. But if they are both TRUE, then all the Xs in the top right block are disallowed. There must be some X in that box, so their being TRUE leads to a contradiction; the cell labeled XB must be X. This technique has been referred to as a strong block hinge. Medusa variants also target cells in the same row or column, or target values in the same cell. |
Cells XA and XC cannot be X, and the cell labeled "XB" must be X. |
||||||||||||||||||||||||||||||||||||||||
|
The chain connecting XA and XC need not be strong. If one of the ends is a weak node, then the result is that we have a weak hinge, and we can eliminate that weak node. (We cannot eliminate the X of XA, because a TRUE
there does not transmit a TRUE to the X of XC.)
The Sudoku Assistant will find these sorts of hinges if the "hinge" box is checked. |
The cell labeled "XC" must NOT be X. |
||||||||||||||||||||||||||||||||||||||||
|
In fact, we can generalize the idea of strong binary grids to three dimensions. If we allow the grid to not just involve a specific candidate number, a 3D grid is generated.
The Sudoku Assistant uses the Jmol applet
to help visualize this. (The different colors here represent different candidate numbers, with red, at the top being 1 and white, at the bottom, being 9.)
| ||||||||||||||||||||||||||||||||||||||||
|
A full 3D Medusa analysis involves extending these idea to weakly linked chains (encompassing X cycle Type 2 as well as Y cycles) and hypothesis and proof.
The idea here is that one may have a sequence of three "nodes" (row, column, and candidate value, in 3D)
A --- b --- C, for which A and C are members of strong chains (their cells have only two possible values). We call this a "weak link" or "weak edge"
if A, b, and C all are in the same row, column, or block -- or, in a 3D sense, the same cell.
In that case, the "strongness" of one chain can be transferred to the other chain, provided the right
condition is applied. (In other words, we can assert "if A is 5...then C is NOT 5", but we cannot assert that "if A is NOT 5, then C is 5.")
The Sudoku Assistant can follow these weak links to any number of associated chains in order to eliminate possibilities.
The classic example of weakly linked chains in the form of a Y-cycle is an XY-wing, shown below.
X... and Y... here are weak links. The first links the strong chains XZ and XY; the second links the strong chains XY and YZ. It doesn't really matter if those links are there or not. (If they are not there, then we just have one strong chain.) The effect is the same: "NOT Z" in the lower left cell gets transmitted to the top right cell through the chain:
as "Z" at the other end. Or, if the other Z is false, then we have:
So one or the other of the Zs on the ends must be TRUE, and the * cannot be Z. The term I use for * in the above situation is weak corner. Here we have again A --- b --- C, for which A and C are members of strong chains. To be a weak node, A and b must be in the same row, column, block, or cell, and the same is true for for b and C. But if these aren't the same row, column, block, or cell, then we don't have a weak link. Instead, we have a weak corner. The "strongness" of one chain cannot be transferred to the other chain via a weak corner, With the proper connecting of strong chains by weak links, one can infer valuable information about such a node. In particular, the elimination of the corner node, b, can be forced by the right sort of chain connection. | |||||||||||||||||||||||||||||||||||||||||
Almost-Locked Sets | |||||||||||||||||||||||||||||||||||||||||
|
All of this business about strong chains turns out to be a subset of a broader sort of
analysis introduced recently as almost-locked set analysis. Consider a simple cell with
only two possible values, say 1 and 2. Write this {12}. If something forces this cell NOT to be 1,
then we know it is 2; if it is forced NOT to be 2, then we know it is 1. Now consider two
cells with three overall possible candidates. Perhaps: {12 23}. We can say exactly the same about
this pair of cells as we said about the single {12} cell. If something force this pair of
cells NOT to be 1, then we know it is 2 and 3; if something forces it to NOT be 2, then we know
it is 1 and 3; and if something forces it NOT to be a 3, then we know it is 1 and 2. Unlike the
single cell, we won't know exactly where the two remaining "locked" candidates will be,
but we know where they will not be -- anywhere else in that row, column, and/or block that contains
these two cells. This idea can be generalized to any N cells containing N+1 candidates.
| |||||||||||||||||||||||||||||||||||||||||
For example, consider the puzzle on the right. The four cells outlined in green constitute
a 4-cell almost-locked set "A" = {37 19 178 789}. Here we have four cells with five candidates: 1, 3, 7, 8 and 9.
Similarly, the two cells highlighted in blue
consitute another almost-locked set, "B" = {237 23}. Two cells; three different possibilities.
We might write this as:
|
| ||||||||||||||||||||||||||||||||||||||||
|
Here's the cool thing: Figuratively, we could write this set of
cells like we might a couple of weakly linked chains:
The two weak links come from the fact that if Set A has a 3 (somewhere) then Set B
does NOT have a 3, and must instead
be 2 and 7. The same goes for the 7 -- only one of A and B, not both, can have the 7.
Any other threes that are weakly linked to BOTH A and B can be eliminated. Same goes
for sevens. The result is exactly the same as we would write for a naked 3,7 pair!
So the possibility of 7 in the two cells r3c8 and r3c9 can be eliminated. Note that the 3 in r3c8 cannot be eliminated, because there is a 3 in r6c7, and r6c7 is not "seen" by r3c8. In order to be eliminated, the possibility must be in a cell that is weakly linked to ALL occurances of that value in BOTH almost-locked sets. But it's better than that. If B has the 7, Set A cannot have it -- Set A must be 1, 3, 8, and 9. Likewise, if Set A has the 7, then Set B does not. It must have 2 and 3. And then A has 1, 3, 8, and 9. Ah, but these are the only two possibilities! Either the 7 is in Set A or Set B. In either case, A has a 1, 8, and 9; and B has a 2. then any other 8 or 9 weakly associated with A, and any other 2 weakly associated with B can be eliminated! This includes any 8s in the third block (in the cells highlighted in red) as well as the 8 in row 2, column 6. Neat, huh? | |||||||||||||||||||||||||||||||||||||||||
|
There are lots of things that can be done with almost-locked sets. Basically, they can plug right into sets of chains,
because they can transmit a weak link. That's because if one of the candidates in the set is forced FALSE, then
all the other candidates are forced TRUE. On the right, if the 3* is TRUE, then the weakly linked 2 and 7 must be FALSE.
And if either the 2 or the 7 is TRUE, then the 3 is FALSE. That's a weak link.
| |||||||||||||||||||||||||||||||||||||||||
|
Consider that "complicated X-cycle" on the right again. If we make a list of the
rows that the number 5 is "possible" in, we get:
{569 57 19 27 25 16} Now, one way to see that the 5 in "569" can be eliminated is that we have here a "naked triple subset" (also called a swordfish in this plane): {57 27 25}. But there is another way: The {27 25} constitutes an almost-locked set, as does the 57. We have:
|
| ||||||||||||||||||||||||||||||||||||||||
Sudoku Assistant currently does the following with almost-locked sets:
| |||||||||||||||||||||||||||||||||||||||||
Hypothesis and (Dis)Proof | |||||||||||||||||||||||||||||||||||||||||
|
The addition of weakly associated nodes to the mix of strong 3D chains allows for a method of solving Sudoku puzzles that transcends the idea of "forcing chains" or
"trial and error" to allow for the formation of hypotheses regarding possible candidate elimination and their direct proof.
Using 3D Medusa chains, we can follow the implications of eliminating or not eliminating a mark and see what comes of it. There are two strategies for this analysis, which I am calling proof or disproof.
The disproof strategy goes like this: Consider a certain chain, 3, that has two possible states, 3C and 3c.
An example of a pattern that is handled easily by hypothesis and proof is what has been referred to as an XYZ-wing. Both the * and Z here are weak corners between the two strong chains XZ and YZ. Let's hypothesize that the * is Z. Any Z shown, then, would be FALSE. But the X of XZ and the Y of YZ both must be TRUE, because they are parts of the strong chains associated with that * node. OK, so if they are TRUE then they each force the X and Y of XYZ to be FALSE. But then all of X, Y, and Z have to be FALSE. That leaves nothing at that position, and the hypothesis is disproved. Therefore, * is not Z. QED.
Note that in practice you doesn't really have to think about chains at all if you don't want to. Just follow the logic. One node TRUE means all those "related to it" are FALSE. From there, look for strong connections -- a cell with only two values in it, or a row, column, or block with only two of the TRUE node's value. Those are the strong chains that can be set TRUE. Go from there to find more FALSE nodes, and on and on. Periodically look to see if there is any row, column, block, or cell that satisfies any of those last three conditions. It's really no more than that. | |||||||||||||||||||||||||||||||||||||||||
Depth | |||||||||||||||||||||||||||||||||||||||||
Full trial-and-error analysis involves adding some element of depth.
The idea here is that if a hypothesis gets to a "dead end"
-- no more conclusions to make, but still no solution --
then all we really have to do is start the sequence over from that configuration.
We can look for singles, locked candidates, subsets, grids, chains, and even do more "subordinate"
hypothesizing. If this process is taken to the extreme, then you basically have a depth-first backtracking solver.
The Sudoku Assistant doesn't go that far. Instead, it takes a middle ground and considers two optional
added elements of depth:
| |||||||||||||||||||||||||||||||||||||||||
Beyond These Techniques | |||||||||||||||||||||||||||||||||||||||||
|
Obviously this can't solve all Sudoku puzzles. But so far no one has shown me a puzzle that can't be solved
using ++depth. Using the above techiniques together Sudoku Assistant has solved all of the puzzles I have thrown at it, including
all top95
and all "impossible" 520 puzzles.
This puzzle and also
what I consider the hardest known Sudoku both require ++depth.
--Bob Hanson | |||||||||||||||||||||||||||||||||||||||||