The Sudoku Assistant uses several techniques to solve a Sudoku puzzle: cross-hatch scanning, row/column range checking, subset elimination, grid analysis,and what I'm calling 3D Medusa analysis, including almost-locked set analysis. When all that fails, the Sudoku Assistant resorts to hypothesis and proof and a sort of depth. All of these techniques are based on identifying all the possible "candidates" for a cell (indicated by marks) and then eliminating them one by one until only one possibility remains in a given cell.

Cross-Hatch Scanning (looking for singles)

The rule of singles requires:
When a candidate k is possible in only a single cell of a row, column, or block, then that cell must be k.
This situation can arise for one of two reasons. A naked single arises when there is only one possible candidate for a cell; a hidden single arises when there is only one possible cell for a candidate. Despite the name, hidden singles are far easier to find than naked singles. You should always start a Sudoku by finding all the hidden singles. No marks required!

The dots in the cell in row 3, column 5, indicate that in that cell the numbers 4, 5, 6, and 8 are all possible. But in that top middle block only one cell can hold a 5. That's a hidden single.
This is the most basic technique. Since a number can only appear once in any given column or row and must appear exactly once in any given 3x3 block, the easiest place to start is to first check for cells that must hold a value because no other cell in a 3x3 block can hold that number. For example, in this case the number 5 is excluded from all but one cell in the top center 3x3 block. The 5 in this cell is called a "hidden single" because it can only be in this single location, and that fact is "hidden" by the presence of the other marks. This process, referred to as cross-hatching, is repeated for each row and each column. Cross-hatch scanning is generally all that is necessary for "easy" puzzles. Most people do this step without actually making any marks.

Row/Column Range Checking ("locked" candidates)

The locked candidate rule, form 1:
When a candidate is possible in a certain block and row/column, and it is not possible anywhere else in the same row/column, then it is also not possible anywhere else in the same block.
The locked candidate rule, form 2:
When a candidate is possible in a certain block and row/column, and it is not possible anywhere else in the same block, then it is also not possible anywhere else in the same row/column.
Once all the singles have been found, I usually start marking. Then the job is to eliminate marks until only one remains in a cell, so then we know that cell's value. In this next technique, we use the fact that once a number can be assigned to a given row or column of a specific block (even if its exact location is still unknown), no other block may have that same number in the same row or column.
In the example shown on the right, the location of the number 1 is already known in both the top row and the left-most column. In addition, the second row of the top left 3x3 block is already "filled." The only possible locations for a 1 in the top left 3x3 block, then, is the third row. But this means that, in the third row, the number 1 must appear in one of the crcled cells -- not only for this 3x3 block but at all. We can eliminate it as a possibility (remove its mark) from the other cells in this row. (Think about it -- only one 1 in the third row; it has to be in the third row of the first 3x3 block; so it has to be in the first 3x3 block; it has to be in those circled cells; it can't then be in any OTHER cells. Tricky, eh?)
The same idea eliminates the possibility of 9s in the center block's top row. (Because in this case the only possible place for a 9 in the left center block is the top row.) In this case, the 9 in the center 3x3 block is also determined using subset elimination. While there is a cell in the 6th row that could possibly be 4, 8, or 9, there is already a subset of exactly five cells in the 6th row that must contain the numbers 2, 3, 4, 5, and 8, thus forcing this cell to be a 9.

Range checking can be tricky to catch on to. Don't worry if you don't get it right away. Many puzzles can be solved without it. But if you do start seeing it, you will find it helpful.

Subset Elimination

As for locked cells, there are two subset rules. The naked subset rule states:
When n candidates are possible in a certain set of cells all in the same block, row, or column, and no other candidates are possible in those cells, then those n candidates are not possible elsewhere in that same block, row, or column.
The hidden subset rule states:
When n candidates are possible in a certain set of n cells all in the same block, row, or column, and those n candidates are not possible elsewhere in that same block, row, or column, then no other candidates are possible in those cells.
Thus if a "subset" of three cells can be identified for which the only possibilities are exactly three numbers, then even though we don't know which of those cells has which number in it, we still know that no other cells in the row, column, or 3x3 block containing that "subset" of cells can have any of those numbers.
In this example, since there are already exactly two locations in the top row that together must contain "3" and "7", the cell fourth from the left cannot be 7; It must be 6. Subset elimination along with cross-hatch scanning and block range checking can generally take care of "moderately difficult" puzzles. (This technique is also referred to as looking for naked or hidden pairs, triples, and quads.)

Grid Analysis

Other advanced techniques are required to solve more difficult Sudoku puzzles such as this one. These techniques are based on one of the following two rules:
When a candidate is possible in a certain set of cells that form the intersection of a set of n rows and n columns, but are not possible elsewhere in that same set of rows, then they are also not possible elsewhere in that same set of columns.
and its flipside,
When a candidate is possible in a certain set of cells that form the intersection of a set of n rows and n columns, but are not possible elsewhere in that same set of columns, then they are also not possible elsewhere in that same set of rows.
The grid technique involves picking a specifc number, such as 5, and checking for known allowed patterns of possibility such as X-wings and swordfish.

An X-wing pattern. Within the two highlighted columns, a 4 must be contained in each of the upper and the lower rows. The possibility of 4 in all five circled cells can be eliminated.

Two intertwined "swordfish" patterns. The possibility of 5 in the circled cell can be eliminated.


Note that grid analysis is essentially a form of 2-dimensional subset elimination, where now the subsets themselves are correlated over multiple rows or columns. So, for example, if you look at that blue pattern, there is a set of three columns (the third, the sixth, and the seventh) for which 5 can only be in rows 2, 5, and 7. Three columns...three rows. Just like "three numbers...three cells". In exact analogy to subset elimination, this means that for these three rows the five must be in one of these three columns. it's tricky! In fact, though, X-wings and swordfish are simply two simple varieties of a much broader category of beast.

A 5x5 circuit containing the candidate number "1" hiding in a simple Sudoku. There are five columns and five rows that make up the set. All possibilities for 1 in these five rows and columns that are not on the grid itself can be eliminated. Note that if we list the row numbers where candidate 1 appears, we get {379 469 14679 46 139 1(not shown)}. So in this case the two indicated marks can be eliminated either due to the 5x5 "hidden quintuplet" of 34679 or the "naked singlet" 1.

The demonstration of this equivalence of all such "circuits" is the way in which the Sudoku Assistant handles all circuit analysis without distinction:
  1. Look for regions of the board that have a given candidate's possibilities lying on a grid of n columns and n rows. Additional possibilities may exist outside of this set of gridlines in either rows or columns, but not both.
  2. Eliminate all possibilities of that number in any row or column defined by that grid but not including the grid points themselves.
If you look at the JavaScript code for the Sudoku Assistant, you will see that it uses the same function (analyzeX, just 16 lines) to find all possible X-Wings, Swordfish, hidden sets, and "naked" sets. The fact is, all of these are the same "beast" just seen from different perspectives. (More on that later!)

A 4x4 grid based on
the candidate number 4
The point is simply that finding X-wings and swordfish and the like is not a daunting task. One simply needs to discern the n by n grid containing the set of candidate squares. You don't have to actually discover the circuit. The principle at work here is as simple as can be: Once a number can be assigned to a given row or column (even if its exact location is still unknown), no other cells in those same rows and columns may also contain that particular number.
Since an n by n set of cells defines exactly where n numbers can go, the principle is the same regardless of the number n. So all we are doing in this sort of analysis is idenitifying a set of n rows and n columns that contain a given candidate's possibility at their points of intersection. All other possibilities, all those not at these grid points, can be eliminated safely.

Generalized Method Rule

If all these rules sound pretty much the same, it's because they are all just permutations of the same idea in different dimensions. Using "X" and "Y" here for some number of rows, columns, cells, or blocks, then we have:
If a candidate k is possible in the intersection of X and Y but not possible elsewhere in X, then it is also not possible elsewhere in Y.

3D Medusa Analysis

If a grid of cells can be made for which the value in one cell directly forces the value in another cell, we can sometimes follow that grid to see if it leads to a logical inconsistency. This can be done even though the exact contents of a cell are unknown. Some algorithms look for "strong cycles" with "weak edges," but there is another way to think about this. What is significant is that in a "binary grid" every other cell along the grid either has one value or another -- only two possible values. (The grid is made by specifically selecting only cells that have exactly two possibilities.) If any two such cells are in a single row, column, or 3x3 block, it is as if they were the ONLY possibilities for their cells.

a complicated X cycle
Thus, if we can say, because of a connecting chain of grid points, that two cells "either must both by 5 or neither is 5" then if they are in the same block, row, or column, neither can be 5. In the case on the left, we have a grid with two endpoints in the same 3x3 block. They are opposite "parity" -- we don't know which value is where, but regardless of that, whatever one is, the other is not at these two points. We can eliminate the circled possibility. That particular cell cannot be a 5, because it is in a block with two other cells, one of which has to be a 5. Since cells with only two possible solutions are relatively easy to spot, this technique lends itself reasonably well to human solving of Sudoku puzzles.
A related condition is that if we know that two cells of a strong chain are positioned such that both must be X or both must not be X, and together they exclude X from a block, then neither can be X. On the left, for example, the two red Xs are linked because they are part of the same strong chain. But if they are both TRUE, then all the Xs in the top right block are disallowed. There must be some X in that box, so their being TRUE leads to a contradiction; the cell labeled XB must be X. This technique has been referred to as a strong block hinge. Medusa variants also target cells in the same row or column, or target values in the same cell.
XA            
                  
                  
X...X...      
            X...
            X...
                  
XB            
      X...      
                  
            XC
      X...      
A strong block hinge.
Cells XA and XC cannot be X,
and the cell labeled "XB" must be X.
The chain connecting XA and XC need not be strong. If one of the ends is a weak node, then the result is that we have a weak hinge, and we can eliminate that weak node. (We cannot eliminate the X of XA, because a TRUE there does not transmit a TRUE to the X of XC.)

The Sudoku Assistant will find these sorts of hinges if the "hinge" box is checked.
XA            
      X...X...
                  
                  
            X...
      X...      
                  
XB            
      X...      
                  
X...      XC
      X...      
A weak hinge targeting row 2.
The cell labeled "XC" must NOT be X.
In fact, we can generalize the idea of strong binary grids to three dimensions. If we allow the grid to not just involve a specific candidate number, a 3D grid is generated. The Sudoku Assistant uses the Jmol applet to help visualize this. (The different colors here represent different candidate numbers, with red, at the top being 1 and white, at the bottom, being 9.)

A full 3D Medusa analysis involves extending these idea to weakly linked chains (encompassing X cycle Type 2 as well as Y cycles) and hypothesis and proof. The idea here is that one may have a sequence of three "nodes" (row, column, and candidate value, in 3D) A --- b --- C, for which A and C are members of strong chains (their cells have only two possible values). We call this a "weak link" or "weak edge" if A, b, and C all are in the same row, column, or block -- or, in a 3D sense, the same cell. In that case, the "strongness" of one chain can be transferred to the other chain, provided the right condition is applied. (In other words, we can assert "if A is 5...then C is NOT 5", but we cannot assert that "if A is NOT 5, then C is 5.") The Sudoku Assistant can follow these weak links to any number of associated chains in order to eliminate possibilities.

The classic example of weakly linked chains in the form of a Y-cycle is an XY-wing, shown below.

*            
                  
XZ      X...
YZ            
Y...            
XY            


X... and Y... here are weak links. The first links the strong chains XZ and XY; the second links the strong chains XY and YZ. It doesn't really matter if those links are there or not. (If they are not there, then we just have one strong chain.) The effect is the same: "NOT Z" in the lower left cell gets transmitted to the top right cell through the chain:

 Z--X--(weak X)--X--Y--(weak Y)--Y--Z
 -  +            -  +            -  +
 ------direction of transmission---->


as "Z" at the other end. Or, if the other Z is false, then we have:

 Z--X--(weak X)--X--Y--(weak Y)--Y--Z
 +  -            +  -            +  -
 <-----direction of transmission-----


So one or the other of the Zs on the ends must be TRUE, and the * cannot be Z.

The term I use for * in the above situation is weak corner. Here we have again A --- b --- C, for which A and C are members of strong chains. To be a weak node, A and b must be in the same row, column, block, or cell, and the same is true for for b and C. But if these aren't the same row, column, block, or cell, then we don't have a weak link. Instead, we have a weak corner. The "strongness" of one chain cannot be transferred to the other chain via a weak corner, With the proper connecting of strong chains by weak links, one can infer valuable information about such a node. In particular, the elimination of the corner node, b, can be forced by the right sort of chain connection.

Almost-Locked Sets

All of this business about strong chains turns out to be a subset of a broader sort of analysis introduced recently as almost-locked set analysis. Consider a simple cell with only two possible values, say 1 and 2. Write this {12}. If something forces this cell NOT to be 1, then we know it is 2; if it is forced NOT to be 2, then we know it is 1. Now consider two cells with three overall possible candidates. Perhaps: {12 23}. We can say exactly the same about this pair of cells as we said about the single {12} cell. If something force this pair of cells NOT to be 1, then we know it is 2 and 3; if something forces it to NOT be 2, then we know it is 1 and 3; and if something forces it NOT to be a 3, then we know it is 1 and 2. Unlike the single cell, we won't know exactly where the two remaining "locked" candidates will be, but we know where they will not be -- anywhere else in that row, column, and/or block that contains these two cells. This idea can be generalized to any N cells containing N+1 candidates.
For example, consider the puzzle on the right. The four cells outlined in green constitute a 4-cell almost-locked set "A" = {37 19 178 789}. Here we have four cells with five candidates: 1, 3, 7, 8 and 9. Similarly, the two cells highlighted in blue consitute another almost-locked set, "B" = {237 23}. Two cells; three different possibilities. We might write this as:

    8 1           2   
     \|          /
      A--3...3--B
     /|         |
    9 7.........7

Here's the cool thing: Figuratively, we could write this set of cells like we might a couple of weakly linked chains: The two weak links come from the fact that if Set A has a 3 (somewhere) then Set B does NOT have a 3, and must instead be 2 and 7. The same goes for the 7 -- only one of A and B, not both, can have the 7. Any other threes that are weakly linked to BOTH A and B can be eliminated. Same goes for sevens. The result is exactly the same as we would write for a naked 3,7 pair!

So the possibility of 7 in the two cells r3c8 and r3c9 can be eliminated. Note that the 3 in r3c8 cannot be eliminated, because there is a 3 in r6c7, and r6c7 is not "seen" by r3c8. In order to be eliminated, the possibility must be in a cell that is weakly linked to ALL occurances of that value in BOTH almost-locked sets.

But it's better than that. If B has the 7, Set A cannot have it -- Set A must be 1, 3, 8, and 9. Likewise, if Set A has the 7, then Set B does not. It must have 2 and 3. And then A has 1, 3, 8, and 9. Ah, but these are the only two possibilities! Either the 7 is in Set A or Set B. In either case, A has a 1, 8, and 9; and B has a 2. then any other 8 or 9 weakly associated with A, and any other 2 weakly associated with B can be eliminated! This includes any 8s in the third block (in the cells highlighted in red) as well as the 8 in row 2, column 6. Neat, huh?
There are lots of things that can be done with almost-locked sets. Basically, they can plug right into sets of chains, because they can transmit a weak link. That's because if one of the candidates in the set is forced FALSE, then all the other candidates are forced TRUE. On the right, if the 3* is TRUE, then the weakly linked 2 and 7 must be FALSE. And if either the 2 or the 7 is TRUE, then the 3 is FALSE. That's a weak link.
Consider that "complicated X-cycle" on the right again. If we make a list of the rows that the number 5 is "possible" in, we get:

{569 57 19 27 25 16}

Now, one way to see that the 5 in "569" can be eliminated is that we have here a "naked triple subset" (also called a swordfish in this plane): {57 27 25}. But there is another way: The {27 25} constitutes an almost-locked set, as does the 57. We have:

      2               
      |           
 5*...A--5...5--B  
      |         |
      7.........7

The 5* can be eliminated!
Sudoku Assistant currently does the following with almost-locked sets:
  1. Finds all possible almost-locked sets.
  2. Checks to see if there are any mutually-linked pairs of almost-locked sets. If so, it checks for weak links and eliminates them.
    
        8 1           2...2*   
         \|          /
          A--3...3--B
         /|         |
        9 7.........7...7*
    
    
  3. Looks for nodes that are weak links to two already-linked almost-locked sets and eliminates them. (7* in this case).
    
             2...2---B  
            /        |
        3--A         |
            \        |
             7..7*...7   
                           
    
    
  4. Looks for strong chain nodes that are themselves weakly associated with an almost-locked chain. If a chain or set of chains is connected in an impossible way, such that it would force two of the N candidates of the almost-locked set to be FALSE, that chain can be eliminated.
    
                  2...2*--2   
                 /         7
        3*...3--A          .
                 \         .
                  7...7*---7-   
                           
    
    
    Due to the chain, the 2* and 7* have to be such that either both are TRUE or both are FALSE. But they can't both be TRUE, because that would not leave enough candidates for almost-locked set A. They can be eliminated.

Hypothesis and (Dis)Proof

The addition of weakly associated nodes to the mix of strong 3D chains allows for a method of solving Sudoku puzzles that transcends the idea of "forcing chains" or "trial and error" to allow for the formation of hypotheses regarding possible candidate elimination and their direct proof. Using 3D Medusa chains, we can follow the implications of eliminating or not eliminating a mark and see what comes of it. There are two strategies for this analysis, which I am calling proof or disproof.

The disproof strategy goes like this:

Consider a certain chain, 3, that has two possible states, 3C and 3c.
  1. Hypothesize that a weakly associated node, say one of the 3c-associated nodes, is TRUE -- that it cannot be eliminated.

  2. Mark all Chain 3c nodes with the lowercase letter "a" meaning FALSE -- "If the hypothesis is true, then this node must be FALSE.

  3. Mark all Chain 3C nodes with an uppercase "A" meaning "if the hypothesis is TRUE, then these nodes must also be TRUE (not eliminated)." This works because Chain 3 is a strong chain.

  4. Mark all associated weak nodes of 3C with a capital letter "B" meaning "if the hypothesis is TRUE, then these nodes will be false (that is, they can be eliminated)."

  5. Recursively add any chains directly involving any of these associated weak nodes with alternating capital letters/lowercase letters.

  6. Check to see if we have either of the following two forcing conditions within any subset (all of a specific candidate in a row, column, or block, or all the candidates of a given cell):

    1. All members are FALSE. Then, since one HAS to be true, the hypothesis is disproved.

    2. Two members are TRUE. Then, since one MUST be false, the hypothesis is disproved.


    If either of these is the case, we are done. The hypothesis is disproved and the node can be eliminated.

The proof strategy is essentially the same, but the objective is to prove a hypothesis rather than disprove it, starting with the proposition that a certain mark can be eliminated, and seeing what would have to be true to have that happen. There used to be a section here about this strategy, but I've removed it, because it adds nothing new and isn't very easy for me (at least) to use.

An example of a pattern that is handled easily by hypothesis and proof is what has been referred to as an XYZ-wing. Both the * and Z here are weak corners between the two strong chains XZ and YZ. Let's hypothesize that the * is Z. Any Z shown, then, would be FALSE. But the X of XZ and the Y of YZ both must be TRUE, because they are parts of the strong chains associated with that * node. OK, so if they are TRUE then they each force the X and Y of XYZ to be FALSE. But then all of X, Y, and Z have to be FALSE. That leaves nothing at that position, and the hypothesis is disproved. Therefore, * is not Z. QED.

*      XYZ
                  
XZ            
YZ            
                  
                  

Note that in practice you doesn't really have to think about chains at all if you don't want to. Just follow the logic. One node TRUE means all those "related to it" are FALSE. From there, look for strong connections -- a cell with only two values in it, or a row, column, or block with only two of the TRUE node's value. Those are the strong chains that can be set TRUE. Go from there to find more FALSE nodes, and on and on. Periodically look to see if there is any row, column, block, or cell that satisfies any of those last three conditions. It's really no more than that.

Depth

Full trial-and-error analysis involves adding some element of depth. The idea here is that if a hypothesis gets to a "dead end" -- no more conclusions to make, but still no solution -- then all we really have to do is start the sequence over from that configuration. We can look for singles, locked candidates, subsets, grids, chains, and even do more "subordinate" hypothesizing. If this process is taken to the extreme, then you basically have a depth-first backtracking solver. The Sudoku Assistant doesn't go that far. Instead, it takes a middle ground and considers two optional added elements of depth:

  1. +depth just adding analysis involving single cells, rows, columns, and blocks (singles and locked candidates), and

  2. ++depth also adding analysis involving multiple cells, rows, and columns (n-tuples, grids). Note that using ++depth amounts to more than just one level of what people usually refer to as "depth" because these methods can't be replaced by simple "If X then Y" type logic. They require "If X and Y then Z" logic, and that is the logic of depth. ("OK, let's hypothesize that X is true, and then let's also hypothesize that Y is true, then....")

Further discussion of dead ends and depth can be found here.

Beyond These Techniques

Obviously this can't solve all Sudoku puzzles. But so far no one has shown me a puzzle that can't be solved using ++depth. Using the above techiniques together Sudoku Assistant has solved all of the puzzles I have thrown at it, including all top95 and all "impossible" 520 puzzles. This puzzle and also what I consider the hardest known Sudoku both require ++depth.

--Bob Hanson

Disclaimer