The Sudoku Assistant uses several techniques to solve a Sudoku puzzle: cross-hatch scanning, row/column range checking, subset elimination, grid analysis,and what I'm calling 3D Medusa analysis, including bent naked subsets, almost-locked set analysis. Almost-locked set analysis can be extended to grids, where it forms the basis for all finned fish and sashimi, and also to what I am calling almost-locked ranges. When all that fails, the Sudoku Assistant resorts to hypothesis and proof and a sort of depth. All of these techniques are based on identifying all the possible "candidates" for a cell (indicated by marks) and then eliminating them one by one until only one possibility remains in a given cell.

Cross-Hatch Scanning (looking for singles)

The rule of singles requires:
When a candidate k is possible in only a single cell of a row, column, or block, then that cell must be k.
This situation can arise for one of two reasons. A naked single arises when there is only one possible candidate for a cell; a hidden single arises when there is only one possible cell for a candidate. Despite the name, hidden singles are far easier to find than naked singles. You should always start a Sudoku by finding all the hidden singles. No marks required!

The dots in the cell in row 3, column 5, indicate that in that cell the numbers 4, 5, 6, and 8 are all possible. But in that top middle block only one cell can hold a 5. That's a hidden single.
This is the most basic technique. Since a number can only appear once in any given column or row and must appear exactly once in any given 3x3 block, the easiest place to start is to first check for cells that must hold a value because no other cell in a 3x3 block can hold that number. For example, in this case the number 5 is excluded from all but one cell in the top center 3x3 block. The 5 in this cell is called a "hidden single" because it can only be in this single location, and that fact is "hidden" by the presence of the other marks. This process, referred to as cross-hatching, is repeated for each row and each column. Cross-hatch scanning is generally all that is necessary for "easy" puzzles. Most people do this step without actually making any marks.

Generalized Method Rule

First of all, if the rules discussed below sound pretty much the same, it's because they are all just permutations of the same idea in different dimensions. Using "A" and "B" here for some number of rows, columns, cells, or blocks, then we have:
If a candidate k is possible in the intersection of A and B but not possible elsewhere in A, then it is also not possible elsewhere in B.

This idea is more fully discussed mathematically in The 12 Rules of Sudoku.

Row/Column Range Checking ("locked" candidates) examples

The locked candidate rule, form 1:
When a candidate is possible in a certain block and row/column, and it is not possible anywhere else in the same row/column, then it is also not possible anywhere else in the same block.
The locked candidate rule, form 2:
When a candidate is possible in a certain block and row/column, and it is not possible anywhere else in the same block, then it is also not possible anywhere else in the same row/column.
Once all the singles have been found, I usually start marking. Then the job is to eliminate marks until only one remains in a cell, so then we know that cell's value. In this next technique, we use the fact that once a number can be assigned to a given row or column of a specific block (even if its exact location is still unknown), no other block may have that same number in the same row or column.
In the example shown on the right, the location of the number 1 is already known in both the top row and the left-most column. In addition, the second row of the top left 3x3 block is already filled. The only possible locations for a 1 in the top left 3x3 block, then, is the third row. But this means that, in the third row, the number 1 must appear in one of the red-circled cells -- not only for this 3x3 block but at all. We can eliminate it as a possibility (remove its mark) from the other cells in this row. Those values then lock that row and demand that in the top right block the 1 must be in one of the two positions indicated in blue.
The same idea eliminates the possibility of 9s in the center block's top row, because in this case the only possible place for a 9 in the left center block is the top row.

There are several more locked candidates on this board. They involve 1s, 4s, and 7s. Can you find them?

By the way, can you find the hidden single 5? It's here.

Subset Elimination examples

As for locked cells, there are two subset rules. The naked subset rule states:
When n candidates are possible in a certain set of n cells all in the same block, row, or column, and no other candidates are possible in those cells, then those n candidates are not possible elsewhere in that same block, row, or column.
The hidden subset rule states:
When n candidates are possible in a certain set of n cells all in the same block, row, or column, and those n candidates are not possible elsewhere in that same block, row, or column, then no other candidates are possible in those cells.
Thus if a "subset" of three cells can be identified for which the only possibilities are exactly three numbers, then even though we don't know which of those cells has which number in it, we still know that no other cells in the row, column, or 3x3 block containing that "subset" of cells can have any of those numbers.
In this example, since there are already exactly two locations in the top row that together must contain "3" and "7", the cell fourth from the left cannot be 7; It must be 6. Subset elimination along with cross-hatch scanning and block range checking can generally take care of "moderately difficult" puzzles. (This technique is also referred to as looking for naked or hidden pairs, triples, and quads.)

Grid Analysis examples

Other advanced techniques are required to solve more difficult Sudoku puzzles such as this one. These techniques are based on one of the following two rules:
When a candidate is possible in a certain set of cells that form the intersection of a set of n rows and n columns, but are not possible elsewhere in that same set of rows, then they are also not possible elsewhere in that same set of columns.
and its flipside,
When a candidate is possible in a certain set of cells that form the intersection of a set of n rows and n columns, but are not possible elsewhere in that same set of columns, then they are also not possible elsewhere in that same set of rows.
The grid technique involves picking a specifc number, such as 5, and checking for known allowed patterns of possibility such as X-wings and swordfish.

An X-wing pattern. Within the two highlighted columns, a 4 must be contained in each of the upper and the lower rows. The possibility of 4 in all five circled cells can be eliminated.

Two intertwined "swordfish" patterns. The possibility of 5 in the circled cell can be eliminated.


Note that grid analysis is essentially a form of 2-dimensional subset elimination, where now the subsets themselves are correlated over multiple rows or columns. So, for example, if you look at that blue pattern, there is a set of three columns (the third, the sixth, and the seventh) for which 5 can only be in rows 2, 5, and 7. Three columns...three rows. Just like "three numbers...three cells". In exact analogy to subset elimination, this means that for these three rows the five must be in one of these three columns. it's tricky! In fact, though, X-wings and swordfish are simply two simple varieties of a much broader category of beast.

A 5x5 circuit containing the candidate number "1" hiding in a simple Sudoku. There are five columns and five rows that make up the set. All possibilities for 1 in these five rows and columns that are not on the grid itself can be eliminated. Note that if we list the row numbers where candidate 1 appears, we get {379 469 14679 46 139 1(not shown)}. So in this case the two indicated marks can be eliminated either due to the 5x5 "hidden quintuplet" of 34679 or the "naked singlet" 1.

The demonstration of this equivalence of all such "circuits" is the way in which the Sudoku Assistant handles all circuit analysis without distinction:
  1. Look for regions of the board that have a given candidate's possibilities lying on a grid of n columns and n rows. Additional possibilities may exist outside of this set of gridlines in either rows or columns, but not both.
  2. Eliminate all possibilities of that number in any row or column defined by that grid but not including the grid points themselves.
If you look at the JavaScript code for the Sudoku Assistant, you will see that it uses the same function (analyzeX, just 16 lines) to find all possible X-Wings, Swordfish, hidden sets, and "naked" sets. The fact is, all of these are the same "beast" just seen from different perspectives. (More on that later!)

A 4x4 grid based on
the candidate number 4
The point is simply that finding X-wings and swordfish and the like is not a daunting task. One simply needs to discern the n by n grid containing the set of candidate squares. You don't have to actually discover the circuit. The principle at work here is as simple as can be: Once a number can be assigned to a given row or column (even if its exact location is still unknown), no other cells in those same rows and columns may also contain that particular number.
Since an n by n set of cells defines exactly where n numbers can go, the principle is the same regardless of the number n. So all we are doing in this sort of analysis is idenitifying a set of n rows and n columns that contain a given candidate's possibility at their points of intersection. All other possibilities, all those not at these grid points, can be eliminated safely.

Bent Naked Subsets examples

Consider a simple naked triple:

                    
                    
12 *       
                    
                    
13 23       
                    
                    
                    


What prevents the cell with * being a 2? Well, if a 2 were there, then we would be eliminating both possibilities of 2 in that same row and in particular in our naked triple, right? Those cells, in turn, would have to be 1 and 3, and that would remove the only possibilities for the cell having only 13 in it. This is how all naked sets work. You have N cells having among them N possible candidates. If an option removes one of those possibilities from the set, then there aren't enough candidates to go around, and that option can be eliminated.

Now consider what is referred to as an XY-Wing:

*            
                  
12            
23            
                  
13            
                    
                    
                    


Notice that the three cells constituting the XY-Wing form a naked triple. It's just that this triple is "bent" -- the three cells are not in the same row, column, or block. What are the implications of that? First, unlike a standard naked triple, it's possible that both the 12 and the 23 cells are 2. So that's no help. But here's the interesting thing: It is still not possible for NEITHER of them to be 2. Because they still have exactly the same relationship to the 13 cell. If neither is 2, then they are 1 and 3, and then the 13 cell is left with no possibility. The cell marked with * cannot be 2. It's as simple as that. XY-Wings are just bent naked triples.

You know how easy it is to find naked triples. Well, it's just about that easy to find XY-Wings. Just look for bent naked triples. Consider, for example, the board on the right. The 2 in row 8, column 1 can be excluded by the "bent triple" in rows 7 and 8.

Notice that of 1, 2, and 3, it is the 2 that can be excluded because (a) of 1, 2, and 3, only the 2 is common to both ends of the triple, and (b) the cell we are striking the 2 from "sees" all of the cells of our bent naked triple that contain 2.


But beware! If the middle cell were 123 instead of 13, this strategy would not work, because the 2 we struck would not be in a position to see all the cells of the subset containing 2, and if one of the ends were 123 instead of just 12 or 13, then the strategy would not work because two values would be common in the two ends, not just one. (And it would not be an XY-Wing.)

The idea of a "bent" naked subset is that the subset involves two domains -- a row and a column, a row and a block, a column and a block, or even two rows or two colunns. The key is the cells that are NOT in the intersection of those two domains. If there is a value k possible in both of these domains outside that intersection, then we have the simple rule:
If a bent naked subset contains one and only one candidate k that is present in both of its nonintersection subdomains, k can be eliminated as a candidate in any cell that sees all the possibilities for k in the subset.
An example: This one is a block and column involving 1, 4, and 5. Again, 4 can be excluded. It's what is referred to as an XYZ-Wing. If we choose 4 in the {14} cell, obviously the 4 in the {1245} cell in the 6th row of its column can be excluded. And if we choose "not 4" in that {14} cell, that is, if we choose 1, then that excludes the 1 in the {145} cell below it, the {45} {145} in that block becomes {45} {45}, and the 4 in the {1245} cell is still excluded. So no matter what the case in the {14} cell, the 4 in {1245} can be excluded. This is what happens with all naked subsets -- bent or not.


Why exactly does this work? Let's use our general intersection idea to prove that when there is just one common value in the nonintersecting region of the subset, that candidate must not be eliminated. The proof goes something like this:
Say you have a bent naked subset that involves exactly one common candidate k, as shown on the right. We don't care what's in the intersection. (This is what makes bent naked subsets so handy and makes them fundamentally different from almost-locked sets.) What we know is that there are N candidates for N cells. Now remove the common candidate k from all locations in A and B. We now have N-1 candidates in N cells. That would be OK if we could duplicate one of the other values, but we can't do that, because a, b, c, ... are all within A, so they can't be duplicated, and d, e, f, ..., are all in B, so they can't be duplicated, either. So none of the remaining candidate numbers can be duplicated, and it's not possible to eliminate the common candidate k. Any candidate k elsewhere that would do that may be eliminated.


One final example that illustrates another rule of bent naked subsets. Consider the configuration on the right. The five cells in green and blue constitute a bent naked set consisting of the five numbers {15789}. Interestingly, there is no common candidate outside the intersection region -- {589} for the block and {17} for the row. In this case the indicated 9 can be eliminated because it sees all the cells of the subset containing 9. (Check it out for yourself that this is true. When that cell is 9, then the top row of that block must be 7 and 1, and that rules out any value for the {17} cell.)


This points to another rule for bent naked subsets:
When there is no common value k in the two nonintersecting regions of a bent naked subset, the subset behaves as a standard naked subset. That is, candidate k can be eliminated from any cell that can "see" all cells of the subset containing k.
Let's use our general intersection idea again to prove this rule. Say you have a bent naked subset that involves no common candidates, as shown on the right. Again, we don't care what's in the intersection. What we know is that there are N candidates for N cells. Remove any one of the candidates -- say, "a". We now have N-1 candidates in N cells. Again, that would be OK if we could duplicate one of the other values, but we can't do that, for exactly the same reason as the previous proof. Candidates b, c, ..., are all within A, so they cannot be duplicated, and candidates d, e, f, ..., are all in B, so they cannot be duplicated. So none of the remaining candidates can be duplicated, and there is no way to fill the N cells. We cannot eliminate any of the candidates in the bent naked subset. Any possibility that would do that may be eliminated.


I suggest that bent naked subsets should be pretty easy to see in a puzzle. The Sudoku Assistant can find some, but not all, bent naked subsets.

Almost-Locked Sets examples

Closely related to bent-naked subsets is an idea that has been referred to as almost-locked sets. An "almost-locked" set is one that consists of n cells containing n+1 values. Removal of any one value from this set will "lock" the set as a naked subset. It turns out that a certain combination of almost-locked sets constitute one type of bent naked subsets, so there is some overlap in this analysis. Let's look again at that last example of a bent naked subset.
The cell outlined in green constitutes a single-cell almost-locked set "A" = {17}. Here we have one cell with two candidates. Similarly, the four cells highlighted in blue consitute another almost-locked set, "B" = {789 1789 59 58}, involving four cells and five candidates: 1, 5, 7, 8, and 9. Together they form the bent naked subset {17 789 178 59 58}.


We could represent the situation as shown below:

              5 8   
              |/
    A--7...7--B
    |         |\
    1.........1 9


Almost-locked set A involves two candidates {17}; almost-locked set B involves five candidates {15789}. Candidates 1 and 7 are special, because if one set has one of them, the other set doesn't and must have the other. So, for example, if we remove the 7 from both of these sets, we run into a problem, because then both of them need the 1. But only one of them can have the 1, because all the 1s are in Row 4. Same for the 7. So we can eliminate any 7 or 1 that "sees" all of the 7s and 1s in our pair of almost-locked sets. But it's better than that. We can also eliminate all the 5s, 8s, or 9s that see all their kind in the pair as well. That's because eliminating any of these candidates forces one of the sets A or B to require both the 7 and the 1, leaving the other without enough candidates. We can eliminated all the candidates marked in red. So we have a new rule:
If two almost-locked sets are mutually doubly linked, any candidate k elsewhere that "sees" all the candidates k in the two sets can be eliminated.
Similarly, in that next-to-last example of a bent naked subset, where we have the bent naked subset {14} {145} {45}, you can divy up the two almost-locked sets two different ways. If we use A={14 145} and B={45}, then we can write:

                    
              14
                    
              145
45              
              4...


Here we have

         5...5  
        /     \   
       B       A--1   
        \     /   
         4   4       
                       

Notice that these sets are not doubly linked. However, they have the common value 4. Removing 4 from both of these sets would cause both to require 5, but they can't both require 5, so we can eliminate 4 from any cell that can see all three of these cells. Same with the 1. Specifically, in this case we get to eliminate 4 from the cell colored in red. So we have a the rule:
If two almost-locked sets are linked by candidate i and have a different common candidate k, then any candidate k elsewhere that "sees" all the candidates k in the two sets can be eliminated.
This turns out to be a far more common situation than the doubly-linked pair business described above.

There are lots of things that can be done with almost-locked sets. Basically, they can plug right into sets of chains, because they can transmit a weak link. That's because if one of the candidates in the set is forced FALSE, then all the other candidates are forced TRUE (and the set is locked). That's the essence of a weak link. One problem, though, is that they can be plentiful. It's not uncommon for a Sudoku board to have 100 or more almost-locked sets. So finding the one or two that produce an elimination can be difficult and time-consuming. Still, you might just find one.

Grid-Based Almost-Locked Sets (Including Sashimi) examples

A little-heralded characteristic of almost-locked sets is that they also apply to single-value grids. The key here is that one value can knock out two possibilities of an almost-locked set. You can't do this with value-based almost-locked sets -- a 7 somewhere knocks out a 7, not a 7 and an 8 -- but that is possible for a grid-based almost-locked set. Let's see how that works.

Consider the set of 8s on the right. What now? Remember that a grid of 8s is really the same as a subset. (An X-wing is really a naked pair; a swordfish is a naked triple, etc.) All we need to do is consider the columns as a set and see what rows the 8s are in. In this case we have: { 146 126 8 237 16 1467 1235 25 9 }. Take a good look there. The {126 16} form an almost-locked set. Those 8s are circled in blue. The 8 in r1c1 is doubly linked to this set -- it would remove both the 1 and the 2 and reduce the set to {6 6} ("two 8s in row 6"), an impossibility. r1c1#8 can be removed. Note how EASY it is to see the almost locked set -- it's just an X-Wing with an extra cell.


Here's another. In this case we have {3 5 79 12489 24789 278 12789 279}. The key is the almost-locked set {79 278 279}. The 8 in Row 7 Column 5 is doubly linked to this set. Assigning 8 there removes all possibilities for 8 in Rows 7 and 8. But we can't reduce the number of possibilities in an almost-locked set by two. So r7c5#8 can be eliminated. Same goes for r9c5#8 and r9c4#8. The Sudoku Assistant analyzes this board in terms of almost-locked sets and classifies it as a Sashimi.


Many methods have been discovered that go by other special names but really just are looking for this sort of grid-based almost-locked sets. See, for example, A1s. It turns out, for instance, that the technique referred to as sashimi is easily described in terms of almost-locked sets. In the example on the right the green-highlighted cells are a column-based almost-locked set for 6s involving the following three sets of four rows: {16 17 367}. The red 6 is a weak link to this almost-locked set because it is in Row 6, which would reduce this set -- lock it -- as {1 17 37}. Normally that is no problem. But in the case of a Sashimi, it would have the additional consequence that Column 7 would now be just {3}. But the {1 17 37} lock disallows any other 6 in Row 3. So that 6 can be eliminated. Nothing more than that.
Four additional examples of Sashimi analyzed in terms of almost-locked sets are shown below.

impossible520 Number 273 Step 10
r4c3#4 is eliminated by an almost-locked set in 4s in Columns 6 and 9 which, when reduced, conspires to disallow all 4s in column 1.

impossible520 Puzzle 202 Step 22
r5c5 cannot be 4 because of the almost-locked set of 4s in Columns 7 and 9: {58 38}. If r5c5 were 4, then this set would be reduced to {8 38}, and all of the 4s in Column 4 would be excluded.

impossible520 Puzzle 121 Step 9
r2c3 cannot be 4. If it were, then almost-locked set {25 59} of 4s in Columns 4 and 7 would reduce to {5 59}, and all the 4s in Column 1 would be excluded.

top1465 Puzzle 89 Step 31
r1c1#6 is eliminated by a simple row-based almost-locked set in 6s in Row 9: {14}. If r1c1 were 6, then the set would be locked as {4}. But the 6s in Row 2 would be reduced to rows {4} as well, allowing no 6s in Column 4.


It's interesting that the Sudoku Assistant originally only found column-based Sashimi. This was not a bug. It was because Sudoku Assistant checked column-based subsets first and did not consider that large column-based Sashimi can be alternatively represented as smaller row-based Sashimi. That is, the same sort of row/column relationship with grids (that an n x n grid in rows is always also an m x m grid in columns) has an analogy with Sashimi. (If you want to play with this, add the ALSLARGE option when you list the A1s examples, looking specifically for row-type Sashimi. Then Sudoku Assistant will find large Sashimi as well as small.) Thus we have the following additional rule:
All Sashimi in rows(columns) have complementary Sashimi in columns(rows). These alternative eliminations always involve the complementary set of columns and rows of the original Sashimi and involve the same elimination.
Here is an example:


top1465 Puzzle 89 Step 31 (row-based Sashimi)

top1465 Puzzle 89 Step 31 (column-based Sashimi)

Almost-Locked Ranges

Subsets are not the only grouping that can be almost locked. Consider, for example, the board on the right. Here we have a simple X-type strong chain involving 6s. The red-highlighted 6 is a weak link to the yellow polarity of the chain. What's interesting here is the bottom right-hand block. Notice that the 6s are positioned in a row/column intersection. I'm going to refer to this as an almost-locked range. The idea is similar to an almost-locked set (and Sudoku Assistant treats it as such, and it is actually very close to the way I solve Sudoku puzzles by hand). The red-highlighted 6 locks the 6s in the lower right-hand block to the first row, which forces the opposite polarity in the chain. And any time a candidate forces both polarities of a strong chain we have a problem. r6c9 cannot be 6.


I got interested in this when I read about "Franken" Swordfish. I don't pretend to understand the analysis presented there, with all the Xs, defining and secondary sets, constraints, and such. To me it's a simple case of a conjugate pair working with two almost-locked ranges. Really what is up in the top right box is totally immaterial. Starting with the red- and blue-highlighted candidate options, you can follow the eliminations in red and blue lines. Where the two overlap, that's where you can eliminate any similar candidate. It works because of the pairing of the two almost-locked ranges. It's really a very clever little idea, one of hundreds, I'm sure, that could involve almost-locked ranges. The key is not where the candidates are, it's where they are not.

Basically there are 15 varieties of almost-locked ranges. They are shown on the right. In each case, at least one candidate must be in one of the red squares, and at least one must be in one of the blue squares. The presence of a candidate in the yellow squares is optional. (The Sudoku Assistant ignores them unless there are at least three candidates in the block; otherwise, with just two, they reduce to a simple strong "conjugate pair".) What's important is that there is no candidate in any of the gray cells.


In effect what these almost-locked ranges do is to shift the direction of a weak link. The upper nine types turn the direction of a weak link by 90 degrees, switching the link from a row-based link to a column-based one or vice-versa; the bottom six amplify the signal along a parallel row or column. When coupled with a strong chain, almost-locked ranges can have the interesting consequence noticed by the discoverer of the Franken Fish.

I don't know that it makes a whole lot of sense to name combinations of almost-locked ranges and conjugate pairs and such. These things are everywhere; when I solve Sudoku puzzles, I simply follow them and see where they lead. In combination with strong chains, they can easily lead to eliminations.

By the way, that yellow-highlighted cell is what makes almost-locked ranges a different sort of idea from almost-locked sets. Unlike almost-locked sets, here we have a cell where, if a candidate is present, both options disappear. That yellow cell is in effect a weak link to both the red and blue subsets, generating a FALSE in all four directions. This is really quite different from almost-locked sets, where N-1 candidates must ultimately be present somewhere within the set.

Four simple examples will suffice to illustrate almost-locked ranges. Browse the examples for over 100 more.


top1465 #250
Here's a simple one. The 1 in r1c3 is a weak link to both the {15 15} pair and the almost-locked range in block 2, which redirects the link back to the other polarity of the {15 15} pair. r1c3#1 can be eliminated.

top95 #20
Again, the redirection of the weak link by an almost-locked range causes a candidate to force both parities of a conjugate pair. r3c3#5 can be eliminated.

impossible520 #228
A column-type almost-locked range duplicates the weak link from the 9 in r3c9 in Column 8. But that then forces both parities of the strong XY chain involving 7 and 9 -- an impossibility.

impossible520 #517
Here's a great elimination -- a full 9 candidatates in this rather complex strong chain are eliminated all at once because the 8 in r8c2 forces "not 8" in r6c7, but those two 8s are of the same polarity. That's not allowed.


The possibilities are endless. It's safe to say that almost-locked sets are everywhere, and they can be powerful in generating eliminations. The table below summarizes the Sudoku Assistant's solving of the impossible520 set. "A" here is simple almost-locked set analysis; "a" is that in combination with weak links to strong chains. "Ms" and "Mw" are Medusa strong and weak chain analysis, respectively, where no almost-locked sets were involved. Clearly when almost-locked analysis is in place, Sudoku Assistant uses it extensively.



It's interesting to compare these results to when almost-locked set analysis is turned off (in the table below). The number of puzzles where Sudoku Assistant resorted to pure hypothesis goes up from 18 (about 4%) when using almost-locked set analysis to 95 (about 18%) when not using it.

The Sudoku Assistant and Almost-Locked Sets examples

The Sudoku Assistant by default does not look for almost-locked sets because it slows down the processing to do that, and they have not proven to be generally decisive in solving puzzles. But when those checkboxes are checked, it currently does the following:
  1. Finds all possible almost-locked sets.
  2. Checks to see if there are any mutually-linked pairs of almost-locked sets (bent naked subsets). If so, it checks for weak links and eliminates them.
    
        8 1           2...2*   
         \|          /
          A--3...3--B
         /|         |
        9 7.........7...7*
    
    
    Here almost-locked set A has five possibilities in four cells, and almost-locked set B has three possibilities in two cells. Together there are six possibilities in six cells -- a bent naked subset.
  3. Looks for nodes that are weak links to two already-linked almost-locked sets and eliminates them. (7* in this case).
    
             2...2---B  
            /        |
        3--A         |
            \        |
             7..7*...7   
                           
    
    
  4. Looks for strong chain nodes that are themselves weakly associated with an almost-locked chain. If a chain or set of chains is connected in an impossible way, such that it would force two of the N candidates of the almost-locked set to be FALSE, that chain can be eliminated.
    
                  2...2*--2   
                 /         7
        3*...3--A          .
                 \         .
                  7...7*---7-   
                           
    
    
    Due to the chain, the 2* and 7* have to be such that either both are TRUE or both are FALSE. But they can't both be TRUE, because that would not leave enough candidates for almost-locked set A. They can be eliminated.

3D Medusa Analysis examples

If a grid of cells can be made for which the value in one cell directly forces the value in another cell, we can sometimes follow that grid to see if it leads to a logical inconsistency. This can be done even though the exact contents of a cell are unknown. Some algorithms look for "strong cycles" with "weak edges," but there is another way to think about this. What is significant is that in a "binary grid" every other cell along the grid either has one value or another -- only two possible values. (The grid is made by specifically selecting only cells that have exactly two possibilities.) If any two such cells are in a single row, column, or 3x3 block, it is as if they were the ONLY possibilities for their cells.

a complicated X cycle. Only the values of 5 are shown. Lines connect mutually exclusive possibilities. Alternating nodes on the chain are sprayed red. The circled value is excluded by both of the chain options, so it can be eliminated.
Thus, if we can say, because of a connecting chain of grid points, that two cells "either must both by 5 or neither is 5" then if they are in the same block, row, or column, neither can be 5. In the case on the left, we have a grid with two endpoints in the same 3x3 block. They are opposite "parity" -- we don't know which value is where, but regardless of that, whenever one is not 5, the other is 5. We can eliminate the circled possibility. That particular cell cannot be a 5, because it is in a block with two other cells, one of which has to be a 5. Since cells with only two possible solutions are relatively easy to spot, this technique lends itself reasonably well to human solving of Sudoku puzzles.
In fact, we can generalize the idea of strong binary grids to three dimensions. If we allow the grid to not just involve a specific candidate number, a 3D grid is generated. The Sudoku Assistant uses the Jmol applet to help visualize this. (The different colors here represent different candidate numbers, with red, at the top being 1 and white, at the bottom, being 9.)

Weakly Linked Chains examples

A full 3D Medusa analysis involves extending these idea to weakly linked chains (encompassing X cycle Type 2 as well as Y cycles). The idea here is that one may have a sequence of three "nodes" (row, column, and candidate value, in 3D) A --- b --- C, for which A and C are members of strong chains (their cells have only two possible values). We call this a "weak link" or "weak edge" if A, b, and C all are in the same row, column, or block -- or, in a 3D sense, the same cell. In that case, the "strongness" of one chain can be transferred to the other chain, provided the right condition is applied. (In other words, we can assert "if A is 5...then C is NOT 5", but we cannot assert that "if A is NOT 5, then C is 5.") The Sudoku Assistant can follow these weak links to any number of associated chains in order to eliminate possibilities.

The classic example of weakly linked chains in the form of a Y-cycle is an XY-wing, shown below.

*            
                  
XZ      X...
YZ            
Y...            
XY            


X... and Y... here are weak links. The first links the strong chains XZ and XY; the second links the strong chains XY and YZ. It doesn't really matter if those links are there or not. (If they are not there, then we just have one strong chain.) The effect is the same: "NOT Z" in the lower left cell gets transmitted to the top right cell through the chain:

 Z--X--(weak X)--X--Y--(weak Y)--Y--Z
 -  +            -  +            -  +
 ------direction of transmission---->


as "Z" at the other end. Or, if the other Z is false, then we have:

 Z--X--(weak X)--X--Y--(weak Y)--Y--Z
 +  -            +  -            +  -
 <-----direction of transmission-----


So one or the other of the Zs on the ends must be TRUE, and the * cannot be Z.

The term I use for * in the above situation is weak corner. Here we have again A --- b --- C, for which A and C are members of strong chains. To be a weak node, A and b must be in the same row, column, block, or cell, and the same is true for for b and C. But if these aren't the same row, column, block, or cell, then we don't have a weak link. Instead, we have a weak corner. The "strongness" of one chain cannot be transferred to the other chain via a weak corner, With the proper connecting of strong chains by weak links, one can infer valuable information about such a node. In particular, the elimination of the corner node, b, can be forced by the right sort of chain connection.

3D Medusa Hinge

If we know that two cells of a strong chain are positioned such that both must be X or both must not be X, and together they exclude X from a block, then neither can be X. On the right, for example, the two red Xs are linked because they are part of the same strong chain. But if they are both TRUE, then all the Xs in the top right block are disallowed. There must be some X in that box, so their being TRUE leads to a contradiction. The XA cell must be A, the XC cell must be C, and the XB cell must be X. This technique has been referred to as a strong block hinge. Medusa variants also target cells in the same row or column, or target values in the same cell.
XA            
                  
                  
X...X...      
            X...
            X...
                  
XB            
      X...      
                  
            XC
      X...      
A strong block hinge.
Cells XA and XC cannot be X,
and the cell labeled "XB" must be X.
The chain connecting XA and XC need not be strong. If one of the ends is a weak node, then the result is that we have a weak hinge, and we can eliminate that weak node. In the example on the right, if XA is X, we cannot say anything about XC because the X in XC is only weakly linked to the X of XB. (Only when XB is X can we say anything about XC; when XB is not X, we can't conclude anything about XC.) But the weak link does work the other way. So if XC were X, then XB would have to be "not X" (B), and XA would have to be X. But then the two ends of the hinge would be X, and that would remove all possibilities for X in Row 2. So XC must be C, not X.

The Sudoku Assistant will find these sorts of hinges if the "hinge" box is checked.
XA            
      X...X...
                  
                  
            X...
      X...      
                  
XB            
      X...      
                  
X...      XC
      X...      
A weak hinge targeting Row 2.
The cell labeled "XC" must NOT be X.

Hypothesis and (Dis)Proof examples

The addition of weakly associated nodes to the mix of strong 3D chains allows for a method of solving Sudoku puzzles that transcends the idea of "forcing chains" or "trial and error" to allow for the formation of hypotheses regarding possible candidate elimination and their direct proof. Using 3D Medusa chains, we can follow the implications of eliminating or not eliminating a mark and see what comes of it. There are two strategies for this analysis, which I am calling proof or disproof.

The disproof strategy goes like this:

Consider a certain chain, 3, that has two possible states, 3C and 3c.
  1. Hypothesize that a weakly associated node, say one of the 3c-associated nodes, is TRUE -- that it cannot be eliminated.

  2. Mark all Chain 3c nodes with the lowercase letter "a" meaning FALSE -- "If the hypothesis is true, then this node must be FALSE.

  3. Mark all Chain 3C nodes with an uppercase "A" meaning "if the hypothesis is TRUE, then these nodes must also be TRUE (not eliminated)." This works because Chain 3 is a strong chain.

  4. Mark all associated weak nodes of 3C with a capital letter "B" meaning "if the hypothesis is TRUE, then these nodes will be false (that is, they can be eliminated)."

  5. Recursively add any chains directly involving any of these associated weak nodes with alternating capital letters/lowercase letters.

  6. Check to see if we have either of the following two forcing conditions within any subset (all of a specific candidate in a row, column, or block, or all the candidates of a given cell):

    1. All members are FALSE. Then, since one HAS to be true, the hypothesis is disproved.

    2. Two members are TRUE. Then, since one MUST be false, the hypothesis is disproved.


    If either of these is the case, we are done. The hypothesis is disproved and the node can be eliminated.

The proof strategy is essentially the same, but the objective is to prove a hypothesis rather than disprove it, starting with the proposition that a certain mark can be eliminated, and seeing what would have to be true to have that happen. There used to be a section here about this strategy, but I've removed it, because it adds nothing new and isn't very easy for me (at least) to use.

An example of a pattern that is handled easily by hypothesis and proof is what has been referred to as an XYZ-wing. Both the * and Z here are weak corners between the two strong chains XZ and YZ. Let's hypothesize that the * is Z. Any Z shown, then, would be FALSE. But the X of XZ and the Y of YZ both must be TRUE, because they are parts of the strong chains associated with that * node. OK, so if they are TRUE then they each force the X and Y of XYZ to be FALSE. But then all of X, Y, and Z have to be FALSE. That leaves nothing at that position, and the hypothesis is disproved. Therefore, * is not Z. QED.

*      XYZ
                  
XZ            
YZ            
                  
                  

Note that in practice you doesn't really have to think about chains at all if you don't want to. Just follow the logic. One node TRUE means all those "related to it" are FALSE. From there, look for strong connections -- a cell with only two values in it, or a row, column, or block with only two of the TRUE node's value. Those are the strong chains that can be set TRUE. Go from there to find more FALSE nodes, and on and on. Periodically look to see if there is any row, column, block, or cell that satisfies any of those last three conditions. It's really no more than that.

A good trick Gail Nelson taught me is that you can to this hypothesizing starting with a cell with only two values in it and go both ways by placing a circle around the implications for one option and a square around the implications for the other option. You often don't have to go far in either direction to come to a satisfactory conclusion:
  1. The two logical chains converge to a particular value in a specific cell. Then that cell is that value.
  2. The two logical chains converge to two different values in the same cell, row, column, or block. Then every other possibility in that cell, row, column, or block can be eliminated.

This, I find, is a very powerful technique. Except for the Easter Monster puzzle, this technique has solved every Sudoku puzzle I have thrown at it. It is definitely my choice of last resort when solving puzzles by hand. I should probably get Sudoku Assistant doing this, but I haven't done that yet.

Depth

Full trial-and-error analysis involves adding some element of depth. The idea here is that if a hypothesis gets to a "dead end" -- no more conclusions to make, but still no solution -- then all we really have to do is start the sequence over from that configuration. We can look for singles, locked candidates, subsets, grids, chains, and even do more "subordinate" hypothesizing. If this process is taken to the extreme, then you basically have a depth-first backtracking solver. The Sudoku Assistant doesn't go that far. Instead, it takes a middle ground and considers two optional added elements of depth:

  1. +depth just adding analysis involving single cells, rows, columns, and blocks (singles and locked candidates), and

  2. ++depth also adding analysis involving multiple cells, rows, and columns (n-tuples, grids). Note that using ++depth amounts to more than just one level of what people usually refer to as "depth" because these methods can't be replaced by simple "If X then Y" type logic. They require "If X and Y then Z" logic, and that is the logic of depth. ("OK, let's hypothesize that X is true, and then let's also hypothesize that Y is true, then....")

Further discussion of dead ends and depth can be found here.

Beyond These Techniques

Obviously this can't solve all Sudoku puzzles. But so far no one has shown me a puzzle that can't be solved using ++depth. Using the above techiniques together Sudoku Assistant has solved all of the puzzles I have thrown at it, including all top95 and all "impossible" 520 puzzles. This puzzle and also what I consider the hardest known Sudoku both require ++depth.

--Bob Hanson

Disclaimer