So, back to the topic that started this, solving a Sudoku in software while using heuristics that a human would apply. In this post I'll be looking at Sudoku as a Constraint Satisfaction Problem. In a later post I'd like to look at Sudoku as a Set Cover Problem and how this relates to Human Heuristics, but lets start viewing it as a constraint problem, which has much wider applications than cover problems.
The approach can be applied to a Sudoku of any size and with any shape for groups, but for explanation purposes, lets work with a 4 by 4 Sudoku with 2 by 2 groups. As a constraint problem, we have 16 variables, the 16 locations in the Sudoku. Each variable has four options for its value, 1, 2, 3 and 4.
Then there are constraints. No number can be repeated in each row, column or group. Last, but not least, the values that are already filled in are also constraints. Otherwise we'd be generating all Sudoku's instead of solving a specific one.
Now, a naïve approach would be to just try each possible value for each variable and then check whether all constraints are met. A computer could still solve the problem like this, but it is very inefficient and also far from the way a human would solve a Sudoku.
The human way is to try to use the solved variables and the constraints to determine the correct value for unsolved variables. This postpones the need for trying values as much as possible. A typical Sudoku that is not too difficult can be solved like this completely. For more difficult Sudoku's it's common to start keeping track of the possible values for each location using pencil marks.
This pencil marks notation is something we can easily convert into software and is the base for the representation. We keep track of all options that are still possible for each variable. Once a variable has only remaining option, it is solved and this option can be removed from other variables in the same column, row and group. Once all variables have only one reaming option, the Sudoku is solved.
For more difficult Sudoku's you might get to a point where all effects of the constraints on the initially filled in variables are spread, but you're not at a solution yet. At this point you'll have to pick a variable that still has multiple options and try each option. As a human, you would pick a variable with 2 options over a variable with 3 options. This is the most constrained variable heuristic, which also helps a computer to solve these problems more quickly.
For this post I've written a Sudoku solver treating the problem as a Constraint Satisfaction Problem. It reads 9 by 9 Sudoku's (with regular 3 by 3 groups) from a text file and writes the solved Sudoku's to another text file. The source is available at:
https://github.com/josvanouwerkerk/SudokuConstraint
No files with Sudoku puzzles are provided, but can be found for example at:
https://github.com/grantm/sudoku-exchange-puzzle-bank
For these specific files the offset to the actual Sudoku is 13 on each line, which is a parameter that needs to be passed to the solver. For example:
SudokuConstraint.exe medium.txt 13
For the medium set of puzzles from the source above, this solver solves about 200.000 Sudoku's per second on my machine. A bit more than I expected honestly.