Applications Based on a Novel Sudoku Solver Algorithm and Grid Based Models

: Numerous algorithms for solving sudoku puzzles have been explored, most of which use a backtracking approach. Thus computational efficiency of such algorithms can sometimes yield poor results. We propose a probabilistic solver algorithm which, iteratively fills the sudoku grid and solves the same. In this approach we make use of a dynamic random number set, we identify unassigned sudoku grids for a given puzzle where only one possible value can be filled in and iteratively identify and assign cells with least number of possible values. We not only elaborate on our solver algorithm logic, but also explore application areas based on algorithm devised, after reviewing relevant similar approaches illustrated in the referenced articles. We believe by extension of this algorithm, many combinatorial problems in the field of material characterization, cryptography, cybersecurity can be solved and advanced. We also envision that with application of neural networks, Machine Learning techniques the algorithm will take a very adaptive and robust form, useful for solving complex problems in accurate estimation of missing data, discrete event analysis and prediction. Uniqueness is the ability to use high probability for faster computation and low execution time. With cyberattacks of varied vectors and types, its important to devise a mechanism to create a deliberate mismatch every time a possible attack is detected.


Introduction
Solving a sudoku puzzle using a deterministic method is similar to taking the right decisions after weighing the risks involved, instead of wasting resources trying and backtracking every time someone takes a wrong direction. Even the best of algorithms [1] do not prove ideal, alternative approaches such as the backtracking algorithm [2], exact cover approach [3], stochastic approaches [4,5] and a deterministic approach [6] although each exhibiting a unique style and prowess based on varied computing metrics, thus a quest to unravel an approach that repeatedly seeks a naked single6 until the only option left is to make use of a random number generator as seen in 2 is explored based on shifting probabilities of assignment to an unassigned Sudoku cell. This approach opens avenue to understanding ways to avoid exhausting possible paths to solving a problem/completing a task.
Genetic algorithms [5] till date were acknowledged to have superior performance, compared to all of the state of the art till date. While we reviewed all approached, we realized that there was still no way to benchmark algorithms as the best achievable performance and do all comparisons based on the same. The need for a comprehensive algorithm that worked based on assigning cells that had least number of possible values, or a dynamic probabilistic approach. Sudoku grids which generate a p=1 or single possible assignment value are considered as accelerators for our approach. Likewise, we in general look for unassigned cells with lest number of possible values, based on first cut grid tear down.
Our idea was to use this as a reference to compare human solvers and map it against this, to study how often the human mind gets close to the solution and misses it. Also where does the human mind surpass the algorithm.
To study these, we needed to phase the problem into phases and address each of them in order.
Following was the initial study undertaken: 1) Complete the working prototype solver in R and feed diverse, unbiased samples of Sudoku puzzles from a wide variety of sources.
2) Collect and save iteration statistics generated by R software and the 3D scatter plot visualizations showing probabilistic shifts for unassigned cells.
3) Train logic regression models based on unassigned versus assigned cell probabilities, to classify puzzles as unique or multi solution 4) Also identify diverse application possibilities for isomorphous decision structures.
Algorithm proof of concept for 2x2 grids: We illustrate the approach using R software for a 2x2 grid and showing graphically how thealgorithm iteratively arrives at a solution. The algorithm illustrated for 4X4 puzzles turns out to be deterministic in comparison to 9x9 puzzles which turn stochastic in nature once a algorithm required run of random number generator to guess unassigned cells. However, with this limitation for 9x9 cells, we still are able to identify and reject multi solution grids and filter out sudoku puzzles. We haven't in our scope of study found a way of characterizing 2X 2 type grids or studied effect of algorithm performance against increasing grid size.
Algorithm run in RStudio for a sudoku puzzle: Combinatorics a forerunner to the modern day sudoku, has been under the attention of mathematicians like Euler [7], the first ever attempt to make a grid where a symbol or number occurs once in each row or column. For integers 1 through 9, a 9x9 such combinatorial unsolved grid yielding one unique solution became an intriguing class of puzzles, today we know these as sudoku.
Our initial hypothesis was that our greedy algorithm will identify cells in the unsolved Sudoku grid which have a unique possibility of assignment. In general, too for solving an incomplete Latin square this is a boon [7]. The next part of our study was based on the run of the algorithm for various sudoku puzzles and consolidate findings from the same.
We realize that the solver algorithm needed to continue without backtracking at points where there were multiple possible values of assignment, during the course of iterations.
We introduced an explicit logic in code which flags off such points. We make use of a dynamic random number generator set S, obeying the constraints of a Latin square, here in specific the sudoku puzzle.
Excerpt from R programming showing a Random number generator runs: Here the function sample (1:1:10,9) is a row generator that would form the base of a Latin square solver. Perhaps in the free form manner, we might have to end up running several runs of the function to obtain and isolate the Latin squares using this approach.
Given a Latin square puzzle, this approach would have to run several iterations and populate a set containing unique Latin squares. We would then have to make use of genetic algorithms [5] and neural networks, to arrive at a solution. In the evolutionary computing world this approach seemed ideal but would be computationally expensive.
Thus we needed a call for a straight forward approach without backtracking, in all of our literature survey we did find some approached were extremely efficient but all of the methods [1][2][3][4][5][6] called for backtracking at some stage of iteration. However, we can't deny that genetic algorithms would perhaps serve as the most efficient algorithms for puzzle generation [5], likewise these would also aid in solving complex challenges in pattern recognition, characterizing materials.

Results from the Run of Sudoku Solver Algorithm in R
Algorithm terminates when the iteration is left with cell values containing assignment probability all or most equal to 1, and zero's or assigned cells.
Interpolation studies would reveal the rate of shift in probabilities per iteration, computing ∆[p]9 9, ∆2[p]9 9,…. ∆ -1 [p]9 9, where n is the number of iterations taken to solve the sudoku grid will reveal complexity of the puzzles undertaken.

Regression Model Evolution to Classify Puzzles
There was a need to identify the most accurate classifier model, which can be used in conjunction with results of the first iteration of the algorithm to identify grids which qualify as sudoku puzzles, i.e, ones with unique solution.
Our initial premise is to evaluate a linear regression model to classify grids in this manner, We create the following variables y matrix showing unique versus multi solution grids.
Here unique solution is classified as 0, and multi solution as 1.
We take here for sampling 5 random puzzles, of which 3 are known to be unique solution and 2 are multi solution, train a regression model to classify puzzles.   From the Results Summary we obtain the classifier model as

Coefficients
But the classifier accuracy is low since 2,4 are classified as unique solution instead of 2,3. Meaning the accuracy is 50% and also our output value y is a Boolean.

First Iteration Number of Cells with Unique Assignment Possibility
Call:  We thus conclude that number of unique assignment possibility cells, obtained in first iteration is the most accurate determining factor to classify and isolate sudokus from a grid generator.
To Here the classification accuracy is about 77.1%

3) Unassigned Cells and No of cells with unique possibility at end of first iteration:
Call: glm(formula=y ~ P1 + UA, family="binomial") Using randomly selected sample sudoku puzzles from various sources, we make sure that our classifier models are accurate and unbiased.

Identifying a Multi Solution Puzzle
We have included also an identifying logic that isolates multi-solution puzzles and flags it through the iterations as seen in the snippet below.
Broadly three categories or cases can be charted out where grids can yield multiple solutions.  Figure 13. Penultimate iteration with one locked cell and one with two assignment possibilities. These are explicitly called in the working code to flag such grids, this also highlights the potential of a puzzle generator. In order to get to logically define a puzzle generator we thus need to firstly revisit the algorithm, in terms of a flow chart. Algorithm flowchart:

Difficulty Rating of Sudoku Puzzles
Based on the proposed and running it on various puzzles we come across what is a rounded normalized score. The hypothesis or premise of the difficulty rating is based on the probability of assignment to unassigned grids before puzzle is solved by the algorithm.

Applications of the Solver Algorithm
Based on the study conducted and insights on the algorithm run, following possible applications could be explores:

Cybersecurity Solutions
A dynamically generated incomplete latin square or sudoku can serve as a strong authentication means for accessing extremely confidential information. Traditional username password-based authentication will have vulnerabilities that could be overcome by this means.

Modelling Decision-making Problems
Real-life decision-making scenarios especially in complex projects involve multiple factors, after exploring the algorithm from various perspectives, it is not far that we are able to create algorithms [3,8], that are able to model decision making and factors governing it.

Constraint Modelling for Complex Scenarios
High backtracking or failure prone projects or issues, where constraints are also dynamic. We can use Latin squares or rectangles to model or approximate the constraints themselves [8,7].

Cybersecurity
Firewalls, antiviruses, token-based authentication mechanisms, make use of logic that could be manipulated and tampered. We thus need to integrate a mechanism that is feedback and input driven. The inputs to the mechanism being threat pattern or signature, this will also be used to feed a random number generator which feeds incomplete latin squares to our algorithm. A logic block that deliberately introduces a mismatch and fails an attempted authentication by the malicious software or virus program. The detailed logic for the system, needs to be developed in phases and calls for indigenously developing each of the logical blocks as shown below.
Flowchart of the proposed cybersecurity solution: Figure 17. Cybersecurity solution design based on solver algorithm.

Characterizing Material Properties
The published article [9] talks about use of IFEA, characterizing methods based on conventional tests such as uniaxial tensile and compression tests. A closer thought to the process, we can see how the concept of an incomplete Latin square can find suitable use here. An incomplete Latin square can be shown analogous to an incompletely characterized material. Based on knowledge of permissible limits for properties such as uniaxial tensile strength, compression strength or shear strength we can model a random number generator, which can be run inside of the solver algorithm to completely decode the missing property values.

Design of Experiments for Fractional Factorial
Designs with Blocking Uses Latin squares for modelling scenarios when there are n distinct treatment types applied to n different subjects. Here balanced incomplete blocks [10] are used in situations where number of factors are less than number of treatment levels. Given that these form basis of designing incomplete blocks.
Statistical model defining response variable yijk, seen as an effect of other variables such as (Overall average), Ri, column Cj, kth treatment Tk and error ijk [10].
We are able to use the above model also to evaluate the incomplete blocks, in conjunction with the Latin square solver algorithm. This is applicable in situations where all treatments are distinct, number of subjects are definitely lower than treatments.
Error modelling is cumbersome, but reduced factorials or blocking effects when known can be efficiently solved using our algorithm.
Thus, we reach a point where the concept of purely deterministic or purely statistical approaches to model completely fail, calling in for new approaches to tackle challenges in all the mentioned problem areas.
Minimizing the above function given constraints, one of which is the Latin square that forms the design block.
One of the real-life problems that is intriguing and similar in model to factorial design, being discussed here is that of identifying factors that negatively impact performance of an indigenously developed application with multiple features. Here we can liken application features to elements or attributes in a grid, treatments as the distinct combination of test scenario settings that application is being subjected to.
For example, an application where multiple workflows are running concurrently, the number of factors causing variation are large, in most cases nested in structure too. Very often, the actual design block would often have missing elements, calling out more tests including these elements. In order to not run extra tests, and arrive at actual expected projection estimates, we need to use fractional factorials.
Decision modelling using design spaces, can make use of the solver to fill up missing parts of the decision model or outcome.
Even with use of non-parametric smoothing processes, for improving model accuracy, we need to make sure we reduce the space of unknowns, are left with most part of our experimental design known [10].
We can use the following excerpt showing a nested random number generator run to generate rows of varying sizes:

Conclusions
Applications of the generic Latin square or the solver algorithm could range from cybersecurity to frontiers in material science and material characterization. The possibilities are limitless, with the advent of modern AI/ML the algorithms can be further expanded and enriched to include multiple possibilities.