One Problem, Four Languages, Two Paradigms - Asher's Blog
Keyboard shortcuts
Press ← or → to navigate between chapters
Press S or / to search in the book
Press ? to show this help
Press Esc to hide this help
Auto
Light
Rust
Coal
Navy
Ayu
Asher's Blog
One Problem, Four Languages, Two Paradigms
2021-10-19
Solving a leetcode problem in four programming languages using two acceleration paradigms!
NOTE: This post is a transcript of the youtube video linked here.
In addition to that, we’ll be using various combinations of two programming paradigms common in distributed computing: using a GPU to perform some calculations and MPI to distribute our calculation among multiple processes, potentially on multiple machines.
We’ll be looking at this leetcode problem, which is to determine if a 9 x 9 Sudoku board is valid, but not necessarily solvable.<br>Each row, column, and subbox of the grid must have the digits 1-9.
Let’s jump right in to our BQN solution.
Content
One Problem, Four Languages, Two Paradigms
Content
BQN
Approach
Python
Python And MPI
C++
C++ And MPI
C++ And CUDA
C++ And CUDA And MPI
Fortran
Fortran And MPI
Conclusion
YouTube Description
BQN
This is what our sudoku boards will look like:
# two 8s in the first block<br>bad ← ⟨8, 3, 0, 0, 7, 0, 0, 0, 0<br>6, 0, 0, 1, 9, 5, 0, 0, 0<br>0, 9, 8, 0, 0, 0, 0, 6, 0<br>8, 0, 0, 0, 6, 0, 0, 0, 3<br>4, 0, 0, 8, 0, 3, 0, 0, 1<br>7, 0, 0, 0, 2, 0, 0, 0, 6<br>0, 6, 0, 0, 0, 0, 2, 8, 0<br>0, 0, 0, 4, 1, 9, 0, 0, 5<br>0, 0, 0, 0, 8, 0, 0, 7, 9⟩
# valid sudoku<br>good ← ⟨5, 3, 0, 0, 7, 0, 0, 0, 0<br>6, 0, 0, 1, 9, 5, 0, 0, 0<br>0, 9, 8, 0, 0, 0, 0, 6, 0<br>8, 0, 0, 0, 6, 0, 0, 0, 3<br>4, 0, 0, 8, 0, 3, 0, 0, 1<br>7, 0, 0, 0, 2, 0, 0, 0, 6<br>0, 6, 0, 0, 0, 0, 2, 8, 0<br>0, 0, 0, 4, 1, 9, 0, 0, 5<br>0, 0, 0, 0, 8, 0, 0, 7, 9⟩
And here is our full solution.<br>This solution will be the basis for all of our later solutions.
F ← {𝕊𝕩:<br>Fl ← 0⊸≠⊸/ # Filter 0s out<br>Dup ← (∨´∾´)¬∘∊¨ # Are there any duplicates?
rs ← Dup Fl¨(9/↕9)⊔𝕩 # Check rows<br>cs ← Dup Fl¨(81⥊↕9)⊔𝕩 # Check columns
bi ← 27⥊3/↕3<br>bs ← Dup Fl¨(bi∾(3+bi)∾(6+bi))⊔𝕩 # Check blocks
(bs ∨ rs ∨ cs)⊑"true"‿"false"
This first line is a function to filter out any 0s:
Fl ← 0⊸≠⊸/<br>Fl ⟨5, 3, 0, 0, 7, 0, 0, 0, 0⟩<br>⟨ 5 3 7 ⟩
Here we have another utility function to return an integer indicating whether any duplicates were found in any sublists:
Dup ← (∨´∾´)¬∘∊¨<br>Dup ⟨⟨5, 3, 7⟩, ⟨1, 2, 3⟩⟩<br>Dup ⟨⟨5, 3, 7⟩, ⟨1, 2, 2⟩⟩
Next we check for duplicates in all the filtered rows and columns:
rs ← Dup Fl¨(9/↕9)⊔𝕩<br>cs ← Dup Fl¨(81⥊↕9)⊔𝕩
These ranges are used to create indices for grouping the values in X.<br>I’ll show a trimmed down version of their output here to give you an idea:
3‿3⥊(3/↕3) # For the rows<br>┌─<br>╵ 0 0 0<br>1 1 1<br>2 2 2<br>3‿3⥊(9⥊↕3) # For the columns<br>┌─<br>╵ 0 1 2<br>0 1 2<br>0 1 2
Next I do something similar to get the indices for the boxes.
bi ← 27⥊3/↕3<br>3‿9⥊bi<br>┌─<br>╵ 0 0 0 1 1 1 2 2 2<br>0 0 0 1 1 1 2 2 2<br>0 0 0 1 1 1 2 2 2
This creats indices for the first three boxes, and you can probably imagine how to extend this to get the indices for all the boxes. I just add three to the previous indices, and then add six, and then append them all together. Here’s the second layer:
3‿9⥊bi+3<br>┌─<br>╵ 3 3 3 4 4 4 5 5 5<br>3 3 3 4 4 4 5 5 5<br>3 3 3 4 4 4 5 5 5
And the final layer:
3‿9⥊bi+6<br>┌─<br>╵ 6 6 6 7 7 7 8 8 8<br>6 6 6 7 7 7 8 8 8<br>6 6 6 7 7 7 8 8 8
And all three layers of indices stacked on top of each other:
9‿9⥊(bi∾(3+bi)∾(6+bi))<br>┌─<br>╵ 0 0 0 1 1 1 2 2 2<br>0 0 0 1 1 1 2 2 2<br>0 0 0 1 1 1 2 2 2<br>3 3 3 4 4 4 5 5 5<br>3 3 3 4 4 4 5 5 5<br>3 3 3 4 4 4 5 5 5<br>6 6 6 7 7 7 8 8 8<br>6 6 6 7 7 7 8 8 8<br>6 6 6 7 7 7 8 8 8
Using these indices, I group all the elements of the input, and then check all of them for duplicates:
bs ← Dup Fl¨(bi∾(3+bi)∾(6+bi))⊔𝕩 # Check blocks
And in the end I check if there were duplicates in the blocks, in the rows, or in the columns, and use that to index into our strings that indicate whether our sudoku board is valid or not.
(bs ∨ rs ∨ cs)⊑"true"‿"false"
Approach
Before we move on to the Python solution, I’d like to talk about our approach to this solution in the rest of the languages, because they will all be pretty similar.
Just like in the BQN solution, we have three collections which represent the validity of the rows, another for the columns, and a third for the blocks.
Here I have a subset of a sudoku board on the bottom.
In our procedural languages, we’ll create an array thrice the size of the grid to hold these values.
Note that this is not as space (or time) efficient as many of the solutions that you can find on the discussion page for the leetcode problem, but it is much easier to parallelize and that’s really the point of this video.
Let’s now walk through a few steps of our algorithm starting at the second row and first column of our sudoku board, which relates to the second row of our “row matrix.”
Because we’re looking at our row matrix, we’ll take the row index in our sudoku board as the row for our row matrix, and we’ll take the value in...