A Dialog on APL | Dyalog Blog
-->
Home
Business
How We Can Help
Products
Services
Prices and Licences
Resellers
Application Development Partners
Reference Customers
Case Studies
Learning
Getting Started
Mastering Dyalog APL
Training
Dyalog in Education
TryAPL
2016 Year Game
Student Competition
Community
Sharing Ideas
Dyalog Forum
GitHub
User Meetings
User Groups
Blog
APL Wiki
Resources
Resource Map
Documentation
Tools and Interfaces
Download Dyalog
Support
Videos
Tutorials and Samples
Fonts and Keyboards
External Publications
Dyalog Papers
News
News
Event Calendar
Newsletters
About Us
Our Mission
Our Values
Meet Team Dyalog
The Dyalog Duck
Careers
Corporate
History
Contact Us
Search
Products
Dyalog Services
Prices and Licences
Support (DSS)
Download Dyalog – Free
Dyalog '24
A discussion between Nicolas Delcros and Roger Hui
Nicolas, Prologue : From a language point of view, thanks to Ken Iverson, it is obvious that you want grade rather than sort as a primitive. Yet from a performance point of view, sort is currently faster than grade.
Can one be "more fundamental" than the other? If so, who’s wrong…APL or our CPUs? In any case, what does "fundamental" mean?
Roger : Sorting is faster than grading due to reasons presented in the J Wiki essay Sorting versus Grading, in the paragraph which begins "Is sorting necessarily faster than grading?" I can not prove it in the mathematical sense, but I believe that to be the case on any CPU when the items to be sorted/graded are machine units.
Nicolas, Formulation 1: Now, parsing ⍵[⍋⍵] (and not scanning the idiom) begs the question of how deeply an APL intepreter can "understand" what it’s doing to arrays.
How would an APL compiler resolve this conjunction in the parse tree? Do you simply have a bunch of state pointers such as "is the grade of" or "is sorted" or "is squozen" or "axis ordering" etc. walking along the tree? If so, do we have an idea of the number of such state pointers required to exhaustively describe what the APL language can do to arrays? If not, is there something more clever out there?
Roger: I don’t know of any general principles that can tell you what things can be faster. I do have two lists, one for J and another for Dyalog. A big part of the lists consists of compositions of functions, composition in the mathematical sense, that can be faster than doing the functions one after the other if you "recognize" what the composed function is doing and write a native implementation of it. Sort vs. grade is one example (sort is indexing composed with grade). Another one is (⍳∘1 >) or (1 ⍳⍨ >). The function is "find the first 1" composed with >. These compositions have native implementations and:
x←?1e6⍴1e6
cmpx '5e5(1 ⍳⍨ >)x' '(5e5>x)⍳1'<br>5e5(1 ⍳⍨ >)x → 0.00E0 | 0%<br>(5e5>x)⍳1 → 1.06E¯2 | +272100% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
cmpx '¯1(1 ⍳⍨ >)x' '(¯1>x)⍳1'<br>¯1(1 ⍳⍨ >)x → 2.41E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕<br>(¯1>x)⍳1 → 4.15E¯3 | +71% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
If you get a "hit" near the beginning, as would be the case with 5e5, you win big. Even if you have to go to the end (as with ¯1), you still save the cost of explicitly generating the Boolean vector and then scanning it to the end.
Another one, introduced in 14.1, is:
cmpx '(≢∪)x' '≢∪x'<br>(≢∪)x → 4.43E¯3 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕<br>≢∪x → 1.14E¯2 | +157% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
This is the tally nub composition, used often in a customer application. If you "know" that you just want the tally of the nub (uniques), you don’t actually have to materialise the array for the nub.
I am not conversant with compiler technology so I don’t know what all an APL compiler can do. I do know that there’s a thing call "loop fusion" where, for example, in a+b×c÷2, it doesn’t have to go through a,b,c in separate loops, but can instead do
:for i :in ⍳≢a ⋄ z[i]←a[i]+b[i]×c[i]÷2 ⋄ :endif
saving on the array temp on every step. You win some with this, but I think the function composition approach wins bigger. On the other hand, I don’t know that there is a general technique for function composition. I mean, what general statements can you make about what things can be faster (AKA algebraically simpler)?
Nicholas: I sort of see…so jot is a "direct" conjunction. An indirect conjunction could be ⍵[⍺⍴⍋⍵] where the intermediate grade is reshaped. We "know" that grade and shape are "orthogonal" and can rewrite the formula to ⍴⍵[⍋⍵].
So if we can establish a list of flags, and establish how primitives touch these flags, and how these flags affect each other, then we can extend ∘ to any path of the parse tree, provided that the intermediate nodes don’t destroy the relationship between the two operations (here grade + indexing provides sort, independently of the reshape).
Of course we can spend our lives finding such tricks. Or we could try and systemise it.
Roger: What you are saying above is that reshape and indexing commute (i.e. reshape ∘...