Digital Printing of Arabic: explaining the problem – The Digital Orientalist
Skip to content
Menu
Representing Arabic script on a computer has been problematic from the very beginning. In this series, we shall explore some of the solutions offered. First, we need to understand what is so problematic about Arabic script in a digital world.
Arabic was developed a millennium and a half ago, and found its first expression through rock inscription and pen and ink. Pen and ink became the de facto mode of expression and the Arabic script developed in this medium. Most of the more technical aspects of Arabic script were developed as a response to a need to record the Koran as precisely as possible, e.g. elongations, assimilations, points at which to stop and at which to keep reciting.
Printing Arabic
Fast forward to the printing era. Lithography went along well with Arabic script because its technology was based on the principle on preparing an entire page as though a stamp, and then stamping it on as many pieces of paper as one wished. But printing became much more popular using the technology called movable type. This technology constructed a page not as one stamp, but as hundreds or thousands of tiny stamps, arranged in rows, each stamp representing one letter or reading sign. It is called movable because you can move around each stamp, and reuse the stamps for every page instead of having one unique stamp for each page.
© Willi Heidelbach, CC-BY-SA-3.0, via Wikimedia Commons<br>Latin script lends itself quite easily to this type of printing. Movable type for Arabic was developed similarly, on a per letter basis. Arabic, however, is not written on a per letter basis, but on a per letter block basis. For example, the word المعروف is written not with five letters but with four letter blocks, the first is ا, the second is لمعر, the third is و, and the fourth is ف. Especially the second letter block knows special ligatures, with the lām on top of the mīm, and the mīm presented as a tick to the right.
Take this zoomed in excerpt from an early print from Brill Publishers, circa 1890:
We see that each letter has its own stamp. Notably, Brill actually accommodated the normal ligature for mīm and ḥa, as one stamp. Nevertheless, the approach taken here makes it very inflexible. Moreover, the open spaces provide for an uneven reading experience, in which it is sometimes hard to distinguish where a word ends. The latter problem has been solved; we do not see the spaces in between the letters in more recent printed works. The inflexibility persisted and has arguably only become more aggravated in the 20th century with most publishers using less ligatures.
Arabic on computers
In short, as we will discuss throughout this series of posts, these flaws of movable type transferred to the digital representation of Arabic. One way to look at the persistence of this problem is that these technologies, printing press and computers, were developed in Latin script based societies and thus took that use case as a matter of fact. The rest of the world, with its many varying complicated scripts, had to bend to these rules. It seems that even though Arabic is the sixth language of the world, there was little economic incentive to fundamentally solve these issues.
In fact, the digital environment aggravated the problem even further. Taking the movable type philosophy of dividing text into letters, computers have had a significant problem in representing Arabic as connected letters. Take the following example:
Some poor soul decided to translate "what doesn’t kill you makes you stronger" into Arabic, type it out on a computer, print it and give it to a tattoo artist to have it as a tattoo. Where letters should have been connected to look like ما لا يقتلك يجعلك أقوى, all letters are represented individually. In a horribly simple typeface as well, I might add. (this is Arabic tattoos done right)
Further, computers have had a terribly hard time figuring out that some cultures write from right to left instead of left to right. Thus, when typing, the letters could turn out in the exact opposite order. Here is an example from Baltimore-Washington Airport:
Arabic fail at BWI security lane (wrong direction and letters not connected) pic.twitter.com/OU3BboEvTW
— Pinboard (@Pinboard) April 11, 2017
Or this:
Absolute gibberish. Very shoddy from @BarbicanCentre for this Arabic poster. Disjointed, unreadable, left to right. pic.twitter.com/Ldel7fyUxt
— Joseph Willits (@josephwillits) June 1, 2015
Lastly, computers have seemingly trouble with encoding Arabic. Encoding means that the visible representation on the screen has a string of 0s and 1s that a computer can actually store, that this string is stable so that a visible representation can be taken and repurposed somewhere else and indeed be the same. Examples will make this clear.
In the image above, we can see that if we select the word hādhihi, copy it, and then paste it into the...