Arabic on the Internet: History of Arabic on Computers

This page provides a historical overview on how computers handled Arabic in the past, and how it is handled at present.

Arabization in the computer world was initially a hardware vendor activity. Vendors either had in house resources (Systems Engineers), or subcontracted others to do the job for them.

Some vendors established centers of expertise to develop Arabic, such as the NCR-ALIF (Arabic Language Integration Facility) in Nicosia, Cyprus then in Cairo, Egypt. IBM and ICL had similar facilities.

Each major vendor had several standards as improvements were gradually made.

7-bit In Lower Case

Initially, Arabization was very rudimentary. The lowercaseEnglish characters were replaced by the Arabic characters, and werestored as 7-Bits. Lower case English was not used much in computers in the 1960s and 1970s, until UNIX came to being. Some printers were not even capable of printing lower case English letters. This made the replacement of lower case somewhat acceptable in those days. Texas Instruments minicomputers used this scheme for the Arabization in Egypt in the early 1980s.

Limited Character Set

8-Bit character sets were then used. These used the upper part of the ASCII table (128 decimal/80 Hexadecimal and above).

Initially, this was somewhat primitive, with just one representation for the several different shapes an Arabic characters could have. For example, Thesecond letter in the Arabic alphabet, the ب BA character and its like (TA ت, THA ث) were only represented as the beginning ofword shape, which also functioned for the end of word shape as well.

An example of this era is NCR-64. It provided limited shaping, the above three letters had only one variant, instead of havingseparate shapes for beginning, middle and end of word. The 'Ein ع and Ghein غ letters only had two variants, not the usual four variants for them.

Moreover, thischaracter set catered for Farsi (Persian) as well, and had the 4 extraletters for Gaf, Pa, Ja, Tch sounds. It also had a sorting problem: for someunknown reason, it had the Waw و before the Ha' ه too!

Enhanced Character Set

As improvements were done, richer character sets came into being, which had more shapes, and thus wasmore visually acceptable.

An example is NCR-96. Storage was still in 7-bit most of the time,and terminals used SI/SO (control characters Shift In, Shift Out) to switch languages.Auto shaping was done in the ROM of terminal, which was a step forward.For printing, applications had to call special system routines to do the automatic shaping (called Context Analysis at NCR).

Still, every vendor had their own character sets, and interchanging data was quite a chore, just like the pre-ASCII days in the 1950s and 1960s in the West.

IBM EBCDIC and its idiosyncracies

IBM, since they use EBCDIC had a peculiar character set, where Arabicand English were interspersed. The detection of whether the 8th bit wason or off was not a guarantee that this is an Arabic letter. Sortingwas problematic too.

Standardized Character Sets

Finally, the era of standards arrived. These were cross-vendor standards accepted by the industry, and set by standard bodies.

ASMO 449

One of the first vendor independant standards was ASMO 449 (Arabic Standardsand Measurements Organization). However, it was still 7-bit, and requiredescape characters (normally the plain text braces { and }, which was kind of odd to use as escape codes on the terminal).

ASMO 708

By the time ASMO-708 was introduced, things were getting better. Thiswas a true transparent 8-bit vendor-independant standard.

Terminals and Terminal Emulatorswere now sophisticated enough to do the shaping and work in 8-bits, distinguishing between Latin and Arabic automatically by whether the 8th bit is set or not, as well as obeying certain escape sequences to shift the keyboard language/direction as well as the screen language and direction.

Printing was donein the firmware of the printers (e.g. from Alis). Where there were no Alisprinters, the UNIX spooler did the work through a custom filter written in C that does the context analysis.

ASMO 708 was adopted by the Internatinal Standards Organization as the ISO 8859-6 standard.

Still in some cases, vendors did silly things: NCR and ICL each had theirown ASMO-708 derivative. For example, in Egypt customers insisted that theLam-Alef لا is one character, and not stored as a separate Lam ل and Alef ا ! Alsothe Ya' ي at the end of a word, with two dots under it was virtually unknownin Egypt, and the version without the dots (actually and Alef sound) was used instead. This led to NCR-ASMO and such vendor variants.

Other approaches

ICL DRS-80 system

Even in the mid 1980s, some companies did get Arabization right, such as ICL (later bought by Fujitsu). They had this DRS-80 system which had totally seamless Arabization. One could enter Arabic right in the programcode using the editor, be it COBOL or BASIC, a feat which was uncommonin other systems in those days, where development environments, editors and other programmers' tools could not handle 8-bit input.

This system also had other features, such as calling programs from one language to another languages, such as COBOL and BASIC, and the ability to recover from power failures via a dump facility.

The PC Revolution

Microsoft Code Pages

The advent of the PC in the early 1980s also influenced how Arabization(and other internationalization) is to be done.

Several code pages weredeveloped by Microsoft, including CP 720 for Arabic DOS. Later, when Windows arrived, anothercode page was developed, Windows CP 1256.

Unicode

Unicode finally arrived. It was an international standard, and neither vendor specific nor developed by Arabic standards organizations.

The scheme required 2 bytes to represent every character on earth, from Arabic to Chinese.

UTF-8

Due to the limitations of Unicode requiring 2-bytes, some scientists developed a scheme called UTF-8, which is basically Unicode that allows for existing ASCII to be encoded injust one byte (8-bits).

You can read How UTF-8 was born, by Rob Pike.

Also, Joel Spolsky has an article on the use of Unicode in software from a developer's perspective. Remember that this is just a standard for encoding, and not presentation. Therefore, it is not Arabic specific, nor does not address the problems with direction, but it is a standard nonetheless, and hence very useful in general.

Still other developers have proposed other encoding systems.

Contents: