Saturday, March 06, 2004

Linux Your Way

One of the most important features of Free Software in general and Linux in particular is that anybody, including user groups, governments, educational institutions, commercial software companies, or NGOs, can make changes and distribute the new version. This is particularly important for language communities that commercial vendors ignore, on the theory that the market for users of the language is too small to give a good return on the investment needed. The Linux community looks at these things differently. Localizing Linux into a language is profitable for the language community, and doesn't have to be profitable for a vendor to be worthwhile and to attract the needed effort.

The most notorious case was Icelandic. Although the government of Iceland offered to pay the costs for localizing Windows 98 into Icelandic, Microsoft refused.

"Spokeswoman Erin Brewer notes that while the company has translated the popular program into 'at least 30 languages,' including such rarities as Slovenian and Catalan, it won't be doing Icelandic. 'We are not localizing Windows 98 into Icelandic due to the size of the market,' she said."

Of course, it was possible to use a non-Icelandic version of Windows and MS Office to create, view, and print documents in Icelandic and at least 100 other languages, most not supported by Microsoft. With specialized software from the linguists at SIL.org it is possible to work in about 2,000 languages on English Windows. But that only works for those who have learned English or some other language supported in the user interface.

Since then, Microsoft has decided that Iceland, with only 230,000 inhabitants, is big enough, and recent versions of Windows are available in Icelandic. But not in several languages of India with tens of millions of speakers, not in Swahili, the lingua franca of much of Africa, and not in dozens of other languages vital to whole countries and regions.

On the other hand, Ankur, a Linux User Group in Bengal State, India, has released Bengalinux, a Linux distribution entirely localized into Bangla (Bengali). A group in Rwanda is localizing Open Office into their language, Kinyarwanda, while university students in Tanzania are working on a Swahili version, and so on for a number of other languages.

In many cases, only a portion of Linux has been localized in the current release, and more remains to be done. This is the case for a Berber-language distribution from France for populations mainly in several North African countries, on a base of Mandrake Linux.

Mandrake has had significantly better language support than other commercial Linux distributions, with versions in 63 languages in varying degrees, and itself is headquartered in France. Some of the notable localizations of Mandrake Linux are in Albanian, Armenian, Azerbaijani, Basque, Esperanto, Estonian, Georgian, Icelandic, Irish Gaelic, Kurdish, Latvian, Lithuanian, Macedonian, Mongolian, Malay, Maltese, Romanian, Slovak, Slovenian, Tajik, Uzbek, Walloon French, and Welsh.

Some projects turn the problem around. Instead of localizing all of Linux into one language, they work on localizing a particular function into a number of languages. A notable example is MailAfrica, which plans to make e-mail practical in 257 languages of Africa, out of about out of about 2,000. It currently has: Afrikaans, Dholuo, Hausa, Kalenjin, Kamba, Kikuyu, Kisii, Luhya, Oromoo, SiSwati, Swahili, IsiXhosa, Yoruba, and IsiZulu, as well as English, French, and Portuguese.

It is not only small, poor countries and minority groups without Windows language support that are creating their own distributions of Linux. China, Korea, and Japan have gotten together on a plan to create a distribution with good support for all three languages, all of which use Chinese characters in their writing. This goes well beyond existing distributions, such as Chinese 2000 Linux, (中文 2000) which supports only Chinese, and others for Korea or Japan only.

The first obstacle in localizing Linux to a new language is usually creating the computer vocabulary. Once this is done, a few hundred people can translate a major piece of software, such as Open Office, with more than 21,000 text strings, in a few days to produce a version suitable for Beta testing. Some refinement of the translations is usually needed so that they are not only linguistically correct but culturally appropriate, and to deal with ambiguities in the original. Translating a complete Linux distribution takes proportionally longer, but is well within the scope of any university's Computer Science, Engineering, and Linguistics departments.

Complete language support goes far beyond localization. It means not only the ability to create, view, and print data in a language, within localized software. It also should include OCR, text-to-speech conversion, handwriting recognition, speech recognition, a spelling checker, and a grammar checker. There is Free Software for most of these functions, adaptable to any language and writing system, and people are working on the rest. For example, the Dhvani text-to-speech software project is hosted at SourceForge, where anybody can join in the effort, either to work on the software or to apply it to a particular language. The handwriting recognition system used on the Simputer uses text files to define character geometry. The format is public; in fact it is explained in the files. Again anybody can write a file for a particular writing system, or a variant of a writing system for a particular language, adding the extra Cyrillic letters for Ukrainian, or the extra Arabic letters for Urdu, or the extra Latin letters for the Pan-African Alphabet. Speech recognition and grammar checking require specific linguistic expertise, of course, but there is probably adequate information on record to handle the thousand most-used languages in the world, and a good start on the other 5,000 odd.

Language support is essential for helping the poor in general, and in particular for recording the oral traditions of more than 6,000 cultures that have not had much access to printing and publishing in the past.

The Free Software Foundation and UNESCO have put together a directory of Free Software, including a page of language and localization tools.

As far as I can tell, nobody is specifically tracking all of the Linux localization projects, although many of them are listed at DistroWatch. Here are some more.

South Africa, the 11 official national languages Afrikaans, English, Ndebele (isiNdebele), Northern Sotho (Sepedi), Southern Sotho (Sesotho), Swati (siSwati), Tsonga (Xitsonga), Tswana (Setswana), Venda (Tshivena), Xhosa (isiXhosa), Zulu (isiZulu): IMPI Linux
Nigeria, Yoruba: Paradigm Initiative Nigeria
North Africa and Middle East, Arabic: Arabbix
North Africa and Middle East, Arabic: Hancom Linux
North Africa and Middle East, Arabic: Haydar Linux

Turkey, Turkish: Gelecek Linux (site in Turkish)
Israel, Hebrew: GNU/Linux Kinneret
Iran, Farsi: Shabdix GNU/Linux (In Farsi)

India, Hindi, Marathi, Bengali, Gujarati, Punjabi, Oriya, Kannada, Malayalam, Tamil, Telugu: the Indian Linux Project
India, Hindi, Marathi, Sanskrit, Assamese, Bengali, Gujarati, Punjabi, Oriya, Kannada, Malayalam, Tamil, Telugu: IndiX II
India, various: Free Software i18n and l10n Projects
India, Punjabi: PunLinux
China, Chinese: Cosix Linux (site in Chinese), Lineox Enterprise Linux, Linpus Linux, Magic Linux (site in Chinese), Red Flag Linux, ThizLinux
Xteam Linux (site in Chinese)
Korea, Korean: Hancom Linux, NuxOne Linux (site in Korean), WOWLinux (site in Korean)
Japan, Japanese: Happy MacLinux for 68000 and PPC Macintosh,
Holon Linux for X86 and PPC (site in Japanese), Linux Media Lab Distribution (site in Japanese), ARMA aka Omoikane GNU/Linux (site in Japanese), Vine Linux, Laos, Laotian: Laonux; Jhai IT
Vietnam, Vietnamese: VietKey Linux (site in Vietnamese), vnlinux (site in Vietnamese)
Thailand, Thai: Burapha Linux, Linux TLE (site in Thai), Phayoune Linux
Malaysia, Malay: PIKOM people's computer
Mongolia, Mongolian: Soyombo Mongolian Linux

Philippines, Tagalog: Bayanihan Linux 3.0
Indonesian, Bahasa Indonesia: WinBi, Bijax (Bina Nusantara Bluejackets Linux), Trustix Merdeka

Russian Federation, Russian: ASP Linux, Linux XP Professional Edition (site in Russian)
Ukraine, Ukrainian:ASP Linux
Bulgaria, Bulgarian: Tilix Linux (site in Bulgarian)
Languages written in Cyrillic: Blin Linux (site in Russian)
Spain, Aragonese: Augustux
Spain, Catalan: Biadix (site in Catalan)
Latvia, Latvian: LIIS Linux (site in Latvian)
Nordic and Baltic languages, Danish, Estonian, Finnish, Faroese, Icelandic, Latvian, Lithuanian, Northern sami, Norwegian bokmål, Norwegian nynorsk, Swedish and US English: NordisKnoppix
Slovenia, Slovenian: Slix (site in Slovenian)
Greece, Greek: Zeus Linux

Multilingual Braille and speech: BrlSpeak
Audio for the blind: Oralux

This list is certainly not complete. It leaves out large numbers of projects to enable text entry, viewing and printing in various languages without localizing the user interface, and many more that aim to localize some subset of Linux or a specific application, but not a complete distribution.

We obviously need a lot more such projects. This ties in with the idea of getting Linux into universities, and getting them and the Linux User Groups to do the work, thus providing their members with employment opportunities and experience.

Comments: Post a Comment

This page is powered by Blogger. Isn't yours?