How do I program for other-than-English?
Introduction. Explain stuff.
Lots of description and references.
HTML+
[Is this just a glossary?]
- multi-byte
- UTF-8 Universal Transformation Format
- ISO 8859
- ISO 8859-1
- ISO2022-JP
- wide
- ISO 10646. Unlikely ever to become common; apparently destined
for special purposes.
- Unicode
- Shift-JIS
- JIS 208
- localization (ANSI ...--refer to standards)
- ISO 639:1988 "Code for the representation of names
of languages" See also ISO CD 11639
- multi-lingual
- multilocalized
- [reflexive]
Desirable characteristics:
- high-level
- actively supported ("keep up with times"--different meaning
for Forthians and CLOSites vs. commercial ...)
- good for prototyping
Ada
With
Ada95,
wide characters have become part of the
standard
for Ada. Ada does not have
a good reputation for prototyping.
CLOS
We describe CLOS with other Lisps.
Dylan
Forth
The new ANSI Forth Standard does not specify the size of
a "character" precisely in order to allow future changes.
Forth is good for prototyping, and actively supported.
Functional languages
Typically allow user redefinition of character ... Haskell,
ML, Gofer, ...
There are people working to make a notion
of "Character Set Profile" part of the standard for ML.
Icon
Lisp
Good for prototyping. Different opinions about level of support.
Common Lisp
Common Lisp is monstrous. Lucid's Common Lisp supports
double-byte characters. Kyoto Common Lisp is public
domain; does it support kanji?
CLOS is reflexive.
Self?
NewtonScript
Unicode based.
Perl
Larry Well et al. want to support all characters, but they want
even more not to implement such support incorrectly. They continue
to study.
PL/I
PL/I is said to support double-wide characters.
Python
Eight-bit clean. Waiting for compiler support [explain].
PostScript
Who knows?
Scheme
We describe Scheme with other Lisps.
ScriptX
See Kaleida.
SmallTalk
Well-supported, industrial strength. Reflexive. Supports
international characters.
Quasar Knowledge Systems
supply SmalltalkAgents which support Unicode. They ship now
on Macintosh, with promises of Microsoft Windows and Unix for
the future.
Telescript
General Magic is secretive, from the perspective of us individual
developers.
TCL
Eight-bit clean.
Describe.
Linux
Oberon
Plan-9
Plan-9 is Unicode-only, with
UTF-8 encoding
in the filesystem and 16-bit "runes" in the programs.
Taligent
Describe.
Windows NT
[Explain.]
KenD@apple.com (Ken Dickey)
Charles Fiterman
mfx@cs.tu-berlin.de (Markus Freericks)
patrick_d_logan@ccm.jf.intel.com (Patrick D. Logan)
mcdonald@kestrel.edu (Jim McDonald)
"Steven D. Majewski"
lmaturo@ix.netcom.com (Lawrence Maturo)
torbenm@diku.dk (Torben AEgidius Mogensen)
jvn@fermi.clas.Virginia.EDU (Julian V. Noble)
rommel@hermes.bc.edu (Martin Rommel)
cschell@world.std.com (Kate M Schell)
schwartz@galapagos.cse.psu.edu (Scott Schwartz)
sparre@meyer.fys.ku.dk (Jacob Sparre Andersen)
Cameron
Laird's survey of computing environments for managing
other-than-English texts/claird@phaseit.net