Full Sail: Power User TipsBrowsing with Character.Which browser has the most character(s)?
Which type of system can display the greater number of different characters, a text-based system or a graphical system? Which can handle the greater number of different characters, Lynx or the Netscape that you get with Sympatico? The answer to the first question is obviously "a graphical system". Surprisingly enough, the answer to the second question is "Lynx". Most Windows-based graphical browsers can only handle the single Windows character set they have installed as the default character set (and handle that incorrectly). Usually that is an incomplete version of the "cp1252" or "iso-8859-1-windows-3.1-latin-1" character set (missing the 's', 'S', 'z', and 'Z' with the caron accents and missing the new Euro symbol). The Unicode numeric entities for the Windows characters may or may not be recognised properly but the entity names are usually not. The numeric HTML character entity, "…" may display as an horizontal ellipsis, '…', but even the latest browsers at the MT&T/Sympatico display will not recognise the synonym entity name, "…", '…' although lynx does and displays it as '...' if that character is not available on your machine (and lynx has been configured correctly) or uses the appropriate character if you do have it available. With graphical browsers, if an HTML numeric character entity is specified on a web page that is not in the limited character set installed then the browser just displays '?' for the character. HTML entity names get printed in full without rendering or translation. Even those characters that are in the cp1252 character set are not properly recognised if the standard HTML entity numbers or names are used for them instead of the invalid "€" to "Ÿ" which Windows-based web-design software is prone to use. Lynx, on the other hand, can determine (if it is told) which characters your system can display and can substitute an approximation. If your PC does not have the appropriate character, Lynx can:
What do you mean when you write "if it is told"?<OLD FOGEY> In the "old days" we didn't have all of those characters. We had to make do with sixty-four characters -- upper-case letters, the digits '0' to '9', and some punctuation characters. And we considered ourselves to be lucky to have them. </OLD FOGEY> As bits in computers became cheaper, most computer character sets were expanded to include lower-case letters as well as upper-case. Each computer company used its own proprietary encoding scheme. IBM's EBCDIC encoding was developed to minimize the computing needed to convert punch-card data to characters and vice versa. Other companies preferred an encoding scheme that made sorting character data easier. Everyone went their own way, making exchange of data difficult until the ASCII standard was developed. Then only IBM dared to be different, still using EBCDIC. The others used the ASCII standard which specified the encoding of 128 printable and control characters using seven bits. Then integrated circuits and microprocessors were invented. Personal computers came on the market that used eight-bit bytes. It was convenient to use all eight bits to encode characters. Most of them defined the 95 characters from 32 to 126 in their computers to display the same printable characters as was defined by the ASCII character set. Each one, however, chose to display a completely different set of characters for the rest of the possible codes. Some displayed graphical symbols and some chose to include accented letters to increase their computer's utility to non-English users. Some, like IBM, chose to use a mixture of accented characters and characters suitable for producing charts and forms with boxes and bar charts. At the time Windows was being developed, an international standard for the use of the upper 128 characters out of the possible 256 had emerged. Microsoft chose to incorporate this standard character set into Windows and then include a few additional characters in the space reserved for high control codes. Different languages have different needs for special characters so someone (I think at IBM, but I am not sure) came up with the idea of special "code pages" for the computer. While similar in the lower half of the character set, the code pages differed in which characters they provided in the upper half of the character set. An international effort has also been made to catalog and define codes for all the characters needed for various languages. This standard is called "Unicode" (text-only site is here) and the Unicode standard has become the standard for the Internet. Not all languages have been included into the standard yet but more are added with each new version. The characters for writing Cherokee are a recently approved addition and proposals for other additions include Egyptian Hieroglyphics and the fictional Klingon Alphabet. While different computers have different character sets depending on the manufacturer or the installed code pages, almost all of the commonly-used characters have found their way into the Unicode standard. It is possible, then, to determine which characters are displayable on your computer and present you with those when they appear on a web page. Those that are unsupported by your computer can be approximated. Lynx was written with all of these in mind -- at least, all of those which have defined code pages or fixed character sets that lynx has been informed about. As an example, "かわさき" ("かわさき" in Japanese hiragana text) would be displayed by lynx as "kawasaki" unless lynx was told that you were using a Japanese syllabic code page but would show up as Japanese text for any computer that supported it. Most graphical browsers, however, would default to displaying "????" if they were not specially configured to display Japanese.
Some character sets your computer could have:
Code Page 437 (the standard IBM PC character set)
Code Page 850 (the IBM PC's multilingual character set)
The Windows character set.
The Macintosh character set.
You could be using a Cyrillic font.
How do I configure lynx to get the right characters?
Configuring lynx to configure lynx.<CONFESSION> I almost left this part out. I was so used to operating with lynx in the "advanced" mode that I forgot that not all of the configuration options were available for users operating in "novice" or "intermediate" modes. Thanks to a proofreader ("Hi, Barbara!"), the oversight was noticed before press time. </CONFESSION> You have to have lynx set to "ADVANCED" user mode in order to have available the options for changing your character set configuration. If you are already operating in advanced mode, skip to "Actually changing the character set." below. If you are not (or are not sure) then: 1. Press 's' to get the Setup menu. If your User mode setting reads "User mode: [ADVANCED____]" then skip to the character set configuration. 2. If the setting reads "[NOVICE______]" or "[INTERMEDIATE]" then you have to select the User mode: field and press ENTER to get the pop-up menu for selecting your user mode. It will look something like this (but the selections may be in a different order): +--------------+ | NOVICE | | INTERMEDIATE | | ADVANCED | +--------------+ 3. Move to the "ADVANCED" choice in the pop-up window and press ENTER to select it. 4. Press '>' (shifted '.' on most PCs) to get to the bottom of the Setup form and select the "[Save Settings]" button. Press the ENTER key to save your setting. The next time you get into the Setup form you will have more options available including the ones changed below. Note: If you miss the menu and prompts at the bottom of the screen, you can always switch back to novice or intermediate mode once the character set changes below have been made. Just repeat the steps above, selecting the appropriate mode instead of "ADVANCED".
Actually changing the character set.1. Press 's' to get the Setup menu. Part of it will look like this: Character set: [ISO Latin 1.........] Assume charset if unknown : [iso-8859-1....................] (Your settings may vary but these are usually the default settings.) 2. Use your down-arrow key to move to the "Character set:" field with the default [ISO Latin 1.........] setting and press your ENTER key to get a menu of the character sets supported. There are currently forty of them:
The most likely selections that users on the Chebucto Community Net will require are:
The rest of the possible selections are rather exotic and I suspect that anyone using one of them will know it and select the appropriate one. 3. You can then scroll down to the "Assume charset if unknown :" field with its default setting of "[iso-8859-1....................]". Presumably, the selection here should match your "Character set:" setting but I have found that some settings will be changed to "iso-8859-1" when the configuration is saved no matter what you select. The safest thing to select is probably that charset selection that corresponds to your "Character set:" setting and then let lynx change it if it wants to do so. The correspondence is: Character Set: Assume charset if unknown : ISO Latin 1 iso-8859-1 ISO Latin 2 iso-8859-2 Other ISO Latin x-iso-8859-other WinLatin1 (cp1252) iso-8859-1-windows-3.1-latin-1 DEC Multinational dec-mcs Macintosh (8 bit) macintosh NeXT character set x-next KOI8-R Cyrillic koi8-r Chinese euc-cn Japanese (EUC) euc-jp Japanese (SJIS) shift_jis Korean euc-kr Taipei (Big5) big5 Vietnamese (VISCII) viscii 7 bit approximations us-ascii Transparent x-transparent IBM PC character set cp437 IBM PC codepage 850 cp850 PC Latin2 CP 852 cp852 DosCyrillic (cp866) cp866 DosArabic (cp864) cp864 DosGreek (cp737) cp737 DosGreek2 (cp869) cp869 DosHebrew (cp862) cp862 WinLatin2 (cp1250) windows-1250 WinCyrillic (cp1251) windows-1251 WinGreek (cp1253) windows-1253 WinHebrew (cp1255) windows-1255 WinArabic (cp1256) windows-1256 ISO Latin 3 iso-8859-3 ISO Latin 4 iso-8859-4 ISO 8859-5 Cyrillic iso-8859-5 ISO 8859-6 Arabic iso-8859-6 ISO 8859-7 Greek iso-8859-7 ISO 8859-8 Hebrew iso-8859-8 ISO 8859-9 (Latin 5) iso-8859-9 ISO 8859-10 iso-8859-10 UNICODE UTF 8 unicode-1-1-utf-8 RFC 1345 w/o Intro mnemonic+ascii+0 RFC 1345 Mnemonic mnemonic 4. Once you have told lynx what character set you use and what to assume then press '>' (shifted '.' on most PCs) to get to the bottom of the Setup form and select the "[Save Settings]" button. Press the ENTER key to save your settings. If you have changed your settings while viewing a web page which might be affected by the change, you may not find the change is visible yet. The page will still be displayed with the old settings. Pressing Control-R to reload the page may still use the old settings. In that case, press the double-quote character, " twice to toggle lynx's double-quote parsing away from and back to normal. Lynx will then also use your new configuration settings when the page is re-rendered. With your lynx configuration set, you may now wish to see how the upper characters 160 to 255 in the ISO-8859-1 character set are displayed on your computer. That's all for now. Tune in to our next episodes,
You may direct comments or suggestions about this column to: Norman L. De Forest, af380@chebucto.ns.ca
|