next message in archive
next message in thread
previous message in archive
previous message in thread
Index of Subjects
Index of Subjects On Sun, 18 May 1997, Edward Dyer wrote: > Hello Norman! [snip] > The problem is this: There are two phases in retrieving a web document > (i.e. anything that goes through http:) >=20 > For a web document, the &#NNN; is supposed to be interpreted according to > either ISO-8859-1 (ISO Latin-1) or in the proposed revision to use UniCod= e > (which includes ISO 8859-1 as a subset in the first 256 characters) by th= e > browser (Lynx if you dial in to CCN, your choice if you go by other > routes.) >=20 > But Lynx uses the setting "Display character set" to decide which 8-bit > code to send to you to display the e-ague character. In order for that t= o > work two things are required: your communications software must be able t= o > map that code to the appropriate glyph on your screen, which actually > means the comm program must map the received code to a display font, and > the display font must include the correct glyph at the correct code. >=20 > On the other hand, I hypothesise that if Lynx is not viewing web pages, =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > i.e. using file: access, the first stage of the interpretation does not =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > occur, in the same way: Lynx displays the file according to your default =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > character set, which is by default USASCII, and the character that is sen= t =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > to you has a different 8-bit value. This is probably a Lynx BUG, the way > we use it, but one might get some discussion on that. Technically, it is > probably an undefined behaviour.=20 Translation of a web page when viewing it or when printing it to a file=20 seems to be taking place whether it is a local file or a remote one. > THE WORKAROUND: set the default character set to ISO 8859-1, and select > fonts that will work for you.=20 I have tried a default document type of "NONE", "ISO-8859-1", and=20 "ISO-8859-2" with the display character set of ISO Latin 1 and there=20 is no difference in behaviour. ISO Latin 1 seems to interpret characters differently depending on whether a file is local or remote regardless of=20 document type selected. Other display character sets don't have this=20 behaviour but are not fully usable with an ANSI font on my (borrowed) PC. > THE DOWNSIDE: people receiving mail from you may see a note that you are > using a different character set, unless they too are using 8859-1. >=20 > THE FIX: make Lynx interpret entities in local HTML files using ISO > 8859-1/Unicode, since we use local references as a shortcut to what are > effectively web documents. Does this have any downside? need to > distinguish between HTML and other document types and views, especially > binary, and source, but I think only HTML and equivalent (htm, html.fr) > are interpreted anyway. (Aside: some browsers now support up to 5 digit > numbers in &#NNNNN; to do Unicode - does Lynx?) >=20 > Discussion on the merits to CSuite-Dev@chebucto.ns.ca, please. >=20 > Ed Dyer aa146@chebucto.ns.ca (902) H 826-7496 CCN Assistant Postmaste= r > http://www.chebucto.ns.ca/~aa146/ W 426-4894 CSuite Technical Worksho= p > Religion Page Editor, Chebucto Community Network http://www.chebucto.ns.c= a [major lexographic defoliation here] Below my signature block are samples of a test file saved to my CCN=20 directory with the "p" (Print) command of lynx with different display =20 character set selections, for both local access and http access. Note=20 that ISO Latin 1 is the only character set that generates different=20 results depending on local or http access. The test file HTML code is=20 also included. Feel free to use it yourself. Any attempt on my part to=20 enforce a copyright on it would be mean, nasty, and ridiculous. =09=09Norman De Forest =09=09af380@chebucto.ns.ca =09=09http://www.chebucto.ns.ca/~af380/Profile.html =09=09(A Speech Friendly Site) ......................................................................... Q. Which is the greater problem in the world today, ignorance or apathy? A. I don't know and I couldn't care less. ......................................................................... High character set in compact form. HTML Source Code: <html> <head> <title> High character set in compact form. </title> </head> <body> <h1> High character set in compact form. </h1>   ¡ ¢ £ ¤ ¥ ¦ §=20 ¨ © ª « ¬ ­ ® ¯=20 <BR> <BR> ° ± ² ³ ´ µ ¶ ·=20 ¸ ¹ º » ¼ ½ ¾ ¿=20 <BR> <BR> À Á Â Ã Ä Å Æ Ç=20 È É Ê Ë Ì Í Î Ï=20 <BR> <BR> Ð Ñ Ò Ó Ô Õ Ö ×=20 Ø Ù Ú Û Ü Ý Þ ß=20 <BR> <BR> à á â ã ä å æ ç=20 è é ê ë ì í î ï=20 <BR> <BR> ð ñ ò ó ô õ ö ÷=20 ø ù ú û ü ý þ ÿ=20 <BR> <BR> </body> </html> Below is the above file as printed to my Chebucto directory with the lynx "p" command when I am viewing it with different display options. It looks like the display option affects the printed output whether the file is local or remote. Also, the display ISO Latin 1 character set is the only one that displays and prints differently depending on whether the file is local or remote. With that display option, I also tried three different=20 document character set selections, with the same results. Note: below, the headings, "High character set in compact form.", and some trailing spaces removed for space reasons. No other changes. Addresses used are: <ol> <LI><a href=3D"file://localhost/ccn/home/80/af380/public_html/charset.html"= > charset.html -- local link</a> <LI>&