next message in archive
next message in thread
previous message in archive
Index of Subjects
Hello Norman! =09Looks fine to me! =09But you are right, in a way, it doesn't look right to you :-( Here's the long explanation: The problem is this: There are two phases in retrieving a web document (i.e. anything that goes through http:) For a web document, the &#NNN; is supposed to be interpreted according to either ISO-8859-1 (ISO Latin-1) or in the proposed revision to use UniCode (which includes ISO 8859-1 as a subset in the first 256 characters) by the browser (Lynx if you dial in to CCN, your choice if you go by other routes.) But Lynx uses the setting "Display character set" to decide which 8-bit code to send to you to display the e-ague character. In order for that to work two things are required: your communications software must be able to map that code to the appropriate glyph on your screen, which actually means the comm program must map the received code to a display font, and the display font must include the correct glyph at the correct code. On the other hand, I hypothesise that if Lynx is not viewing web pages, i.e. using file: access, the first stage of the interpretation does not occur, in the same way: Lynx displays the file according to your default character set, which is by default USASCII, and the character that is sent to you has a different 8-bit value. This is probably a Lynx BUG, the way we use it, but one might get some discussion on that. Technically, it is probably an undefined behaviour.=20 THE WORKAROUND: set the default character set to ISO 8859-1, and select fonts that will work for you.=20 THE DOWNSIDE: people receiving mail from you may see a note that you are using a different character set, unless they too are using 8859-1. THE FIX: make Lynx interpret entities in local HTML files using ISO 8859-1/Unicode, since we use local references as a shortcut to what are effectively web documents. Does this have any downside? need to distinguish between HTML and other document types and views, especially binary, and source, but I think only HTML and equivalent (htm, html.fr) are interpreted anyway. (Aside: some browsers now support up to 5 digit numbers in &#NNNNN; to do Unicode - does Lynx?) Discussion on the merits to CSuite-Dev@chebucto.ns.ca, please. Ed Dyer aa146@chebucto.ns.ca (902) H 826-7496 CCN Assistant Postmaster http://www.chebucto.ns.ca/~aa146/ W 426-4894 CSuite Technical Workshop Religion Page Editor, Chebucto Community Network http://www.chebucto.ns.ca On Sat, 17 May 1997 af380@chebucto.ns.ca wrote: > Hello. >=20 > Has anyone found the cause of the annoying bug that displays some accente= d > characters differently depending on the URL used? >=20 > I can never be sure when I quote a web page with accented characters on i= t > if I am getting them correctly. >=20 > Below is a fragment of code from the top of my antivirus page and how it > is displayed (and printed to a file) with a "file:" link and again with a= n > "http://" link. Also below is a table of all of the character pairs that > get swapped depending on the link used, shown as HTML and again as seen > with both a "file:" and an "http://" link. >=20 > My antivirus page: >=20 > HTML source code: >=20 > <LI><a href=3D"http://www.eeb.Fr/avp/index.htm"> > Home page des Editions Gérard Mannig</a>. >=20 > As displayed and printed with the URL: > file://localhost/ccn/home/80/af380/public_html/antivirus.html >=20 > * [1]Home page des Editions G=E9rard Mannig. >=20 > As displayed and printed with the URL: > http://www.chebucto.ns.ca/~af380/antivirus.html >=20 > * [1]Home page des Editions G=E8rard Mannig. >=20 > My HTML Sampler ( http://www.chebucto.ns.ca/~af380/htmlchars.html ) and > (until the bug is fixed) my Computer Hints, Tips, and Utilities page: >=20 > HTML source code: (From my Tips.html file) >=20 > &#192; =3D À =3D Capital A, grave accent<BR> > &#193; =3D Á =3D Capital A, acute accent<BR> > <BR> > &#196; =3D Ä =3D Capital A, dieresis or umlaut mark<BR> > &#197; =3D Å =3D Capital A, ring<BR> > <BR> > &#200; =3D È =3D Capital E, grave accent<BR> > &#201; =3D É =3D Capital E, acute accent<BR> > <BR> > &#210; =3D Ò =3D Capital O, grave accent<BR> > &#211; =3D Ó =3D Capital O, accute accent<BR> > <BR> > &#217; =3D Ù =3D Capital U, grave accent<BR> > &#218; =3D Ú =3D Capital U, acute accent<BR> > <BR> > &#224; =3D à =3D Small a, grave accent<BR> > &#225; =3D á =3D Small a, acute accent<BR> > <BR> > &#232; =3D è =3D Small e, grave accent<BR> > &#233; =3D é =3D Small e, accute accent<BR> > <BR> > &#236; =3D ì =3D Small i, grave accent<BR> > &#237; =3D í =3D Small i, acute accent<BR> > <BR> > &#242; =3D ò =3D Small o, grave accent<BR> > &#243; =3D ó =3D Small o, acute accent<BR> > <BR> > &#249; =3D ù =3D Small u, grave accent<BR> > &#250; =3D ú =3D Small u, acute accent<BR> >=20 > As displayed and printed with the URL: > file://localhost/ccn/home/80/af380/public_html/Tips.html >=20 > À =3D =C0 =3D Capital A, grave accent > Á =3D =C1 =3D Capital A, acute accent > =20 > Ä =3D =C4 =3D Capital A, dieresis or umlaut mark > Å =3D =C5 =3D Capital A, ring > =20 > È =3D =C8 =3D Capital E, grave accent > É =3D =C9 =3D Capital E, acute accent > =20 > Ò =3D =D2 =3D Capital O, grave accent > Ó =3D =D3 =3D Capital O, accute accent > =20 > Ù =3D =D9 =3D Capital U, grave accent > Ú =3D =DA =3D Capital U, acute accent > =20 > à =3D =E0 =3D Small a, grave accent > á =3D =E1 =3D Small a, acute accent > =20 > è =3D =E8 =3D Small e, grave accent > é =3D =E9 =3D Small e, accute accent > =20 > ì =3D =EC =3D Small i, grave accent > í =3D =ED =3D Small i, acute accent > =20 > ò =3D =F2 =3D Small o, grave accent > ó =3D =F3 =3D Small o, acute accent > =20 > ù =3D =F9 =3D Small u, grave accent > ú =3D =FA =3D Small u, acute accent > =20 > As displayed and printed with the URL: > http://www.chebucto.ns.ca/~af380/Tips.html >=20 > À =3D =C1 =3D Capital A, grave accent > Á =3D =C0 =3D Capital A, acute accent > =20 > Ä =3D =C5 =3D Capital A, dieresis or umlaut mark > Å =3D =C4 =3D Capital A, ring > =20 > È =3D =C9 =3D Capital E, grave accent > É =3D =C8 =3D Capital E, acute accent > =20 > Ò =3D =D3 =3D Capital O, grave accent > Ó =3D =D2 =3D Capital O, accute accent > =20 > &am