CCN: LYNX: &#NNN; in file: reference PROPOSED FIX

Date: Sun, 18 May 1997 04:00:25 -0300
From: Edward Dyer <aa146@chebucto.ns.ca>
To: "Norman L. DeForest" <af380@chebucto.ns.ca>
cc: techteam@chebucto.ns.ca,

next message in archive
next message in thread
previous message in archive
Index of Subjects


Hello Norman!

=09Looks fine to me!

=09But you are right, in a way, it doesn't look right to you :-(
Here's the long explanation:

The problem is this: There are two phases in retrieving a web document
(i.e. anything that goes through http:)

For a web document, the &#NNN; is supposed to be interpreted according to
either ISO-8859-1 (ISO Latin-1) or in the proposed revision to use UniCode
(which includes ISO 8859-1 as a subset in the first 256 characters) by the
browser (Lynx if you dial in to CCN, your choice if you go by other
routes.)

But Lynx uses the setting "Display character set" to decide which 8-bit
code to send to you to display the e-ague character.  In order for that to
work two things are required: your communications software must be able to
map that code to the appropriate glyph on your screen, which actually
means the comm program  must map the received code to a display font, and
the display font must include the correct glyph at the correct code.

On the other hand, I hypothesise that if Lynx is not viewing web pages,
i.e. using file: access, the first stage of the interpretation does not
occur, in the same way: Lynx displays the file according to your default
character set, which is by default USASCII, and the character that is sent
to you has a different 8-bit value.  This is probably a Lynx BUG, the way
we use it, but one might get some discussion on that.  Technically, it is
probably an undefined behaviour.=20

THE WORKAROUND: set the default character set to ISO 8859-1, and select
fonts that will work for you.=20

THE DOWNSIDE: people receiving mail from you may see a note that you are
using a different character set, unless they too are using 8859-1.

THE FIX: make Lynx interpret entities in local HTML files using ISO
8859-1/Unicode, since we use local references as a shortcut to what are
effectively web documents.  Does this have any downside? need to
distinguish between HTML and other document types and views, especially
binary, and source, but I think only HTML and equivalent (htm, html.fr)
are interpreted anyway. (Aside: some browsers now support up to 5 digit
numbers in &#NNNNN; to do Unicode - does Lynx?)

Discussion on the merits to CSuite-Dev@chebucto.ns.ca, please.

Ed Dyer aa146@chebucto.ns.ca   (902) H 826-7496  CCN  Assistant Postmaster
http://www.chebucto.ns.ca/~aa146/    W 426-4894  CSuite Technical Workshop
Religion Page Editor, Chebucto Community Network http://www.chebucto.ns.ca

On Sat, 17 May 1997 af380@chebucto.ns.ca wrote:

> Hello.
>=20
> Has anyone found the cause of the annoying bug that displays some accente=
d
> characters differently depending on the URL used?
>=20
> I can never be sure when I quote a web page with accented characters on i=
t
> if I am getting them correctly.
>=20
> Below is a fragment of code from the top of my antivirus page and how it
> is displayed (and printed to a file) with a "file:" link and again with a=
n
> "http://" link.  Also below is a table of all of the character pairs that
> get swapped depending on the link used, shown as HTML and again as seen
> with both a "file:" and an "http://" link.
>=20
> My antivirus page:
>=20
> HTML source code:
>=20
> <LI><a href=3D"http://www.eeb.Fr/avp/index.htm">
> Home page des Editions G&#233;rard Mannig</a>.
>=20
> As displayed and printed with the URL:
> file://localhost/ccn/home/80/af380/public_html/antivirus.html
>=20
>      * [1]Home page des Editions G=E9rard Mannig.
>=20
> As displayed and printed with the URL:
> http://www.chebucto.ns.ca/~af380/antivirus.html
>=20
>      * [1]Home page des Editions G=E8rard Mannig.
>=20
> My HTML Sampler ( http://www.chebucto.ns.ca/~af380/htmlchars.html ) and
> (until the bug is fixed) my Computer Hints, Tips, and Utilities page:
>=20
> HTML source code: (From my Tips.html file)
>=20
> &amp;#192; =3D &#192; =3D Capital A, grave accent<BR>
> &amp;#193; =3D &#193; =3D Capital A, acute accent<BR>
> <BR>
> &amp;#196; =3D &#196; =3D Capital A, dieresis or umlaut mark<BR>
> &amp;#197; =3D &#197; =3D Capital A, ring<BR>
> <BR>
> &amp;#200; =3D &#200; =3D Capital E, grave accent<BR>
> &amp;#201; =3D &#201; =3D Capital E, acute accent<BR>
> <BR>
> &amp;#210; =3D &#210; =3D Capital O, grave accent<BR>
> &amp;#211; =3D &#211; =3D Capital O, accute accent<BR>
> <BR>
> &amp;#217; =3D &#217; =3D Capital U, grave accent<BR>
> &amp;#218; =3D &#218; =3D Capital U, acute accent<BR>
> <BR>
> &amp;#224; =3D &#224; =3D Small a, grave accent<BR>
> &amp;#225; =3D &#225; =3D Small a, acute accent<BR>
> <BR>
> &amp;#232; =3D &#232; =3D Small e, grave accent<BR>
> &amp;#233; =3D &#233; =3D Small e, accute accent<BR>
> <BR>
> &amp;#236; =3D &#236; =3D Small i, grave accent<BR>
> &amp;#237; =3D &#237; =3D Small i, acute accent<BR>
> <BR>
> &amp;#242; =3D &#242; =3D Small o, grave accent<BR>
> &amp;#243; =3D &#243; =3D Small o, acute accent<BR>
> <BR>
> &amp;#249; =3D &#249; =3D Small u, grave accent<BR>
> &amp;#250; =3D &#250; =3D Small u, acute accent<BR>
>=20
> As displayed and printed with the URL:
> file://localhost/ccn/home/80/af380/public_html/Tips.html
>=20
>    &#192; =3D =C0 =3D Capital A, grave accent
>    &#193; =3D =C1 =3D Capital A, acute accent
>   =20
>    &#196; =3D =C4 =3D Capital A, dieresis or umlaut mark
>    &#197; =3D =C5 =3D Capital A, ring
>   =20
>    &#200; =3D =C8 =3D Capital E, grave accent
>    &#201; =3D =C9 =3D Capital E, acute accent
>   =20
>    &#210; =3D =D2 =3D Capital O, grave accent
>    &#211; =3D =D3 =3D Capital O, accute accent
>   =20
>    &#217; =3D =D9 =3D Capital U, grave accent
>    &#218; =3D =DA =3D Capital U, acute accent
>   =20
>    &#224; =3D =E0 =3D Small a, grave accent
>    &#225; =3D =E1 =3D Small a, acute accent
>   =20
>    &#232; =3D =E8 =3D Small e, grave accent
>    &#233; =3D =E9 =3D Small e, accute accent
>   =20
>    &#236; =3D =EC =3D Small i, grave accent
>    &#237; =3D =ED =3D Small i, acute accent
>   =20
>    &#242; =3D =F2 =3D Small o, grave accent
>    &#243; =3D =F3 =3D Small o, acute accent
>   =20
>    &#249; =3D =F9 =3D Small u, grave accent
>    &#250; =3D =FA =3D Small u, acute accent
>   =20
> As displayed and printed with the URL:
> http://www.chebucto.ns.ca/~af380/Tips.html
>=20
>    &#192; =3D =C1 =3D Capital A, grave accent
>    &#193; =3D =C0 =3D Capital A, acute accent
>   =20
>    &#196; =3D =C5 =3D Capital A, dieresis or umlaut mark
>    &#197; =3D =C4 =3D Capital A, ring
>   =20
>    &#200; =3D =C9 =3D Capital E, grave accent
>    &#201; =3D =C8 =3D Capital E, acute accent
>   =20
>    &#210; =3D =D3 =3D Capital O, grave accent
>    &#211; =3D =D2 =3D Capital O, accute accent
>   =20
>    &am