CCN: LYNX: &#NNN; in file: reference PROPOSED FIX

Date: Sun, 18 May 1997 23:35:33 -0300
From: "Norman L. DeForest" <af380@chebucto.ns.ca>
To: Edward Dyer <aa146@chebucto.ns.ca>
cc: techteam@chebucto.ns.ca,

next message in archive
next message in thread
previous message in archive
previous message in thread
Index of Subjects

Index of Subjects

On Sun, 18 May 1997, Edward Dyer wrote:

> Hello Norman!
[snip]
> The problem is this: There are two phases in retrieving a web document
> (i.e. anything that goes through http:)
>=20
> For a web document, the &#NNN; is supposed to be interpreted according to
> either ISO-8859-1 (ISO Latin-1) or in the proposed revision to use UniCod=
e
> (which includes ISO 8859-1 as a subset in the first 256 characters) by th=
e
> browser (Lynx if you dial in to CCN, your choice if you go by other
> routes.)
>=20
> But Lynx uses the setting "Display character set" to decide which 8-bit
> code to send to you to display the e-ague character.  In order for that t=
o
> work two things are required: your communications software must be able t=
o
> map that code to the appropriate glyph on your screen, which actually
> means the comm program  must map the received code to a display font, and
> the display font must include the correct glyph at the correct code.
>=20
> On the other hand, I hypothesise that if Lynx is not viewing web pages,
                     =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> i.e. using file: access, the first stage of the interpretation does not
  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> occur, in the same way: Lynx displays the file according to your default
  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> character set, which is by default USASCII, and the character that is sen=
t
  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> to you has a different 8-bit value.  This is probably a Lynx BUG, the way
> we use it, but one might get some discussion on that.  Technically, it is
> probably an undefined behaviour.=20

Translation of a web page when viewing it or when printing it to a file=20
seems to be taking place whether it is a local file or a remote one.

> THE WORKAROUND: set the default character set to ISO 8859-1, and select
> fonts that will work for you.=20

I have tried a default document type of "NONE", "ISO-8859-1", and=20
"ISO-8859-2" with the display character set of ISO Latin 1 and there=20
is no difference in behaviour.  ISO Latin 1 seems to interpret characters
differently depending on whether a file is local or remote regardless of=20
document type selected.  Other display character sets don't have this=20
behaviour but are not fully usable with an ANSI font on my (borrowed) PC.

> THE DOWNSIDE: people receiving mail from you may see a note that you are
> using a different character set, unless they too are using 8859-1.
>=20
> THE FIX: make Lynx interpret entities in local HTML files using ISO
> 8859-1/Unicode, since we use local references as a shortcut to what are
> effectively web documents.  Does this have any downside? need to
> distinguish between HTML and other document types and views, especially
> binary, and source, but I think only HTML and equivalent (htm, html.fr)
> are interpreted anyway. (Aside: some browsers now support up to 5 digit
> numbers in &#NNNNN; to do Unicode - does Lynx?)
>=20
> Discussion on the merits to CSuite-Dev@chebucto.ns.ca, please.
>=20
> Ed Dyer aa146@chebucto.ns.ca   (902) H 826-7496  CCN  Assistant Postmaste=
r
> http://www.chebucto.ns.ca/~aa146/    W 426-4894  CSuite Technical Worksho=
p
> Religion Page Editor, Chebucto Community Network http://www.chebucto.ns.c=
a
[major lexographic defoliation here]

Below my signature block are samples of a test file saved to my CCN=20
directory with the "p" (Print) command of lynx with different display =20
character set selections, for both local access and http access.  Note=20
that ISO Latin 1 is the only character set that generates different=20
results depending on local or http access.  The test file HTML code is=20
also included.  Feel free to use it yourself.  Any attempt on my part to=20
enforce a copyright on it would be mean, nasty, and ridiculous.


=09=09Norman De Forest
=09=09af380@chebucto.ns.ca
=09=09http://www.chebucto.ns.ca/~af380/Profile.html
=09=09(A Speech Friendly Site)

.........................................................................
Q.  Which is the greater problem in the world today, ignorance or apathy?
A.  I don't know and I couldn't care less.
.........................................................................


                      High character set in compact form.

HTML Source Code:

<html>
<head>
<title> High character set in compact form. </title>
</head>
<body>

<h1> High character set in compact form. </h1>

&#160; &#161; &#162; &#163; &#164; &#165; &#166; &#167;=20
&#168; &#169; &#170; &#171; &#172; &#173; &#174; &#175;=20
<BR>
<BR>
&#176; &#177; &#178; &#179; &#180; &#181; &#182; &#183;=20
&#184; &#185; &#186; &#187; &#188; &#189; &#190; &#191;=20
<BR>
<BR>
&#192; &#193; &#194; &#195; &#196; &#197; &#198; &#199;=20
&#200; &#201; &#202; &#203; &#204; &#205; &#206; &#207;=20
<BR>
<BR>
&#208; &#209; &#210; &#211; &#212; &#213; &#214; &#215;=20
&#216; &#217; &#218; &#219; &#220; &#221; &#222; &#223;=20
<BR>
<BR>
&#224; &#225; &#226; &#227; &#228; &#229; &#230; &#231;=20
&#232; &#233; &#234; &#235; &#236; &#237; &#238; &#239;=20
<BR>
<BR>
&#240; &#241; &#242; &#243; &#244; &#245; &#246; &#247;=20
&#248; &#249; &#250; &#251; &#252; &#253; &#254; &#255;=20
<BR>
<BR>

</body>
</html>

Below is the above file as printed to my Chebucto directory with the lynx
"p" command when I am viewing it with different display options.  It looks
like the display option affects the printed output whether the file is
local or remote.  Also, the display ISO Latin 1 character set is the only
one that displays and prints differently depending on whether the file is
local or remote.  With that display option, I also tried three different=20
document character set selections, with the same results.

Note: below, the headings, "High character set in compact form.", and some
trailing spaces removed for space reasons.  No other changes.

Addresses used are:

<ol>
<LI><a href=3D"file://localhost/ccn/home/80/af380/public_html/charset.html"=
>
  charset.html -- local link</a>
<LI>&