CCN: LYNX: &#NNN; in file: reference PROPOSED FIX

Date: Sun, 18 May 1997 04:53:25 -0300
From: "Norman L. DeForest" <af380@chebucto.ns.ca>
To: Edward Dyer <aa146@chebucto.ns.ca>
cc: techteam@chebucto.ns.ca,

next message in archive
next message in thread
previous message in archive
previous message in thread
Index of Subjects

&gt;    &am

On Sun, 18 May 1997, Edward Dyer wrote:

> Hello Norman!
> 
> 	Looks fine to me!
> 
> 	But you are right, in a way, it doesn't look right to you :-(
> Here's the long explanation:
> 
> The problem is this: There are two phases in retrieving a web document
> (i.e. anything that goes through http:)
> 
> For a web document, the &#NNN; is supposed to be interpreted according to
> either ISO-8859-1 (ISO Latin-1) or in the proposed revision to use UniCode
> (which includes ISO 8859-1 as a subset in the first 256 characters) by the
> browser (Lynx if you dial in to CCN, your choice if you go by other
> routes.)
> 
> But Lynx uses the setting "Display character set" to decide which 8-bit
> code to send to you to display the e-ague character.  In order for that to
> work two things are required: your communications software must be able to
                                ============================================
> map that code to the appropriate glyph on your screen, which actually
  =====================================================================
> means the comm program  must map the received code to a display font, and
  =========================================================================
> the display font must include the correct glyph at the correct code.
  ====================================================================

That doesn't apply to the samples I sent you as I used the "p" (Print) 
command to print to my Chebucto directory and then snipped away the 
irrelevant text.  Although I may have seen a *copy* of it on my screen, 
the actual text I sent you never passed through my communications 
software.  It went straight from the HTML file, through lynx (via "Print") 
to a local text file, got snipped and sent to you.

> On the other hand, I hypothesise that if Lynx is not viewing web pages,
> i.e. using file: access, the first stage of the interpretation does not
> occur, in the same way: Lynx displays the file according to your default
> character set, which is by default USASCII, and the character that is sent
> to you has a different 8-bit value.  This is probably a Lynx BUG, the way
> we use it, but one might get some discussion on that.  Technically, it is
> probably an undefined behaviour. 
> 
> THE WORKAROUND: set the default character set to ISO 8859-1, and select
> fonts that will work for you. 

I'll try that but am not too hopeful.  In fact, I'll try a number of 
settings to see if characters are shuffled around even more with others.

Whan I am on CCN, I have an ANSI font loaded into the VGA adapter.  No 
translation is selected at all -- presumably.

Note that at the library they have a 7-bit text terminal.  High ANSI 
characters are folded to the low characters:

         !"#$%&'()*+,-./01...89:;<=>?@AB...YZ[\]^_`ab...yz{|}~

With a local link, the folded "alphabetic" characters come out in the order:

         ABCDEFGHIJKLMNOPQRSTUVWXYZ ... abcdefghijklmnopqrstuvwxyz

when you view my htmlchars.html file.  With an "http://www.chebucto.etc." 
link, you see something like the sequence:   (I may have the wrong 
characters swapped but you get the idea)

         ABCEDFGIHJKMLNOPQSRTUVWYXZ ... abcedfgihjklmnpoqsrtuvwyxz

> THE DOWNSIDE: people receiving mail from you may see a note that you are
> using a different character set, unless they too are using 8859-1.
> 
> THE FIX: make Lynx interpret entities in local HTML files using ISO
> 8859-1/Unicode, since we use local references as a shortcut to what are
> effectively web documents.  Does this have any downside? need to
> distinguish between HTML and other document types and views, especially
> binary, and source, but I think only HTML and equivalent (htm, html.fr)
> are interpreted anyway. (Aside: some browsers now support up to 5 digit
> numbers in &#NNNNN; to do Unicode - does Lynx?)
> 
> Discussion on the merits to CSuite-Dev@chebucto.ns.ca, please.
> 
> Ed Dyer aa146@chebucto.ns.ca   (902) H 826-7496  CCN  Assistant Postmaster
> http://www.chebucto.ns.ca/~aa146/    W 426-4894  CSuite Technical Workshop
> Religion Page Editor, Chebucto Community Network http://www.chebucto.ns.ca
> 
> On Sat, 17 May 1997 af380@chebucto.ns.ca wrote:
> 
> > Hello.
> > 
> > Has anyone found the cause of the annoying bug that displays some accented
> > characters differently depending on the URL used?
> > 
> > I can never be sure when I quote a web page with accented characters on it
> > if I am getting them correctly.
[lexographic defoliation]

		Norman De Forest
		af380@chebucto.ns.ca
		http://www.chebucto.ns.ca/~af380/Profile.html
		(A Speech Friendly Site)

.........................................................................
Q.  Which is the greater problem in the world today, ignorance or apathy?
A.  I don't know and I couldn't care less.
.........................................................................


next message in archive
next message in thread
previous message in archive
previous message in thread
Index of Subjects