Hello folks,
I've recently set up a Pi 2B and pretend to play around with some stuff on it.
I was trying to run Mystic, but it seems that the LXTerm is not very much friendly to ANSI character codes. Is there a way to tweak it?
"Flavio Bessa" <nospam.Flavio.Bessa@f188.n801.z4.fidonet.org> wrote
|
| I was trying to run Mystic, but it seems that the LXTerm is not very much
| friendly to ANSI character codes. Is there a way to tweak it?
|
Is there a reason to think the Pi has codepages?
You'd need that, and you'd need to set the local
codepage, in order to use ANSI. I thought that
was only on Windows.
M$ Windows did not support ANSI codes until sometime in Win10
(primarily to allow the "Windows Subsystem for Linux" to handle common terminal controls).
Hello folks,
I was trying to run Mystic, but it seems that the LXTerm is not very
much friendly to ANSI character codes. Is there a way to tweak it?
On 2/11/21 9:44 PM, Flavio Bessa wrote:
Hello folks,
Hi,
I was trying to run Mystic, but it seems that the LXTerm is not very
much friendly to ANSI character codes. Is there a way to tweak it?
I know for a fact that XTerm, which it seems is the root of LXTerm, has supported ANSI control codes for at least 20 years as I've been using
them in it for at least that long.
Please provide more details about the problems that you're seeing.
Also, can you reproduce the problems in standard XTerm?
The 'modern' way to handle extended/extra characters is UTF
Ahem A Rivet's Shot wrote:
the *only* encoding capable of representing
everything unambiguously.
Yes it is ambiguous when the codepage is not expicitly declared but the bigger advantage is using more than one codepage in a single text, like quoting Hebrew and Greek in German. Personally I stick to TeX syntax
even for those. One character -- one byte has its advantages if you like
the command line and editor makros.
The downside are the malicious possibilities it opens. In my eyes it was
a big mistake to open domain names to more than ASCII. There are many
(near) lookalikes and that fools even the careful user, who makes a
point of checking the true destination before clicking.
the *only* encoding capable of representing
everything unambiguously.
Ahem A Rivet's Shot wrote:
the *only* encoding capable of representing
everything unambiguously.
Yes it is ambiguous when the codepage is not expicitly declared but the bigger advantage is using more than one codepage in a single text, like
"Ahem A Rivet's Shot" <steveo@eircom.net> wrote
| > The 'modern' way to handle extended/extra characters is UTF
|
| It is, perhaps, worth adding that the reason that this is the
| modern way is that it is the *only* encoding capable of representing
| everything unambiguously.
|
More to the point, it's backward compatible with HTML,
where the vast majority of webpages are still effectively
ASCII, aside from the odd curly quote or space character
inserted by editor software. Anything else would have
required multi-byte characters for the ASCII range and
thus would have broken editors and webpages.
This way we can espouse the value of multiculturalism
without changing very much. :)
More to the point, it's backward compatible with HTML
UTF-8 provided a smooth, easy, solution. It accommodates
the millions of pages and files that are still essentially ASCII.
Unlike with unicode 16 or 32, we don't have to add a null byte
to every character in order to encode it.
UTF-8 allows ANSI character sets to still be used. But it also
provides a way to fully support multi-byte characters only
where necessary. It's the one solution to support all languages
without changing the default of 1 character to 1 byte.
On 2/13/21 7:11 AM, The Natural Philosopher wrote:
Does any one else suspect that this post is utter bunk? UTF 8 is
multibyte character sequences and its not necessarily compatible with
HTML which uses straight, not curly, brackets.
The HTML /markup/ is basic ASCII.
The HTML /page/, in it's entirety, may contain UTF-* directly, or the
ASCII HTML codes therefor.
UTF8 is a layer above HTML.
Eh ... If you're talking about the HTML /file/ and not the HTML
/markup/, then it's entirely possible to have raw UTF-* content in the
text copy outside of the markup.
Does any one else suspect that this post is utter bunk? UTF 8 is
multibyte character sequences and its not necessarily compatible with
HTML which uses straight, not curly, brackets.
UTF8 is a layer above HTML.
HTML is not there to specify odd characters - it can but its job is to format text.
On Sat, 13 Feb 2021 17:49:28 +0000 The Natural Philosopher <tnp@invalid.invalid> wrote:
HTML is not there to specify odd characters - it can but its job is to
format text.
Nope, that's CSS's job. HTML's job is to add semantic markup - OK
they dropped the ball with <b>, <i>, <br>, <blink> as well as some
pre-css font and colour properties and ..., but mostly it's about
semantic markup honest.
On 2/13/21 10:49 AM, The Natural Philosopher wrote:
HTML is not there to specify odd characters - it can but its job is to
format text.
All of the HTML codes for special characters tends to disagree with you.
©
€
™
...
Do a web search for "html special characters" and you will find long
lists.
I don't know what version of HTML these were introduced. But I do know
that many of the basic ones have been there for at least 20 years (HTML
4?).
HTML is not there to specify odd characters - it can but its job is to format text.
.... and are still there in HTML 5
Nope, that's CSS's job. HTML's job is to add semantic markup - OK
they dropped the ball with <b>, <i>, <br>, <blink> as well as some
pre-css font and colour properties and ..., but mostly it's about
semantic markup honest.
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
On Sat, 13 Feb 2021 11:09:57 -0700, Grant Taylor wrote:
On 2/13/21 10:49 AM, The Natural Philosopher wrote:
HTML is not there to specify odd characters - it can but its job is to >>> format text.
All of the HTML codes for special characters tends to disagree with you.
©
€
™
...
Do a web search for "html special characters" and you will find long
lists.
I don't know what version of HTML these were introduced. But I do know
that many of the basic ones have been there for at least 20 years (HTML
4?).
... and are still there in HTML 5
"TimS" <timstreater@greenbee.net> wrote
| > UTF-8 allows ANSI character sets to still be used. But it also
| > provides a way to fully support multi-byte characters only
| > where necessary. It's the one solution to support all languages
| > without changing the default of 1 character to 1 byte.
|
| It's only a default for ASCII, and the characters that ASCII supports. And | when you say it allows ANSI character sets to be used, I take it you mean the
| characters that different ANSI pages supported, which under UTF-8 will
most
| likely be 2-byte chars, rather than 1-byte but 8-bit values.
|
Most ANSI character sets are also 1 byte to 1 character.
It's only the DBCS languages that can't fit that model.
So first we had ASCII. Then we had ANSI with codepages,
and most languages could be fully represented in HTML
using META content type. **All of that is 1 byte to 1
character.** Only the DBCS languages were an exception.
And they used a system similar to UTF-8.
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Browsers properly
display curly quotes, but I actually only have one unicode
font on my system, which is arial uncode MS, weighing in at
24 MB. Nothing else will render most UTF-8 characters. For example,
the RichEdit window in Windows has supported UTF-8 for
some time. And I can use the ability in my own software.
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <mayayana@invalid.nospam> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode internally for the sake of sanity (I was for a while internationalisation specialist (among other hats) on the Yahoo! front page team). We had loads
of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - they're almost but not quite the same.
but mostly it's about semantic markup honest.
... plus all the (X)HTML Symbols and characters - á &
£ ... which AFAIK can't be rendered with CSS.
On 2/13/21 11:11 AM, Ahem A Rivet's Shot wrote:
Nope, that's CSS's job. HTML's job is to add semantic markup - OK
they dropped the ball with <b>, <i>, <br>, <blink> as well as some
pre-css font and colour properties and ..., but mostly it's about
semantic markup honest.
That's the /current/ interpretation. 20 years ago, there was a
different interpretation.
Hello folks,
I've recently set up a Pi 2B and pretend to play around with some stuff on it.
I was trying to run Mystic, but it seems that the LXTerm is not very much friendly to ANSI character codes. Is there a way to tweak it?
On 13 Feb 2021 at 21:05:47 GMT, Ahem A Rivet's Shot <steveo@eircom.net> wrote:
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <mayayana@invalid.nospam> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode
internally for the sake of sanity (I was for a while internationalisation
specialist (among other hats) on the Yahoo! front page team). We had loads >> of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - >> they're almost but not quite the same.
I convert everything to UTF-8. Windows tends to lie about which code-page it's using, anyway.
Not that it really matters. It's pretty much all ASCII.
On Sat, 13 Feb 2021 17:49:28 +0000
The Natural Philosopher <tnp@invalid.invalid> wrote:
HTML is not there to specify odd characters - it can but its job is to
format text.
Nope, that's CSS's job. HTML's job is to add semantic markup - OK
they dropped the ball with <b>, <i>, <br>, <blink> as well as some pre-css font and colour properties and ..., but mostly it's about semantic markup honest.
Ahem A Rivet's Shot wrote:
but mostly it's about semantic markup honest.
Yes and no. HTML is markup and characters are content. So by definition characters are not HTML. But still HTML defined how to encode those not
part of 7-bit ASCII.
On 2/13/21 10:49 AM, The Natural Philosopher wrote:
HTML is not there to specify odd characters - it can but its job is
to format text.
All of the HTML codes for special characters tends to disagree with you.
©
€
™
...
Do a web search for "html special characters" and you will find long lists.
I don't know what version of HTML these were introduced. But I do know
that many of the basic ones have been there for at least 20 years (HTML
4?).
On 13 Feb 2021 at 21:05:47 GMT, Ahem A Rivet's Shot <steveo@eircom.net> wrote:
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <mayayana@invalid.nospam> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode
internally for the sake of sanity (I was for a while internationalisation
specialist (among other hats) on the Yahoo! front page team). We had loads >> of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - >> they're almost but not quite the same.
I convert everything to UTF-8. Windows tends to lie about which code-page it's
using, anyway.
the question of which font doesn't enter
into character representation.
On 13/02/2021 18:11, Ahem A Rivet's Shot wrote:
Nope, that's CSS's job. HTML's job is to add semantic markup -
OK they dropped the ball with <b>, <i>, <br>, <blink> as well as some pre-css font and colour properties and ..., but mostly it's about
semantic markup honest.
CSS is part of HTML
On 13/02/2021 22:09, Axel Berger wrote:
Ahem A Rivet's Shot wrote:By saying 'content-type: UTF8' or whatever the exact magic spell is
but mostly it's about semantic markup honest.
Yes and no. HTML is markup and characters are content. So by definition
characters are not HTML. But still HTML defined how to encode those not
part of 7-bit ASCII.
On 13/02/2021 21:48, TimS wrote:
the question of which font doesn't enter
into character representation.
It does if the font in use has no representation of the glyph you are
trying to display
You wont get far trying to display Gujarati in Arial Narrow...
On 13 Feb 2021 at 21:05:47 GMT, Ahem A Rivet's Shot <steveo@eircom.net> wrote:
On Sat, 13 Feb 2021 14:42:37 -0500
"Mayayana" <mayayana@invalid.nospam> wrote:
So all I was saying was that UTF-8 was far easier than
any other approach, using "wide characters", when it came
time to fully support all languages under one system. Even now
I'm not sure how much it's really used.
Anyone who has to support multiple languages tends to use unicode internally for the sake of sanity (I was for a while internationalisation specialist (among other hats) on the Yahoo! front page team). We had loads of fun with external feeds claiming to be ISO8859-1 and sending Win-1252 - they're almost but not quite the same.
I convert everything to UTF-8. Windows tends to lie about which code-page it's
using, anyway.
It does if the font in use has no representation of the glyph you are
trying to display
You wont get far trying to display Gujarati in Arial Narrow...
On 14 Feb 2021 at 03:36:19 GMT, The Natural Philosopher
<tnp@invalid.invalid>
wrote:
On 13/02/2021 22:09, Axel Berger wrote:
Ahem A Rivet's Shot wrote:By saying 'content-type: UTF8' or whatever the exact magic spell is
but mostly it's about semantic markup honest.
Yes and no. HTML is markup and characters are content. So by
definition characters are not HTML. But still HTML defined how to
encode those not part of 7-bit ASCII.
Just start your html page with:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
End of.
"The Natural Philosopher" <tnp@invalid.invalid> wrote
| > Not that it really matters. It's pretty much all ASCII.
| >
| >
| Schrödingers cat would disagree - or ½ of him would.
|
:) I always wonder how people end up using these characters.
There are ways to do it. I can copy the character from existing
text. On Windows I think there's Charmap, though I've never
used it. Schrodinger will just have to get by without his umlaut.
Just as "naive" has survived without one.
Then there's the matter of the mechanical entry system. My
keyboard only has ASCII and a few extras.
So it's a good solution for webpages, but once you get into
entering, editing and storing multi-lingual text it gets very
complicated. Only for those of us who speak English is it
reasonable to say that UTF-8 makes everything easy.
On Sun, 14 Feb 2021 08:32:36 -0500
"Mayayana" <mayayana@invalid.nospam> wrote:
So it's a good solution for webpages, but once you get into
entering, editing and storing multi-lingual text it gets very
complicated. Only for those of us who speak English is it
reasonable to say that UTF-8 makes everything easy.
Not so! Unicode is the enabler for anyone who needs to handle
multiple scripts and languages, sure if you just want CJK then you could
use SHIFT-JIS but if you want to be able to hold text and not worry about what script or language it belongs to and mix scripts and languages freely then Unicode is the only solution.
But in System Prefs -> Keyboard -> Keyboard-tab I have ticked
"Show Keyboard Viewer in menu bar".
"The Natural Philosopher" <tnp@invalid.invalid> wrote
| CSS is part of HTML
|
It's part of web design but it's an entirely different system
and syntax. Though I suppose that's splitting hairs.
So all those ANSI pages need to go in the bin, really speaking.
"Ahem A Rivet's Shot" <steveo@eircom.net> wrote
| > It's part of web design but it's an entirely different system
| > and syntax. Though I suppose that's splitting hairs.
|
| Not really, there's an important separation CSS applies to any XML
| or SGML not just HTML.
|
You snipped my example. The "cascading" part applies
there. First is the CSS file. Then that's overridden by
CSS in the STYLE tag of the page. Then that can be
overridden by a STYLE attribute in the HTML tag. If
you want pretty fonts for your XML that's up to you,
but CSS is still deeply entagled with HTML. To not
"The Natural Philosopher" <tnp@invalid.invalid> wrote
| > Not that it really matters. It's pretty much all ASCII.
| >
| >
| Schrödingers cat would disagree - or ½ of him would.
|
:) I always wonder how people end up using these characters.
There are ways to do it. I can copy the character from existing
text. On Windows I think there's Charmap, though I've never
used it. Schrodinger will just have to get by without his umlaut.
Just as "naive" has survived without one.
Then there's the matter of the mechanical entry system. My
keyboard only has ASCII and a few extras.
Where this really helps is with things like Chinese. But it only
really helps them. For English speakers, we deal with pretty much all
ASCII. And that's not the 1/2 of it. As you noted, if you want
to write unicode you also need a unicode font. Browsers make
it look simple, but for general text files it's not so simple. For
example, I like to use Verdana for most text. But the font
is not unicode. Windows will display UTF-8 as ANSI.
If I visit xinhuanet.com I see Chinese characters. (Even though
it's all Greek to me.) If I check the source code I see Chinese. If
I download that and open it in my code editor as UTF-8 with
Verdana font, I see some of the languages. It looks like I'm
getting Russian and Arabic, for example. But the Chinese is all
little boxes. If I open it in Notepad, since it's plain text with no
file header, it shows as English ANSI with lots of little boxes.
So it's a good solution for webpages, but once you get into
entering, editing and storing multi-lingual text it gets very
complicated. Only for those of us who speak English is it
reasonable to say that UTF-8 makes everything easy. It does,
but only because it's usually exactly the same byte string as
ASCII. In fact, if I happen to come across
UTF-8 text or HTML code I'll generally convert it to ASCII/ANSI
for convenience. It's too much trouble trying to access it across
different programs and displays at UTF-8. On Linux, where that's
standard, it's fine. But we have to remember that this is
representational file encoding. UTF-8 by itself is no miracle.
Microsoft are one of the sites that have used UTF-8 for years.
It's all English on their English pages, but they spec it as
UTF-8, use curly quotes and UTF-8 space characters. Neither
is necessary and it complicates things. Both of these will work
with an English codepage. The first should work anywhere:
“curly &#nbsp; quotes”
“curly   quotes”
And just to keep TNP happy here's some Gujarati: શ ણ ઊ ૐ . Hope it's notLucky I have gujarati capable fonts in use
rude.
"The Natural Philosopher" <tnp@invalid.invalid> wrote
| Lucky I have gujarati capable fonts in use
|
|
| ᚠᚲ ᛗᛖ
|
Indeed. I see "as", "as" squared, a-, a-.
But I'm sure that's a very funny joke in India.
"TimS" <timstreater@greenbee.net> wrote
| You obviously need to get a new Usenet client.
|
No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <timstreater@greenbee.net> wrote
You obviously need to get a new Usenet client.No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <timstreater@greenbee.net> wrote
| You obviously need to get a new Usenet client.
|
No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <timstreater@greenbee.net> wrote
| You obviously need to get a new Usenet client.
|
No. I can only read English. I only write English.
It's of no value to see characters in languages I
can't read, even if I have the font. The nice thing
about not displaying in UTF-8 is that I don't have
to see emojis. I can just write a cranky not back
to people saying that I only see boxes. That usually
cures them of emoji mania. :)
"TimS" <timstreater@greenbee.net> wrote
| > No. I can only read English. I only write English.
| > It's of no value to see characters in languages I
| > can't read, even if I have the font. The nice thing
| > about not displaying in UTF-8 is that I don't have
| > to see emojis. I can just write a cranky not back
| > to people saying that I only see boxes. That usually
| > cures them of emoji mania. :)
|
| You're obviously making life too easy for yourself. Why not just junk all this
| useful software nonsense, and read the bits directly off the disk. All you | need is a bar magnet, a magnifying glass, and some fine iron filings.
| Simples!
|
:) This seems to really get your goat. If you'll recall,
all I said was that UTF-8 was a good choice because
for English-speaking people and most webpages it was
an invisible transition. That might not be politically
correct, but it's true.
If you're on Linux and text files default to UTF-8 then
that's handy. You'll never need to know that the encoding
is not ASCII/ANSI. Since all my text files and HTML files are
essentially ASCII, I convert any UTF-8 I get to that.
For me UTF-8 is only corrupted text data.
For many people, UTF-8 is a great solution. That's fine.
But if you're going to send me funky characters for no reason,
in English, in a text-based medium, I see no reason to figure
out how to decipher it... And imagine my dismay at going to the
trouble only to find that someone has sent me 4 crying faces,
3 piles of shit, and an umbrella... or is that a soccer ball?
Or that they're trying to show off by sending some ditty in
Turkish or Russian... I still don't know what it means. I can't
read Turkish and Russian.
And what the heck does 4 crying faces and 3 piles of
shit and an umbrella mean? The sender is having a tantrum?
They've eaten too many prunes? They hate shitting? Or maybe
it's an inside joke. Maybe that's Beyonce's famous signature?
Sort of a "proud to be cranky" gimmick? Maybe it's Taylor
Swift's official breakup note? Maybe the sender is signalling
their fondness for some foul tempered rock star?
Who knows? It's
hardly an articulate expression. I pretty much ignore
emojis, anyway, for that reason. I usually don't know
what they mean. I just figured out that what I thought
was a corncob is probably "anjali" -- praying hands. So...
what?... a hippie is writing to me and they've developed
that irritating behavioral tic of bowing to express false
humility? ... Or maybe it really is a corncob. Beats me.
I need UTF-8 so that I can see such crap? I don't think so.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 418 |
Nodes: | 16 (2 / 14) |
Uptime: | 23:16:43 |
Calls: | 8,804 |
Calls today: | 2 |
Files: | 13,304 |
Messages: | 5,970,194 |