• Re: Character non-equivalence, was Byte Addressability And Beyond

    From John Levine@21:1/5 to ThatWouldBeTelling@thevillage.com on Fri Jun 7 21:26:03 2024
    It appears that EricP <ThatWouldBeTelling@thevillage.com> said:
    Eeewww... I didn't even think of that.
    What does one do about them? You can't treat them as equivalent in a
    string compare... the user might want the first B and not second B.

    People keep rediscovering that when you're using Unicode, nothing is
    simple. One of its canonical forms is NFKC which uses composed
    versions of accented characters, and uses a canonical equivalence rule
    to turn some kinds of characters that look similar into a single form.

    That solves some of the problems but not even close to all of them.
    The rules about whether two strings are upper/lower caase equivalent
    depend on the language and sometimes even the local version of the
    language, e.g. French French and Quebec French have different
    conventions about accented capital letters.

    The only thing I can say with confidence is that any rule that starts
    with "You can just ..." is wrong.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to John Levine on Sun Jun 9 03:23:50 2024
    On Fri, 7 Jun 2024 21:26:03 -0000 (UTC), John Levine wrote:

    People keep rediscovering that when you're using Unicode, nothing is
    simple.

    Unicode is the first successful attempt at capturing the complexity of
    human writing in a computer code.

    Now, suddenly, those whose experience of “text processing” was only in ASCII, or Windows-1252, or something basic like that, are confronted with
    the full reality of a multilingual, multicultural, multinational world.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)