• Musings about Usernames in adduser and Debian

    From Marc Haber@21:1/5 to All on Thu Nov 21 18:50:01 2024
    [writing this with my adduser hat on. I am also in touch with the
    maintainers of src:shadow and base-passwd]

    Hi,

    recently, I have "taken over" the wiki page about UserAccounts and have
    put in some history and general thoughts about what Debian thinks about
    user names and name restrictions.

    https://wiki.debian.org/UserAccounts

    I fear that I have opened an especially nasty can of worms by beginning
    to do sanity checks in adduser and being pointed towards user name
    encoding in that process. Can you help me to bring some sense into this
    mess?

    I would like to hear your comments. Feel free to directly apply
    corrections to the wiki page. I am especially interested in having clear terminology regarding unicode codepoints, UTF-8, character strings and
    byte strings. It is vitally important to be consistent her to avoid
    making the mess even worse.

    For adduser's next release, I would like to discuss the following
    things:

    (1)
    Should Debian allow UTF-8 user names in the first place or should we
    restrict names for regular users to some us-ascii near set as well? (I
    think yes, we should)

    (2)
    If the answer to (1) is "allow UTF-8", should we also do that for system
    users? (I think no, we should not)

    (2a)
    Which UTF-8 subset / code point classes should we allow and which should
    we reject? (I don't have an opinion about that)

    (3)
    I think that 32 characters/bytes (it's the same if we don't allow UTF-8)
    is a good limitation for a system user name. But, should we increase
    that for regular user names? (I think yes)

    (4)
    If we decide to relax some of our current requirements, where are the
    borders between "normal" user name, one that requires --allow-bad-names
    and finally one that requires --allow-all-names? Wouldn't it be
    offensive to speakers of some languages that require --allow-bad-names
    for their special characters to be allowed on a user name? (no opinion
    here that would not break backwards compatibility)

    (5)
    Is it right to say "the user name in /etc/passwd is UTF-8 encoded" or
    should I better say "the user name in /etc/passwd can be UTF-8 encoded"?

    (6)
    Does it still make sense to give non-UTF-8-locales special handling
    (which one?), or can adduser safely assume that any non-ascii locale is
    UTF-8? Or must I check for locale and reject UTF-8 user names on
    non-UTF-8 locales? (I hope that we can safely assume UTF-8)

    (7)
    Do the general restrictions for both kinds of user names make sense?
    Going forward with this would mean to reject user names that we used to
    accept before. (I think we should come close to systemd's ideas)

    (8)
    I think that our current way to restrict system account names is fine.
    Any objections/additions here?

    (9)
    Should some of this language be in Policy instead of some random wiki
    page? Policy is quite short about user names (chapter 9.2) (I think yes)

    (10)
    What should adduser do regarding subuids? Since I was ignorant about
    that concept until a few hours ago, all accounts created by adduser do
    have subuids, regardless of being system account or not, while useradd
    does not give system accounts subuids.

    Greetings
    Marc

    P.S.: The teams and inviduals working on src:shadow, base-passwd and
    adduser would appreciate your help in coding and packaging. You can gt
    in touch with all involved parties via
    pkg-shadow-devel@lists.alioth.debian.org

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Lewis@21:1/5 to Marc Haber on Thu Nov 21 23:10:01 2024
    Marc Haber <mh+debian-devel@zugschlus.de> writes:


    For adduser's next release, I would like to discuss the following
    things:

    (1)
    Should Debian allow UTF-8 user names in the first place or should we
    restrict names for regular users to some us-ascii near set as well? (I
    think yes, we should)

    would allowing utf-8 enable some of the abuse described at https://lwn.net/Articles/874951/ ?

    as usernames appear in logs and other output (and are passed to all
    sorts of commands), it seems a bad idea to be too permissive or to
    change from historic practice by default, even though from a user pov it
    would be nice to have the option


    P.S.: The teams and inviduals working on src:shadow, base-passwd and
    adduser would appreciate your help in coding and packaging.

    Is there a list of "things that need doing"?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to Marc Haber on Thu Nov 21 23:30:01 2024
    On 2024-11-21 18:45:06, Marc Haber wrote:
    [writing this with my adduser hat on. I am also in touch with the
    maintainers of src:shadow and base-passwd]

    Hi,

    recently, I have "taken over" the wiki page about UserAccounts and have
    put in some history and general thoughts about what Debian thinks about
    user names and name restrictions.

    https://wiki.debian.org/UserAccounts

    I fear that I have opened an especially nasty can of worms by beginning
    to do sanity checks in adduser and being pointed towards user name
    encoding in that process. Can you help me to bring some sense into this
    mess?

    I would like to hear your comments. Feel free to directly apply
    corrections to the wiki page. I am especially interested in having clear terminology regarding unicode codepoints, UTF-8, character strings and
    byte strings. It is vitally important to be consistent her to avoid
    making the mess even worse.

    For adduser's next release, I would like to discuss the following
    things:

    (1)
    Should Debian allow UTF-8 user names in the first place or should we
    restrict names for regular users to some us-ascii near set as well? (I
    think yes, we should)

    You weren't clear to which part you agreed. If by "we should" you meant
    the closest option, i.e. restrict, then I agree as well.

    As Richard also replied, full UTF-8 is tricky, and I think it's somewhat misplaced to focus on the username, as opposed to gecos. Aren't most
    other OSes using the "full name" as the "display name", and the username
    is mostly one part of the user/password combination, but not a display
    property most of the time?

    So I would suggest that maybe the better option is to standardise the
    gecos format/gecos parsing, so migrate UI tools to use that more often.

    On the other hand, as long as this is admin-controlled, it doesn't
    matter much. I could see that viewpoint, but I wonder how much latent
    breakage would be introduced that will take years to fix in all tooling
    and all packages.

    regards,
    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Iustin Pop on Fri Nov 22 10:50:01 2024
    [Reducing the list to debian-devel. I have omitted to set Reply-To and apologize for that]

    On Thu, Nov 21, 2024 at 11:26:48PM +0100, Iustin Pop wrote:
    On 2024-11-21 18:45:06, Marc Haber wrote:
    Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should)

    You weren't clear to which part you agreed. If by "we should" you meant
    the closest option, i.e. restrict, then I agree as well.

    I am sorry. My personal opinions were among the last things I added to
    the article and I was not clear here. I think we should allow UTF-8 user
    names as a courtesy to those people who need non-ascii user names to
    write their name, since user names are frequently chosen from the real
    name of the person. In addition, this will enhance software quality
    since we now get the chance of finding bugs that are already here in
    many software.

    This comes kind of late in the Trixie cycle, but as it is currently
    already possible to create user names with UTF-8 characters, I do not
    like the idea of tightening our restrictions in Trixie over what we have
    in Bookworm just to maybe revisit our decision in Trixie+1.

    As Richard also replied, full UTF-8 is tricky,

    My current code uses \p{Graph} as a least common denominator. I am not
    sure whether this is wise.

    and I think it's somewhat
    misplaced to focus on the username, as opposed to gecos. Aren't most
    other OSes using the "full name" as the "display name", and the username
    is mostly one part of the user/password combination, but not a display property most of the time?

    I think that we should allow full UTF-8 in the gecos¹ field, yes. People should be allowed to have their fully correct name in there. I also
    think that users of non-latin languages should have the possibility to
    have a login name that resembles their name.

    ¹ in 2024 noone remembers what gecos means any more. Adduser and
    src:shadow are using "comment" for that field nowadays.

    So I would suggest that maybe the better option is to standardise the
    gecos format/gecos parsing, so migrate UI tools to use that more often.

    That doesn't solve the issue I am having with adduser right now: That
    we're allowing things that we are not sure we should allow.

    On the other hand, as long as this is admin-controlled, it doesn't
    matter much. I could see that viewpoint, but I wonder how much latent breakage would be introduced that will take years to fix in all tooling
    and all packages.

    Yes. Fixing breakage makes software better, and by disallowing non-latin characters in user names we are hiding those issues away.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Richard Lewis on Fri Nov 22 10:40:01 2024
    On Thu, Nov 21, 2024 at 10:05:49PM +0000, Richard Lewis wrote:
    Marc Haber <mh+debian-devel@zugschlus.de> writes:


    For adduser's next release, I would like to discuss the following
    things:

    (1)
    Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should)

    would allowing utf-8 enable some of the abuse described at https://lwn.net/Articles/874951/ ?

    as usernames appear in logs and other output (and are passed to all
    sorts of commands), it seems a bad idea to be too permissive or to
    change from historic practice by default, even though from a user pov it would be nice to have the option

    I am not sure about that. Would typosquatting on a user name make sense?
    It might be possible to make logs ambiguious. Being passed to other
    commands SHOULD not be dangerous since we can expect other commands to gracefully handle a byte stream, can't we?

    I might be naive here , but I don't have much experience with non-ascii
    names since I have the privilege of being fluent in two languages that
    use the latin alphabet.

    On the other side, wouldnt it be a courtesy to allow people having a
    name that needs transcription to be written in latin to use their name
    in the real alphabet that it is usually written in as a login name as
    well? To make things worse, transcriptions are often ambigious.

    I would like to hear the opinion of people who would be affected by this change.

    Local Administrators are able today to use UTF-8 user names in useradd
    or configure adduser to allow their locally important subset of UTF-8,
    but at the moment with things being more restrictive, our software is
    untested in this regard. I think that Debian would get more robust if
    we'd allow things here.

    Vulnerabilities that could be exploited by having non-ascii user names
    are already here and present today, just not uncovered yet.

    P.S.: The teams and inviduals working on src:shadow, base-passwd and adduser would appreciate your help in coding and packaging.

    Is there a list of "things that need doing"?

    The collaboration between src:shadow, base-passwd and adduser is a
    relatively fresh thing that came from the fact that src:shadow recently introduced changes that made adduser's test suite break. So we haven't
    yet found good paths yet. I suggested moving together as a method to
    improve communication and also to at least a bit reducing the bus
    factors of those quite important packages. That was also the reason why
    I suggested base-passwd to join and I am happy that Colin agreed.

    In adduser, nearly everything that needs doing has issues in the BTS,
    with the severity set to the urgency of the matter in my opinion. You'll
    see that adduser has quite a lot of bugs that were filed by myself. I
    consider it a feature to have a public to-do list. For the other two
    packages, I'd let their respective maintainers comment.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timo =?utf-8?Q?R=C3=B6hling?=@21:1/5 to All on Fri Nov 22 15:30:01 2024
    Hi,

    * Richard Lewis <richard.lewis.debian@googlemail.com> [2024-11-21
    * 22:05]:
    would allowing utf-8 enable some of the abuse described at >https://lwn.net/Articles/874951/ ?

    as usernames appear in logs and other output (and are passed to all
    sorts of commands), it seems a bad idea to be too permissive or to
    change from historic practice by default, even though from a user pov it >would be nice to have the option
    I have no experience with bidirectional attacks, but browsers
    mitigate homograph attacks in IDNs by disallowing mixed alphabets
    such as cyrillic and latin letters in the same name. That seems to
    be a reasonable restriction for user names as well.


    Cheers
    Timo


    --
    ⢀⣴⠾⠻⢶⣦⠀ ╭────────────────────────────────────────────────────╮
    ⣾⠁⢠⠒⠀⣿⡁ │ Timo Röhling │
    ⢿⡄⠘⠷⠚⠋⠀ │ 9B03 EBB9 8300 DF97 C2B1 23BF CC8C 6BDD 1403 F4CA │
    ⠈⠳⣄⠀⠀⠀⠀ ╰────────────────────────────────────────────────────╯

    -----BEGIN PGP SIGNATURE-----

    iQIzBAEBCgAdFiEEmwPruYMA35fCsSO/zIxr3RQD9MoFAmdAlUMACgkQzIxr3RQD 9MpDLA/+KWB5U48GgvnsPWX726G4EiWjVhHRjKVc5cAWFYmP4vuP9DoEVGA23fMv G7N/C+cM6lX+vzB0Pq4Y9kxdBhyCDJkR37XUQx4pniNsdVOBFjK7n3dN4z0bfhrM Q3pPR+iarmjSGCGyVTh1C7cyGzsQZ5SM8wAohSLcIeaC/8uL2gwn2KuMiHLp0+aC pkmXXynohBw4LPR97bYYCY4kkfd1zHA+uktET/X1sw70z1QjsBK+Jex11aZu9AP6 7ZaOqikF88QLYKdmg3N+HlNMdngBXCTCLS72lOPShOvHtLWORJxhtXji/6NQSQAG w2Y0vmXhi3sex/+WvL+ai3Y1/XVQTzU0lCBqrT3lJN/
  • From Marc Haber@21:1/5 to All on Fri Nov 22 17:40:01 2024
    On Fri, Nov 22, 2024 at 03:29:24PM +0100, Timo Röhling wrote:
    I have no experience with bidirectional attacks, but browsers mitigate homograph attacks in IDNs by disallowing mixed alphabets such as cyrillic
    and latin letters in the same name. That seems to be a reasonable
    restriction for user names as well.

    I am not willing to implement that myself in adduser. I will accept code
    and test cases written by others, but this is a thing that goes beyond
    my resources. Additionally, it won't help since an attacker can directly
    write to /etc/passwd.

    Homograph attacks would be best mitigated in software reading
    /etc/passwd, alerting in their output or logs that the user name they
    just printed was composed of strange alphabets.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Fri Nov 22 20:50:01 2024
    Hi Marc,

    Marc Haber, on 2024-11-22:
    I might be naive here , but I don't have much experience with non-ascii
    names since I have the privilege of being fluent in two languages that
    use the latin alphabet.

    I am not sure whether I am the intended audience here, because
    my name is almost Ascii based. That being said, I happen to
    have one weird enough latin based character as the first letter
    in my first name, that it gives interesting results when thrown
    toward random databases. Thus I do happen to have some thoughts
    about this topic.

    On the other side, wouldnt it be a courtesy to allow people having a
    name that needs transcription to be written in latin to use their name
    in the real alphabet that it is usually written in as a login name as
    well? To make things worse, transcriptions are often ambigious.

    I would like to hear the opinion of people who would be affected by this change.

    I tried to consider what it would take to have an émollier or an
    Émollier login, and there is one little blocker : I may have to
    login from environments or keyboards lacking the necessary i18n
    and l10n capabilities to transcribe the 'e' acute, let alone the
    uppercase 'e' acute. For example, I hit this particular issue
    when populating the Gecos field from the Debian installer
    environment: if I choose a Qwerty US configuration but miss the
    step to choose which Qwerty US internationalized variant I want
    to use, then I don't get to type uppercase 'e' acute, but there
    are many other situations unrelated to d-i or even Debian where
    I run into that. For this practical reason, I tend to feel
    better about keeping a full Ascii login name. I wouldn't feel
    strongly if unicode support for login never happens. I believe
    however that the Gecos is the right place to store the properly
    typed-in person name, because it is a "presented" name that
    hasn't the technical coupling that the login name has, and I
    would probably have stronger feelings if it were to not have
    unicode support.

    You probably want to have some more thoughts, especially from
    people with entirely non latin character names. Having a latin
    name, I accomodate perhaps too well of a full Ascii login.

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/1, please excuse my verbosity
    `-

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdA3ooACgkQeTz2fo8N Edp4JQ//U/KOqcnutmqGARGiQKUvUpWt1otWn7qiv6IYEaX/PPVwcen0T3/BzjbJ zWL7CO5IeY3sRk9nL4E9vldU8DnniUhR+MjZt1UBhYQlbxFcaFG/r5aXdsD/aS0q KE2pTY4aIwUsVIZL3k5ZDLGJCOFXpSAwRJB9YfqSPzkuw3DkIzCfbjAdbm5t8jkI dGTuK1KwwC0TWWbrd1wTgDx1toKKCZZlNKF7I55Fe8OZxz+bD7st82jH3sZN8iDi hqmbLDNObGKiKgYB3GFru9AoZQ1MF/9IBa10An1PRGrjcxCh3AN1+qFPAY4NMC+Z vxvI4HFVrb+ndmta9LDhmUpdg3YOOf8SkKxDKdZ/3XQR/D0RqMXQ11PNcQEnehra jlN2LX5eCz8eRa7Ry0jD02Wy2FUCWop6kzvuqdd0EIu1nU3j5qm+ivob24ksubgc D2wHWDY/RyoLUp29Du7CJi7zDg1in0fv2n8o+eg+UJDR8uWKTphmMfPFQaHPQ090 7jv5549Jf5IpApyD8uMJDsVztPp4OPMzyEgNoKAFdy24qWr/7g0JLBx34hzSCqzG 2pwhuEakGph68tQJXegqC7AcaT6wMidRp7Nms2qmyLyaLNqeXjhtH5GXo0ldlGwo AKapuxuk2Ykf16qqppHOe5TJmyC6URdfjVvid+KJZjKJOSBlqN8=
    =X/J7
    -----END PGP SIG
  • From Gioele Barabucci@21:1/5 to All on Fri Nov 22 22:10:01 2024
    On 22/11/24 20:42, Étienne Mollier wrote:
    I tried to consider what it would take to have an émollier or an
    Émollier login, and there is one little blocker : I may have to
    login from environments or keyboards lacking the necessary i18n
    and l10n capabilities to transcribe the 'e' acute, let alone the
    uppercase 'e' acute.

    Dear Étienne,

    your case highlights another problem not mentioned in the original list
    posted by Marc: comparison (and normalization).

    Some characters can be encoded in more than one way. For instance, "é"
    in "émollier" could we stored as "e with acute" U+00E9 (and encoded in
    UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus
    U+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the
    former sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive
    comparison will not find the user "émollier" in the system. Unicode
    defines in Annex 15 a few normalization forms as a way to work around
    this problem. But a correct use of these normalization forms still
    requires coordination and standardization among all programs accessing
    the data.

    Does POSIX (or other de-facto standards) prescribe a normalization form
    for Unicode-/UTF-8-encoded usernames?

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Pentchev@21:1/5 to Gioele Barabucci on Sat Nov 23 00:40:01 2024
    On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote:
    On 22/11/24 20:42, Étienne Mollier wrote:
    I tried to consider what it would take to have an émollier or an
    Émollier login, and there is one little blocker : I may have to
    login from environments or keyboards lacking the necessary i18n
    and l10n capabilities to transcribe the 'e' acute, let alone the
    uppercase 'e' acute.

    Dear Étienne,

    your case highlights another problem not mentioned in the original list posted by Marc: comparison (and normalization).

    Some characters can be encoded in more than one way. For instance, "é" in "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the former sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive comparison will not find
    the user "émollier" in the system. Unicode defines in Annex 15 a few normalization forms as a way to work around this problem. But a correct use of these normalization forms still requires coordination and standardization among all programs accessing the data.

    Does POSIX (or other de-facto standards) prescribe a normalization form for Unicode-/UTF-8-encoded usernames?

    POSIX says "if you want your applications to be portable, do not use any
    funny characters in usernames":

    https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_409

    3.409 User Name

    A string that is used to identify a user; see also 3.407 User Database.
    To be portable across systems conforming to POSIX.1-2024, the value is
    composed of characters from the portable filename character set.
    The <hyphen-minus> character should not be used as the first character
    of a portable user name.

    For people unfamiliar with POSIX terms, the portable filename character
    set is defined as:

    https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265

    The set of characters from which portable filenames are constructed.

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
    a b c d e f g h i j k l m n o p q r s t u v w x y z
    0 1 2 3 4 5 6 7 8 9 . _ -

    The last three characters are the <period>, <underscore>, and
    <hyphen-minus> characters, respectively.

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdBFYsACgkQZR7vsCUn 3xNx1w/+IIBgnNzuCmXEIolTfg+daa794rU/O2cwJSs/R4N9mH0jcIc3RT5v7xmx vehZZpxrLgFpVJyYyx6yx/dBWc8D/yPuEWqaikAYzQNFsr8H8J1r9XlPMopyKuVG YKgNR+89sgQbtC2gOiMiGdx+mU5qiE3KavLxI9cmBdR5V6WT3e7OODvvD4ySnzH1 P0/PHdjX3QeaCnNcrScSnhuG7FvnXJPbKf2G3UwAWXEn6/jNPrDXinsatC5GSNco ToAbWWkhuxiiAlEBo1Fxe0lLggDETerA1Iu/eAfufRLQ9Mq1TnuKrGepvODTtEEO TzXZWjSyd4G2lxSEHDg3LFl4qmVNZJKB5cz8kXi1kVijyLfRibHDy/mz/U6Pzfc6 i5YqY70jWpR0CLKJPpUIikNVbmV9hypNZ5iyTzw1WlU/CllAmvHqWnZVIwGt5qCp ROu5wnprdbudiGUJmZeh07FYKLQr0rMHZsGEj7jHGAJ6GCNoUSvYoBbKdQKlsp8b tN4hPPQD316rNYtxjKkVkvuFAFoV6Pxdve/O/zGwwdMuXjqvZ2fH9uDTaS6KlxPK Uu2/218wDxlbay0ua4GjSb0AK3A0cHEJsp9v3bQPK2+pmI5t6up98KUULMyHjn3k nseqQggvyRNHa/V1YkplDImCuMY5sW/+XEviPFSABDUUQSjBkRs=
    =pSLM
  • From Johannes Schauer Marin Rodrigues@21:1/5 to All on Sat Nov 23 09:40:01 2024
    Quoting nick black (2024-11-23 08:48:10)
    You now have glyphs which occupy more than one column. Are your columnar/tabular programs prepared for that? ﷽𒁭𒐫i

    xfce-terminal renders this like this: https://mister-muffin.de/p/4o2v.png

    No idea if this is correct and I'll leave the details to those who know more about this topic than I. And maybe my email client completely messes this up in this response of mine.

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters because that's all we have after half a century of Computers being a thing...

    If having this work in Debian can put a bit on pressure on those software projects that do not support this, then please let that happen so that missing unicode support becomes more annoying for those pieces of software that are missing it. For example, if my email client messed this up, then lets fix it. We cannot find these kind of bugs if we accept translating everybody's given name to the American alphabet.

    Thanks!

    cheers, josch
    --==============G73946724612237309=MIME-Version: 1.0
    Content-Transfer-Encoding: 7bit
    Content-Description: signature
    Content-Type: application/pgp-signature; name="signature.asc"; charset="us-ascii"

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEElFhU6KL81LF4wVq58sulx4+9g+EFAmdBkx0ACgkQ8sulx4+9 g+GXSA/8Cc2tq1eMzQw5/p2W3WAmlro7vBXxIM5LKwcRLLFEI1uYKBy0rXlWN40G mzD0w2YyY2ZmVaSp6kRwo2gZExvNioGXsjxpKFz5b7QeDODk1hqx0G7qzswKnS+u DCDmfL7GGHci1wDXq4O9uoAO916oA6uZwJcMP09/IpYu3Z+s2QuinHcWbxEAHrS6 +b0H5pZRbBv+nqJw3WimgPV+GBGFvpnM1sqZ3+dJ+jt+QdVuQ5vMd8qSz4f+K6TM AqcsHrAmvYkpDHFleBUOD2QIhVzANjc46/G8bHe36SVkGBKLV2CCwiBsTZqB0nzS sdo6x78yZO6cMGfF8FvlEmMeRCquIjePtofIulLaOheWCgJqPs4VQ9tgVZ0AzWJz MeAN9nTWVca31mzFMauE0o0A28HD2u9lmy3izZB2b+1IMxJwc3TaB9kiOCxQVWsf hb5mv1VNSiMZOBMkllsToxiVsF8/OKUHByF0iUeJ2q3Qff67KKB9fH149dYorIOv Pkr/la8HmScWcmsRFNvWEzysgqvWTS9jp8/xFei5cVeE0bI/liKvr5kSw3X0sv1Y aZ9XeYgDDmomBo6ntRcEgmZYvrkscEaHrxstyIaGOQbdIVg46vcAOSRQBzfht7w/ oS1NyOB9hnaUyLymRqUvvUuKZ7o7vALtDeIYz4LefsXgdCG+lUc=
    =08Qt
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Johannes Schauer Marin Rodrigues on Sat Nov 23 13:00:01 2024
    On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:
    But my 2 cents on the topic are: Lets please allow more than ascii in usernames.

    Yes please, but opt-in and behind a big red warning that says that it is
    not interoperable (outside POSIX), potentially insecure (homographs) and
    at high-risk of breaking existing applications (lack of standardized normalization form).

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Johannes Schauer Marin Rodrigues on Sun Nov 24 11:50:01 2024
    Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    You are confusing usernames and names. Different concepts with
    different rules. Let's just hope you never get two students with the
    same name.


    Bjørn

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Marc Haber on Sun Nov 24 12:00:01 2024
    Marc Haber <mh+debian-devel@zugschlus.de> writes:

    On the other hand, as long as this is admin-controlled, it doesn't
    matter much. I could see that viewpoint, but I wonder how much latent
    breakage would be introduced that will take years to fix in all tooling
    and all packages.

    Yes. Fixing breakage makes software better, and by disallowing non-latin characters in user names we are hiding those issues away.

    This is arrogant. Assuming that a username can be displayed, sorted,
    compared and typed using strict us-ascii is not a bug today. It's not
    "hiding" any issue.

    The question is whether it makes sense to introduce a new class of bugs
    by changing the rules. And we can pretty much guarantee that some of
    those bugs are securty critical, since this is all about authentication
    and authorization.

    Knowingly introducing security bugs does not sound like a good idea.

    For what purpose?


    Bjørn

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to nick black on Sun Nov 24 12:30:01 2024
    On 24/11/24 10:43, nick black wrote:
    Gioele Barabucci left as an exercise for the reader:
    On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:
    But my 2 cents on the topic are: Lets please allow more than ascii in
    usernames.

    potentially insecure (homographs) and at
    high-risk of breaking existing applications (lack of standardized
    normalization form).

    i'm not sure why this is being repeated.

    https://unicode.org/reports/tr15/

    Dear Nick,

    You may have misunderstood that phrase. I was not referring to the fact
    that there are no standardized normalization forms for Unicode (I
    explicitly mention Annex 15 in [1]), but to the fact that there is no
    standard that specifies which of the possible normalization forms should
    be used for account names (and other fields in passwd).

    POSIX explicitly limits itself of a subset of ASCII, so it is not going
    to mandate any normalization form. Are there other standards (or
    initiatives) in this area that you know of?

    Regards,

    [1] https://lists.debian.org/debian-devel/2024/11/msg00305.html

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to All on Sun Nov 24 13:30:01 2024
    On 2024-11-24 11:44:45, Bjørn Mork wrote:
    Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
    that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    You are confusing usernames and names. Different concepts with
    different rules. Let's just hope you never get two students with the
    same name.

    I wanted to reply to Johannes, but I didn't exactly how to phrase it -
    you did it perfectly.

    I still don't understand the need for username to be very
    representative of one's name. OTOH, my name can be fully written using
    ASCII, so maybe I miss something. But I've also had to use accounts like abc745, which didn't bother me much over the duration of a semester or
    year.

    regards,
    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Sun Nov 24 14:40:01 2024
    * Bjørn Mork <bjorn@mork.no> [241124 11:45]:
    Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
    that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    You are confusing usernames and names. Different concepts with
    different rules. Let's just hope you never get two students with the
    same name.

    I find your reply massively insulting, and I'm not even the original
    author.

    Usernames (not the "comment" field) are identifiers, and humans care
    about the identifiers used for them.

    Yes, some humans don't care if you assign them a random 32byte
    string as their username. Enough humans however, do have
    preferences. In some countries humans even have a right to choose
    how they are being adressed.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to Chris Hofstaedtler on Sun Nov 24 14:50:02 2024
    On 2024-11-24 14:37:24, Chris Hofstaedtler wrote:
    * Bjørn Mork <bjorn@mork.no> [241124 11:45]:
    Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
    that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    You are confusing usernames and names. Different concepts with
    different rules. Let's just hope you never get two students with the
    same name.

    I find your reply massively insulting, and I'm not even the original
    author.

    Massively?

    Usernames (not the "comment" field) are identifiers, and humans care
    about the identifiers used for them.

    Yes, some humans don't care if you assign them a random 32byte
    string as their username. Enough humans however, do have
    preferences. In some countries humans even have a right to choose
    how they are being adressed.

    And what relation does the username used for logging in have to "being addressed"? Isn't it akin a passport/ID card number?

    regards,
    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Iustin Pop on Sun Nov 24 15:30:01 2024
    On Thu, 21 Nov 2024 at 23:26:48 +0100, Iustin Pop wrote:
    As Richard also replied, full UTF-8 is tricky, and I think it's somewhat misplaced to focus on the username, as opposed to gecos. Aren't most
    other OSes using the "full name" as the "display name", and the username
    is mostly one part of the user/password combination, but not a display property most of the time?

    So I would suggest that maybe the better option is to standardise the
    gecos format/gecos parsing, so migrate UI tools to use that more often.

    As a data point, in our default GNOME desktop, System Settings (gnome-control-center) prompts for a "Full Name" first (behind the
    scenes that's the full name part of the pw_gecos field), and a "Username" second (this is the pw_name); and the default display mode for the
    gdm3 login prompt is to show a list of full names from pw_gecos.

    My understanding is that the full name already allows arbitrary UTF-8,
    except for the characters that can't be represented in passwd(5) syntax
    (colon, comma, newline) and the ampersand.

    Outside the Linux/GNU/freedesktop worlds, this is fairly similar to how
    macOS presents the distinction between the display name and the Unix
    username (pw_name). macOS is interesting here because it's an operating
    system with a lot of Unix ancestry, but has also had a lot of effort put
    into making it friendly for non-technical users.

    In the macOS world, it seems to be conventional and encouraged to set the username to a lower-case ASCII string with no punctuation, similar to the conventions in POSIX and <https://systemd.io/USER_NAMES/>.
    Unfortunately I haven't been able to find a reference for what characters
    macOS allows in pw_name. Perhaps a DD who has a macOS system (or a family member with a macOS system) could help here?

    I think one good idea that we should certainly adopt from <https://systemd.io/USER_NAMES/> is its separation between "strict mode"
    (the naming convention that it encourages for all uses, and enforces
    when a user is created via systemd tools) and "relaxed mode" (the much
    less strict naming convention that systemd requires for names created by non-systemd tools). Because of the differences between those two modes,
    systemd is quite conservative in what its own tools will emit but a
    lot more liberal in what it will accept, and that seems like a good
    principle here, even if the specific rules that Debian chooses end up
    differing from those that systemd has chosen.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Sun Nov 24 15:30:01 2024
    * Iustin Pop <iustin@debian.org> [241124 14:41]:
    On 2024-11-24 14:37:24, Chris Hofstaedtler wrote:
    * Bjørn Mork <bjorn@mork.no> [241124 11:45]:
    Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
    that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    You are confusing usernames and names. Different concepts with
    different rules. Let's just hope you never get two students with the same name.

    I find your reply massively insulting, and I'm not even the original author.

    Massively?

    Yes.

    Usernames (not the "comment" field) are identifiers, and humans care
    about the identifiers used for them.

    Yes, some humans don't care if you assign them a random 32byte
    string as their username. Enough humans however, do have
    preferences. In some countries humans even have a right to choose
    how they are being adressed.

    And what relation does the username used for logging in have to "being addressed"? Isn't it akin a passport/ID card number?

    No. I see and type my username hundreds times a day, people use it
    to address me in written and spoken conversations with it, etc.

    If it were my uid, which I see maybe once a week and don't have to
    remember, I wouldn't care.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Giuseppe Sacco@21:1/5 to All on Sun Nov 24 15:40:02 2024
    Hi all,

    Il giorno dom, 24/11/2024 alle 13.20 +0100, Iustin Pop ha scritto:
    [...]
    I still don't understand the need for username to be very
    representative of one's name. OTOH, my name can be fully written
    using
    ASCII, so maybe I miss something. But I've also had to use accounts
    like
    abc745, which didn't bother me much over the duration of a semester
    or
    year.

    It is true that user account name and user (display) name are
    different, of course. But still, when you log in, you use the user
    account name to the access system; this is the text shown in file
    ownership listing and almost everywhere in the system.
    I think that user (display) name, that may be put in gecos field, are
    not widely used. Moreover, adduser man page on Debian stable, states
    that gecos fields will be removed after bookworm.

    So, having a good account user name is an important thing. And we have
    to chose if it should be "good" for the computer (like in: unique,
    lowercase, short, US-ASCII, etc.) or if it should be "good" for the
    real user. In the latter case, I would accept a broader class of
    strings for the very simple reason that it should be left to user
    preference.

    I checked what other systems do:

    Windows[0] accept any characters, except " / \ [ ] : ; | = , + * ? < >,
    and allow for 64 characters (or bytes, I am unsure on this).

    SunOS has these restrictions[1] "a string of no more than thirty-two
    bytes consisting of characters from the set of alphabetic characters,
    numeric characters, period (.), underscore (_), and hyphen (-). The
    first character should be alphabetic and the field should contain at
    least one lowercase alphabetic character"

    In LDAP[2] the uid field is a "Directory String"[3], so any non zero
    length UTF8 text. There is a note: Servers and clients MUST be prepared
    to receive arbitrary UCS code points, including code points outside the
    range of printable ASCII and code points not presently assigned to any character.

    FreeBSD[4] suggest to "use user names that consist of eight or fewer,
    all lower case characters in order to maintain backwards compatibility
    with applications." But the real syntax[5] is: login name must not
    begin with a hyphen (`-'), and cannot contain 8-bit characters, tabs or
    spaces, or any of these symbols: `,:+&#%^()!@~*?<>=|\/";'. The dollar
    symbol (`$') is allowed only as the last character for use with Samba.
    No field may contain a colon (`:') as this has been used historically
    to separate the fields in the user database.

    IBM AIX has these rules[6]: must not begin with a hyphen (-), plus sign
    (+), at sign (@), or tilde (~). Additionally, do not use any of the
    following characters within a user-name string: :"#,=\/?'`
    Finally, the login parameter cannot contain any space, tab, or newline characters.

    On HP-UX user names are restricted[8] to eight characters and group
    names to 16 character ut you may change limits up to 254 characters.
    Anyway, it must start with a letter.

    Kerberos syntax for principal[9] is GeneralString constrained to
    contain only characters in IA5String (so, basically US-ASCII 7 bits),
    with this note: US-ASCII control characters should not be used.

    So, I think any sequence of unicode "printable" letters should be
    allowed. It may be encoded in UTF-8 or other encoding, but I think UTF-
    8 is the best encoding since in includes the US-ASCII 7 bit chars.
    About the meaning of "printable", probably this means a few unicode categories[7] should be included: lowercase letter, uppercase letter,
    decimal number, plus a few symbols (hyphen, period, plus, at sign, and underscore at minimum).

    Bye,
    Giuseppe

    [0]https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/bb726984(v=technet.10)?redirectedfrom=MSDN
    [1]https://docs.oracle.com/cd/E88353_01/html/E37852/passwd-5.html#REFMAN5passwd-5
    [2]https://www.rfc-editor.org/rfc/rfc4519#section-2.39 [3]https://docs.ldap.com/specs/rfc4517.txt [4]https://docs.freebsd.org/en/books/handbook/basics/#users-synopsis [5]https://man.freebsd.org/cgi/man.cgi?query=passwd&sektion=5&format=html [6]https://www.ibm.com/docs/en/aix/7.2?topic=u-useradd-command [7]https://www.compart.com/en/unicode/category [8]https://support.hpe.com/hpesc/public/docDisplay?docId=c01922594&docLocale=en_US
    [9]https://www.rfc-editor.org/rfc/rfc4120#section-5.2.1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?QsOhbGludCBSw6ljemV5?=@21:1/5 to All on Sun Nov 24 16:00:01 2024
    Hi Johannes,

    Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> ezt írta
    (időpont: 2024. nov. 23., Szo, 9:32):

    Quoting nick black (2024-11-23 08:48:10)
    You now have glyphs which occupy more than one column. Are your columnar/tabular programs prepared for that? ﷽𒁭𒐫i

    xfce-terminal renders this like this: https://mister-muffin.de/p/4o2v.png

    No idea if this is correct and I'll leave the details to those who know more about this topic than I. And maybe my email client completely messes this up in
    this response of mine.

    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    I had students as well with many of them having accents in their name,
    like myself and never had this kind of discomfort before.

    If any time it occurs to me, I'll remind myself that also deeply
    personal birthdays are shown as Arabic numerals instead of Roman ones
    which would look way cooler, and also use the base 10 encoding instead
    of base 60 which encoding was widely used by Sumers.


    If having this work in Debian can put a bit on pressure on those software projects that do not support this, then please let that happen so that missing
    unicode support becomes more annoying for those pieces of software that are missing it. For example, if my email client messed this up, then lets fix it. We cannot find these kind of bugs if we accept translating everybody's given name to the American alphabet.

    Please don't open this can of worms and impose pointless work on
    upsteams. Keep what works reasonably well for decades.

    Cheers,
    Balint

    PS: The mandatory relevant Monty Python sketch: https://www.youtube.com/watch?v=6cKsBe3on5g


    Thanks!

    cheers, josch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Chris Hofstaedtler on Sun Nov 24 16:40:02 2024
    Chris Hofstaedtler <zeha@debian.org> writes:

    No. I see and type my username hundreds times a day, people use it
    to address me in written and spoken conversations with it, etc.

    This is confusing the subject even more.

    Are you sure you are talking about usernames? Or is this email local
    parts, chat nicknames and spoken nicks? If so, then there is no reason
    you can't use utf8. Today. Without changing any username.

    It's also possible to modify $PS1 if seeing \u without utf8 is annoying.


    Bjørn

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Giuseppe Sacco on Sun Nov 24 17:40:02 2024
    On Sun, 24 Nov 2024 at 15:37:36 +0100, Giuseppe Sacco wrote:
    Moreover, adduser man page on Debian stable, states
    that gecos fields will be removed after bookworm.

    No, it says the --gecos *option* will be removed after bookworm,
    replaced by --comment, which seems to be another name for the same thing: passwd(5) "user name or comment field" = struct passwd's pw_gecos,
    as can be edited by chfn(1).

    The field containing the user's full name, and a way to edit it, are
    definitely something that should stay.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Kern@21:1/5 to All on Sun Nov 24 18:30:01 2024
    On Sun Nov 24, 2024 at 4:03 PM CET, Bjørn Mork wrote:
    Chris Hofstaedtler <zeha@debian.org> writes:

    No. I see and type my username hundreds times a day, people use it
    to address me in written and spoken conversations with it, etc.

    This is confusing the subject even more.

    Are you sure you are talking about usernames? Or is this email local
    parts, chat nicknames and spoken nicks? If so, then there is no reason
    you can't use utf8. Today. Without changing any username.

    In many organizations the email local part matches the username[1] and it
    is also used in spoken conversations. To the point where I needed to
    make clear on internal yellow pages that I would prefer not to be called "pkern" in spoken conversation, thank you very much.

    So yes, usernames are pretty much used in spoken conversation. Many do
    not actually understand what a username is and think that it reflects
    how someone wants to be called - as their default assumption.

    Kind regards
    Philipp Kern

    PS: My personal, ignorant, Latin-world opinion is that it is probably
    too hard for most people to type each others' usernames if UTF-8 were to
    be allowed. And I would never ever use UTF-8 in a local part. And I
    suffered a bit too much recently looking at differences between byte
    count and character count.

    [1] Referred to as "LDAP" in mine, which is both funny and sad.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Sun Nov 24 21:20:02 2024
    On Sun, Nov 24, 2024 at 11:58:44AM +0100, Bjørn Mork wrote:
    Marc Haber <mh+debian-devel@zugschlus.de> writes:
    On the other hand, as long as this is admin-controlled, it doesn't
    matter much. I could see that viewpoint, but I wonder how much latent
    breakage would be introduced that will take years to fix in all tooling
    and all packages.

    Yes. Fixing breakage makes software better, and by disallowing non-latin characters in user names we are hiding those issues away.

    This is arrogant.

    That was not my intention. I apologize for that.

    Assuming that a username can be displayed, sorted,
    compared and typed using strict us-ascii is not a bug today. It's not "hiding" any issue.

    I have to disagree. Our tools allow creating UTF-8 usernames today, and
    even if they did it would be possible to just edit /etc/passwd.

    The question is whether it makes sense to introduce a new class of bugs
    by changing the rules. And we can pretty much guarantee that some of
    those bugs are securty critical, since this is all about authentication
    and authorization.

    So we're having these bugs right noow. If you can use adduser or useradd
    to create such accounts, then you have the privilege of putting them
    directly into /etc/passwd as well. /etc/passwd is a well-defined and
    documented interface.

    For what purpose?

    Being friendly to people who can't properly write their names in latin.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Johannes Schauer Marin Rodrigues on Wed Nov 27 17:00:01 2024
    On Sat, Nov 23, 2024 at 09:32:32AM +0100, Johannes Schauer Marin Rodrigues wrote:
    But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters
    because that's all we have after half a century of Computers being a thing...

    In Debian stable, they can already try.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Gioele Barabucci on Wed Nov 27 17:00:01 2024
    On Sat, Nov 23, 2024 at 12:53:52PM +0100, Gioele Barabucci wrote:
    On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:
    But my 2 cents on the topic are: Lets please allow more than ascii in usernames.

    Yes please, but opt-in and behind a big red warning that says that it is not interoperable (outside POSIX),

    adduser requires an option to allow such user names. I think that some
    peoples might find it offensive to require an option to be allowed their
    native names. You're arguing to not relax the requirement for plain
    adduser <username>, right?

    potentially insecure (homographs) and at
    high-risk of breaking existing applications (lack of standardized normalization form).

    Can you outline an attack/failure scenario?

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Philipp Kern on Wed Nov 27 17:00:01 2024
    On Sun, Nov 24, 2024 at 06:06:23PM +0100, Philipp Kern wrote:
    PS: My personal, ignorant, Latin-world opinion is that it is probably
    too hard for most people to type each others' usernames if UTF-8 were to
    be allowed.

    Why would anybody need to type somebody else's user name despite in
    "su"? I see it as the exception that local parts of mail addresses do
    1:1 map to a UNIX user name.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Wed Nov 27 16:50:01 2024
    Hi nick,

    On Sat, Nov 23, 2024 at 02:48:10AM -0500, nick black wrote:
    Marc Haber left as an exercise for the reader:
    (1)
    Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should)

    I feel strongly yes, despite POSIX admonitions (quoted elsewhere
    in this thread) and sure breakage any number of places.

    Thank you, noticed.

    I think
    a test plan would be very desirable (off the top of my head,
    we'd want to check login, the DMs, PAM, OpenSSH, passwd, w,
    framebuffer console input, etc. It would probably also be a good
    idea to loop in other distributions.

    Coordinating this test is way beyond what I have available in resources,
    most notably time. Our tools have been allowing UTF-8 user names at
    least since bookworm (I don't have any bullseye systems left, buster's
    adduser does not allow UTF-8). So we are already testing this in a
    stable release (albeit unplanned).

    Please note that allowing UTF-8 user names by default does not break compatibility in any place where only 7bit user names are being used.
    Debian is not using such user names in anything that we ship. We only
    allow them.

    Actually _doing_ this is still the local admin's decision. And should
    they decide to not want this, adduser can be configured to disallow.

    This thread is mainly about whether we should disallow things in next
    stable that are possible in current stable. I think we need good reasons
    for that, and I ain't seeing any right now.

    I recommend Chapter 7 of my free book, "Hacking the Planet with
    Notcurses: A Guide to TUIs and Character Semigraphics" for the
    full story (as I understand it) regarding Unicode presentation: https://nick-black.com/htp-notcurses.pdf (starts on page 41).

    Noted for reading.

    * any upstream tool could say "bad idea" and refuse patches,
    requiring their long term management,

    Depending of how important this tool is, we could get away without
    patching and probably not even documenting this failure.

    * the Linux framebuffer console is pretty limited in what
    glyphs it has available, and the number of glyphs it can
    support,

    Probably, yes. But people working on the Linux framebuffer console are
    unlikely to actually use UTF-8 user names, so the only really bad
    situation would be a rescue situation. We could get away with
    documenting "please use 7bit only user names for accounts that are
    likely to be used in system rescue situations".

    * you want installer support if you intend to do this right,

    The installer currently allows me to type UTF-8 user names in the entry
    fields (and even displays them correctly when one goes through the
    dialogs a second time), but rejects them with a sanitation error message
    ("The username you entered is invalid. Note that usernames must start
    with a lower-case letter, which can be followed by any combination of
    numbers and more lower-case letters, and must be no more than 32
    characters long.") which is incorrect, it should be "lower-case us-ascii letters". From a German point of view "jürgen" conforms to the rules
    given in the error message.

    * ubiquitous input for UTF-8 is a pretty complicated story, and

    Sites using such letters in user names should know which of them can be
    typed.

    * broken localization (or failure to call setlocale()) could be
    a bigger problem, especially for root/system accounts.

    I don't think we should allow UTF-8 charactes in the string "root" or in
    system account names. And if a local admin decides to do so, Debian
    packages should still restrict themselves to using US-ASCII in their
    system accounts.

    Other concerns:

    You'll likely now be linking libunistring into some
    binaries where it wasn't previously used.

    Probably, yes. I hope to get away in adduser without that, since I'd
    like to keep adduser's dependencies minimal (it's being used in the
    installer).

    Regarding the subset of Unicode characters you'd want to allow,
    this would be best decided using the General Category trait.
    Each codepoint is assigned one of a finite set of General
    Categories. We would probably want to allow Letters, Marks, and
    Numbers, and perhaps a whitelist from Punctuation and Symbols
    (Punctuation, connector and Punctuation, dash are probably all
    we'd want) extended from currently supported ispunct(3)
    characters. This data is available from libunistring (and
    probably other places). This eliminates a great swatch of known
    security issues.

    Do you have a suggestion for a perl regexp that allows this? My current development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".

    Names containing invalid UTF-8 sequences ought be rejected.

    Agreed. How do I check for this in perl?

    Characters 0-127 would presumably be allowed iff they are now;
    UTF-8 preserves US-ASCII.

    I'd rather allow 32-127 only.

    We ought support combining characters up through the Extended
    Grapheme Cluster (a single user-perceived character, roughly a
    glyph, made up of one or more encoded characters). Generally a
    single backspace ought map to an entire EGC.

    This is beyond my knowledge of Unicode.

    Regarding canonicalization/normalization, this is a complex
    question without a necessarily correct technical answer. I think
    you'd want to follow the Principle of Least Astonishment; as to what
    would astonish the least, I'd like to hear wider input. But
    Unicode definitely defines multiple normal forms and equivalency
    classes.

    I am not sure whether we need this. A local admin is likely to be
    consistent to herself in creating user names.

    You now have glyphs which occupy more than one column. Are your columnar/tabular programs prepared for that? ﷽𒁭𒐫

    Probably not. If that's important for a local admin, they can disallow
    such characters and maybe even file a patch against adduser.

    Quoting the character just out of curiosity.

    (2)
    If the answer to (1) is "allow UTF-8", should we also do that for system users? (I think no, we should not)

    I think you should, simply because otherwise you have two paths
    in more places.

    Adduser already has different code paths for normal and system accounts.

    (3)
    I think that 32 characters/bytes (it's the same if we don't allow UTF-8)
    is a good limitation for a system user name. But, should we increase
    that for regular user names? (I think yes)

    I hesitate to comment here because who really cares, but does 32
    save us something over 128? 128 seems the default "enough for
    everybody" these days, looking at IPv6 and ZFS.

    systemd argues that > 32 characters are rarely supported in "older and unmaintaind" utilities.

    My printer is administered by i̸̒n̴͛e̵̎l̴͝u̷̾c̴̉t̵́å̵b̷͋l̷͐e̴̋m̸̆o̷̚d̴̐ä̸́l̶͝i̷̋t̷͗ẏ̷ȏ̵f̸̃t̶͘h̷͗e̴̿v̶͘i̷̛s̸̈́ì̵b̷̃l̶̎e̷͊.

    That really renders strangely here.

    (6)
    Does it still make sense to give non-UTF-8-locales special handling
    (which one?), or can adduser safely assume that any non-ascii locale is UTF-8? Or must I check for locale and reject UTF-8 user names on
    non-UTF-8 locales? (I hope that we can safely assume UTF-8)

    It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a
    properly set LANG and programs calling setlocale(). This, as
    alluded to above, has the potential for a big mess.

    Our default is C.UTF-8 and has been like that for a while.

    Greetings
    Marc


    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Simon McVittie on Wed Nov 27 17:30:02 2024
    On Sun, Nov 24, 2024 at 02:19:51PM +0000, Simon McVittie wrote:
    I think one good idea that we should certainly adopt from <https://systemd.io/USER_NAMES/> is its separation between "strict mode"
    (the naming convention that it encourages for all uses, and enforces
    when a user is created via systemd tools) and "relaxed mode" (the much
    less strict naming convention that systemd requires for names created by non-systemd tools). Because of the differences between those two modes, systemd is quite conservative in what its own tools will emit but a
    lot more liberal in what it will accept, and that seems like a good
    principle here, even if the specific rules that Debian chooses end up differing from those that systemd has chosen.

    Yes. Especially we need to note that systemd strict mode is even
    stricter than what we currently allow for system accounts. I also don't
    like that this is not configurable, especially regarding systemd-homed
    which affects the account names of regular users.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Giuseppe Sacco on Wed Nov 27 17:30:02 2024
    Hi,

    On Sun, Nov 24, 2024 at 03:37:36PM +0100, Giuseppe Sacco wrote:
    It is true that user account name and user (display) name are
    different, of course. But still, when you log in, you use the user
    account name to the access system; this is the text shown in file
    ownership listing and almost everywhere in the system.
    I think that user (display) name, that may be put in gecos field, are
    not widely used.

    I think this differes between GUIs and DEs (which are more likely to use
    the display name) and the console (where the user name is used).

    Moreover, adduser man page on Debian stable, states
    that gecos fields will be removed after bookworm.

    That's a misunderstanding. We're just in the process of renaming the
    --gecos option to --comment as per passwd(5) documentation. Sadly,
    passwd(5) uses "login name" instead of "user name"

    So, having a good account user name is an important thing. And we have
    to chose if it should be "good" for the computer (like in: unique,
    lowercase, short, US-ASCII, etc.) or if it should be "good" for the
    real user. In the latter case, I would accept a broader class of
    strings for the very simple reason that it should be left to user
    preference.

    I think that we should have reached a state where a properly UTF-8
    encoded string should be a good compromise between "good for the
    computer" and "good for the person". In Debian, we have a rather tightly controlled ecosystem and can take care that things don't break too
    badly.

    I checked what other systems do:

    Thank you for this tedious work. I have incorporated that into https://wiki.debian.org/UserAccountsPhilosophy to preserve the
    information.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to this advice in the documentation we on Wed Nov 27 17:40:02 2024
    On Fri, Nov 22, 2024 at 08:42:10PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-11-22:
    I might be naive here , but I don't have much experience with non-ascii names since I have the privilege of being fluent in two languages that
    use the latin alphabet.

    I am not sure whether I am the intended audience here, because
    my name is almost Ascii based. That being said, I happen to
    have one weird enough latin based character as the first letter
    in my first name, that it gives interesting results when thrown
    toward random databases. Thus I do happen to have some thoughts
    about this topic.

    All opinions are important.

    On the other side, wouldnt it be a courtesy to allow people having a
    name that needs transcription to be written in latin to use their name
    in the real alphabet that it is usually written in as a login name as
    well? To make things worse, transcriptions are often ambigious.

    I would like to hear the opinion of people who would be affected by this change.

    I tried to consider what it would take to have an émollier or an
    Émollier login, and there is one little blocker : I may have to
    login from environments or keyboards lacking the necessary i18n
    and l10n capabilities to transcribe the 'e' acute, let alone the
    uppercase 'e' acute.

    Yes. Configuring all keyboards and input subsystems in the realm of this instance of the user database in a way that all users are able to login
    are the responsibility of the local admi.

    For example, I hit this particular issue
    when populating the Gecos field from the Debian installer
    environment: if I choose a Qwerty US configuration but miss the
    step to choose which Qwerty US internationalized variant I want
    to use, then I don't get to type uppercase 'e' acute, but there
    are many other situations unrelated to d-i or even Debian where
    I run into that.

    That issue would only affect users created from the Installer, and even
    if you insist to have étienne as UID 1000, you could change to that
    after installation. I tend to classify the inability to type the
    intended user name on account creation a user error ;-)

    I always create "zgadmin" in the installer, which is my user to ssh into
    before sudoing to root if my regular account (which has a higher UID
    for historial reasons) is unavailable. I wonder whether we should give
    this advice in the documentation we are bound to write once we have
    decided to officially allow UTF-8 login names.

    For this practical reason, I tend to feel
    better about keeping a full Ascii login name. I wouldn't feel
    strongly if unicode support for login never happens.

    It is already allowed. Only its support status is unclear.

    I believe
    however that the Gecos is the right place to store the properly
    typed-in person name, because it is a "presented" name that
    hasn't the technical coupling that the login name has, and I
    would probably have stronger feelings if it were to not have
    unicode support.

    Console tools tend to ignore the gecos/comment name.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Gioele Barabucci on Wed Nov 27 17:40:02 2024
    On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote:
    your case highlights another problem not mentioned in the original list posted by Marc: comparison (and normalization).

    Some characters can be encoded in more than one way. For instance, "é" in "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 (UTF-8: 0x65 0xcc 0x81).

    That would be two distinct user names. Unless we have a widely available unicode library that can do this kind of normalization it is unlikely
    that our system utilities can take care of that. I'd like to put that responsibility on to the person who / the system that actually creates
    those user names.

    If a keyboard input system provides the former
    sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive comparison will not find
    the user "émollier" in the system.

    Currently adduser just takes the characters that come from the command
    line and encodes it into the byte stream that goes to useradd and
    library calls.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Peter Pentchev on Wed Nov 27 17:50:01 2024
    On Sat, Nov 23, 2024 at 01:36:48AM +0200, Peter Pentchev wrote:
    POSIX says "if you want your applications to be portable, do not use any funny characters in usernames":

    But we are not writing applications, we are a distribution. Anything
    that works with the software we distribute is fine.

    A string that is used to identify a user; see also 3.407 User Database.
    To be portable across systems conforming to POSIX.1-2024, the value is
    composed of characters from the portable filename character set.

    If a local admin wants their local user database (hence, /etc/passwd or
    an LDAP diretory) to work with non-Debian OSses, they need to take care
    about which accounts they create. I don't think that we should restrict
    local admins who don't need that kind of portability.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Smith@21:1/5 to Marc Haber on Wed Nov 27 21:10:02 2024
    Hi,

    On Wed, Nov 27, 2024 at 04:54:39PM +0100, Marc Haber wrote:
    Can you outline an attack/failure scenario?

    On the failure side, I did a few tests and noticed that on Debian 12 if
    I create a user with for example é in their username then I can log in
    by SSH as long as that é is encoded the same way: as utf-8 0xC3 0xA9.
    But if that é is made of the combining characters 0x65 0xCC 0x81 (as
    that one just was) then that's not the same user even if it looks the
    same.

    Upon login, the logs from sshd contain the escaped bytes but the logs from PAM and systemd-logind are in utf-8:

    2024-11-23T00:35:37.743827+00:00 arran sshd[1903006]: Accepted password for h\303\251llo from 200:d0e9:8d97:72fe:69af:eb63:7e9e:1f07 port 37396 ssh2
    2024-11-23T00:35:37.744825+00:00 arran sshd[1903006]: pam_unix(sshd:session): session opened for user héllo(uid=1001) by (uid=0)

    So, anything which parses usernames out of logs will need to be aware of
    that.

    Thanks,
    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michal Politowski@21:1/5 to All on Thu Nov 28 12:40:01 2024
    Dnia Sun, 24 Nov 2024 11:22:18 +0000, Gioele Barabucci napisał(a):
    On 24/11/24 10:43, nick black wrote:
    Gioele Barabucci left as an exercise for the reader:
    On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:
    But my 2 cents on the topic are: Lets please allow more than ascii in usernames.

    potentially insecure (homographs) and at
    high-risk of breaking existing applications (lack of standardized normalization form).

    i'm not sure why this is being repeated.

    https://unicode.org/reports/tr15/

    Dear Nick,

    You may have misunderstood that phrase. I was not referring to the fact that there are no standardized normalization forms for Unicode (I explicitly mention Annex 15 in [1]), but to the fact that there is no standard that specifies which of the possible normalization forms should be used for account names (and other fields in passwd).

    POSIX explicitly limits itself of a subset of ASCII, so it is not going to mandate any normalization form. Are there other standards (or initiatives)
    in this area that you know of?

    What about RFC 8265?
    "Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords"
    https://datatracker.ietf.org/doc/html/rfc8265

    Regards,

    [1] https://lists.debian.org/debian-devel/2024/11/msg00305.html

    --
    Michał Politowski

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Michal Politowski on Sun Dec 1 23:30:01 2024
    On 28/11/24 11:28, Michal Politowski wrote:
    POSIX explicitly limits itself of a subset of ASCII, so it is not going to >> mandate any normalization form. Are there other standards (or initiatives) >> in this area that you know of?

    What about RFC 8265?
    "Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords"
    https://datatracker.ietf.org/doc/html/rfc8265

    Thank you Michal for the pointer.

    RFC 8265 (and the associated RFC 8264 "PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols") looks exactly what all login-related programs should
    implement in order to avoid the kind of errors described in <https://lists.debian.org/debian-devel/2024/11/msg00491.html>.

    But a cursory search shows that none of the current upstreams support
    (or mention) PRECIS. (It also shows that src:precis is a Java library
    squatting a bit on that package name... :))

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to G. Branden Robinson on Mon Dec 2 09:00:01 2024
    On Sun, Dec 01, 2024 at 09:16:03PM -0600, G. Branden Robinson wrote:
    These things are ugly, which is why I suppose they haven't caught on
    despite being around for decades, but I would guess that this problem
    space is such that there are no non-ugly solutions apart from "just
    stick to ASCII", which some people find ugly in a different way.

    The issue is that we didn't stick to ASCII. You CAN use UTF-8 in user
    names and it works.

    Apologies if I missed someone bringing up and rejecting Punycode in the previous ~41 messages in this thread.

    Noone did. It doesn't make sense anyway (and I would not implement this
    in adduser), because we HAVE UTF-8 and it works. So ther alternatives
    are really

    (1) Stick with the current way, having UTF-8 work but keeping it
    undocumented, hurling any breakage on the user
    (2) Document UTF-8 as working and consider breakage a bug
    (3) Forbid UTF-8

    Greetings
    Marc


    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Mon Dec 2 09:30:01 2024
    On Mon, Dec 02, 2024 at 01:35:05AM -0500, nick black wrote:
    WTF-8 extends UTF-8 to handle
    invalid UTF-16 input.

    WTF-8 is a seriously defined encoding? I have only experienced that name
    as a mocking name for an UTF-8 string that erroneously went though UTF-8 encoding a second time (double-UTF-8).

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Mon Dec 2 09:50:01 2024
    On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:
    Marc Haber left as an exercise for the reader:
    * any upstream tool could say "bad idea" and refuse patches,
    requiring their long term management,

    Depending of how important this tool is, we could get away without
    patching and probably not even documenting this failure.

    This kind of attitude seems self-defeating. Despite being
    *strongly* in favor of this effort, I would oppose it if were
    strictly a Debian thing. We can inspire the move, but going it
    alone seems a recipe for present and future pain (think SSHing
    from/to Debian and a non-Debian machine).

    I bet that other distribtions will also allow me to useradd an UTF-8
    name today. I don't think that we have patched useradd to allow this.

    * the Linux framebuffer console is pretty limited in what
    glyphs it has available, and the number of glyphs it can
    support,

    Probably, yes. But people working on the Linux framebuffer console are unlikely to actually use UTF-8 user names, so the only really bad

    With all due respect, this seems totally unsupported by anything
    other than vibes =].

    So you think that we should be stricter than we are today?

    * broken localization (or failure to call setlocale()) could be
    a bigger problem, especially for root/system accounts.

    I don't think we should allow UTF-8 charactes in the string "root" or in system account names. And if a local admin decides to do so, Debian packages should still restrict themselves to using US-ASCII in their
    system accounts.

    Why? This would require multiple code paths for what seems to me a
    very questionable objective. You point out later in your
    response that there already exist diverging codepaths, but isn't
    unifying such things always a goal?

    I think that the distinction between system users and regular users is a
    good thing and that we should continue treating them differently.
    Strictly, it's only adduser (and useradd, UID only) having different
    code paths, the treatment in other software is identical.

    Even if we unify things (either by allowing strange characters in system
    user names, or by restricting regular user names to the western
    character set), adduser will need to keep the distinction because we
    assign UIDs from different ranges.

    Do you have a suggestion for a perl regexp that allows this? My current development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".

    I do not. This is not a regex problem in my mind and experience;
    you need full access to complicated libraries.

    Adduser will have to stick to regexes for dependency reasons.

    Any such effort
    should go through Annex 15 canonicalization before being
    inspected at all.

    I have always assumed that canonicalization would be used for sorting
    and equality, while in the databases it is important to keep the
    difference between the unit Angstrom and the capital letter A with
    circle. If we canonicalize everything, why do we have different
    codepoints for different semantics?

    Yes, I need to read your book.

    At that point, you're well past regular
    languages so far as I can tell. I do not see this goal as
    possible with small surgeries on the adduser code base, but
    rather something that requires work across the chain.

    So, "not for Trixie". And what would we do in Trixie? I think we need
    something that a single person can implement in spare time before
    christmas. This is a rather limited amount of time that we have.

    It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a
    properly set LANG and programs calling setlocale(). This, as
    alluded to above, has the potential for a big mess.
    Our default is C.UTF-8 and has been like that for a while.

    Yes, but that can be changed.

    By the local admin? Yes. That's why we (Linux distributions) should
    stick to us-ascii user names for the accounts that are created by our
    packages. If a local admin creates UTF-8 user names but gives them a
    non-UTF-8 locale than it's their fault, and if a user with a UTF-8 user
    name selects a non-UTF-8 locale it's deliberate sabotage. I don't think
    we should or care about that, and it's already possible today.

    With all due respect, I admire your gung ho candoit spirit, but
    adduser alone is not IMHO the place. This is a major change
    requiring support from libraries, applications, and UI to do
    right, and thus wide buyin. I love the idea, but it's not going
    to happen with a few Perl regexes. Please don't read this as
    commentary on you or your code.

    So your recommendation is to disallow things that we have allowed until recently, and maybe remove configurability to REALLY disallow it?

    Greetings
    Marc


    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michal Politowski@21:1/5 to All on Mon Dec 2 11:40:01 2024
    Dnia Sun, 1 Dec 2024 23:27:09 +0100, Gioele Barabucci napisał(a):
    [...]
    But a cursory search shows that none of the current upstreams support (or mention) PRECIS. (It also shows that src:precis is a Java library squatting
    a bit on that package name... :))

    But at least it is an implementation of this PRECIS :)
    There is also python3-precis-i18n in the archive.

    --
    Michał Politowski

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Mon Dec 2 16:30:02 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241202 09:43]:
    On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:
    Marc Haber left as an exercise for the reader:
    * any upstream tool could say "bad idea" and refuse patches,
    requiring their long term management,

    Depending of how important this tool is, we could get away without patching and probably not even documenting this failure.

    This kind of attitude seems self-defeating. Despite being
    *strongly* in favor of this effort, I would oppose it if were
    strictly a Debian thing. We can inspire the move, but going it
    alone seems a recipe for present and future pain (think SSHing
    from/to Debian and a non-Debian machine).

    I bet that other distribtions will also allow me to useradd an UTF-8
    name today. I don't think that we have patched useradd to allow this.

    We did. Debian carries (since "forever") a patch in useradd to turn
    off most name checking. (Trying to) remove this patch is what
    started this all.

    Observe:

    [root@cc65635fbf00 /]# cat /etc/os-release
    NAME="Fedora Linux"
    VERSION="40 (Container Image)"
    ...
    [root@cc65635fbf00 /]# useradd för
    useradd: invalid user name 'för': use --badname to ignore


    Not sure if mjt brought it up yet, but the sendmail interface will
    also need some solution for utf8 usernames (=email address local
    parts). However, it seems some sendmail implementations already
    cannot cope with utf8 gecos fields.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Tue Dec 3 17:30:01 2024
    Hi,

    thank you all for your contributions to this discussion. I have now
    finally understood¹ that it is not enough to try creating an UTF-8
    encoded user name and see that it correctly shows up in /etc/passwd to
    declare UTF-8 support. Please forgive me for not replying to all of you
    in this thread individually, I have read everything and if I didnt cater
    for your arguments in this message please feel free to remind me.

    https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly
    outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
    adduser should reject creating étienne if étienne already exists - those
    are different user names but look the same, and if you don't
    cut-and-paste user names instead of typing them you're bound to hit the
    wrong user depending on HOW you type and what input medium you use. Not
    good.

    https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.

    After understanding this, I must admit that what's currently left active
    on the adduser team (me) doesn't have the capacity to implement this
    properly and in time for trixie. To make things worse, the
    Unicode::Precis module, which should be in Debian as
    libunicode-precis-perl (but isn't) hasnt seen an upstream release in
    more than five years.

    Additionally, I don't see myself in the situation of writing a proper
    checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
    since I don't have the time to check out which \p{Foo} character classes
    match the classes given in the RFC.

    I would appreciate volunteers to help here, but first I need to bring
    some sense in adduser's current state of affairs to make an unstable
    upload that can eventuall migrate to testing.

    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have
    - adduser --allow-all-names will just verbatim pass all user names to
    useradd.

    All this will be documented in the man page, in README.Debian and/or the
    Wiki after the code passes the test suite again.

    I'll probably deprecate --allow-bad-names in favor of something that
    doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    I would love to hear your opinion. Silence is agreement ;-)

    Greetings
    Marc


    ¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
    educating for me

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 18:00:02 2024
    On Tue, Dec 03, 2024 at 05:46:00PM +0100, Gioele Barabucci wrote:
    On 03/12/24 17:20, Marc Haber wrote:
    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have

    Dear Marc,

    in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters
    are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

    While this seems the right thing to do, I think this should be done in
    useradd (pkg:shadow), in the respective upstream project, so that all
    Linux distributions get the same behavior.

    I have filed https://github.com/shadow-maint/shadow/issues/1138 in the
    general regard. Feel free to add what I fotgot to mention there.

    I'd rather not have this can of worms in adduser, but I'd consider a
    patch.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 17:50:01 2024
    On 03/12/24 17:20, Marc Haber wrote:
    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have

    Dear Marc,

    in preparation for a PRECIS future, couldn't adduser pass the usernames
    through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    [1] https://www.rfc-editor.org/rfc/rfc8264.html#section-5.2.4

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Tue Dec 3 20:50:01 2024
    Hi Marc,

    Marc Haber, on 2024-12-03:
    thank you all for your contributions to this discussion. I have now
    finally understood¹ that it is not enough to try creating an UTF-8
    encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
    in this thread individually, I have read everything and if I didnt cater
    for your arguments in this message please feel free to remind me.

    Thank you for having taken the time to investigate this issue,
    as a person concerned, I much appreciated it. Let's see whether
    I can contribute one last useful item.

    I'll probably deprecate --allow-bad-names in favor of something that
    doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/5, please excuse my verbosity
    `- on air: DGM - Solitude

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdPXtIACgkQeTz2fo8N Edrubw/8CqDvIyJGTGpt0Wwy8NThEEQ38je9rmO6P0Bz/ExVgDVaD/s7hHNfamq7 VG5sqVOZUD1Mtn31mXVkaB1ZvSRfLwYHDqBos6jm/4rOGjvrmQKCC3niy8A1H6KY FNsTSB9ERcVB0IV94o9zOtzfMh4X26RSop7XZEQVO30+x25uh49Es3GXGTxuedds +dplgu0ikDtZWWhZIWqVlzRk+yQzMUMuk2Y3OOkNY5ieHwGXl8RE+iAsp2czLkSC gaCZm7U3bc2FMnscNnd3AY21e1agAnJblCl80rj3+HhiIzeRXSEo1fFf2cz0ofDL MNuXTY2das46AyDDwirJ7uz3ocyMXYu652Ih/RxxIjNWfd1RU+yY7CZFSSxXBsLZ YehWomyFjHn/zRTV3jHpLWEmgQJ7eSfNtuqe28rSLl0mArbtdzHEMnq2AD3FsoXJ TIrT0LTQmJMBvMiQJmSvO89bSjY8rhBfIQm9slbVTaTJrO4sCTFvGeWkwU1a1mLT 2AtsO4fvVKgTMpZ1BPxNawdPfH/AWGL7EXrMqVLT9AzCY+JQZp9jbB0c9F/P5zlj plGKHFkPqD0xqlya4P1X3roO6R09UMf41JE/1jOcSVIOqgdlSqlDYQzCbwZDnC2T HTXauqNOJ7hKk5lRaxw4x8xFFaQv1XQ3qcrxatqrmBpt6dr5sXU
  • From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 21:40:01 2024
    On 03/12/24 17:59, Marc Haber wrote:
    in preparation for a PRECIS future, couldn't adduser pass the usernames
    through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

    What NFC alone will not solve are homograph collisions: a (U+0061 Latin
    small letter a) and а (U+0430 Cyrillic small letter a) are
    NFC-normalized to different codepoints.

    But these are two different scenarios: the former problem may (and does)
    arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce
    a different é), the latter is an attack. The former is an
    interoperability issue, the latter is a security issue.

    While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
    Linux distributions get the same behavior.

    That's probably the best approach.

    Thanks for taking the time to delve into this issue,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Tue Dec 3 22:10:02 2024
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-12-03:
    I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 22:10:02 2024
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    On 03/12/24 17:59, Marc Haber wrote:
    in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

    Converting Ohm into an Omega is losing intended information, isnt it?

    Thanks for taking the time to delve into this issue,

    I have learned many things.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Tue Dec 3 22:30:01 2024
    Marc Haber, on 2024-12-03:
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    I avoided unicode as it would include ascii and the safe subset
    documented by posix, and I also considered the unlikely case
    where something were to replace unicode. "international" would
    make the name technology agnostic, but there is still the case
    about also covering the posix-safe subset… Borrowing the idea
    from the other branch of the thread, --allow-unsafe-characters
    sounds fine and would carry the idea that certain characters
    could cause issues, if used in a login name.

    Have a nice day, :)
    --
    .''`. Étienne Mollier <emollier@debian.org>
    : :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
    `. `' sent from /dev/pts/1, please excuse my verbosity
    `- on air: Atlas - Hemifran

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdPd+kACgkQeTz2fo8N Edq2uQ/+JV7YXDj2ti360MAPkBpFqT9AxgZcElkbo9utZuiqM/YdEUxURXHizqhG japLLIuW1si8xmAT6KAbCRs0pDhsROhALWH1hYCiqmHLCLlEPXV3MwFHtTTu2vvF 6peG7tEH419evKynMuRHW1ZVeoBo2tylONZldyH/83aq75naY8oaJqCndFZj8ZZR bdBB5qpjB7TbojIOFBsunQImWF0ZB/a72boIWl6JFoCvooeY5LLhXqictSwBbo0O R+uZniul+aUDSY1rgbO4jIuWrl6Znk6wmXEFdZshyPgkF+hGSugdEawqaZ9GQ4bT PkDMcez6JXYYW/3ToZellYpBdnnjclVqYm5v83CaakA/pYRSQL/57keRDOBVzfN4 3gkBVmunDrnkJooOrORTKo+3OC5nj+dHk2XRU5D+bFiMv0rOljSg+j1FspYEK8vE xaPPTpAYH1g+3wzndIIRyXYeFDZi64g5mQlFiV4WM4XTbODvNWRCtmjcypQJgLYz kwlgq/mrLM0dv4D3BK8ujEZp9QUvaaehRYr5/Q8+KTWyUYx671AISpysMQhmmSMA PZGzj8r1ENwC1Zps+JVS2GIDeu57EXleW3j+mN2LZORBR+aA1oqDRVGokg10A8BP +PzAEFsWs9hbL9+A5sIY9Gyvqk76kEJuHHWCvnnqDQEaDLQbw
  • From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 22:50:02 2024
    On Tue, Dec 03, 2024 at 10:18:46PM +0100, Gioele Barabucci wrote:
    Normalization is always lossy, at least in principle.

    Applications that employ normalization accept that tradeoff in order to gain something valuable: in this case the ability to have a Ohm sign codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.

    I don't know what's exactly in the standard, but my gut feeling says
    that I would probably store _exactly_ what was received, but normalize
    both sides before duplicate checking, sorting, comparing.

    If we'd normalize things away in storage, why do we have homographs in
    the first place? Why would I replace a kyrillic a with a latin a,
    destroying the idea of a "script"?

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 22:20:01 2024
    On 03/12/24 22:02, Marc Haber wrote:
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    On 03/12/24 17:59, Marc Haber wrote:
    in preparation for a PRECIS future, couldn't adduser pass the usernames >>>> through NFC instead of doing no normalization?

    RFC 8264 5.2.4 Normalization Rule states:

    In accordance with [RFC5198], Normalization Form C (NFC) is
    RECOMMENDED.

    that would solve the étienne and étienne issue (where the two characters >>> are just different renderings of the same character), but not the
    Ohm-against-Omega issue, right?

    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
    (omega).

    Converting Ohm into an Omega is losing intended information, isnt it?

    Normalization is always lossy, at least in principle.

    Applications that employ normalization accept that tradeoff in order to
    gain something valuable: in this case the ability to have a Ohm sign
    codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Soren Stoutner@21:1/5 to All on Tue Dec 3 15:15:52 2024
    I appreciate your being careful and deliberate about this instead of rushing into a solution that brings unintended consequences. But I also appreciate your taking the time to engage with the issue instead of just ignoring it.

    On Tuesday, December 3, 2024 9:20:53 AM MST Marc Haber wrote:
    Hi,

    thank you all for your contributions to this discussion. I have now
    finally understood¹ that it is not enough to try creating an UTF-8
    encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
    in this thread individually, I have read everything and if I didnt cater
    for your arguments in this message please feel free to remind me.

    https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
    adduser should reject creating étienne if étienne already exists - those are different user names but look the same, and if you don't
    cut-and-paste user names instead of typing them you're bound to hit the
    wrong user depending on HOW you type and what input medium you use. Not
    good.

    https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.

    After understanding this, I must admit that what's currently left active
    on the adduser team (me) doesn't have the capacity to implement this
    properly and in time for trixie. To make things worse, the
    Unicode::Precis module, which should be in Debian as
    libunicode-precis-perl (but isn't) hasnt seen an upstream release in
    more than five years.

    Additionally, I don't see myself in the situation of writing a proper
    checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
    since I don't have the time to check out which \p{Foo} character classes match the classes given in the RFC.

    I would appreciate volunteers to help here, but first I need to bring
    some sense in adduser's current state of affairs to make an unstable
    upload that can eventuall migrate to testing.

    What I intend to do in adduser for the next unstable upload is:

    - adduser --system's user name validation will not change
    - I'll make sure that adduser <normal user account> doesn't accept
    UTF-8 user names, bringing it closer to systemd's notion of a valid
    user name
    - adduser --allow-bad-names will still allow UTF-8 usernames, not doing
    normalization. I will document this and make it clear that the local
    admin needs to make sure that they don't allow things they don't want
    to have
    - adduser --allow-all-names will just verbatim pass all user names to
    useradd.

    All this will be documented in the man page, in README.Debian and/or the
    Wiki after the code passes the test suite again.

    I'll probably deprecate --allow-bad-names in favor of something that
    doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
    the Red Hat World uses --badname to allow such names as well.

    I would love to hear your opinion. Silence is agreement ;-)

    Greetings
    Marc


    ¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
    educating for me


    --
    Soren Stoutner
    soren@debian.org
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEJKVN2yNUZnlcqOI+wufLJ66wtgMFAmdPgxgACgkQwufLJ66w tgP3dA//e7/S9ajyCAzadr0wNbx9oBzIGofqM3OZ6gzZlHtoj/jm0/VwY1VB47kw 5H0zcTO2eyul75riOVKBwwhOXNDueuBv7PwL5tr2c3o1mBHrEdtS+TPaLooUfw/M Qdx+Knouxha5fM+yPbogUZOO3pADrUHDRW2CUaqTEwKleSZdw4IV9Qx8G1hIMqAu QCGMmyeMO9Q+T162eBZ0Ah8sR16jbBEGTk17ax/ssdItNryxJRbl3M+eRk6ge1Mx BWfliH+4s2w/CbyJDTzRiY2lLPeQwpOPfTDv88061kC3iGHzT4YO98oiV8TwTVw6 m1HnJQAUgYbLlN8Bm7iIBL0cTkCK6vv1kPl10dv4Sf6LAwbWokXyU05H6uMnbNSe 7nkuNX1t/3tOKXDZykkfT4BIrjEmzXg3/GVAqUMlUsQbrdR7Js8WjDN3QKj4H3/g FOA1RuKUfo/A2/0o0R2BZIbdmfmM98h4rniy8BRaOlLN4BRBObtHwUDOW9n36oT8 zT9/eJ0WJbSUtNydDDcrHpiYKKyW55fCiEU0XRGf0gtZK4LQcIdOE9oqETnweFt+ xuZbIp8WWl/JYy5tnxnREMwPPV2S4XoE1erpYgUVpt+HY5u9cFKP7bjFYT1VX5HS ZwQ16xvy3aHaO1KFHR0T1GwDf+F7FWEKSc4SaQx9XYY5AgGugdY=
    =LMes
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alejandro Colomar@21:1/5 to All on Thu Dec 5 14:34:21 2024
    Copy: debian-devel@lists.debian.org

    Hi Marc,

    Homograph attacks would be best mitigated in software reading
    /etc/passwd, alerting in their output or logs that the user name they
    just printed was composed of strange alphabets.

    Software that reads /etc/passwd or /etc/shadow is quite sensitive, and
    should therefore be as simple as possible. More code, more bugs.

    The best mitigation for those attacks is to ban the names altogether.
    IMO, setuid programs should not accept Unicode.

    Have a lovely day!
    Alex

    --
    <https://www.alejandro-colomar.es/>

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmdRq90ACgkQnowa+77/ 2zKYJg//fhAD6gj5l4mWj+4XVfaR1Gz3whaMhQ3K+Bhhbngeg+nyghxONMrZneZL M/63zVSwWnQxOu1wOvV/XkO8yioO8v8EUglDWp0iZwmWEPqQWT6VdBTm5+PlFvSD mLfEF8be+mK/0obnXJVa0Qs+cuWUQAjkep21aovYVh8hN1lTvVcCSsandFe4uFPT wiS0d70lDGja/0xWZqtcrnWiT8I2mfiyrKnGKHOR4Sgg4pPPYVjy1XbR8xPq649u u1klAHUKCrI5UefSns1iTmuoWvywfU5DqOzOp5PJthCnf6eL+ji8ERAihBOQcBy1 hONT1/OHCohuqACFjl4Ian58RGEXwER4Ok0Zus5YEi4ognnh8zMdRifkq8QQ2iuc f9QXqFAzYKS8FtR6VVOyyciVHLE3cU2dTqndzxAaq4b7Dbks719N622Gf20dst5j g5EvVxOfmPpIgHRwMMe9gyst1bkrtXhpc2BYXHaNmInrhRB00G7y8kOMBhsRI1uz SjmDYoxzUh6ZC4jGR5Qy0SyFfxdJGZbZEtEo8XQnXsqCqz9cpDtq90of5HDotj6n vR6iNPav2HjsbdAr41dGY72O4/O8b0pY2Rqr49IM9UF6tHU2fO+qGn9YLPi7fTUN XiLlk6yGVQHHVWIx5mQyzEIFP8ZQbMq7zUe1PL4a1V0XUVAA3ag=
    =Ms1f
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alejandro Colomar@21:1/5 to Marc on Thu Dec 5 15:53:36 2024
    Copy: debian-devel@lists.debian.org

    Marc wrote:
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-12-03:
    I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    I prefer "bad". It gives the implicit message that it's bad to use that
    flag. If you find it offensive, then how about --allow-unsafe-names?

    I oppose "unicode", "extended", or "international", as all of them
    remove the connotation that you should not use that flag.

    Anyway, I vote for removing the possibility of using unsafe names, and
    not even exposing a flag.

    Have a lovely day!
    Alex

    --
    <https://www.alejandro-colomar.es/>

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmdRvnAACgkQnowa+77/ 2zLwxA//WTlK2a6dxhRW5f+ohED8josb+k54CmClx76kzQvL7/QililZdl/3eMJx EEnZs3KpiNTrCXwPrI+BAJ7p/yqA6ySI+BXr03nDJYlILtLlkh+9U1UMuEgM5xED KLXME/b+upaBKkOhXHHpjSkWiXnvjdoozEGCU+I8LZ5NY2qGbigfUg1Mnw5asY8B ra5UkN6s0GVzEaRcIm4t5da7ObOwxHir68VsY3kfa9eX9mWKF4foTnnnVvxOUwHg W3BAtllc4ADyCoevsr3cOqdn3GmUKDvH6LDL5Fl6KYVUgW24yXJ2x2Fra+RMj3+p 9ZhM2BNWNjdmJv4pzS7JYwEk6po4/GvfEe5UTaS3d0dfBkAzNx6I8f6OmVQUykOo s4txzSLZZ7WV0zseYjdG5bbbVwn0Uk0cnvPMZ0soTTCMllLBBET2G40qIctO+nTZ X5CLb193/zeF3Uuah5ufMlRS9DMgp8v+Uw3Yz9lD2ht+kGz3z+UJWv7u/19KELYg IH9QisBxQX+5edbhE/Ve3bbPXrpc8VV5YSSpmY67eS4usK28bR/x2zRZNFKWCUHD v1tT9EmYVeFul2FM0A/FkqxDff46MqiqK4wmsz9WPGnJ5GMqgUJoBMOEyllX5nCO lY7N445g2q0UBtHSBYj5ApWv6cBiWl1bMiu6M157d1auiB1Y+ZQ=
    =ednz
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Thu Dec 5 16:40:02 2024
    Am Do, Dez 05, 2024 at 14:34:21 +0100 schrieb Alejandro Colomar:
    The best mitigation for those attacks is to ban the names altogether.
    IMO, setuid programs should not accept Unicode.

    Today, not many people want to live in the past and accept simply ASCII
    if there name needs a bigger character set.

    Stephan

    --
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Thu Dec 5 17:10:01 2024
    On Thu, 5 Dec 2024 14:34:21 +0100, Alejandro Colomar <alx@kernel.org>
    wrote:
    The best mitigation for those attacks is to ban the names altogether.
    IMO, setuid programs should not accept Unicode.

    Oh, Bugs by Code. Dangerous. We should stop producing code completely.
    No code, no bugs.

    Neither adduser nor useradd are setuid.

    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Thu Dec 5 17:20:01 2024
    Am Do, Dez 05, 2024 at 17:05:29 +0100 schrieb Marc Haber:
    Neither adduser nor useradd are setuid.

    To be fair, passwd is setuid. And I’m sure you are using it to set the password. So it has to survive an unicode user name.

    Stephan

    --
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to nick black on Thu Dec 5 18:10:01 2024
    On Sat, Nov 23, 2024 at 02:48:10AM -0500, nick black wrote:
    I recommend Chapter 7 of my free book, "Hacking the Planet with
    Notcurses: A Guide to TUIs and Character Semigraphics" for the
    full story (as I understand it) regarding Unicode presentation: https://nick-black.com/htp-notcurses.pdf (starts on page 41).

    Thank you very much for providing this. The chapter has educated me.
    "The vast minimum of things you should know about Unicode."

    The time to read it was well spent.

    Greetings
    Marc

    P.S.: Sadly, this has gotten less than positive coverage on LWN. I
    apologize for the harm this discussion has done.

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Kallus@21:1/5 to All on Sun Dec 8 21:40:01 2024
    Hi everyone!

    I second calling it "allow-unsafe-names" for the following reasons:

    1. Many programs assume that usernames are so inert that they can be
    used in shell strings without proper escaping. For example, a user
    named $(touch /tmp/pwn) will create /tmp/pwn upon the first launch of
    an interactive bash, because the default bash PS1 interpolates the
    username before doing command substitution. adduser doesn't allow
    whitespace or forward slashes in usernames, even with
    --allow-all-names, but you can still get the same behavior with the
    username $(>`printf$IFS"\x2ftmp\x2fpwn"`). How this works is left as
    an exercise for the reader. Once you figure it out, see if you can
    out-golf us :)

    2. There's a path traversal bug in useradd (but not adduser) that can
    be triggered by usernames beginning with "../". For example, for the
    username "../bin/brangal", useradd will create a home directory at /home/../bin/brangal (i.e. /bin/brangal). This can be used to place a
    directory owned by the new user nearly anywhere on the system.

    -Ben Kallus && Jonah Weinbaum

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Mon Dec 9 18:10:01 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241203 22:06]:
    On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:
    Marc Haber, on 2024-12-03:
    I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.

    The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
    also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
    long winded, but also sounds more accurate than the rest. What
    do you think of these approaches?

    Extended sounds good, maybe even "unicode"? or "international"?

    I echo Alejandro's concerns. We should stop having the flag
    completely, not encourage using it.

    If the default restrictions are too tight, then we need to work on
    that. What we should not do is to introduce a badly tested because
    mostly unused codepath, that will introduce bugs in all sorts of
    places.
    IOW: if we move towards better character support, we need to do that
    by allowing it always. Same for longer names.

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Mon Dec 9 18:20:01 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241205 18:06]:
    P.S.: Sadly, this has gotten less than positive coverage on LWN. I
    apologize for the harm this discussion has done.

    Marc, my thank you for collecting the info on the wiki, and starting
    this discussion. I'm sorry I was not able to participate more.

    However, I reject the idea that it is on you to apologize for LWN
    covering this discussion and the harm that might have come out of
    it. This is something we need to address on a wider floor. Otherwise
    we lose our ability to discuss anything (and then changing anything
    ever).

    Best,
    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to zeha@debian.org on Mon Dec 9 21:30:01 2024
    On Mon, 9 Dec 2024 18:04:52 +0100, Chris Hofstaedtler
    <zeha@debian.org> wrote:
    This was never on the table, and shadow upstream might even drop the
    entire "support" for having bad names.

    Just for the record, I consider this a kneejerk reaction that moves
    the world backwards. It's sad.

    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Hofstaedtler@21:1/5 to All on Tue Dec 10 12:20:01 2024
    * Marc Haber <mh+debian-devel@zugschlus.de> [241209 21:21]:
    On Mon, 9 Dec 2024 18:08:33 +0100, Chris Hofstaedtler
    <zeha@debian.org> wrote:
    I echo Alejandro's concerns. We should stop having the flag
    completely, not encourage using it.

    I violently disagree. But I have to accept this.

    IOW: if we move towards better character support, we need to do that
    by allowing it always. Same for longer names.

    I think that our distinction between system users and "normal" users
    is fine. Noone needs a package generating "weird" user names.

    I think we're speaking past each other here.

    Packages can already create absolutely broken usernames today, if
    they want.

    To me, the question is more, why do we have a flag that, if used,
    allows you to break /etc/{passwd,shadow,group,gshadow} completely?

    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Gioele Barabucci on Tue Dec 10 13:50:01 2024
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

    What NFC alone will not solve are homograph collisions: a (U+0061 Latin
    small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to different codepoints.

    NFC also doesn't solve various invisible characters (e.g., zero-width
    spaces, bidirectional control characters). For more information about
    all of the various security land mines, see[1]. I also suggest that
    people do a google search on "CVE" and "Unicode". There has been at
    least one interaction where we needed to make a kernel(!) change to
    address a security vulnerability, although we decided it wasn't
    super-critical because "no sane distribution actually enables the
    casefold feature on users' file systems by default".

    [1] https://www.unicode.org/reports/tr39/tr39-22.html

    The other security consideration to consider is the vast amount of
    code that you need to link into security critical / setuid programs if
    you are going to use libunicode. (And yes, we do include libunicode
    into the kernel in order to support casefold. If you are thinking
    about potentially enabling casefold by default on User file systems
    because Windows and MacOS does it, and we need to appeal to Gen Z'ers
    in order for Debian to stay relevent(tm) --- please don't. :-)

    So if we really do want to support unicode in usernames, may I suggest
    that having someone implement the smallest possible Unicode
    canonicalization library, which also handles getting rid of all of the
    *other* Unicode security traps like invisible characters,
    bidirectional control characters, etc., and then asking it to get
    subjected to rigorous security audits before we propose linking it
    into setuid programs, that would be a Really Good Idea.

    This would also reduce bloat in the minimal Debian install required
    for installer images, docker containers, etc., since we wouldn't need
    to support things like Unicode sorting rules, Unicode case folding,
    conversion between the many different Unicode encoding forms, etc.

    Cheers,

    - Ted







    But these are two different scenarios: the former problem may (and does) arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce a different é), the latter is an attack. The former is an interoperability issue, the latter is a security issue.

    While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
    Linux distributions get the same behavior.

    That's probably the best approach.

    Thanks for taking the time to delve into this issue,

    --
    Gioele Barabucci




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Theodore Ts'o on Tue Dec 10 15:00:01 2024
    On 10/12/24 13:47, Theodore Ts'o wrote:
    On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:
    NFC would solve both of these "problems":

    * Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
    * Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
    (omega).

    What NFC alone will not solve are homograph collisions: a (U+0061 Latin
    small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to
    different codepoints.

    NFC also doesn't solve various invisible characters (e.g., zero-width
    spaces, bidirectional control characters). For more information about
    all of the various security land mines, see[1].

    NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.

    The IdentifierClass of RFC 8264 explicitly disallows all these "security
    land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3

    The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12

    In general, the PRECIS RFCs are more prescriptive than Unicode UTS #39,
    so, should Unicode usernames ever happen, the PRECIS RFCs are the
    reference all programs should follow.

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Gioele Barabucci on Tue Dec 10 16:00:01 2024
    On Tue, Dec 10, 2024 at 02:52:05PM +0100, Gioele Barabucci wrote:
    NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.

    The IdentifierClass of RFC 8264 explicitly disallows all these "security
    land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3

    The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12

    Oh, good. I was just getting worried when discussion on the list
    seemed to be treating NFC as a silver bullet, and people were
    suggesting that the canonicalization should be done both by readers
    and writers of /etc/passwd --- which would imply linking libunicode
    into setuid programs like sudo and login, with the (to my view)
    invevitable results of hilarity ensuing.

    As I look at RFC 8264, I note that it does not take a position about
    which version of Unicode should be considered canonical, and in fact
    talks about one of the features (tm) of RFC 8264 being that it is
    agile with respect to newer versions of Unicode.

    However, it should be noted that RFC 8264 also states that code points
    which are not defined in whatever version of the Unicode supported by
    "the application" shall be disallowed. From Debian's perspective,
    though, if we are going to take a position about what version of
    Unicode should be supported by "the application(s)" that read and
    write /etc/passwd, we *will* need to take a position on what version
    of Unicode should be supported, and therefore, what set of characters
    will be disallowed.

    It also means that we need to be careful about what happens when we
    want to upgrade to newer versions of Unicode in future versions of
    Debian. If the system administrator wants to support more than one
    version of Debian, then it would be advisable if the Unicode version
    is something which is configurable, especially if the passwd entries
    are being supplied via some kind of network protocol such as LDAP or
    Hesiod (for those people who remember MIT Project Athena :-P).

    There is also (admittedly, only on edge case) of what to do if a newer
    version of Unicode disallows or remove characters. This rarely
    happens, but it has in the past (in particular in the case of various
    security disasters, or in the case of characters getting deprecated in
    favor of newer characters, many of which are mentioned in RFC 8264).
    So we can probably just ignore this case and hope that the Unicode
    consortium will be more careful in the future, but I'd thought I'd
    just mention it.

    The bottom line is that while I am sympethetic to the desire to
    support Unicode --- heck, I was one of the primary drivers of
    libunicode into the kernel so we could support case folding for more
    than just the ASCII character set --- the meme of "One does not simply
    walk into Morder" also applies for "adopting Unicode".

    And I am reminded of one of my IETF mentors who was an
    Iternationalization expert tell me two decades ago that, late at
    night, in the bar after a standard meeting, one of the things that
    I18N folks would say, just amongst themselves, was, "It would be
    easier just to teach everyone English" --- and this was with I18N
    experts who understood everything that was involved in doing full I18N
    support. No doubt this was only half-joking, but I think the point is
    valid.

    So if we're going to do this, let's do it right. :-)

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to zeha@debian.org on Tue Dec 10 15:30:01 2024
    On Tue, 10 Dec 2024 12:10:14 +0100, Chris Hofstaedtler
    <zeha@debian.org> wrote:
    To me, the question is more, why do we have a flag that, if used,
    allows you to break /etc/{passwd,shadow,group,gshadow} completely?

    The user-oriented solution would be to identify the things that break /etc/passwd and to forbid these. Just forbidding everything is heading
    the wrong direction.

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Theodore Ts'o on Tue Dec 10 18:10:02 2024
    "Theodore Ts'o" <tytso@mit.edu> writes:

    However, it should be noted that RFC 8264 also states that code points
    which are not defined in whatever version of the Unicode supported by
    "the application" shall be disallowed. From Debian's perspective,
    though, if we are going to take a position about what version of
    Unicode should be supported by "the application(s)" that read and
    write /etc/passwd, we *will* need to take a position on what version
    of Unicode should be supported, and therefore, what set of characters
    will be disallowed.

    A possible position may be to treat code points that are the subject of
    version mismatching to be undefined. This is how IDNA resolved the same problem, and PRECIS inherited this. While I protested about that
    approach many years ago as libidn maintainer when IDNA2003 was
    hard-coded to use Unicode 3.2, I think today that the approach is
    reasonable since Unicode has maintained good stability. We've done a
    couple of Unicode version bumps in libidn2 and interop with other IDN implementations -- that typically always use some other Unicode version
    -- is good enough to not cause serious breakage. I would expect the
    same to be true for PRECIS usernames too. Hostnames are hashed and is
    subject to string comparisons, just like usernames, so we have some
    experience to build on here.

    I would involve cross-distribution discussion about this though.
    Perhaps the /etc/passwd APIs affect some POSIX specifications, and a
    non-ASCII extension could be proposed.

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZ1h1mBQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFouqHAQC/TPObCg/ICrzye/UYk5zHKrYrpoCg nTGBrRJuLGeZZwD/dvik6f8DK81jUjxk+WyGnQK58JsjrvLEmCDEHSlXCQI=
    =RWT5
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Simon Josefsson on Tue Dec 10 19:20:01 2024
    On Tue, Dec 10, 2024 at 06:08:40PM +0100, Simon Josefsson wrote:
    I would involve cross-distribution discussion about this though.
    Perhaps the /etc/passwd APIs affect some POSIX specifications, and a non-ASCII extension could be proposed.

    Yeah, good point. If the scope is going to include passwd entries
    that are distributed via network protocols like LDAP, then we need to
    worry about sites that support other Linux distributions beyond just
    Debian --- or for that matter, sites that need to support Linux as
    well as legacy Unix systems like AIX or Solaris.

    Of course, we could just exclude them from the scope and say that if
    you are using LDAP, then you MUST only use ASCII characters in the
    username, given that POSIX has decided to run away from the I18N
    problems wrt to usernames. That might be the simpler approach, unless
    we want to drive something that could eventually be adopted by POSIX.

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Tue Dec 10 21:30:01 2024
    On Tue, 10 Dec 2024 13:13:08 -0500, "Theodore Ts'o" <tytso@mit.edu>
    Yeah, good point. If the scope is going to include passwd entries
    that are distributed via network protocols like LDAP, then we need to
    worry about sites that support other Linux distributions beyond just
    Debian --- or for that matter, sites that need to support Linux as
    well as legacy Unix systems like AIX or Solaris.

    Even if we had full Unicode support for anything using /etc/passwd, a
    site is always free to restict itself to us-ascii usernames. Same with
    POSIX, in my understanding we would still be POSIX compliant if we had
    full Unicode support for usernames, because POSIX defines the minimum
    of things a system MUST support, but it is always free to support
    more. Or, at least I hope so.

    But things are moving by shadow upstream taking a user-hostile stance,
    willing to take away freedom. I must be fine with that because I
    cannot change it. But I don't need to like it.

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Plessy@21:1/5 to All on Wed Dec 11 02:10:01 2024
    Hello everybody,

    sorry if it is too naive, but is there an easy way to determine for a
    given Unicode string if it can be typed from a single keboard layout or produced by a text-to-speech system? People who want a username because
    of SSH, email and su will want to be able to input it. On the other
    range of user cases, they can use a computer for years without seeing
    their username.

    If we take one step back and look at the future: will usernames
    still be a thing in 10 years? If not, then a simple heuristic that
    satisfies more than half of the users may be enough...

    Have a nice day,

    Charles


    --
    Charles Plessy Nagahama, Yomitan, Okinawa, Japan
    Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from home https://framapiaf.org/@charles_plessy
    - You do not have my permission to use this email to train an AI -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to Charles Plessy on Wed Dec 11 02:50:01 2024
    On 2024-12-11 10:04:44 +0900 (+0900), Charles Plessy wrote:
    [...]
    is there an easy way to determine for a given Unicode string if it
    can be typed from a single keboard layout
    [...]

    Do keyboards with a "compose" key count? There's plenty of glyphs I
    can type which aren't depicted directly on my keyboard's keycaps,
    after all.
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmdY7VtfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCnQQQ/+IwTSZhfI3X0CPWIHvvZ0AnFmr5L2jvdx8BisChj2SpGEx5chXZbFkHdw vQtpM9xkwugBe9ZSKhClxXEweiOprEBQBy28EOWMWNzAJYAetxjw2r09lGlClfE2 GmJgWX5mhzqhxpLvWaqigA/GcIo4MiE/FvIZqr9kxrdSJW+5v8b8ZN7XWnQNfIoX +PnrvR13+j6tMyP2y8r4CCJwQqZD9czd3usROxFLnOBAe/rjXjMsmWTvOL6vqRh6 ORm3XDh3GLYPEdp29XlPcJ2HuBXvA0+k0F1t9e2kUDjM10hHJdswFGCKw9mUnV4U 0KLBUy0OB0TH/VFIc0IX5ax9MkNIxwVZMDaoS0nK3ikTdBPhthdTJaf9K31dBbFH tPepWvOIlelAwSXeO+JczoUMjGz+UtOAAG5Wa9KP1b8cjaC72fi/msOAt1c+G0Dy IfL8gPfPsfA12lTtjmiYek2VuDYdz/HmwzSB18JkVZUtexkH1mIPe+g/WHkgdSvp D54TrAQAPwD36jrBvgK9YNUa/aXm1+aQBPlFr1+QcDWiB4+5W0ac+V5Xn+m3FC7j PM3KDh3iZWVUJvBoqhOG5f/XsoGHD6s00pNCGkjMc9PxSznI2hwNCFIEWpbPoi6h LqwoWizTTsiRt2RyyF7jRkkKiagWrROgXQ+VOxnv21ByxQukN/k=
    =0dez
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From Marc Haber@21:1/5 to All on Wed Dec 11 09:20:01 2024
    On Wed, 11 Dec 2024 10:04:44 +0900, Charles Plessy <plessy@debian.org>
    wrote:
    sorry if it is too naive, but is there an easy way to determine for a
    given Unicode string if it can be typed from a single keboard layout or >produced by a text-to-speech system? People who want a username because
    of SSH, email and su will want to be able to input it.

    That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
    in their own locale?

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Marc Haber on Thu Dec 12 17:10:01 2024
    On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:

    But things are moving by shadow upstream taking a user-hostile stance, willing to take away freedom. I must be fine with that because I
    cannot change it. But I don't need to like it.

    As a suggestion, we might make more forward progress if we assume good
    faith and accept that other people might have different priorities
    than others. I could easily see shadow, being a security-related
    package, would consider encouraging something that could lead to
    security bugs or just other random breakage, as "user-hostile".

    I am reminded of Professor Jerome Saltzer, who was responsible for the
    overall technical architecture for MIT's Project Athena, insisting
    that he be assigned the username Saltzer. He theorized that while
    this *would* cause breakage (for a long time, usernames were assumed
    to be always lowercase ASCII, and given that e-mail localparts where
    case insensitive, and usernames were case sensitive), but since he was
    (a) a Professor, and (b) responsible for the technical architecture
    for Project Athena, that when problems inevitably showed up, that
    programmers would be incentivized to fix them. As I recall, we didn't
    let students chose mixed-case usernames for a while, since there was
    presumed to be breakage; Professor Saltzer's username was a special
    case.

    If there are brave people who want to use Unicode characters (for
    bonus points, they could try using "unofficial" characters such as the
    Klingon script), they could be the first to find bugs, and report
    them. And if they suffer from security breaches, they would know what
    they were getting into. (And we salute them for their courage. :-)

    Perhaps at some future stable Debian release (not Trixie), we could
    enable it by default. But I really do think we need to do some
    technical work, including not requring adding libunicode as a required
    package, but having a minimal security unicode library that can be
    used by privileged programs first.

    Cheers,

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henrik Ahlgren@21:1/5 to Marc Haber on Thu Dec 12 19:40:01 2024
    On Wed, 2024-12-11 at 09:11 +0100, Marc Haber wrote:
    That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
    in their own locale?

    I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese
    speaking Westerner, needing to chown a file to a colleague called 陈成. Even if you have Pinyin configured, you might not even know how to type it. (Of course, you have the same problem with filenames that have essentially no limitations. I know from experience how hard it is to type names in Arabic which I can't read.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to Henrik Ahlgren on Fri Dec 13 12:30:01 2024
    On Thu, Dec 12, 2024 at 08:21:15PM +0200, Henrik Ahlgren wrote:
    I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese speaking Westerner, needing to chown a file to a colleague called 陈成.

    I would type "chown 陈成 <filename>", pasting the user name from the
    written request or probably from /etc/passwd. Or I would ask the system administrator for a solution.

    I see your argument, but I'd also see that as an issue that the system administrator choosing the user names needs to solve. I's nothing that
    we as a distribution should solve.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marc Haber@21:1/5 to All on Fri Dec 13 12:30:01 2024
    On Thu, 12 Dec 2024 11:02:21 -0500, "Theodore Ts'o" <tytso@mit.edu>
    wrote:
    On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:
    But things are moving by shadow upstream taking a user-hostile stance,
    willing to take away freedom. I must be fine with that because I
    cannot change it. But I don't need to like it.

    As a suggestion, we might make more forward progress if we assume good
    faith and accept that other people might have different priorities
    than others. I could easily see shadow, being a security-related
    package, would consider encouraging something that could lead to
    security bugs or just other random breakage, as "user-hostile".

    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.
    And if I was in Chris' shoes, I would probably refrain from doing so
    as well.

    And shadow would be the canonical place to do the PRECIS normalization
    at least for comparing usernames. That's something they wouldn't do.

    Perhaps at some future stable Debian release (not Trixie), we could
    enable it by default.

    There won't be such an option for us to enable.

    I need to be fine with that because I cannot change it. But I don't
    need to like it.

    Greetings
    Marc
    --
    ---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
    Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Fri Dec 13 13:10:01 2024
    Am Do, Dez 12, 2024 at 20:21:15 +0200 schrieb Henrik Ahlgren:
    I don't see much problems with single-user machines, especially security >related. But, think multi-user environments? Imagine, as a non-Chinese >speaking Westerner, needing to chown a file to a colleague called 陈成. Even

    You are joking, aren’t you? You could use „getent passwd” and copy
    & paste the username. Or use the user id.

    With this argument passwd should refuse to set the password to „12345”.

    And no one in this thread has said that you *have* to use non-ASCII
    usernames. But some people don’t want to give you a chance to do it.

    I don’t need non-ASCII for my name but I would never use a system that
    would forces me to rewrite my name in ASCII because it is so utterly
    broken in 2024. I bet there is no problem on Windows systems.

    Stephan

    --
    | Stephan Seitz E-Mail: stse@rootsland.net |
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?ISO-8859-1?Q?IOhannes_m_zm=F6lnig@21:1/5 to All on Fri Dec 13 14:00:02 2024
    Am 13. Dezember 2024 13:08:01 MEZ schrieb Stephan Seitz <stse+debian@rootsland.net>:

    I don’t need non-ASCII for my name but I would never use a system that would forces me to rewrite my name in ASCII because it is so utterly broken in 2024. I bet there is no problem on Windows systems.

    Stephan


    Incidentally, my kid's school rolled out their school laptops this week, which of course come with Windows11 preinstalled (as a sidenote I am now looking forward to four years of "digital competence training" consisting entirely of Windows(basics),
    PowerPoint, Word and Excel; but that's another story), and *of course* all usernames have been normalized to lowercase ASCII.

    so my take is, that "no. In Redmond you would use ASCII for username"

    Oh, and my name does have non-ASCII characters, and I have been using Unicode in my display name for the last 20 years.
    I do remember problems in the 90ies.
    But those are long past.


    mfh.her.fsr
    IOhannes

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Seitz@21:1/5 to All on Fri Dec 13 14:40:01 2024
    Am Fr, Dez 13, 2024 at 13:38:31 +0100 schrieb IOhannes m zmölnig: >Incidentally, my kid's school rolled out their school laptops this week, >which of course come with Windows11 preinstalled (as a sidenote I am now >looking forward to four years of "digital competence training"
    consisting entirely of Windows(basics), PowerPoint, Word and Excel; but >that's another story), and *of course* all usernames have been
    normalized to lowercase ASCII.

    I’m quite sure I have never seen an Asian Windows where you had to use
    ASCII for your username.

    Stephan

    --
    | Stephan Seitz E-Mail: stse@rootsland.net |
    | If your life was a horse, you'd have to shoot it. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Marc Haber on Fri Dec 13 16:10:01 2024
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file
    authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command
    here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the
    policy that useradd sets...just don't use it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sre4ever@free.fr@21:1/5 to All on Fri Dec 13 15:30:01 2024
    Hi,

    Le 2024-12-13 13:38, IOhannes m zmölnig a écrit :

    and *of course* all usernames have been normalized to lowercase ASCII.

    I just took a look at some reasonably recent government-issued IDs and
    it turns out the French ones normalized my name to uppercase whatever-some-clerk-had-on-their-typewriter-keyboard-late-last-millenium, dropping the accent from the second word of my name. My father's birth certificate is handwritten and has the accent. My Canadian IDs are
    better as they retained the name as I wrote it in in the application
    form. I don't remember if the french online application forms for IDs
    allowed accents in names but I would not be too surprised if they
    didn't. I might start a procedure to try to get that officially fixed in
    2025, as there is another issue with the way my name is registered with
    some administrations that occasionnally complicates my life. I'm pretty confident the other issue will get fixed, much less the accent one
    though the law should be on my side which here means that I could well
    sue the government, win the lawsuit and the subsequent ones up to the
    ECJ and back and still not get that fixed within my lifetime.

    I was going to write that on payment cards you can't have accents in
    your name. Wrong. I managed to get one that reproduced it. I don't use
    that one much online so I don't know if entering my name with the accent actually works somewhere when paying with that card.

    I would not try too hard to get non-ascii characters in that convenient computer identifier often named "login name" rather than "user name".
    You can't get them in the local part of an e-mail address and not many
    people complain. You can't get them in IRC nicknames. You can't get them
    in the machine readable part of your IATA-compliant government-issued
    IDs. It's still better than just numbers. I'm fine with that as long as
    my name is properly written in the places that actually matter.

    If you need a name for that option, --allow-non-ascii should be neutral
    enough.

    --
    Julien Plissonneau Duquène

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Pentchev@21:1/5 to Peter Pentchev on Fri Dec 13 18:10:01 2024
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    Sigh, that's adduser(8) too, of course.

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdcaFkACgkQZR7vsCUn 3xMI5g//U2QkXevEVYNw2RUF2LhVZsD3SSnfCYvQE3db/PlXYK54dWfQhkXYhLyw mHH+XncyJUbMv4s1v1hoeyZEYIF/huh9NYl7Ntd99qxpyKiriO+LG6q0Vrf3bVz0 fJtMDArFkwAVxKrhTn/VingixjXUYYe2YJFxJA0zNaTGcLR9f9JX3NCw93TuBhD0 Gh/2M5tu/N7TtLIhB7sXa3DtACJqxOTcPnxN6riOV9BFgalVTWVwTuZGTZUoLGaI aG6bPZsIi8XpCssLuiN9sky4yOpoJaeJ43I7+djO43iI3Iz6kkzy5tiVHsl9iR1d 5Hpv4AyQyoIcvW9epmPcpR+K1xBLxkbuIk8CfyFZSoSbpbdSsobEhZ6/HF4yPrjb vl3YT4SekQI/hA/OBkr+ai2NB3SElcF+/Fd/+zVpjGXp1sjMAdHQDFlcS+5vEVSY iQAdHYmDYDe6SLdWEY6BK+TpOkzaUYVOcgH+STEKxXtqlmwMACfi+MmlVC4+O4Sf l2sJeNBUIGRFZ+NG6Ju9lHJ64MDrpHaECXB+nKV3onYtxOXiPANLK1KrMDUYHir/ 4JOxBm1TT6eYffIuG7lWYtcLoHSEMAjqD+dlTdw64DR4EUlkg0XjXbgPrAh87VwH 5Hgf177Fa5lXaZ75RmztPFoFLxgEDxyeI4E01FovBDMQCJHfWCI=
    =vREB
  • From Peter Pentchev@21:1/5 to Michael Stone on Fri Dec 13 18:10:01 2024
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command
    here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdcaC8ACgkQZR7vsCUn 3xMDQBAAsHM//kwqqpWljEKePmadA2kmUpUsoxI0MERCnQztCb2fJuQXYT2niCZz l4VTBxibAIp7CLuq8I6UoGv3R89FUpp0RkrXInS3Rfhhu/mWdIAFX9WLLsItyAJN Y0+dpnWuHUx6KBNA0js0F5bZ9QcsxjJiA3LF9MuOg5fCJkkRi2QqMa930Lc59m6V qdI8Cd34ppCo7wEnkafpOPY5a0isVHwYf/nmNh1MMTQKTEgHZQH0DmQar2NlQsTr hV0xag9PLyiEwZqzI7YOBXMSCKfn6TQNkrb2BwOAwxCWalmYwTOzNRpfyUKOUTbs czOO3ty8KJCNjIvILZu52Hn+Ur1cqotx8hK+Oz9gKhrK86PLmKDEVvZv7USwOHMV 928+W6b2JljaF/wma6hFB8adlTtwlS552gcghlO39i4qUgffoQBVNewiJxkYRPhY U4wyNijH4xrI/LvdWzc4EVOkUfhpK0d2HLKDWyPzWmy9edQgDqxxiYmMhZXwa/xa 2lfwTB4ceNTQJ3TpxWWfCbZ56VbiE3HiU/ckijAekHlW4GQI/R2o+O7FtJI3UU0n QNwWJ5X7pW5zlA3fOi+36zClqH18ujVloXLfInJI/b4+2PmgGAen/jMJG7+t0K7R geUO9C2zKzjPb6ZRlP64AxqjMSSPIgMnzOjUcBxm0EehUzyGQ7I=
    =u1jV
  • From Marc Haber@21:1/5 to Peter Pentchev on Fri Dec 13 21:40:01 2024
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    adduser will not do that. Doing so is nonsense.

    Greetings
    Marc

    -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Stone@21:1/5 to Peter Pentchev on Sat Dec 14 05:10:01 2024
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making
    it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication
    mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command
    here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy >> that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1)
    should be changed to use something other than useradd(8) under the hood?

    No, I'm suggesting that rhetoric asserting that any adduser/useradd
    policy could constrain people is overblown because users can be added to
    the system without using either of those tools. The tools' policies
    should reflect what is safest and most sensible for the majority of
    users, but if someone wants to do something different there is nothing
    stopping them from doing so.

    The claim at the top of this subthread is that some useradd change would prevent people from experimenting with UTF-8 usernames. As an exercise I
    just created UTF-8 users and groups entirely without useradd/adduser
    (using vipw and vigr):

    getent passwd 1144
    💩:*:1144:1144::/nowhere:/bin/false
    getent group 1144
    💩:*:1144:
    ls -l /tmp/samplefile
    -rw-r--r-- 1 💩 💩 0 Dec 13 22:42 /tmp/samplefile

    On an individual basis there aren't so many steps that creating a user
    manually is a big deal, or that a script dedicated to creating users
    according to the policies of a particular environment would be overly complicated. For a large organization I question the idea that user
    accounts would be managed by adduser/useradd at all.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter Pentchev@21:1/5 to Michael Stone on Sat Dec 14 11:00:02 2024
    On Fri, Dec 13, 2024 at 11:01:43PM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:
    On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:
    On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:
    They are planning to remove the --badname option from useradd, making it impossible to even try UTF-8 user names, without patching useradd.

    Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
    I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
    it's not actually essential for adding a user. If you don't like the policy
    that useradd sets...just don't use it.

    In the context of the whole thread, are you suggesting that adduser(1) should be changed to use something other than useradd(8) under the hood?

    No, I'm suggesting that rhetoric asserting that any adduser/useradd policy could constrain people is overblown because users can be added to the system without using either of those tools. The tools' policies should reflect what is safest and most sensible for the majority of users, but if someone wants to do something different there is nothing stopping them from doing so.
    [snip more about adding accounts without useradd/adduser]

    Thanks, that makes sense. Apologies if my reply came through as snarky.

    G'luck,
    Peter

    --
    Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
    PGP key: https://www.ringlet.net/roam/roam.key.asc
    Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmddVhAACgkQZR7vsCUn 3xMajxAAtselXvTcg/EA7ftvqA1jDLJewYsGh5nGW0vm+H/WwawQnIdXeU68SePm qs6fbwrHK4xtwjHQvZf4dCw+WF95E0WvUxpMCDtTktXmAgm3acJ2W6rqKPYtfner 62PGLImVuWrGvFWonnHApewFE1qlPxR7K544jnyvXH+XJrYNuTN/npYeXxLZTgnb mESlq3WfevrpuU8TbFLl5ERPs+WDC2RkJwASnBb3bmgRlBlFE1qyRca0Ee6Clknq wk951vpGhNbGnXIvJqDA5PPsd5owHgnN2a5ZNVuXxUdQPW+DoybbTcOqD7FR6Hbp urolxrAmP8wup/EvhTi7f8HRuLZbecNbNA4mll9untzKxTZUKf7ED3V/8y+7S48l isBObSqYNXiLwyva7glLL9SbdDsjR03r0HYlKY7/ohfAGD/SNZ2ndzGmJxlHmXgz Yxg3ktxcDX0l5nNM8YAdq6oJNmX6a+t6BgdoINMhcGx0G4bL5y1zzfJE0nL7csZE bzjNj4N0wr9E/RMApBn0fuCznXjJXR16NJ4Hb+cK3J9itRWOLQt7m1Uyhhq8d/qt 0nLGF1SaJr/mfwSaION2DpfPyCrRohwffr1nIl1WzhaCiYMcvvjU/fimRx99PGTV U7MrGsWgzajS8cyWogDBmTa9A7s4xIoOp9wweoDtnPL3dkS/Jig=
    =AIwG