Forum: >>> Magnum BBS <<<

Musings about Usernames in adduser and Debian

From Marc Haber@21:1/5 to All on Thu Nov 21 18:50:01 2024

[writing this with my adduser hat on. I am also in touch with the
maintainers of src:shadow and base-passwd]

Hi,

recently, I have "taken over" the wiki page about UserAccounts and have
put in some history and general thoughts about what Debian thinks about
user names and name restrictions.

https://wiki.debian.org/UserAccounts

I fear that I have opened an especially nasty can of worms by beginning
to do sanity checks in adduser and being pointed towards user name
encoding in that process. Can you help me to bring some sense into this
mess?

I would like to hear your comments. Feel free to directly apply
corrections to the wiki page. I am especially interested in having clear terminology regarding unicode codepoints, UTF-8, character strings and
byte strings. It is vitally important to be consistent her to avoid
making the mess even worse.

For adduser's next release, I would like to discuss the following
things:

(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)

(2)
If the answer to (1) is "allow UTF-8", should we also do that for system
users? (I think no, we should not)

(2a)
Which UTF-8 subset / code point classes should we allow and which should
we reject? (I don't have an opinion about that)

(3)
I think that 32 characters/bytes (it's the same if we don't allow UTF-8)
is a good limitation for a system user name. But, should we increase
that for regular user names? (I think yes)

(4)
If we decide to relax some of our current requirements, where are the
borders between "normal" user name, one that requires --allow-bad-names
and finally one that requires --allow-all-names? Wouldn't it be
offensive to speakers of some languages that require --allow-bad-names
for their special characters to be allowed on a user name? (no opinion
here that would not break backwards compatibility)

(5)
Is it right to say "the user name in /etc/passwd is UTF-8 encoded" or
should I better say "the user name in /etc/passwd can be UTF-8 encoded"?

(6)
Does it still make sense to give non-UTF-8-locales special handling
(which one?), or can adduser safely assume that any non-ascii locale is
UTF-8? Or must I check for locale and reject UTF-8 user names on
non-UTF-8 locales? (I hope that we can safely assume UTF-8)

(7)
Do the general restrictions for both kinds of user names make sense?
Going forward with this would mean to reject user names that we used to
accept before. (I think we should come close to systemd's ideas)

(8)
I think that our current way to restrict system account names is fine.
Any objections/additions here?

(9)
Should some of this language be in Policy instead of some random wiki
page? Policy is quite short about user names (chapter 9.2) (I think yes)

(10)
What should adduser do regarding subuids? Since I was ignorant about
that concept until a few hours ago, all accounts created by adduser do
have subuids, regardless of being system account or not, while useradd
does not give system accounts subuids.

Greetings
Marc

P.S.: The teams and inviduals working on src:shadow, base-passwd and
adduser would appreciate your help in coding and packaging. You can gt
in touch with all involved parties via
pkg-shadow-devel@lists.alioth.debian.org

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Lewis@21:1/5 to Marc Haber on Thu Nov 21 23:10:01 2024

Marc Haber <mh+debian-devel@zugschlus.de> writes:

For adduser's next release, I would like to discuss the following
things:

(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)

would allowing utf-8 enable some of the abuse described at https://lwn.net/Articles/874951/ ?

as usernames appear in logs and other output (and are passed to all
sorts of commands), it seems a bad idea to be too permissive or to
change from historic practice by default, even though from a user pov it
would be nice to have the option

P.S.: The teams and inviduals working on src:shadow, base-passwd and
adduser would appreciate your help in coding and packaging.

Is there a list of "things that need doing"?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to Marc Haber on Thu Nov 21 23:30:01 2024

On 2024-11-21 18:45:06, Marc Haber wrote:

[writing this with my adduser hat on. I am also in touch with the
maintainers of src:shadow and base-passwd]

Hi,

recently, I have "taken over" the wiki page about UserAccounts and have
put in some history and general thoughts about what Debian thinks about
user names and name restrictions.

https://wiki.debian.org/UserAccounts

I fear that I have opened an especially nasty can of worms by beginning
to do sanity checks in adduser and being pointed towards user name
encoding in that process. Can you help me to bring some sense into this
mess?

I would like to hear your comments. Feel free to directly apply
corrections to the wiki page. I am especially interested in having clear terminology regarding unicode codepoints, UTF-8, character strings and
byte strings. It is vitally important to be consistent her to avoid
making the mess even worse.

For adduser's next release, I would like to discuss the following
things:

(1)
Should Debian allow UTF-8 user names in the first place or should we
restrict names for regular users to some us-ascii near set as well? (I
think yes, we should)

You weren't clear to which part you agreed. If by "we should" you meant
the closest option, i.e. restrict, then I agree as well.

As Richard also replied, full UTF-8 is tricky, and I think it's somewhat misplaced to focus on the username, as opposed to gecos. Aren't most
other OSes using the "full name" as the "display name", and the username
is mostly one part of the user/password combination, but not a display
property most of the time?

So I would suggest that maybe the better option is to standardise the
gecos format/gecos parsing, so migrate UI tools to use that more often.

On the other hand, as long as this is admin-controlled, it doesn't
matter much. I could see that viewpoint, but I wonder how much latent
breakage would be introduced that will take years to fix in all tooling
and all packages.

regards,
iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Iustin Pop on Fri Nov 22 10:50:01 2024

[Reducing the list to debian-devel. I have omitted to set Reply-To and apologize for that]

On Thu, Nov 21, 2024 at 11:26:48PM +0100, Iustin Pop wrote:

On 2024-11-21 18:45:06, Marc Haber wrote:

Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should)

You weren't clear to which part you agreed. If by "we should" you meant
the closest option, i.e. restrict, then I agree as well.

I am sorry. My personal opinions were among the last things I added to
the article and I was not clear here. I think we should allow UTF-8 user
names as a courtesy to those people who need non-ascii user names to
write their name, since user names are frequently chosen from the real
name of the person. In addition, this will enhance software quality
since we now get the chance of finding bugs that are already here in
many software.

This comes kind of late in the Trixie cycle, but as it is currently
already possible to create user names with UTF-8 characters, I do not
like the idea of tightening our restrictions in Trixie over what we have
in Bookworm just to maybe revisit our decision in Trixie+1.

As Richard also replied, full UTF-8 is tricky,

My current code uses \p{Graph} as a least common denominator. I am not
sure whether this is wise.

and I think it's somewhat
misplaced to focus on the username, as opposed to gecos. Aren't most
other OSes using the "full name" as the "display name", and the username
is mostly one part of the user/password combination, but not a display property most of the time?

I think that we should allow full UTF-8 in the gecos¹ field, yes. People should be allowed to have their fully correct name in there. I also
think that users of non-latin languages should have the possibility to
have a login name that resembles their name.

¹ in 2024 noone remembers what gecos means any more. Adduser and
src:shadow are using "comment" for that field nowadays.

So I would suggest that maybe the better option is to standardise the
gecos format/gecos parsing, so migrate UI tools to use that more often.

That doesn't solve the issue I am having with adduser right now: That
we're allowing things that we are not sure we should allow.

On the other hand, as long as this is admin-controlled, it doesn't
matter much. I could see that viewpoint, but I wonder how much latent breakage would be introduced that will take years to fix in all tooling
and all packages.

Yes. Fixing breakage makes software better, and by disallowing non-latin characters in user names we are hiding those issues away.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Richard Lewis on Fri Nov 22 10:40:01 2024

On Thu, Nov 21, 2024 at 10:05:49PM +0000, Richard Lewis wrote:

Marc Haber <mh+debian-devel@zugschlus.de> writes:

For adduser's next release, I would like to discuss the following
things:

(1)
Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should)

would allowing utf-8 enable some of the abuse described at https://lwn.net/Articles/874951/ ?

as usernames appear in logs and other output (and are passed to all
sorts of commands), it seems a bad idea to be too permissive or to
change from historic practice by default, even though from a user pov it would be nice to have the option

I am not sure about that. Would typosquatting on a user name make sense?
It might be possible to make logs ambiguious. Being passed to other
commands SHOULD not be dangerous since we can expect other commands to gracefully handle a byte stream, can't we?

I might be naive here , but I don't have much experience with non-ascii
names since I have the privilege of being fluent in two languages that
use the latin alphabet.

On the other side, wouldnt it be a courtesy to allow people having a
name that needs transcription to be written in latin to use their name
in the real alphabet that it is usually written in as a login name as
well? To make things worse, transcriptions are often ambigious.

I would like to hear the opinion of people who would be affected by this change.

Local Administrators are able today to use UTF-8 user names in useradd
or configure adduser to allow their locally important subset of UTF-8,
but at the moment with things being more restrictive, our software is
untested in this regard. I think that Debian would get more robust if
we'd allow things here.

Vulnerabilities that could be exploited by having non-ascii user names
are already here and present today, just not uncovered yet.

P.S.: The teams and inviduals working on src:shadow, base-passwd and adduser would appreciate your help in coding and packaging.

Is there a list of "things that need doing"?

The collaboration between src:shadow, base-passwd and adduser is a
relatively fresh thing that came from the fact that src:shadow recently introduced changes that made adduser's test suite break. So we haven't
yet found good paths yet. I suggested moving together as a method to
improve communication and also to at least a bit reducing the bus
factors of those quite important packages. That was also the reason why
I suggested base-passwd to join and I am happy that Colin agreed.

In adduser, nearly everything that needs doing has issues in the BTS,
with the severity set to the urgency of the matter in my opinion. You'll
see that adduser has quite a lot of bugs that were filed by myself. I
consider it a feature to have a public to-do list. For the other two
packages, I'd let their respective maintainers comment.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timo =?utf-8?Q?R=C3=B6hling?=@21:1/5 to All on Fri Nov 22 15:30:01 2024

Hi,

* Richard Lewis <richard.lewis.debian@googlemail.com> [2024-11-21
* 22:05]:

would allowing utf-8 enable some of the abuse described at >https://lwn.net/Articles/874951/ ?

as usernames appear in logs and other output (and are passed to all
sorts of commands), it seems a bad idea to be too permissive or to
change from historic practice by default, even though from a user pov it >would be nice to have the option

I have no experience with bidirectional attacks, but browsers
mitigate homograph attacks in IDNs by disallowing mixed alphabets
such as cyrillic and latin letters in the same name. That seems to
be a reasonable restriction for user names as well.

Cheers
Timo

--
⢀⣴⠾⠻⢶⣦⠀ ╭────────────────────────────────────────────────────╮
⣾⠁⢠⠒⠀⣿⡁ │ Timo Röhling │
⢿⡄⠘⠷⠚⠋⠀ │ 9B03 EBB9 8300 DF97 C2B1 23BF CC8C 6BDD 1403 F4CA │
⠈⠳⣄⠀⠀⠀⠀ ╰────────────────────────────────────────────────────╯

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEmwPruYMA35fCsSO/zIxr3RQD9MoFAmdAlUMACgkQzIxr3RQD 9MpDLA/+KWB5U48GgvnsPWX726G4EiWjVhHRjKVc5cAWFYmP4vuP9DoEVGA23fMv G7N/C+cM6lX+vzB0Pq4Y9kxdBhyCDJkR37XUQx4pniNsdVOBFjK7n3dN4z0bfhrM Q3pPR+iarmjSGCGyVTh1C7cyGzsQZ5SM8wAohSLcIeaC/8uL2gwn2KuMiHLp0+aC pkmXXynohBw4LPR97bYYCY4kkfd1zHA+uktET/X1sw70z1QjsBK+Jex11aZu9AP6 7ZaOqikF88QLYKdmg3N+HlNMdngBXCTCLS72lOPShOvHtLWORJxhtXji/6NQSQAG w2Y0vmXhi3sex/+WvL+ai3Y1/XVQTzU0lCBqrT3lJN/

From Marc Haber@21:1/5 to All on Fri Nov 22 17:40:01 2024

On Fri, Nov 22, 2024 at 03:29:24PM +0100, Timo Röhling wrote:

I have no experience with bidirectional attacks, but browsers mitigate homograph attacks in IDNs by disallowing mixed alphabets such as cyrillic
and latin letters in the same name. That seems to be a reasonable
restriction for user names as well.

I am not willing to implement that myself in adduser. I will accept code
and test cases written by others, but this is a thing that goes beyond
my resources. Additionally, it won't help since an attacker can directly
write to /etc/passwd.

Homograph attacks would be best mitigated in software reading
/etc/passwd, alerting in their output or logs that the user name they
just printed was composed of strange alphabets.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Fri Nov 22 20:50:01 2024

Hi Marc,

Marc Haber, on 2024-11-22:

I might be naive here , but I don't have much experience with non-ascii
names since I have the privilege of being fluent in two languages that
use the latin alphabet.

I am not sure whether I am the intended audience here, because
my name is almost Ascii based. That being said, I happen to
have one weird enough latin based character as the first letter
in my first name, that it gives interesting results when thrown
toward random databases. Thus I do happen to have some thoughts
about this topic.

On the other side, wouldnt it be a courtesy to allow people having a
name that needs transcription to be written in latin to use their name
in the real alphabet that it is usually written in as a login name as
well? To make things worse, transcriptions are often ambigious.

I would like to hear the opinion of people who would be affected by this change.

I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute. For example, I hit this particular issue
when populating the Gecos field from the Debian installer
environment: if I choose a Qwerty US configuration but miss the
step to choose which Qwerty US internationalized variant I want
to use, then I don't get to type uppercase 'e' acute, but there
are many other situations unrelated to d-i or even Debian where
I run into that. For this practical reason, I tend to feel
better about keeping a full Ascii login name. I wouldn't feel
strongly if unicode support for login never happens. I believe
however that the Gecos is the right place to store the properly
typed-in person name, because it is a "presented" name that
hasn't the technical coupling that the login name has, and I
would probably have stronger feelings if it were to not have
unicode support.

You probably want to have some more thoughts, especially from
people with entirely non latin character names. Having a latin
name, I accomodate perhaps too well of a full Ascii login.

Have a nice day, :)
--
.''`. Étienne Mollier <emollier@debian.org>
: :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
`. `' sent from /dev/pts/1, please excuse my verbosity
`-

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdA3ooACgkQeTz2fo8N Edp4JQ//U/KOqcnutmqGARGiQKUvUpWt1otWn7qiv6IYEaX/PPVwcen0T3/BzjbJ zWL7CO5IeY3sRk9nL4E9vldU8DnniUhR+MjZt1UBhYQlbxFcaFG/r5aXdsD/aS0q KE2pTY4aIwUsVIZL3k5ZDLGJCOFXpSAwRJB9YfqSPzkuw3DkIzCfbjAdbm5t8jkI dGTuK1KwwC0TWWbrd1wTgDx1toKKCZZlNKF7I55Fe8OZxz+bD7st82jH3sZN8iDi hqmbLDNObGKiKgYB3GFru9AoZQ1MF/9IBa10An1PRGrjcxCh3AN1+qFPAY4NMC+Z vxvI4HFVrb+ndmta9LDhmUpdg3YOOf8SkKxDKdZ/3XQR/D0RqMXQ11PNcQEnehra jlN2LX5eCz8eRa7Ry0jD02Wy2FUCWop6kzvuqdd0EIu1nU3j5qm+ivob24ksubgc D2wHWDY/RyoLUp29Du7CJi7zDg1in0fv2n8o+eg+UJDR8uWKTphmMfPFQaHPQ090 7jv5549Jf5IpApyD8uMJDsVztPp4OPMzyEgNoKAFdy24qWr/7g0JLBx34hzSCqzG 2pwhuEakGph68tQJXegqC7AcaT6wMidRp7Nms2qmyLyaLNqeXjhtH5GXo0ldlGwo AKapuxuk2Ykf16qqppHOe5TJmyC6URdfjVvid+KJZjKJOSBlqN8=
=X/J7
-----END PGP SIG

From Gioele Barabucci@21:1/5 to All on Fri Nov 22 22:10:01 2024

On 22/11/24 20:42, Étienne Mollier wrote:

I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute.

Dear Étienne,

your case highlights another problem not mentioned in the original list
posted by Marc: comparison (and normalization).

Some characters can be encoded in more than one way. For instance, "é"
in "émollier" could we stored as "e with acute" U+00E9 (and encoded in
UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus
U+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the
former sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive
comparison will not find the user "émollier" in the system. Unicode
defines in Annex 15 a few normalization forms as a way to work around
this problem. But a correct use of these normalization forms still
requires coordination and standardization among all programs accessing
the data.

Does POSIX (or other de-facto standards) prescribe a normalization form
for Unicode-/UTF-8-encoded usernames?

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Pentchev@21:1/5 to Gioele Barabucci on Sat Nov 23 00:40:01 2024

On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote:

On 22/11/24 20:42, Étienne Mollier wrote:

I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute.

Dear Étienne,

your case highlights another problem not mentioned in the original list posted by Marc: comparison (and normalization).

Some characters can be encoded in more than one way. For instance, "é" in "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 (UTF-8: 0x65 0xcc 0x81). If a keyboard input system provides the former sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive comparison will not find
the user "émollier" in the system. Unicode defines in Annex 15 a few normalization forms as a way to work around this problem. But a correct use of these normalization forms still requires coordination and standardization among all programs accessing the data.

Does POSIX (or other de-facto standards) prescribe a normalization form for Unicode-/UTF-8-encoded usernames?

POSIX says "if you want your applications to be portable, do not use any
funny characters in usernames":

https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_409

3.409 User Name

A string that is used to identify a user; see also 3.407 User Database.
To be portable across systems conforming to POSIX.1-2024, the value is
composed of characters from the portable filename character set.
The <hyphen-minus> character should not be used as the first character
of a portable user name.

For people unfamiliar with POSIX terms, the portable filename character
set is defined as:

https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap03.html#tag_03_265

The set of characters from which portable filenames are constructed.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 . _ -

The last three characters are the <period>, <underscore>, and
<hyphen-minus> characters, respectively.

G'luck,
Peter

--
Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdBFYsACgkQZR7vsCUn 3xNx1w/+IIBgnNzuCmXEIolTfg+daa794rU/O2cwJSs/R4N9mH0jcIc3RT5v7xmx vehZZpxrLgFpVJyYyx6yx/dBWc8D/yPuEWqaikAYzQNFsr8H8J1r9XlPMopyKuVG YKgNR+89sgQbtC2gOiMiGdx+mU5qiE3KavLxI9cmBdR5V6WT3e7OODvvD4ySnzH1 P0/PHdjX3QeaCnNcrScSnhuG7FvnXJPbKf2G3UwAWXEn6/jNPrDXinsatC5GSNco ToAbWWkhuxiiAlEBo1Fxe0lLggDETerA1Iu/eAfufRLQ9Mq1TnuKrGepvODTtEEO TzXZWjSyd4G2lxSEHDg3LFl4qmVNZJKB5cz8kXi1kVijyLfRibHDy/mz/U6Pzfc6 i5YqY70jWpR0CLKJPpUIikNVbmV9hypNZ5iyTzw1WlU/CllAmvHqWnZVIwGt5qCp ROu5wnprdbudiGUJmZeh07FYKLQr0rMHZsGEj7jHGAJ6GCNoUSvYoBbKdQKlsp8b tN4hPPQD316rNYtxjKkVkvuFAFoV6Pxdve/O/zGwwdMuXjqvZ2fH9uDTaS6KlxPK Uu2/218wDxlbay0ua4GjSb0AK3A0cHEJsp9v3bQPK2+pmI5t6up98KUULMyHjn3k nseqQggvyRNHa/V1YkplDImCuMY5sW/+XEviPFSABDUUQSjBkRs=
=pSLM

From Johannes Schauer Marin Rodrigues@21:1/5 to All on Sat Nov 23 09:40:01 2024

Quoting nick black (2024-11-23 08:48:10)

You now have glyphs which occupy more than one column. Are your columnar/tabular programs prepared for that? ﷽𒁭𒐫i

xfce-terminal renders this like this: https://mister-muffin.de/p/4o2v.png

No idea if this is correct and I'll leave the details to those who know more about this topic than I. And maybe my email client completely messes this up in this response of mine.

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters because that's all we have after half a century of Computers being a thing...

If having this work in Debian can put a bit on pressure on those software projects that do not support this, then please let that happen so that missing unicode support becomes more annoying for those pieces of software that are missing it. For example, if my email client messed this up, then lets fix it. We cannot find these kind of bugs if we accept translating everybody's given name to the American alphabet.

Thanks!

cheers, josch
--==============G73946724612237309=MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Description: signature
Content-Type: application/pgp-signature; name="signature.asc"; charset="us-ascii"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEElFhU6KL81LF4wVq58sulx4+9g+EFAmdBkx0ACgkQ8sulx4+9 g+GXSA/8Cc2tq1eMzQw5/p2W3WAmlro7vBXxIM5LKwcRLLFEI1uYKBy0rXlWN40G mzD0w2YyY2ZmVaSp6kRwo2gZExvNioGXsjxpKFz5b7QeDODk1hqx0G7qzswKnS+u DCDmfL7GGHci1wDXq4O9uoAO916oA6uZwJcMP09/IpYu3Z+s2QuinHcWbxEAHrS6 +b0H5pZRbBv+nqJw3WimgPV+GBGFvpnM1sqZ3+dJ+jt+QdVuQ5vMd8qSz4f+K6TM AqcsHrAmvYkpDHFleBUOD2QIhVzANjc46/G8bHe36SVkGBKLV2CCwiBsTZqB0nzS sdo6x78yZO6cMGfF8FvlEmMeRCquIjePtofIulLaOheWCgJqPs4VQ9tgVZ0AzWJz MeAN9nTWVca31mzFMauE0o0A28HD2u9lmy3izZB2b+1IMxJwc3TaB9kiOCxQVWsf hb5mv1VNSiMZOBMkllsToxiVsF8/OKUHByF0iUeJ2q3Qff67KKB9fH149dYorIOv Pkr/la8HmScWcmsRFNvWEzysgqvWTS9jp8/xFei5cVeE0bI/liKvr5kSw3X0sv1Y aZ9XeYgDDmomBo6ntRcEgmZYvrkscEaHrxstyIaGOQbdIVg46vcAOSRQBzfht7w/ oS1NyOB9hnaUyLymRqUvvUuKZ7o7vALtDeIYz4LefsXgdCG+lUc=
=08Qt
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Johannes Schauer Marin Rodrigues on Sat Nov 23 13:00:01 2024

On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames.

Yes please, but opt-in and behind a big red warning that says that it is
not interoperable (outside POSIX), potentially insecure (homographs) and
at high-risk of breaking existing applications (lack of standardized normalization form).

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Johannes Schauer Marin Rodrigues on Sun Nov 24 11:50:01 2024

Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

You are confusing usernames and names. Different concepts with
different rules. Let's just hope you never get two students with the
same name.

Bjørn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Marc Haber on Sun Nov 24 12:00:01 2024

Marc Haber <mh+debian-devel@zugschlus.de> writes:

On the other hand, as long as this is admin-controlled, it doesn't
matter much. I could see that viewpoint, but I wonder how much latent
breakage would be introduced that will take years to fix in all tooling
and all packages.

Yes. Fixing breakage makes software better, and by disallowing non-latin characters in user names we are hiding those issues away.

This is arrogant. Assuming that a username can be displayed, sorted,
compared and typed using strict us-ascii is not a bug today. It's not
"hiding" any issue.

The question is whether it makes sense to introduce a new class of bugs
by changing the rules. And we can pretty much guarantee that some of
those bugs are securty critical, since this is all about authentication
and authorization.

Knowingly introducing security bugs does not sound like a good idea.

For what purpose?

Bjørn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to nick black on Sun Nov 24 12:30:01 2024

On 24/11/24 10:43, nick black wrote:

Gioele Barabucci left as an exercise for the reader:

On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:

But my 2 cents on the topic are: Lets please allow more than ascii in
usernames.

potentially insecure (homographs) and at
high-risk of breaking existing applications (lack of standardized
normalization form).

i'm not sure why this is being repeated.

https://unicode.org/reports/tr15/

Dear Nick,

You may have misunderstood that phrase. I was not referring to the fact
that there are no standardized normalization forms for Unicode (I
explicitly mention Annex 15 in [1]), but to the fact that there is no
standard that specifies which of the possible normalization forms should
be used for account names (and other fields in passwd).

POSIX explicitly limits itself of a subset of ASCII, so it is not going
to mandate any normalization form. Are there other standards (or
initiatives) in this area that you know of?

Regards,

[1] https://lists.debian.org/debian-devel/2024/11/msg00305.html

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to All on Sun Nov 24 13:30:01 2024

On 2024-11-24 11:44:45, Bjørn Mork wrote:

Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

You are confusing usernames and names. Different concepts with
different rules. Let's just hope you never get two students with the
same name.

I wanted to reply to Johannes, but I didn't exactly how to phrase it -
you did it perfectly.

I still don't understand the need for username to be very
representative of one's name. OTOH, my name can be fully written using
ASCII, so maybe I miss something. But I've also had to use accounts like abc745, which didn't bother me much over the duration of a semester or
year.

regards,
iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Hofstaedtler@21:1/5 to All on Sun Nov 24 14:40:01 2024

* Bjørn Mork <bjorn@mork.no> [241124 11:45]:

Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

You are confusing usernames and names. Different concepts with
different rules. Let's just hope you never get two students with the
same name.

I find your reply massively insulting, and I'm not even the original
author.

Usernames (not the "comment" field) are identifiers, and humans care
about the identifiers used for them.

Yes, some humans don't care if you assign them a random 32byte
string as their username. Enough humans however, do have
preferences. In some countries humans even have a right to choose
how they are being adressed.

Chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to Chris Hofstaedtler on Sun Nov 24 14:50:02 2024

On 2024-11-24 14:37:24, Chris Hofstaedtler wrote:

* Bjørn Mork <bjorn@mork.no> [241124 11:45]:

Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

You are confusing usernames and names. Different concepts with
different rules. Let's just hope you never get two students with the
same name.

I find your reply massively insulting, and I'm not even the original
author.

Massively?

Usernames (not the "comment" field) are identifiers, and humans care
about the identifiers used for them.

Yes, some humans don't care if you assign them a random 32byte
string as their username. Enough humans however, do have
preferences. In some countries humans even have a right to choose
how they are being adressed.

And what relation does the username used for logging in have to "being addressed"? Isn't it akin a passport/ID card number?

regards,
iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon McVittie@21:1/5 to Iustin Pop on Sun Nov 24 15:30:01 2024

On Thu, 21 Nov 2024 at 23:26:48 +0100, Iustin Pop wrote:

As Richard also replied, full UTF-8 is tricky, and I think it's somewhat misplaced to focus on the username, as opposed to gecos. Aren't most
other OSes using the "full name" as the "display name", and the username
is mostly one part of the user/password combination, but not a display property most of the time?

So I would suggest that maybe the better option is to standardise the
gecos format/gecos parsing, so migrate UI tools to use that more often.

As a data point, in our default GNOME desktop, System Settings (gnome-control-center) prompts for a "Full Name" first (behind the
scenes that's the full name part of the pw_gecos field), and a "Username" second (this is the pw_name); and the default display mode for the
gdm3 login prompt is to show a list of full names from pw_gecos.

My understanding is that the full name already allows arbitrary UTF-8,
except for the characters that can't be represented in passwd(5) syntax
(colon, comma, newline) and the ampersand.

Outside the Linux/GNU/freedesktop worlds, this is fairly similar to how
macOS presents the distinction between the display name and the Unix
username (pw_name). macOS is interesting here because it's an operating
system with a lot of Unix ancestry, but has also had a lot of effort put
into making it friendly for non-technical users.

In the macOS world, it seems to be conventional and encouraged to set the username to a lower-case ASCII string with no punctuation, similar to the conventions in POSIX and <https://systemd.io/USER_NAMES/>.
Unfortunately I haven't been able to find a reference for what characters
macOS allows in pw_name. Perhaps a DD who has a macOS system (or a family member with a macOS system) could help here?

I think one good idea that we should certainly adopt from <https://systemd.io/USER_NAMES/> is its separation between "strict mode"
(the naming convention that it encourages for all uses, and enforces
when a user is created via systemd tools) and "relaxed mode" (the much
less strict naming convention that systemd requires for names created by non-systemd tools). Because of the differences between those two modes,
systemd is quite conservative in what its own tools will emit but a
lot more liberal in what it will accept, and that seems like a good
principle here, even if the specific rules that Debian chooses end up
differing from those that systemd has chosen.

smcv

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Hofstaedtler@21:1/5 to All on Sun Nov 24 15:30:01 2024

* Iustin Pop <iustin@debian.org> [241124 14:41]:

On 2024-11-24 14:37:24, Chris Hofstaedtler wrote:

* Bjørn Mork <bjorn@mork.no> [241124 11:45]:

Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> writes:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students
that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

You are confusing usernames and names. Different concepts with
different rules. Let's just hope you never get two students with the same name.

I find your reply massively insulting, and I'm not even the original author.

Massively?

Yes.

Usernames (not the "comment" field) are identifiers, and humans care
about the identifiers used for them.

Yes, some humans don't care if you assign them a random 32byte
string as their username. Enough humans however, do have
preferences. In some countries humans even have a right to choose
how they are being adressed.

And what relation does the username used for logging in have to "being addressed"? Isn't it akin a passport/ID card number?

No. I see and type my username hundreds times a day, people use it
to address me in written and spoken conversations with it, etc.

If it were my uid, which I see maybe once a week and don't have to
remember, I wouldn't care.

Chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Giuseppe Sacco@21:1/5 to All on Sun Nov 24 15:40:02 2024

Hi all,

Il giorno dom, 24/11/2024 alle 13.20 +0100, Iustin Pop ha scritto:

[...]
I still don't understand the need for username to be very
representative of one's name. OTOH, my name can be fully written
using
ASCII, so maybe I miss something. But I've also had to use accounts
like
abc745, which didn't bother me much over the duration of a semester
or
year.

It is true that user account name and user (display) name are
different, of course. But still, when you log in, you use the user
account name to the access system; this is the text shown in file
ownership listing and almost everywhere in the system.
I think that user (display) name, that may be put in gecos field, are
not widely used. Moreover, adduser man page on Debian stable, states
that gecos fields will be removed after bookworm.

So, having a good account user name is an important thing. And we have
to chose if it should be "good" for the computer (like in: unique,
lowercase, short, US-ASCII, etc.) or if it should be "good" for the
real user. In the latter case, I would accept a broader class of
strings for the very simple reason that it should be left to user
preference.

I checked what other systems do:

Windows[0] accept any characters, except " / \ [ ] : ; | = , + * ? < >,
and allow for 64 characters (or bytes, I am unsure on this).

SunOS has these restrictions[1] "a string of no more than thirty-two
bytes consisting of characters from the set of alphabetic characters,
numeric characters, period (.), underscore (_), and hyphen (-). The
first character should be alphabetic and the field should contain at
least one lowercase alphabetic character"

In LDAP[2] the uid field is a "Directory String"[3], so any non zero
length UTF8 text. There is a note: Servers and clients MUST be prepared
to receive arbitrary UCS code points, including code points outside the
range of printable ASCII and code points not presently assigned to any character.

FreeBSD[4] suggest to "use user names that consist of eight or fewer,
all lower case characters in order to maintain backwards compatibility
with applications." But the real syntax[5] is: login name must not
begin with a hyphen (`-'), and cannot contain 8-bit characters, tabs or
spaces, or any of these symbols: `,:+&#%^()!@~*?<>=|\/";'. The dollar
symbol (`$') is allowed only as the last character for use with Samba.
No field may contain a colon (`:') as this has been used historically
to separate the fields in the user database.

IBM AIX has these rules[6]: must not begin with a hyphen (-), plus sign
(+), at sign (@), or tilde (~). Additionally, do not use any of the
following characters within a user-name string: :"#,=\/?'`
Finally, the login parameter cannot contain any space, tab, or newline characters.

On HP-UX user names are restricted[8] to eight characters and group
names to 16 character ut you may change limits up to 254 characters.
Anyway, it must start with a letter.

Kerberos syntax for principal[9] is GeneralString constrained to
contain only characters in IA5String (so, basically US-ASCII 7 bits),
with this note: US-ASCII control characters should not be used.

So, I think any sequence of unicode "printable" letters should be
allowed. It may be encoded in UTF-8 or other encoding, but I think UTF-
8 is the best encoding since in includes the US-ASCII 7 bit chars.
About the meaning of "printable", probably this means a few unicode categories[7] should be included: lowercase letter, uppercase letter,
decimal number, plus a few symbols (hyphen, period, plus, at sign, and underscore at minimum).

Bye,
Giuseppe

[0]https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-2000-server/bb726984(v=technet.10)?redirectedfrom=MSDN
[1]https://docs.oracle.com/cd/E88353_01/html/E37852/passwd-5.html#REFMAN5passwd-5
[2]https://www.rfc-editor.org/rfc/rfc4519#section-2.39 [3]https://docs.ldap.com/specs/rfc4517.txt [4]https://docs.freebsd.org/en/books/handbook/basics/#users-synopsis [5]https://man.freebsd.org/cgi/man.cgi?query=passwd&sektion=5&format=html [6]https://www.ibm.com/docs/en/aix/7.2?topic=u-useradd-command [7]https://www.compart.com/en/unicode/category [8]https://support.hpe.com/hpesc/public/docDisplay?docId=c01922594&docLocale=en_US
[9]https://www.rfc-editor.org/rfc/rfc4120#section-5.2.1

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?B?QsOhbGludCBSw6ljemV5?=@21:1/5 to All on Sun Nov 24 16:00:01 2024

Hi Johannes,

Johannes Schauer Marin Rodrigues <josch@mister-muffin.de> ezt írta
(időpont: 2024. nov. 23., Szo, 9:32):

Quoting nick black (2024-11-23 08:48:10)

You now have glyphs which occupy more than one column. Are your columnar/tabular programs prepared for that? ﷽𒁭𒐫i

xfce-terminal renders this like this: https://mister-muffin.de/p/4o2v.png

No idea if this is correct and I'll leave the details to those who know more about this topic than I. And maybe my email client completely messes this up in
this response of mine.

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

I had students as well with many of them having accents in their name,
like myself and never had this kind of discomfort before.

If any time it occurs to me, I'll remind myself that also deeply
personal birthdays are shown as Arabic numerals instead of Roman ones
which would look way cooler, and also use the base 10 encoding instead
of base 60 which encoding was widely used by Sumers.

If having this work in Debian can put a bit on pressure on those software projects that do not support this, then please let that happen so that missing
unicode support becomes more annoying for those pieces of software that are missing it. For example, if my email client messed this up, then lets fix it. We cannot find these kind of bugs if we accept translating everybody's given name to the American alphabet.

Please don't open this can of worms and impose pointless work on
upsteams. Keep what works reasonably well for decades.

Cheers,
Balint

PS: The mandatory relevant Monty Python sketch: https://www.youtube.com/watch?v=6cKsBe3on5g

Thanks!

cheers, josch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?Bj=C3=B8rn_Mork?=@21:1/5 to Chris Hofstaedtler on Sun Nov 24 16:40:02 2024

Chris Hofstaedtler <zeha@debian.org> writes:

No. I see and type my username hundreds times a day, people use it
to address me in written and spoken conversations with it, etc.

This is confusing the subject even more.

Are you sure you are talking about usernames? Or is this email local
parts, chat nicknames and spoken nicks? If so, then there is no reason
you can't use utf8. Today. Without changing any username.

It's also possible to modify $PS1 if seeing \u without utf8 is annoying.

Bjørn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon McVittie@21:1/5 to Giuseppe Sacco on Sun Nov 24 17:40:02 2024

On Sun, 24 Nov 2024 at 15:37:36 +0100, Giuseppe Sacco wrote:

Moreover, adduser man page on Debian stable, states
that gecos fields will be removed after bookworm.

No, it says the --gecos *option* will be removed after bookworm,
replaced by --comment, which seems to be another name for the same thing: passwd(5) "user name or comment field" = struct passwd's pw_gecos,
as can be edited by chfn(1).

The field containing the user's full name, and a way to edit it, are
definitely something that should stay.

smcv

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Kern@21:1/5 to All on Sun Nov 24 18:30:01 2024

On Sun Nov 24, 2024 at 4:03 PM CET, Bjørn Mork wrote:

Chris Hofstaedtler <zeha@debian.org> writes:

No. I see and type my username hundreds times a day, people use it
to address me in written and spoken conversations with it, etc.

This is confusing the subject even more.

Are you sure you are talking about usernames? Or is this email local
parts, chat nicknames and spoken nicks? If so, then there is no reason
you can't use utf8. Today. Without changing any username.

In many organizations the email local part matches the username[1] and it
is also used in spoken conversations. To the point where I needed to
make clear on internal yellow pages that I would prefer not to be called "pkern" in spoken conversation, thank you very much.

So yes, usernames are pretty much used in spoken conversation. Many do
not actually understand what a username is and think that it reflects
how someone wants to be called - as their default assumption.

Kind regards
Philipp Kern

PS: My personal, ignorant, Latin-world opinion is that it is probably
too hard for most people to type each others' usernames if UTF-8 were to
be allowed. And I would never ever use UTF-8 in a local part. And I
suffered a bit too much recently looking at differences between byte
count and character count.

[1] Referred to as "LDAP" in mine, which is both funny and sad.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to All on Sun Nov 24 21:20:02 2024

On Sun, Nov 24, 2024 at 11:58:44AM +0100, Bjørn Mork wrote:

Marc Haber <mh+debian-devel@zugschlus.de> writes:

On the other hand, as long as this is admin-controlled, it doesn't
matter much. I could see that viewpoint, but I wonder how much latent
breakage would be introduced that will take years to fix in all tooling
and all packages.

Yes. Fixing breakage makes software better, and by disallowing non-latin characters in user names we are hiding those issues away.

This is arrogant.

That was not my intention. I apologize for that.

Assuming that a username can be displayed, sorted,
compared and typed using strict us-ascii is not a bug today. It's not "hiding" any issue.

I have to disagree. Our tools allow creating UTF-8 usernames today, and
even if they did it would be possible to just edit /etc/passwd.

The question is whether it makes sense to introduce a new class of bugs
by changing the rules. And we can pretty much guarantee that some of
those bugs are securty critical, since this is all about authentication
and authorization.

So we're having these bugs right noow. If you can use adduser or useradd
to create such accounts, then you have the privilege of putting them
directly into /etc/passwd as well. /etc/passwd is a well-defined and
documented interface.

For what purpose?

Being friendly to people who can't properly write their names in latin.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Johannes Schauer Marin Rodrigues on Wed Nov 27 17:00:01 2024

On Sat, Nov 23, 2024 at 09:32:32AM +0100, Johannes Schauer Marin Rodrigues wrote:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames. I find it very uncomfortable every time I have to tell my students that sorry, you somehow have to manage writing your name using American letters
because that's all we have after half a century of Computers being a thing...

In Debian stable, they can already try.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Gioele Barabucci on Wed Nov 27 17:00:01 2024

On Sat, Nov 23, 2024 at 12:53:52PM +0100, Gioele Barabucci wrote:

On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames.

Yes please, but opt-in and behind a big red warning that says that it is not interoperable (outside POSIX),

adduser requires an option to allow such user names. I think that some
peoples might find it offensive to require an option to be allowed their
native names. You're arguing to not relax the requirement for plain
adduser <username>, right?

potentially insecure (homographs) and at
high-risk of breaking existing applications (lack of standardized normalization form).

Can you outline an attack/failure scenario?

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Philipp Kern on Wed Nov 27 17:00:01 2024

On Sun, Nov 24, 2024 at 06:06:23PM +0100, Philipp Kern wrote:

PS: My personal, ignorant, Latin-world opinion is that it is probably
too hard for most people to type each others' usernames if UTF-8 were to
be allowed.

Why would anybody need to type somebody else's user name despite in
"su"? I see it as the exception that local parts of mail addresses do
1:1 map to a UNIX user name.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to nick black on Wed Nov 27 16:50:01 2024

Hi nick,

On Sat, Nov 23, 2024 at 02:48:10AM -0500, nick black wrote:

Marc Haber left as an exercise for the reader:

(1)
Should Debian allow UTF-8 user names in the first place or should we restrict names for regular users to some us-ascii near set as well? (I think yes, we should)

I feel strongly yes, despite POSIX admonitions (quoted elsewhere
in this thread) and sure breakage any number of places.

Thank you, noticed.

I think
a test plan would be very desirable (off the top of my head,
we'd want to check login, the DMs, PAM, OpenSSH, passwd, w,
framebuffer console input, etc. It would probably also be a good
idea to loop in other distributions.

Coordinating this test is way beyond what I have available in resources,
most notably time. Our tools have been allowing UTF-8 user names at
least since bookworm (I don't have any bullseye systems left, buster's
adduser does not allow UTF-8). So we are already testing this in a
stable release (albeit unplanned).

Please note that allowing UTF-8 user names by default does not break compatibility in any place where only 7bit user names are being used.
Debian is not using such user names in anything that we ship. We only
allow them.

Actually _doing_ this is still the local admin's decision. And should
they decide to not want this, adduser can be configured to disallow.

This thread is mainly about whether we should disallow things in next
stable that are possible in current stable. I think we need good reasons
for that, and I ain't seeing any right now.

I recommend Chapter 7 of my free book, "Hacking the Planet with
Notcurses: A Guide to TUIs and Character Semigraphics" for the
full story (as I understand it) regarding Unicode presentation: https://nick-black.com/htp-notcurses.pdf (starts on page 41).

Noted for reading.

* any upstream tool could say "bad idea" and refuse patches,
requiring their long term management,

Depending of how important this tool is, we could get away without
patching and probably not even documenting this failure.

* the Linux framebuffer console is pretty limited in what
glyphs it has available, and the number of glyphs it can
support,

Probably, yes. But people working on the Linux framebuffer console are
unlikely to actually use UTF-8 user names, so the only really bad
situation would be a rescue situation. We could get away with
documenting "please use 7bit only user names for accounts that are
likely to be used in system rescue situations".

* you want installer support if you intend to do this right,

The installer currently allows me to type UTF-8 user names in the entry
fields (and even displays them correctly when one goes through the
dialogs a second time), but rejects them with a sanitation error message
("The username you entered is invalid. Note that usernames must start
with a lower-case letter, which can be followed by any combination of
numbers and more lower-case letters, and must be no more than 32
characters long.") which is incorrect, it should be "lower-case us-ascii letters". From a German point of view "jürgen" conforms to the rules
given in the error message.

* ubiquitous input for UTF-8 is a pretty complicated story, and

Sites using such letters in user names should know which of them can be
typed.

* broken localization (or failure to call setlocale()) could be
a bigger problem, especially for root/system accounts.

I don't think we should allow UTF-8 charactes in the string "root" or in
system account names. And if a local admin decides to do so, Debian
packages should still restrict themselves to using US-ASCII in their
system accounts.

Other concerns:

You'll likely now be linking libunistring into some
binaries where it wasn't previously used.

Probably, yes. I hope to get away in adduser without that, since I'd
like to keep adduser's dependencies minimal (it's being used in the
installer).

Regarding the subset of Unicode characters you'd want to allow,
this would be best decided using the General Category trait.
Each codepoint is assigned one of a finite set of General
Categories. We would probably want to allow Letters, Marks, and
Numbers, and perhaps a whitelist from Punctuation and Symbols
(Punctuation, connector and Punctuation, dash are probably all
we'd want) extended from currently supported ispunct(3)
characters. This data is available from libunistring (and
probably other places). This eliminates a great swatch of known
security issues.

Do you have a suggestion for a perl regexp that allows this? My current development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".

Names containing invalid UTF-8 sequences ought be rejected.

Agreed. How do I check for this in perl?

Characters 0-127 would presumably be allowed iff they are now;
UTF-8 preserves US-ASCII.

I'd rather allow 32-127 only.

We ought support combining characters up through the Extended
Grapheme Cluster (a single user-perceived character, roughly a
glyph, made up of one or more encoded characters). Generally a
single backspace ought map to an entire EGC.

This is beyond my knowledge of Unicode.

Regarding canonicalization/normalization, this is a complex
question without a necessarily correct technical answer. I think
you'd want to follow the Principle of Least Astonishment; as to what
would astonish the least, I'd like to hear wider input. But
Unicode definitely defines multiple normal forms and equivalency
classes.

I am not sure whether we need this. A local admin is likely to be
consistent to herself in creating user names.

You now have glyphs which occupy more than one column. Are your columnar/tabular programs prepared for that? ﷽𒁭𒐫

Probably not. If that's important for a local admin, they can disallow
such characters and maybe even file a patch against adduser.

Quoting the character just out of curiosity.

(2)
If the answer to (1) is "allow UTF-8", should we also do that for system users? (I think no, we should not)

I think you should, simply because otherwise you have two paths
in more places.

Adduser already has different code paths for normal and system accounts.

(3)
I think that 32 characters/bytes (it's the same if we don't allow UTF-8)
is a good limitation for a system user name. But, should we increase
that for regular user names? (I think yes)

I hesitate to comment here because who really cares, but does 32
save us something over 128? 128 seems the default "enough for
everybody" these days, looking at IPv6 and ZFS.

systemd argues that > 32 characters are rarely supported in "older and unmaintaind" utilities.

My printer is administered by i̸̒n̴͛e̵̎l̴͝u̷̾c̴̉t̵́å̵b̷͋l̷͐e̴̋m̸̆o̷̚d̴̐ä̸́l̶͝i̷̋t̷͗ẏ̷ȏ̵f̸̃t̶͘h̷͗e̴̿v̶͘i̷̛s̸̈́ì̵b̷̃l̶̎e̷͊.

That really renders strangely here.

(6)
Does it still make sense to give non-UTF-8-locales special handling
(which one?), or can adduser safely assume that any non-ascii locale is UTF-8? Or must I check for locale and reject UTF-8 user names on
non-UTF-8 locales? (I hope that we can safely assume UTF-8)

It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a
properly set LANG and programs calling setlocale(). This, as
alluded to above, has the potential for a big mess.

Our default is C.UTF-8 and has been like that for a while.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Simon McVittie on Wed Nov 27 17:30:02 2024

On Sun, Nov 24, 2024 at 02:19:51PM +0000, Simon McVittie wrote:

I think one good idea that we should certainly adopt from <https://systemd.io/USER_NAMES/> is its separation between "strict mode"
(the naming convention that it encourages for all uses, and enforces
when a user is created via systemd tools) and "relaxed mode" (the much
less strict naming convention that systemd requires for names created by non-systemd tools). Because of the differences between those two modes, systemd is quite conservative in what its own tools will emit but a
lot more liberal in what it will accept, and that seems like a good
principle here, even if the specific rules that Debian chooses end up differing from those that systemd has chosen.

Yes. Especially we need to note that systemd strict mode is even
stricter than what we currently allow for system accounts. I also don't
like that this is not configurable, especially regarding systemd-homed
which affects the account names of regular users.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Giuseppe Sacco on Wed Nov 27 17:30:02 2024

Hi,

On Sun, Nov 24, 2024 at 03:37:36PM +0100, Giuseppe Sacco wrote:

It is true that user account name and user (display) name are
different, of course. But still, when you log in, you use the user
account name to the access system; this is the text shown in file
ownership listing and almost everywhere in the system.
I think that user (display) name, that may be put in gecos field, are
not widely used.

I think this differes between GUIs and DEs (which are more likely to use
the display name) and the console (where the user name is used).

Moreover, adduser man page on Debian stable, states
that gecos fields will be removed after bookworm.

That's a misunderstanding. We're just in the process of renaming the
--gecos option to --comment as per passwd(5) documentation. Sadly,
passwd(5) uses "login name" instead of "user name"

So, having a good account user name is an important thing. And we have
to chose if it should be "good" for the computer (like in: unique,
lowercase, short, US-ASCII, etc.) or if it should be "good" for the
real user. In the latter case, I would accept a broader class of
strings for the very simple reason that it should be left to user
preference.

I think that we should have reached a state where a properly UTF-8
encoded string should be a good compromise between "good for the
computer" and "good for the person". In Debian, we have a rather tightly controlled ecosystem and can take care that things don't break too
badly.

I checked what other systems do:

Thank you for this tedious work. I have incorporated that into https://wiki.debian.org/UserAccountsPhilosophy to preserve the
information.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to this advice in the documentation we on Wed Nov 27 17:40:02 2024

On Fri, Nov 22, 2024 at 08:42:10PM +0100, Étienne Mollier wrote:

Marc Haber, on 2024-11-22:

I might be naive here , but I don't have much experience with non-ascii names since I have the privilege of being fluent in two languages that
use the latin alphabet.

I am not sure whether I am the intended audience here, because
my name is almost Ascii based. That being said, I happen to
have one weird enough latin based character as the first letter
in my first name, that it gives interesting results when thrown
toward random databases. Thus I do happen to have some thoughts
about this topic.

All opinions are important.

On the other side, wouldnt it be a courtesy to allow people having a
name that needs transcription to be written in latin to use their name
in the real alphabet that it is usually written in as a login name as
well? To make things worse, transcriptions are often ambigious.

I would like to hear the opinion of people who would be affected by this change.

I tried to consider what it would take to have an émollier or an
Émollier login, and there is one little blocker : I may have to
login from environments or keyboards lacking the necessary i18n
and l10n capabilities to transcribe the 'e' acute, let alone the
uppercase 'e' acute.

Yes. Configuring all keyboards and input subsystems in the realm of this instance of the user database in a way that all users are able to login
are the responsibility of the local admi.

For example, I hit this particular issue
when populating the Gecos field from the Debian installer
environment: if I choose a Qwerty US configuration but miss the
step to choose which Qwerty US internationalized variant I want
to use, then I don't get to type uppercase 'e' acute, but there
are many other situations unrelated to d-i or even Debian where
I run into that.

That issue would only affect users created from the Installer, and even
if you insist to have étienne as UID 1000, you could change to that
after installation. I tend to classify the inability to type the
intended user name on account creation a user error ;-)

I always create "zgadmin" in the installer, which is my user to ssh into
before sudoing to root if my regular account (which has a higher UID
for historial reasons) is unavailable. I wonder whether we should give
this advice in the documentation we are bound to write once we have
decided to officially allow UTF-8 login names.

For this practical reason, I tend to feel
better about keeping a full Ascii login name. I wouldn't feel
strongly if unicode support for login never happens.

It is already allowed. Only its support status is unclear.

I believe
however that the Gecos is the right place to store the properly
typed-in person name, because it is a "presented" name that
hasn't the technical coupling that the login name has, and I
would probably have stronger feelings if it were to not have
unicode support.

Console tools tend to ignore the gecos/comment name.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Gioele Barabucci on Wed Nov 27 17:40:02 2024

On Fri, Nov 22, 2024 at 10:01:24PM +0100, Gioele Barabucci wrote:

your case highlights another problem not mentioned in the original list posted by Marc: comparison (and normalization).

Some characters can be encoded in more than one way. For instance, "é" in "émollier" could we stored as "e with acute" U+00E9 (and encoded in UTF-8 as 0xc3 0xa9) or as "e, combined with an acute accent" U+0065 plus U+0301 (UTF-8: 0x65 0xcc 0x81).

That would be two distinct user names. Unless we have a widely available unicode library that can do this kind of normalization it is unlikely
that our system utilities can take care of that. I'd like to put that responsibility on to the person who / the system that actually creates
those user names.

If a keyboard input system provides the former
sequence of bytes, but the username is stored in the login infrastructure using the latter sequence of bites, then a naive comparison will not find
the user "émollier" in the system.

Currently adduser just takes the characters that come from the command
line and encodes it into the byte stream that goes to useradd and
library calls.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Peter Pentchev on Wed Nov 27 17:50:01 2024

On Sat, Nov 23, 2024 at 01:36:48AM +0200, Peter Pentchev wrote:

POSIX says "if you want your applications to be portable, do not use any funny characters in usernames":

But we are not writing applications, we are a distribution. Anything
that works with the software we distribute is fine.

A string that is used to identify a user; see also 3.407 User Database.
To be portable across systems conforming to POSIX.1-2024, the value is
composed of characters from the portable filename character set.

If a local admin wants their local user database (hence, /etc/passwd or
an LDAP diretory) to work with non-Debian OSses, they need to take care
about which accounts they create. I don't think that we should restrict
local admins who don't need that kind of portability.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andy Smith@21:1/5 to Marc Haber on Wed Nov 27 21:10:02 2024

Hi,

On Wed, Nov 27, 2024 at 04:54:39PM +0100, Marc Haber wrote:

Can you outline an attack/failure scenario?

On the failure side, I did a few tests and noticed that on Debian 12 if
I create a user with for example é in their username then I can log in
by SSH as long as that é is encoded the same way: as utf-8 0xC3 0xA9.
But if that é is made of the combining characters 0x65 0xCC 0x81 (as
that one just was) then that's not the same user even if it looks the
same.

Upon login, the logs from sshd contain the escaped bytes but the logs from PAM and systemd-logind are in utf-8:

2024-11-23T00:35:37.743827+00:00 arran sshd[1903006]: Accepted password for h\303\251llo from 200:d0e9:8d97:72fe:69af:eb63:7e9e:1f07 port 37396 ssh2
2024-11-23T00:35:37.744825+00:00 arran sshd[1903006]: pam_unix(sshd:session): session opened for user héllo(uid=1001) by (uid=0)

So, anything which parses usernames out of logs will need to be aware of
that.

Thanks,
Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michal Politowski@21:1/5 to All on Thu Nov 28 12:40:01 2024

Dnia Sun, 24 Nov 2024 11:22:18 +0000, Gioele Barabucci napisał(a):

On 24/11/24 10:43, nick black wrote:

Gioele Barabucci left as an exercise for the reader:

On 23/11/24 09:32, Johannes Schauer Marin Rodrigues wrote:

But my 2 cents on the topic are: Lets please allow more than ascii in usernames.

potentially insecure (homographs) and at
high-risk of breaking existing applications (lack of standardized normalization form).

i'm not sure why this is being repeated.

https://unicode.org/reports/tr15/

Dear Nick,

You may have misunderstood that phrase. I was not referring to the fact that there are no standardized normalization forms for Unicode (I explicitly mention Annex 15 in [1]), but to the fact that there is no standard that specifies which of the possible normalization forms should be used for account names (and other fields in passwd).

POSIX explicitly limits itself of a subset of ASCII, so it is not going to mandate any normalization form. Are there other standards (or initiatives)
in this area that you know of?

What about RFC 8265?
"Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords"
https://datatracker.ietf.org/doc/html/rfc8265

Regards,

[1] https://lists.debian.org/debian-devel/2024/11/msg00305.html

--
Michał Politowski

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Michal Politowski on Sun Dec 1 23:30:01 2024

On 28/11/24 11:28, Michal Politowski wrote:

POSIX explicitly limits itself of a subset of ASCII, so it is not going to >> mandate any normalization form. Are there other standards (or initiatives) >> in this area that you know of?

What about RFC 8265?
"Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords"
https://datatracker.ietf.org/doc/html/rfc8265

Thank you Michal for the pointer.

RFC 8265 (and the associated RFC 8264 "PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols") looks exactly what all login-related programs should
implement in order to avoid the kind of errors described in <https://lists.debian.org/debian-devel/2024/11/msg00491.html>.

But a cursory search shows that none of the current upstreams support
(or mention) PRECIS. (It also shows that src:precis is a Java library
squatting a bit on that package name... :))

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to G. Branden Robinson on Mon Dec 2 09:00:01 2024

On Sun, Dec 01, 2024 at 09:16:03PM -0600, G. Branden Robinson wrote:

These things are ugly, which is why I suppose they haven't caught on
despite being around for decades, but I would guess that this problem
space is such that there are no non-ugly solutions apart from "just
stick to ASCII", which some people find ugly in a different way.

The issue is that we didn't stick to ASCII. You CAN use UTF-8 in user
names and it works.

Apologies if I missed someone bringing up and rejecting Punycode in the previous ~41 messages in this thread.

Noone did. It doesn't make sense anyway (and I would not implement this
in adduser), because we HAVE UTF-8 and it works. So ther alternatives
are really

(1) Stick with the current way, having UTF-8 work but keeping it
undocumented, hurling any breakage on the user
(2) Document UTF-8 as working and consider breakage a bug
(3) Forbid UTF-8

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to nick black on Mon Dec 2 09:30:01 2024

On Mon, Dec 02, 2024 at 01:35:05AM -0500, nick black wrote:

WTF-8 extends UTF-8 to handle
invalid UTF-16 input.

WTF-8 is a seriously defined encoding? I have only experienced that name
as a mocking name for an UTF-8 string that erroneously went though UTF-8 encoding a second time (double-UTF-8).

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to nick black on Mon Dec 2 09:50:01 2024

On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:

Marc Haber left as an exercise for the reader:

* any upstream tool could say "bad idea" and refuse patches,
requiring their long term management,

Depending of how important this tool is, we could get away without
patching and probably not even documenting this failure.

This kind of attitude seems self-defeating. Despite being
*strongly* in favor of this effort, I would oppose it if were
strictly a Debian thing. We can inspire the move, but going it
alone seems a recipe for present and future pain (think SSHing
from/to Debian and a non-Debian machine).

I bet that other distribtions will also allow me to useradd an UTF-8
name today. I don't think that we have patched useradd to allow this.

* the Linux framebuffer console is pretty limited in what
glyphs it has available, and the number of glyphs it can
support,

Probably, yes. But people working on the Linux framebuffer console are unlikely to actually use UTF-8 user names, so the only really bad

With all due respect, this seems totally unsupported by anything
other than vibes =].

So you think that we should be stricter than we are today?

* broken localization (or failure to call setlocale()) could be
a bigger problem, especially for root/system accounts.

I don't think we should allow UTF-8 charactes in the string "root" or in system account names. And if a local admin decides to do so, Debian packages should still restrict themselves to using US-ASCII in their
system accounts.

Why? This would require multiple code paths for what seems to me a
very questionable objective. You point out later in your
response that there already exist diverging codepaths, but isn't
unifying such things always a goal?

I think that the distinction between system users and regular users is a
good thing and that we should continue treating them differently.
Strictly, it's only adduser (and useradd, UID only) having different
code paths, the treatment in other software is identical.

Even if we unify things (either by allowing strange characters in system
user names, or by restricting regular user names to the western
character set), adduser will need to keep the distinction because we
assign UIDs from different ranges.

Do you have a suggestion for a perl regexp that allows this? My current development directory has "qr/[\p{Graph}*\.\${}><%'@]+/".

I do not. This is not a regex problem in my mind and experience;
you need full access to complicated libraries.

Adduser will have to stick to regexes for dependency reasons.

Any such effort
should go through Annex 15 canonicalization before being
inspected at all.

I have always assumed that canonicalization would be used for sorting
and equality, while in the databases it is important to keep the
difference between the unit Angstrom and the capital letter A with
circle. If we canonicalize everything, why do we have different
codepoints for different semantics?

Yes, I need to read your book.

At that point, you're well past regular
languages so far as I can tell. I do not see this goal as
possible with small surgeries on the adduser code base, but
rather something that requires work across the chain.

So, "not for Trixie". And what would we do in Trixie? I think we need
something that a single person can implement in spare time before
christmas. This is a rather limited amount of time that we have.

It cannot. "C" is not UTF-8. Assumption of UTF-8 requires a
properly set LANG and programs calling setlocale(). This, as
alluded to above, has the potential for a big mess.

Our default is C.UTF-8 and has been like that for a while.

Yes, but that can be changed.

By the local admin? Yes. That's why we (Linux distributions) should
stick to us-ascii user names for the accounts that are created by our
packages. If a local admin creates UTF-8 user names but gives them a
non-UTF-8 locale than it's their fault, and if a user with a UTF-8 user
name selects a non-UTF-8 locale it's deliberate sabotage. I don't think
we should or care about that, and it's already possible today.

With all due respect, I admire your gung ho candoit spirit, but
adduser alone is not IMHO the place. This is a major change
requiring support from libraries, applications, and UI to do
right, and thus wide buyin. I love the idea, but it's not going
to happen with a few Perl regexes. Please don't read this as
commentary on you or your code.

So your recommendation is to disallow things that we have allowed until recently, and maybe remove configurability to REALLY disallow it?

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michal Politowski@21:1/5 to All on Mon Dec 2 11:40:01 2024

Dnia Sun, 1 Dec 2024 23:27:09 +0100, Gioele Barabucci napisał(a):
[...]

But a cursory search shows that none of the current upstreams support (or mention) PRECIS. (It also shows that src:precis is a Java library squatting
a bit on that package name... :))

But at least it is an implementation of this PRECIS :)
There is also python3-precis-i18n in the archive.

--
Michał Politowski

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Hofstaedtler@21:1/5 to All on Mon Dec 2 16:30:02 2024

* Marc Haber <mh+debian-devel@zugschlus.de> [241202 09:43]:

On Sun, Dec 01, 2024 at 06:55:09PM -0500, nick black wrote:

Marc Haber left as an exercise for the reader:

* any upstream tool could say "bad idea" and refuse patches,
requiring their long term management,

Depending of how important this tool is, we could get away without patching and probably not even documenting this failure.

This kind of attitude seems self-defeating. Despite being
*strongly* in favor of this effort, I would oppose it if were
strictly a Debian thing. We can inspire the move, but going it
alone seems a recipe for present and future pain (think SSHing
from/to Debian and a non-Debian machine).

I bet that other distribtions will also allow me to useradd an UTF-8
name today. I don't think that we have patched useradd to allow this.

We did. Debian carries (since "forever") a patch in useradd to turn
off most name checking. (Trying to) remove this patch is what
started this all.

Observe:

[root@cc65635fbf00 /]# cat /etc/os-release
NAME="Fedora Linux"
VERSION="40 (Container Image)"
...
[root@cc65635fbf00 /]# useradd för
useradd: invalid user name 'för': use --badname to ignore

Not sure if mjt brought it up yet, but the sendmail interface will
also need some solution for utf8 usernames (=email address local
parts). However, it seems some sendmail implementations already
cannot cope with utf8 gecos fields.

Chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to All on Tue Dec 3 17:30:01 2024

Hi,

thank you all for your contributions to this discussion. I have now
finally understood¹ that it is not enough to try creating an UTF-8
encoded user name and see that it correctly shows up in /etc/passwd to
declare UTF-8 support. Please forgive me for not replying to all of you
in this thread individually, I have read everything and if I didnt cater
for your arguments in this message please feel free to remind me.

https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly
outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
adduser should reject creating étienne if étienne already exists - those
are different user names but look the same, and if you don't
cut-and-paste user names instead of typing them you're bound to hit the
wrong user depending on HOW you type and what input medium you use. Not
good.

https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.

After understanding this, I must admit that what's currently left active
on the adduser team (me) doesn't have the capacity to implement this
properly and in time for trixie. To make things worse, the
Unicode::Precis module, which should be in Debian as
libunicode-precis-perl (but isn't) hasnt seen an upstream release in
more than five years.

Additionally, I don't see myself in the situation of writing a proper
checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
since I don't have the time to check out which \p{Foo} character classes
match the classes given in the RFC.

I would appreciate volunteers to help here, but first I need to bring
some sense in adduser's current state of affairs to make an unstable
upload that can eventuall migrate to testing.

What I intend to do in adduser for the next unstable upload is:

- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have
- adduser --allow-all-names will just verbatim pass all user names to
useradd.

All this will be documented in the man page, in README.Debian and/or the
Wiki after the code passes the test suite again.

I'll probably deprecate --allow-bad-names in favor of something that
doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.

I would love to hear your opinion. Silence is agreement ;-)

Greetings
Marc

¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
educating for me

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 18:00:02 2024

On Tue, Dec 03, 2024 at 05:46:00PM +0100, Gioele Barabucci wrote:

On 03/12/24 17:20, Marc Haber wrote:

What I intend to do in adduser for the next unstable upload is:

- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have

Dear Marc,

in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?

RFC 8264 5.2.4 Normalization Rule states:

In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.

that would solve the étienne and étienne issue (where the two characters
are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

While this seems the right thing to do, I think this should be done in
useradd (pkg:shadow), in the respective upstream project, so that all
Linux distributions get the same behavior.

I have filed https://github.com/shadow-maint/shadow/issues/1138 in the
general regard. Feel free to add what I fotgot to mention there.

I'd rather not have this can of worms in adduser, but I'd consider a
patch.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 17:50:01 2024

On 03/12/24 17:20, Marc Haber wrote:

What I intend to do in adduser for the next unstable upload is:

- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have

Dear Marc,

in preparation for a PRECIS future, couldn't adduser pass the usernames
through NFC instead of doing no normalization?

RFC 8264 5.2.4 Normalization Rule states:

In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.

[1] https://www.rfc-editor.org/rfc/rfc8264.html#section-5.2.4

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Tue Dec 3 20:50:01 2024

Hi Marc,

Marc Haber, on 2024-12-03:

thank you all for your contributions to this discussion. I have now
finally understood¹ that it is not enough to try creating an UTF-8
encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
in this thread individually, I have read everything and if I didnt cater
for your arguments in this message please feel free to remind me.

Thank you for having taken the time to investigate this issue,
as a person concerned, I much appreciated it. Let's see whether
I can contribute one last useful item.

I'll probably deprecate --allow-bad-names in favor of something that
doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.

The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?

Have a nice day, :)
--
.''`. Étienne Mollier <emollier@debian.org>
: :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
`. `' sent from /dev/pts/5, please excuse my verbosity
`- on air: DGM - Solitude

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdPXtIACgkQeTz2fo8N Edrubw/8CqDvIyJGTGpt0Wwy8NThEEQ38je9rmO6P0Bz/ExVgDVaD/s7hHNfamq7 VG5sqVOZUD1Mtn31mXVkaB1ZvSRfLwYHDqBos6jm/4rOGjvrmQKCC3niy8A1H6KY FNsTSB9ERcVB0IV94o9zOtzfMh4X26RSop7XZEQVO30+x25uh49Es3GXGTxuedds +dplgu0ikDtZWWhZIWqVlzRk+yQzMUMuk2Y3OOkNY5ieHwGXl8RE+iAsp2czLkSC gaCZm7U3bc2FMnscNnd3AY21e1agAnJblCl80rj3+HhiIzeRXSEo1fFf2cz0ofDL MNuXTY2das46AyDDwirJ7uz3ocyMXYu652Ih/RxxIjNWfd1RU+yY7CZFSSxXBsLZ YehWomyFjHn/zRTV3jHpLWEmgQJ7eSfNtuqe28rSLl0mArbtdzHEMnq2AD3FsoXJ TIrT0LTQmJMBvMiQJmSvO89bSjY8rhBfIQm9slbVTaTJrO4sCTFvGeWkwU1a1mLT 2AtsO4fvVKgTMpZ1BPxNawdPfH/AWGL7EXrMqVLT9AzCY+JQZp9jbB0c9F/P5zlj plGKHFkPqD0xqlya4P1X3roO6R09UMf41JE/1jOcSVIOqgdlSqlDYQzCbwZDnC2T HTXauqNOJ7hKk5lRaxw4x8xFFaQv1XQ3qcrxatqrmBpt6dr5sXU

From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 21:40:01 2024

On 03/12/24 17:59, Marc Haber wrote:

in preparation for a PRECIS future, couldn't adduser pass the usernames
through NFC instead of doing no normalization?

RFC 8264 5.2.4 Normalization Rule states:

In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.

that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

NFC would solve both of these "problems":

* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

What NFC alone will not solve are homograph collisions: a (U+0061 Latin
small letter a) and а (U+0430 Cyrillic small letter a) are
NFC-normalized to different codepoints.

But these are two different scenarios: the former problem may (and does)
arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce
a different é), the latter is an attack. The former is an
interoperability issue, the latter is a security issue.

While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
Linux distributions get the same behavior.

That's probably the best approach.

Thanks for taking the time to delve into this issue,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to All on Tue Dec 3 22:10:02 2024

On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:

Marc Haber, on 2024-12-03:

I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.

The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?

Extended sounds good, maybe even "unicode"? or "international"?

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 22:10:02 2024

On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:

On 03/12/24 17:59, Marc Haber wrote:

in preparation for a PRECIS future, couldn't adduser pass the usernames through NFC instead of doing no normalization?

RFC 8264 5.2.4 Normalization Rule states:

In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.

that would solve the étienne and étienne issue (where the two characters are just different renderings of the same character), but not the Ohm-against-Omega issue, right?

NFC would solve both of these "problems":

* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

Converting Ohm into an Omega is losing intended information, isnt it?

Thanks for taking the time to delve into this issue,

I have learned many things.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?=C3=89tienne?= Mollier@21:1/5 to All on Tue Dec 3 22:30:01 2024

Marc Haber, on 2024-12-03:

On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:

The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?

Extended sounds good, maybe even "unicode"? or "international"?

I avoided unicode as it would include ascii and the safe subset
documented by posix, and I also considered the unlikely case
where something were to replace unicode. "international" would
make the name technology agnostic, but there is still the case
about also covering the posix-safe subset… Borrowing the idea
from the other branch of the thread, --allow-unsafe-characters
sounds fine and would carry the idea that certain characters
could cause issues, if used in a login name.

Have a nice day, :)
--
.''`. Étienne Mollier <emollier@debian.org>
: :' : pgp: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
`. `' sent from /dev/pts/1, please excuse my verbosity
`- on air: Atlas - Hemifran

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEj5GyJ8fW8rGUjII2eTz2fo8NEdoFAmdPd+kACgkQeTz2fo8N Edq2uQ/+JV7YXDj2ti360MAPkBpFqT9AxgZcElkbo9utZuiqM/YdEUxURXHizqhG japLLIuW1si8xmAT6KAbCRs0pDhsROhALWH1hYCiqmHLCLlEPXV3MwFHtTTu2vvF 6peG7tEH419evKynMuRHW1ZVeoBo2tylONZldyH/83aq75naY8oaJqCndFZj8ZZR bdBB5qpjB7TbojIOFBsunQImWF0ZB/a72boIWl6JFoCvooeY5LLhXqictSwBbo0O R+uZniul+aUDSY1rgbO4jIuWrl6Znk6wmXEFdZshyPgkF+hGSugdEawqaZ9GQ4bT PkDMcez6JXYYW/3ToZellYpBdnnjclVqYm5v83CaakA/pYRSQL/57keRDOBVzfN4 3gkBVmunDrnkJooOrORTKo+3OC5nj+dHk2XRU5D+bFiMv0rOljSg+j1FspYEK8vE xaPPTpAYH1g+3wzndIIRyXYeFDZi64g5mQlFiV4WM4XTbODvNWRCtmjcypQJgLYz kwlgq/mrLM0dv4D3BK8ujEZp9QUvaaehRYr5/Q8+KTWyUYx671AISpysMQhmmSMA PZGzj8r1ENwC1Zps+JVS2GIDeu57EXleW3j+mN2LZORBR+aA1oqDRVGokg10A8BP +PzAEFsWs9hbL9+A5sIY9Gyvqk76kEJuHHWCvnnqDQEaDLQbw

From Marc Haber@21:1/5 to Gioele Barabucci on Tue Dec 3 22:50:02 2024

On Tue, Dec 03, 2024 at 10:18:46PM +0100, Gioele Barabucci wrote:

Normalization is always lossy, at least in principle.

Applications that employ normalization accept that tradeoff in order to gain something valuable: in this case the ability to have a Ohm sign codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.

I don't know what's exactly in the standard, but my gut feeling says
that I would probably store _exactly_ what was received, but normalize
both sides before duplicate checking, sorting, comparing.

If we'd normalize things away in storage, why do we have homographs in
the first place? Why would I replace a kyrillic a with a latin a,
destroying the idea of a "script"?

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Marc Haber on Tue Dec 3 22:20:01 2024

On 03/12/24 22:02, Marc Haber wrote:

On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:

On 03/12/24 17:59, Marc Haber wrote:

in preparation for a PRECIS future, couldn't adduser pass the usernames >>>> through NFC instead of doing no normalization?

RFC 8264 5.2.4 Normalization Rule states:

In accordance with [RFC5198], Normalization Form C (NFC) is
RECOMMENDED.

that would solve the étienne and étienne issue (where the two characters >>> are just different renderings of the same character), but not the
Ohm-against-Omega issue, right?

NFC would solve both of these "problems":

* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
(omega).

Converting Ohm into an Omega is losing intended information, isnt it?

Normalization is always lossy, at least in principle.

Applications that employ normalization accept that tradeoff in order to
gain something valuable: in this case the ability to have a Ohm sign
codepoint as part of your username is traded for the ability to compare usernames across different OSes and applications.

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Soren Stoutner@21:1/5 to All on Tue Dec 3 15:15:52 2024

I appreciate your being careful and deliberate about this instead of rushing into a solution that brings unintended consequences. But I also appreciate your taking the time to engage with the issue instead of just ignoring it.

On Tuesday, December 3, 2024 9:20:53 AM MST Marc Haber wrote:

Hi,

thank you all for your contributions to this discussion. I have now
finally understood¹ that it is not enough to try creating an UTF-8
encoded user name and see that it correctly shows up in /etc/passwd to declare UTF-8 support. Please forgive me for not replying to all of you
in this thread individually, I have read everything and if I didnt cater
for your arguments in this message please feel free to remind me.

https://lists.debian.org/debian-devel/2024/11/msg00491.html correctly outlines that homograph characters (such as é (UTF-8 0xC3 0xA9 and the lookalike é 0x65 0xCC 0x81) are not only a nuisance. At the least,
adduser should reject creating étienne if étienne already exists - those are different user names but look the same, and if you don't
cut-and-paste user names instead of typing them you're bound to hit the
wrong user depending on HOW you type and what input medium you use. Not
good.

https://wiki.debian.org/UserAccounts and https://wiki.debian.org/UserAccountsPhilosophy are updated accordingly.

After understanding this, I must admit that what's currently left active
on the adduser team (me) doesn't have the capacity to implement this
properly and in time for trixie. To make things worse, the
Unicode::Precis module, which should be in Debian as
libunicode-precis-perl (but isn't) hasnt seen an upstream release in
more than five years.

Additionally, I don't see myself in the situation of writing a proper
checker for the RFC 8264 IdentifierClass (Chapter 4.2) at the moment
since I don't have the time to check out which \p{Foo} character classes match the classes given in the RFC.

I would appreciate volunteers to help here, but first I need to bring
some sense in adduser's current state of affairs to make an unstable
upload that can eventuall migrate to testing.

What I intend to do in adduser for the next unstable upload is:

- adduser --system's user name validation will not change
- I'll make sure that adduser <normal user account> doesn't accept
UTF-8 user names, bringing it closer to systemd's notion of a valid
user name
- adduser --allow-bad-names will still allow UTF-8 usernames, not doing
normalization. I will document this and make it clear that the local
admin needs to make sure that they don't allow things they don't want
to have
- adduser --allow-all-names will just verbatim pass all user names to
useradd.

All this will be documented in the man page, in README.Debian and/or the
Wiki after the code passes the test suite again.

I'll probably deprecate --allow-bad-names in favor of something that
doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in
the Red Hat World uses --badname to allow such names as well.

I would love to hear your opinion. Silence is agreement ;-)

Greetings
Marc

¹ RFC 8264, RFC 8265, and Unicode TR 15 linked in this thread were
educating for me

--
Soren Stoutner
soren@debian.org
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEJKVN2yNUZnlcqOI+wufLJ66wtgMFAmdPgxgACgkQwufLJ66w tgP3dA//e7/S9ajyCAzadr0wNbx9oBzIGofqM3OZ6gzZlHtoj/jm0/VwY1VB47kw 5H0zcTO2eyul75riOVKBwwhOXNDueuBv7PwL5tr2c3o1mBHrEdtS+TPaLooUfw/M Qdx+Knouxha5fM+yPbogUZOO3pADrUHDRW2CUaqTEwKleSZdw4IV9Qx8G1hIMqAu QCGMmyeMO9Q+T162eBZ0Ah8sR16jbBEGTk17ax/ssdItNryxJRbl3M+eRk6ge1Mx BWfliH+4s2w/CbyJDTzRiY2lLPeQwpOPfTDv88061kC3iGHzT4YO98oiV8TwTVw6 m1HnJQAUgYbLlN8Bm7iIBL0cTkCK6vv1kPl10dv4Sf6LAwbWokXyU05H6uMnbNSe 7nkuNX1t/3tOKXDZykkfT4BIrjEmzXg3/GVAqUMlUsQbrdR7Js8WjDN3QKj4H3/g FOA1RuKUfo/A2/0o0R2BZIbdmfmM98h4rniy8BRaOlLN4BRBObtHwUDOW9n36oT8 zT9/eJ0WJbSUtNydDDcrHpiYKKyW55fCiEU0XRGf0gtZK4LQcIdOE9oqETnweFt+ xuZbIp8WWl/JYy5tnxnREMwPPV2S4XoE1erpYgUVpt+HY5u9cFKP7bjFYT1VX5HS ZwQ16xvy3aHaO1KFHR0T1GwDf+F7FWEKSc4SaQx9XYY5AgGugdY=
=LMes
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Alejandro Colomar@21:1/5 to All on Thu Dec 5 14:34:21 2024

Copy: debian-devel@lists.debian.org

Hi Marc,

Homograph attacks would be best mitigated in software reading
/etc/passwd, alerting in their output or logs that the user name they
just printed was composed of strange alphabets.

Software that reads /etc/passwd or /etc/shadow is quite sensitive, and
should therefore be as simple as possible. More code, more bugs.

The best mitigation for those attacks is to ban the names altogether.
IMO, setuid programs should not accept Unicode.

Have a lovely day!
Alex

--
<https://www.alejandro-colomar.es/>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmdRq90ACgkQnowa+77/ 2zKYJg//fhAD6gj5l4mWj+4XVfaR1Gz3whaMhQ3K+Bhhbngeg+nyghxONMrZneZL M/63zVSwWnQxOu1wOvV/XkO8yioO8v8EUglDWp0iZwmWEPqQWT6VdBTm5+PlFvSD mLfEF8be+mK/0obnXJVa0Qs+cuWUQAjkep21aovYVh8hN1lTvVcCSsandFe4uFPT wiS0d70lDGja/0xWZqtcrnWiT8I2mfiyrKnGKHOR4Sgg4pPPYVjy1XbR8xPq649u u1klAHUKCrI5UefSns1iTmuoWvywfU5DqOzOp5PJthCnf6eL+ji8ERAihBOQcBy1 hONT1/OHCohuqACFjl4Ian58RGEXwER4Ok0Zus5YEi4ognnh8zMdRifkq8QQ2iuc f9QXqFAzYKS8FtR6VVOyyciVHLE3cU2dTqndzxAaq4b7Dbks719N622Gf20dst5j g5EvVxOfmPpIgHRwMMe9gyst1bkrtXhpc2BYXHaNmInrhRB00G7y8kOMBhsRI1uz SjmDYoxzUh6ZC4jGR5Qy0SyFfxdJGZbZEtEo8XQnXsqCqz9cpDtq90of5HDotj6n vR6iNPav2HjsbdAr41dGY72O4/O8b0pY2Rqr49IM9UF6tHU2fO+qGn9YLPi7fTUN XiLlk6yGVQHHVWIx5mQyzEIFP8ZQbMq7zUe1PL4a1V0XUVAA3ag=
=Ms1f
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Alejandro Colomar@21:1/5 to Marc on Thu Dec 5 15:53:36 2024

Copy: debian-devel@lists.debian.org

Marc wrote:

On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:

Marc Haber, on 2024-12-03:

I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.

The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?

Extended sounds good, maybe even "unicode"? or "international"?

I prefer "bad". It gives the implicit message that it's bad to use that
flag. If you find it offensive, then how about --allow-unsafe-names?

I oppose "unicode", "extended", or "international", as all of them
remove the connotation that you should not use that flag.

Anyway, I vote for removing the possibility of using unsafe names, and
not even exposing a flag.

Have a lovely day!
Alex

--
<https://www.alejandro-colomar.es/>

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEE6jqH8KTroDDkXfJAnowa+77/2zIFAmdRvnAACgkQnowa+77/ 2zLwxA//WTlK2a6dxhRW5f+ohED8josb+k54CmClx76kzQvL7/QililZdl/3eMJx EEnZs3KpiNTrCXwPrI+BAJ7p/yqA6ySI+BXr03nDJYlILtLlkh+9U1UMuEgM5xED KLXME/b+upaBKkOhXHHpjSkWiXnvjdoozEGCU+I8LZ5NY2qGbigfUg1Mnw5asY8B ra5UkN6s0GVzEaRcIm4t5da7ObOwxHir68VsY3kfa9eX9mWKF4foTnnnVvxOUwHg W3BAtllc4ADyCoevsr3cOqdn3GmUKDvH6LDL5Fl6KYVUgW24yXJ2x2Fra+RMj3+p 9ZhM2BNWNjdmJv4pzS7JYwEk6po4/GvfEe5UTaS3d0dfBkAzNx6I8f6OmVQUykOo s4txzSLZZ7WV0zseYjdG5bbbVwn0Uk0cnvPMZ0soTTCMllLBBET2G40qIctO+nTZ X5CLb193/zeF3Uuah5ufMlRS9DMgp8v+Uw3Yz9lD2ht+kGz3z+UJWv7u/19KELYg IH9QisBxQX+5edbhE/Ve3bbPXrpc8VV5YSSpmY67eS4usK28bR/x2zRZNFKWCUHD v1tT9EmYVeFul2FM0A/FkqxDff46MqiqK4wmsz9WPGnJ5GMqgUJoBMOEyllX5nCO lY7N445g2q0UBtHSBYj5ApWv6cBiWl1bMiu6M157d1auiB1Y+ZQ=
=ednz
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephan Seitz@21:1/5 to All on Thu Dec 5 16:40:02 2024

Am Do, Dez 05, 2024 at 14:34:21 +0100 schrieb Alejandro Colomar:

The best mitigation for those attacks is to ban the names altogether.
IMO, setuid programs should not accept Unicode.

Today, not many people want to live in the past and accept simply ASCII
if there name needs a bigger character set.

Stephan

--
| If your life was a horse, you'd have to shoot it. |

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to All on Thu Dec 5 17:10:01 2024

On Thu, 5 Dec 2024 14:34:21 +0100, Alejandro Colomar <alx@kernel.org>
wrote:

The best mitigation for those attacks is to ban the names altogether.
IMO, setuid programs should not accept Unicode.

Oh, Bugs by Code. Dangerous. We should stop producing code completely.
No code, no bugs.

Neither adduser nor useradd are setuid.

--
---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephan Seitz@21:1/5 to All on Thu Dec 5 17:20:01 2024

Am Do, Dez 05, 2024 at 17:05:29 +0100 schrieb Marc Haber:

Neither adduser nor useradd are setuid.

To be fair, passwd is setuid. And I’m sure you are using it to set the password. So it has to survive an unicode user name.

Stephan

--
| If your life was a horse, you'd have to shoot it. |

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to nick black on Thu Dec 5 18:10:01 2024

On Sat, Nov 23, 2024 at 02:48:10AM -0500, nick black wrote:

I recommend Chapter 7 of my free book, "Hacking the Planet with
Notcurses: A Guide to TUIs and Character Semigraphics" for the
full story (as I understand it) regarding Unicode presentation: https://nick-black.com/htp-notcurses.pdf (starts on page 41).

Thank you very much for providing this. The chapter has educated me.
"The vast minimum of things you should know about Unicode."

The time to read it was well spent.

Greetings
Marc

P.S.: Sadly, this has gotten less than positive coverage on LWN. I
apologize for the harm this discussion has done.

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Kallus@21:1/5 to All on Sun Dec 8 21:40:01 2024

Hi everyone!

I second calling it "allow-unsafe-names" for the following reasons:

1. Many programs assume that usernames are so inert that they can be
used in shell strings without proper escaping. For example, a user
named $(touch /tmp/pwn) will create /tmp/pwn upon the first launch of
an interactive bash, because the default bash PS1 interpolates the
username before doing command substitution. adduser doesn't allow
whitespace or forward slashes in usernames, even with
--allow-all-names, but you can still get the same behavior with the
username $(>`printf$IFS"\x2ftmp\x2fpwn"`). How this works is left as
an exercise for the reader. Once you figure it out, see if you can
out-golf us :)

2. There's a path traversal bug in useradd (but not adduser) that can
be triggered by usernames beginning with "../". For example, for the
username "../bin/brangal", useradd will create a home directory at /home/../bin/brangal (i.e. /bin/brangal). This can be used to place a
directory owned by the new user nearly anywhere on the system.

-Ben Kallus && Jonah Weinbaum

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Hofstaedtler@21:1/5 to All on Mon Dec 9 18:10:01 2024

* Marc Haber <mh+debian-devel@zugschlus.de> [241203 22:06]:

On Tue, Dec 03, 2024 at 08:41:06PM +0100, Étienne Mollier wrote:

Marc Haber, on 2024-12-03:

I'll probably deprecate --allow-bad-names in favor of something that doesn't use the word "bad" (suggestions appreciated). Otoh, adduser in the Red Hat World uses --badname to allow such names as well.

The problem is not the name, but the character set, so perhaps --allow-bad-characters will be better perceived. If you want to
also avoid "bad", maybe try --allow-ambiguous-characters, or --allow-extended-character-set? The last one is perhaps a bit
long winded, but also sounds more accurate than the rest. What
do you think of these approaches?

Extended sounds good, maybe even "unicode"? or "international"?

I echo Alejandro's concerns. We should stop having the flag
completely, not encourage using it.

If the default restrictions are too tight, then we need to work on
that. What we should not do is to introduce a badly tested because
mostly unused codepath, that will introduce bugs in all sorts of
places.
IOW: if we move towards better character support, we need to do that
by allowing it always. Same for longer names.

Chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Hofstaedtler@21:1/5 to All on Mon Dec 9 18:20:01 2024

* Marc Haber <mh+debian-devel@zugschlus.de> [241205 18:06]:

P.S.: Sadly, this has gotten less than positive coverage on LWN. I
apologize for the harm this discussion has done.

Marc, my thank you for collecting the info on the wiki, and starting
this discussion. I'm sorry I was not able to participate more.

However, I reject the idea that it is on you to apologize for LWN
covering this discussion and the harm that might have come out of
it. This is something we need to address on a wider floor. Otherwise
we lose our ability to discuss anything (and then changing anything
ever).

Best,
Chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to zeha@debian.org on Mon Dec 9 21:30:01 2024

On Mon, 9 Dec 2024 18:04:52 +0100, Chris Hofstaedtler
<zeha@debian.org> wrote:

This was never on the table, and shadow upstream might even drop the
entire "support" for having bad names.

Just for the record, I consider this a kneejerk reaction that moves
the world backwards. It's sad.

--
---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Hofstaedtler@21:1/5 to All on Tue Dec 10 12:20:01 2024

* Marc Haber <mh+debian-devel@zugschlus.de> [241209 21:21]:

On Mon, 9 Dec 2024 18:08:33 +0100, Chris Hofstaedtler
<zeha@debian.org> wrote:

I echo Alejandro's concerns. We should stop having the flag
completely, not encourage using it.

I violently disagree. But I have to accept this.

IOW: if we move towards better character support, we need to do that
by allowing it always. Same for longer names.

I think that our distinction between system users and "normal" users
is fine. Noone needs a package generating "weird" user names.

I think we're speaking past each other here.

Packages can already create absolutely broken usernames today, if
they want.

To me, the question is more, why do we have a flag that, if used,
allows you to break /etc/{passwd,shadow,group,gshadow} completely?

Chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theodore Ts'o@21:1/5 to Gioele Barabucci on Tue Dec 10 13:50:01 2024

On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:

NFC would solve both of these "problems":

* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349 (omega).

What NFC alone will not solve are homograph collisions: a (U+0061 Latin
small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to different codepoints.

NFC also doesn't solve various invisible characters (e.g., zero-width
spaces, bidirectional control characters). For more information about
all of the various security land mines, see[1]. I also suggest that
people do a google search on "CVE" and "Unicode". There has been at
least one interaction where we needed to make a kernel(!) change to
address a security vulnerability, although we decided it wasn't
super-critical because "no sane distribution actually enables the
casefold feature on users' file systems by default".

[1] https://www.unicode.org/reports/tr39/tr39-22.html

The other security consideration to consider is the vast amount of
code that you need to link into security critical / setuid programs if
you are going to use libunicode. (And yes, we do include libunicode
into the kernel in order to support casefold. If you are thinking
about potentially enabling casefold by default on User file systems
because Windows and MacOS does it, and we need to appeal to Gen Z'ers
in order for Debian to stay relevent(tm) --- please don't. :-)

So if we really do want to support unicode in usernames, may I suggest
that having someone implement the smallest possible Unicode
canonicalization library, which also handles getting rid of all of the
*other* Unicode security traps like invisible characters,
bidirectional control characters, etc., and then asking it to get
subjected to rigorous security audits before we propose linking it
into setuid programs, that would be a Really Good Idea.

This would also reduce bloat in the minimal Debian install required
for installer images, docker containers, etc., since we wouldn't need
to support things like Unicode sorting rules, Unicode case folding,
conversion between the many different Unicode encoding forms, etc.

Cheers,

- Ted

But these are two different scenarios: the former problem may (and does) arise without any wrongdoing from the user's side (a different OS, or a different string manipulation library, or a screen keyboard may produce a different é), the latter is an attack. The former is an interoperability issue, the latter is a security issue.

While this seems the right thing to do, I think this should be done in useradd (pkg:shadow), in the respective upstream project, so that all
Linux distributions get the same behavior.

That's probably the best approach.

Thanks for taking the time to delve into this issue,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Theodore Ts'o on Tue Dec 10 15:00:01 2024

On 10/12/24 13:47, Theodore Ts'o wrote:

On Tue, Dec 03, 2024 at 09:39:03PM +0100, Gioele Barabucci wrote:

NFC would solve both of these "problems":

* Both U+00E9 (é) and U+0065, U+0301 are NFC-normalized to U+00E9,
* Both U+2126 (Ohm sign) and U+0349 (omega) are NFC-normalized to U+0349
(omega).

What NFC alone will not solve are homograph collisions: a (U+0061 Latin
small letter a) and а (U+0430 Cyrillic small letter a) are NFC-normalized to
different codepoints.

NFC also doesn't solve various invisible characters (e.g., zero-width
spaces, bidirectional control characters). For more information about
all of the various security land mines, see[1].

NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.

The IdentifierClass of RFC 8264 explicitly disallows all these "security
land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3

The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12

In general, the PRECIS RFCs are more prescriptive than Unicode UTS #39,
so, should Unicode usernames ever happen, the PRECIS RFCs are the
reference all programs should follow.

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theodore Ts'o@21:1/5 to Gioele Barabucci on Tue Dec 10 16:00:01 2024

On Tue, Dec 10, 2024 at 02:52:05PM +0100, Gioele Barabucci wrote:

NFC has been mentioned in a broader discussion on PRECIS/RFC8264/RFC8265.

The IdentifierClass of RFC 8264 explicitly disallows all these "security
land mines": https://www.rfc-editor.org/rfc/rfc8264.html#section-4.2.3

The "Security considerations" section is quite extensive (5 pages long): https://www.rfc-editor.org/rfc/rfc8264.html#section-12

Oh, good. I was just getting worried when discussion on the list
seemed to be treating NFC as a silver bullet, and people were
suggesting that the canonicalization should be done both by readers
and writers of /etc/passwd --- which would imply linking libunicode
into setuid programs like sudo and login, with the (to my view)
invevitable results of hilarity ensuing.

As I look at RFC 8264, I note that it does not take a position about
which version of Unicode should be considered canonical, and in fact
talks about one of the features (tm) of RFC 8264 being that it is
agile with respect to newer versions of Unicode.

However, it should be noted that RFC 8264 also states that code points
which are not defined in whatever version of the Unicode supported by
"the application" shall be disallowed. From Debian's perspective,
though, if we are going to take a position about what version of
Unicode should be supported by "the application(s)" that read and
write /etc/passwd, we *will* need to take a position on what version
of Unicode should be supported, and therefore, what set of characters
will be disallowed.

It also means that we need to be careful about what happens when we
want to upgrade to newer versions of Unicode in future versions of
Debian. If the system administrator wants to support more than one
version of Debian, then it would be advisable if the Unicode version
is something which is configurable, especially if the passwd entries
are being supplied via some kind of network protocol such as LDAP or
Hesiod (for those people who remember MIT Project Athena :-P).

There is also (admittedly, only on edge case) of what to do if a newer
version of Unicode disallows or remove characters. This rarely
happens, but it has in the past (in particular in the case of various
security disasters, or in the case of characters getting deprecated in
favor of newer characters, many of which are mentioned in RFC 8264).
So we can probably just ignore this case and hope that the Unicode
consortium will be more careful in the future, but I'd thought I'd
just mention it.

The bottom line is that while I am sympethetic to the desire to
support Unicode --- heck, I was one of the primary drivers of
libunicode into the kernel so we could support case folding for more
than just the ASCII character set --- the meme of "One does not simply
walk into Morder" also applies for "adopting Unicode".

And I am reminded of one of my IETF mentors who was an
Iternationalization expert tell me two decades ago that, late at
night, in the bar after a standard meeting, one of the things that
I18N folks would say, just amongst themselves, was, "It would be
easier just to teach everyone English" --- and this was with I18N
experts who understood everything that was involved in doing full I18N
support. No doubt this was only half-joking, but I think the point is
valid.

So if we're going to do this, let's do it right. :-)

- Ted

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to zeha@debian.org on Tue Dec 10 15:30:01 2024

On Tue, 10 Dec 2024 12:10:14 +0100, Chris Hofstaedtler
<zeha@debian.org> wrote:

To me, the question is more, why do we have a flag that, if used,
allows you to break /etc/{passwd,shadow,group,gshadow} completely?

The user-oriented solution would be to identify the things that break /etc/passwd and to forbid these. Just forbidding everything is heading
the wrong direction.

Greetings
Marc
--
---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to Theodore Ts'o on Tue Dec 10 18:10:02 2024

"Theodore Ts'o" <tytso@mit.edu> writes:

However, it should be noted that RFC 8264 also states that code points
which are not defined in whatever version of the Unicode supported by
"the application" shall be disallowed. From Debian's perspective,
though, if we are going to take a position about what version of
Unicode should be supported by "the application(s)" that read and
write /etc/passwd, we *will* need to take a position on what version
of Unicode should be supported, and therefore, what set of characters
will be disallowed.

A possible position may be to treat code points that are the subject of
version mismatching to be undefined. This is how IDNA resolved the same problem, and PRECIS inherited this. While I protested about that
approach many years ago as libidn maintainer when IDNA2003 was
hard-coded to use Unicode 3.2, I think today that the approach is
reasonable since Unicode has maintained good stability. We've done a
couple of Unicode version bumps in libidn2 and interop with other IDN implementations -- that typically always use some other Unicode version
-- is good enough to not cause serious breakage. I would expect the
same to be true for PRECIS usernames too. Hostnames are hashed and is
subject to string comparisons, just like usernames, so we have some
experience to build on here.

I would involve cross-distribution discussion about this though.
Perhaps the /etc/passwd APIs affect some POSIX specifications, and a
non-ASCII extension could be proposed.

/Simon

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZ1h1mBQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFouqHAQC/TPObCg/ICrzye/UYk5zHKrYrpoCg nTGBrRJuLGeZZwD/dvik6f8DK81jUjxk+WyGnQK58JsjrvLEmCDEHSlXCQI=
=RWT5
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theodore Ts'o@21:1/5 to Simon Josefsson on Tue Dec 10 19:20:01 2024

On Tue, Dec 10, 2024 at 06:08:40PM +0100, Simon Josefsson wrote:

I would involve cross-distribution discussion about this though.
Perhaps the /etc/passwd APIs affect some POSIX specifications, and a non-ASCII extension could be proposed.

Yeah, good point. If the scope is going to include passwd entries
that are distributed via network protocols like LDAP, then we need to
worry about sites that support other Linux distributions beyond just
Debian --- or for that matter, sites that need to support Linux as
well as legacy Unix systems like AIX or Solaris.

Of course, we could just exclude them from the scope and say that if
you are using LDAP, then you MUST only use ASCII characters in the
username, given that POSIX has decided to run away from the I18N
problems wrt to usernames. That might be the simpler approach, unless
we want to drive something that could eventually be adopted by POSIX.

- Ted

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to All on Tue Dec 10 21:30:01 2024

On Tue, 10 Dec 2024 13:13:08 -0500, "Theodore Ts'o" <tytso@mit.edu>

Yeah, good point. If the scope is going to include passwd entries
that are distributed via network protocols like LDAP, then we need to
worry about sites that support other Linux distributions beyond just
Debian --- or for that matter, sites that need to support Linux as
well as legacy Unix systems like AIX or Solaris.

Even if we had full Unicode support for anything using /etc/passwd, a
site is always free to restict itself to us-ascii usernames. Same with
POSIX, in my understanding we would still be POSIX compliant if we had
full Unicode support for usernames, because POSIX defines the minimum
of things a system MUST support, but it is always free to support
more. Or, at least I hope so.

But things are moving by shadow upstream taking a user-hostile stance,
willing to take away freedom. I must be fine with that because I
cannot change it. But I don't need to like it.

Greetings
Marc
--
---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Charles Plessy@21:1/5 to All on Wed Dec 11 02:10:01 2024

Hello everybody,

sorry if it is too naive, but is there an easy way to determine for a
given Unicode string if it can be typed from a single keboard layout or produced by a text-to-speech system? People who want a username because
of SSH, email and su will want to be able to input it. On the other
range of user cases, they can use a computer for years without seeing
their username.

If we take one step back and look at the future: will usernames
still be a thing in 10 years? If not, then a simple heuristic that
satisfies more than half of the users may be enough...

Have a nice day,

Charles

--
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from home https://framapiaf.org/@charles_plessy
- You do not have my permission to use this email to train an AI -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Stanley@21:1/5 to Charles Plessy on Wed Dec 11 02:50:01 2024

On 2024-12-11 10:04:44 +0900 (+0900), Charles Plessy wrote:
[...]

is there an easy way to determine for a given Unicode string if it
can be typed from a single keboard layout

[...]

Do keyboards with a "compose" key count? There's plenty of glyphs I
can type which aren't depicted directly on my keyboard's keycaps,
after all.
--
Jeremy Stanley

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmdY7VtfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCnQQQ/+IwTSZhfI3X0CPWIHvvZ0AnFmr5L2jvdx8BisChj2SpGEx5chXZbFkHdw vQtpM9xkwugBe9ZSKhClxXEweiOprEBQBy28EOWMWNzAJYAetxjw2r09lGlClfE2 GmJgWX5mhzqhxpLvWaqigA/GcIo4MiE/FvIZqr9kxrdSJW+5v8b8ZN7XWnQNfIoX +PnrvR13+j6tMyP2y8r4CCJwQqZD9czd3usROxFLnOBAe/rjXjMsmWTvOL6vqRh6 ORm3XDh3GLYPEdp29XlPcJ2HuBXvA0+k0F1t9e2kUDjM10hHJdswFGCKw9mUnV4U 0KLBUy0OB0TH/VFIc0IX5ax9MkNIxwVZMDaoS0nK3ikTdBPhthdTJaf9K31dBbFH tPepWvOIlelAwSXeO+JczoUMjGz+UtOAAG5Wa9KP1b8cjaC72fi/msOAt1c+G0Dy IfL8gPfPsfA12lTtjmiYek2VuDYdz/HmwzSB18JkVZUtexkH1mIPe+g/WHkgdSvp D54TrAQAPwD36jrBvgK9YNUa/aXm1+aQBPlFr1+QcDWiB4+5W0ac+V5Xn+m3FC7j PM3KDh3iZWVUJvBoqhOG5f/XsoGHD6s00pNCGkjMc9PxSznI2hwNCFIEWpbPoi6h LqwoWizTTsiRt2RyyF7jRkkKiagWrROgXQ+VOxnv21ByxQukN/k=
=0dez
-----END PGP SIGNATURE-----

--- SoupGate-Win32

From Marc Haber@21:1/5 to All on Wed Dec 11 09:20:01 2024

On Wed, 11 Dec 2024 10:04:44 +0900, Charles Plessy <plessy@debian.org>
wrote:

sorry if it is too naive, but is there an easy way to determine for a
given Unicode string if it can be typed from a single keboard layout or >produced by a text-to-speech system? People who want a username because
of SSH, email and su will want to be able to input it.

That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
in their own locale?

Greetings
Marc
--
---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theodore Ts'o@21:1/5 to Marc Haber on Thu Dec 12 17:10:01 2024

On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:

But things are moving by shadow upstream taking a user-hostile stance, willing to take away freedom. I must be fine with that because I
cannot change it. But I don't need to like it.

As a suggestion, we might make more forward progress if we assume good
faith and accept that other people might have different priorities
than others. I could easily see shadow, being a security-related
package, would consider encouraging something that could lead to
security bugs or just other random breakage, as "user-hostile".

I am reminded of Professor Jerome Saltzer, who was responsible for the
overall technical architecture for MIT's Project Athena, insisting
that he be assigned the username Saltzer. He theorized that while
this *would* cause breakage (for a long time, usernames were assumed
to be always lowercase ASCII, and given that e-mail localparts where
case insensitive, and usernames were case sensitive), but since he was
(a) a Professor, and (b) responsible for the technical architecture
for Project Athena, that when problems inevitably showed up, that
programmers would be incentivized to fix them. As I recall, we didn't
let students chose mixed-case usernames for a while, since there was
presumed to be breakage; Professor Saltzer's username was a special
case.

If there are brave people who want to use Unicode characters (for
bonus points, they could try using "unofficial" characters such as the
Klingon script), they could be the first to find bugs, and report
them. And if they suffer from security breaches, they would know what
they were getting into. (And we salute them for their courage. :-)

Perhaps at some future stable Debian release (not Trixie), we could
enable it by default. But I really do think we need to do some
technical work, including not requring adding libunicode as a required
package, but having a minimal security unicode library that can be
used by privileged programs first.

Cheers,

- Ted

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Henrik Ahlgren@21:1/5 to Marc Haber on Thu Dec 12 19:40:01 2024

On Wed, 2024-12-11 at 09:11 +0100, Marc Haber wrote:

That's easy, just choose a user name for YOU that YOU can type on YOUR keyboard. Why would anybody chose a username that is impossible to use
in their own locale?

I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese
speaking Westerner, needing to chown a file to a colleague called 陈成. Even if you have Pinyin configured, you might not even know how to type it. (Of course, you have the same problem with filenames that have essentially no limitations. I know from experience how hard it is to type names in Arabic which I can't read.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to Henrik Ahlgren on Fri Dec 13 12:30:01 2024

On Thu, Dec 12, 2024 at 08:21:15PM +0200, Henrik Ahlgren wrote:

I don't see much problems with single-user machines, especially security related. But, think multi-user environments? Imagine, as a non-Chinese speaking Westerner, needing to chown a file to a colleague called 陈成.

I would type "chown 陈成 <filename>", pasting the user name from the
written request or probably from /etc/passwd. Or I would ask the system administrator for a solution.

I see your argument, but I'd also see that as an issue that the system administrator choosing the user names needs to solve. I's nothing that
we as a distribution should solve.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marc Haber@21:1/5 to All on Fri Dec 13 12:30:01 2024

On Thu, 12 Dec 2024 11:02:21 -0500, "Theodore Ts'o" <tytso@mit.edu>
wrote:

On Tue, Dec 10, 2024 at 09:24:15PM +0100, Marc Haber wrote:

But things are moving by shadow upstream taking a user-hostile stance,
willing to take away freedom. I must be fine with that because I
cannot change it. But I don't need to like it.

As a suggestion, we might make more forward progress if we assume good
faith and accept that other people might have different priorities
than others. I could easily see shadow, being a security-related
package, would consider encouraging something that could lead to
security bugs or just other random breakage, as "user-hostile".

They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.
And if I was in Chris' shoes, I would probably refrain from doing so
as well.

And shadow would be the canonical place to do the PRECIS normalization
at least for comparing usernames. That's something they wouldn't do.

Perhaps at some future stable Debian release (not Trixie), we could
enable it by default.

There won't be such an option for us to enable.

I need to be fine with that because I cannot change it. But I don't
need to like it.

Greetings
Marc
--
---------------------------------------------------------------------------- Marc Haber | " Questions are the | Mailadresse im Header Rhein-Neckar, DE | Beginning of Wisdom " |
Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fon: *49 6224 1600402

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephan Seitz@21:1/5 to All on Fri Dec 13 13:10:01 2024

Am Do, Dez 12, 2024 at 20:21:15 +0200 schrieb Henrik Ahlgren:

I don't see much problems with single-user machines, especially security >related. But, think multi-user environments? Imagine, as a non-Chinese >speaking Westerner, needing to chown a file to a colleague called 陈成. Even

You are joking, aren’t you? You could use „getent passwd” and copy
& paste the username. Or use the user id.

With this argument passwd should refuse to set the password to „12345”.

And no one in this thread has said that you *have* to use non-ASCII
usernames. But some people don’t want to give you a chance to do it.

I don’t need non-ASCII for my name but I would never use a system that
would forces me to rewrite my name in ASCII because it is so utterly
broken in 2024. I bet there is no problem on Windows systems.

Stephan

--
| Stephan Seitz E-Mail: stse@rootsland.net |
| If your life was a horse, you'd have to shoot it. |

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?ISO-8859-1?Q?IOhannes_m_zm=F6lnig@21:1/5 to All on Fri Dec 13 14:00:02 2024

Am 13. Dezember 2024 13:08:01 MEZ schrieb Stephan Seitz <stse+debian@rootsland.net>:

I don’t need non-ASCII for my name but I would never use a system that would forces me to rewrite my name in ASCII because it is so utterly broken in 2024. I bet there is no problem on Windows systems.

Stephan

Incidentally, my kid's school rolled out their school laptops this week, which of course come with Windows11 preinstalled (as a sidenote I am now looking forward to four years of "digital competence training" consisting entirely of Windows(basics),
PowerPoint, Word and Excel; but that's another story), and *of course* all usernames have been normalized to lowercase ASCII.

so my take is, that "no. In Redmond you would use ASCII for username"

Oh, and my name does have non-ASCII characters, and I have been using Unicode in my display name for the last 20 years.
I do remember problems in the 90ies.
But those are long past.

mfh.her.fsr
IOhannes

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephan Seitz@21:1/5 to All on Fri Dec 13 14:40:01 2024

Am Fr, Dez 13, 2024 at 13:38:31 +0100 schrieb IOhannes m zmölnig: >Incidentally, my kid's school rolled out their school laptops this week, >which of course come with Windows11 preinstalled (as a sidenote I am now >looking forward to four years of "digital competence training"

consisting entirely of Windows(basics), PowerPoint, Word and Excel; but >that's another story), and *of course* all usernames have been
normalized to lowercase ASCII.

I’m quite sure I have never seen an Asian Windows where you had to use
ASCII for your username.

Stephan

--
| Stephan Seitz E-Mail: stse@rootsland.net |
| If your life was a horse, you'd have to shoot it. |

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael Stone@21:1/5 to Marc Haber on Fri Dec 13 16:10:01 2024

On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:

They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.

Or edit the passwd file (vipw), or use any non-passwd-file
authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command
here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the
policy that useradd sets...just don't use it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sre4ever@free.fr@21:1/5 to All on Fri Dec 13 15:30:01 2024

Hi,

Le 2024-12-13 13:38, IOhannes m zmölnig a écrit :

and *of course* all usernames have been normalized to lowercase ASCII.

I just took a look at some reasonably recent government-issued IDs and
it turns out the French ones normalized my name to uppercase whatever-some-clerk-had-on-their-typewriter-keyboard-late-last-millenium, dropping the accent from the second word of my name. My father's birth certificate is handwritten and has the accent. My Canadian IDs are
better as they retained the name as I wrote it in in the application
form. I don't remember if the french online application forms for IDs
allowed accents in names but I would not be too surprised if they
didn't. I might start a procedure to try to get that officially fixed in
2025, as there is another issue with the way my name is registered with
some administrations that occasionnally complicates my life. I'm pretty confident the other issue will get fixed, much less the accent one
though the law should be on my side which here means that I could well
sue the government, win the lawsuit and the subsequent ones up to the
ECJ and back and still not get that fixed within my lifetime.

I was going to write that on payment cards you can't have accents in
your name. Wrong. I managed to get one that reproduced it. I don't use
that one much online so I don't know if entering my name with the accent actually works somewhere when paying with that card.

I would not try too hard to get non-ascii characters in that convenient computer identifier often named "login name" rather than "user name".
You can't get them in the local part of an e-mail address and not many
people complain. You can't get them in IRC nicknames. You can't get them
in the machine readable part of your IATA-compliant government-issued
IDs. It's still better than just numbers. I'm fine with that as long as
my name is properly written in the places that actually matter.

If you need a name for that option, --allow-non-ascii should be neutral
enough.

--
Julien Plissonneau Duquène

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Pentchev@21:1/5 to Peter Pentchev on Fri Dec 13 18:10:01 2024

On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:

On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:

On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:

They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.

Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.

In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?

Sigh, that's adduser(8) too, of course.

G'luck,
Peter

--
Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdcaFkACgkQZR7vsCUn 3xMI5g//U2QkXevEVYNw2RUF2LhVZsD3SSnfCYvQE3db/PlXYK54dWfQhkXYhLyw mHH+XncyJUbMv4s1v1hoeyZEYIF/huh9NYl7Ntd99qxpyKiriO+LG6q0Vrf3bVz0 fJtMDArFkwAVxKrhTn/VingixjXUYYe2YJFxJA0zNaTGcLR9f9JX3NCw93TuBhD0 Gh/2M5tu/N7TtLIhB7sXa3DtACJqxOTcPnxN6riOV9BFgalVTWVwTuZGTZUoLGaI aG6bPZsIi8XpCssLuiN9sky4yOpoJaeJ43I7+djO43iI3Iz6kkzy5tiVHsl9iR1d 5Hpv4AyQyoIcvW9epmPcpR+K1xBLxkbuIk8CfyFZSoSbpbdSsobEhZ6/HF4yPrjb vl3YT4SekQI/hA/OBkr+ai2NB3SElcF+/Fd/+zVpjGXp1sjMAdHQDFlcS+5vEVSY iQAdHYmDYDe6SLdWEY6BK+TpOkzaUYVOcgH+STEKxXtqlmwMACfi+MmlVC4+O4Sf l2sJeNBUIGRFZ+NG6Ju9lHJ64MDrpHaECXB+nKV3onYtxOXiPANLK1KrMDUYHir/ 4JOxBm1TT6eYffIuG7lWYtcLoHSEMAjqD+dlTdw64DR4EUlkg0XjXbgPrAh87VwH 5Hgf177Fa5lXaZ75RmztPFoFLxgEDxyeI4E01FovBDMQCJHfWCI=
=vREB

From Peter Pentchev@21:1/5 to Michael Stone on Fri Dec 13 18:10:01 2024

On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:

On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:

They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.

Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command
here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy that useradd sets...just don't use it.

In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?

G'luck,
Peter

--
Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmdcaC8ACgkQZR7vsCUn 3xMDQBAAsHM//kwqqpWljEKePmadA2kmUpUsoxI0MERCnQztCb2fJuQXYT2niCZz l4VTBxibAIp7CLuq8I6UoGv3R89FUpp0RkrXInS3Rfhhu/mWdIAFX9WLLsItyAJN Y0+dpnWuHUx6KBNA0js0F5bZ9QcsxjJiA3LF9MuOg5fCJkkRi2QqMa930Lc59m6V qdI8Cd34ppCo7wEnkafpOPY5a0isVHwYf/nmNh1MMTQKTEgHZQH0DmQar2NlQsTr hV0xag9PLyiEwZqzI7YOBXMSCKfn6TQNkrb2BwOAwxCWalmYwTOzNRpfyUKOUTbs czOO3ty8KJCNjIvILZu52Hn+Ur1cqotx8hK+Oz9gKhrK86PLmKDEVvZv7USwOHMV 928+W6b2JljaF/wma6hFB8adlTtwlS552gcghlO39i4qUgffoQBVNewiJxkYRPhY U4wyNijH4xrI/LvdWzc4EVOkUfhpK0d2HLKDWyPzWmy9edQgDqxxiYmMhZXwa/xa 2lfwTB4ceNTQJ3TpxWWfCbZ56VbiE3HiU/ckijAekHlW4GQI/R2o+O7FtJI3UU0n QNwWJ5X7pW5zlA3fOi+36zClqH18ujVloXLfInJI/b4+2PmgGAen/jMJG7+t0K7R geUO9C2zKzjPb6ZRlP64AxqjMSSPIgMnzOjUcBxm0EehUzyGQ7I=
=u1jV

From Marc Haber@21:1/5 to Peter Pentchev on Fri Dec 13 21:40:01 2024

On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:

In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?

adduser will not do that. Doing so is nonsense.

Greetings
Marc

-- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael Stone@21:1/5 to Peter Pentchev on Sat Dec 14 05:10:01 2024

On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:

On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:

On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:

They are planning to remove the --badname option from useradd, making
it impossible to even try UTF-8 user names, without patching useradd.

Or edit the passwd file (vipw), or use any non-passwd-file authentication
mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command
here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy >> that useradd sets...just don't use it.

In the context of the whole thread, are you suggesting that adduser(1)
should be changed to use something other than useradd(8) under the hood?

No, I'm suggesting that rhetoric asserting that any adduser/useradd
policy could constrain people is overblown because users can be added to
the system without using either of those tools. The tools' policies
should reflect what is safest and most sensible for the majority of
users, but if someone wants to do something different there is nothing
stopping them from doing so.

The claim at the top of this subthread is that some useradd change would prevent people from experimenting with UTF-8 usernames. As an exercise I
just created UTF-8 users and groups entirely without useradd/adduser
(using vipw and vigr):

getent passwd 1144

💩:*:1144:1144::/nowhere:/bin/false

getent group 1144

💩:*:1144:

ls -l /tmp/samplefile

-rw-r--r-- 1 💩 💩 0 Dec 13 22:42 /tmp/samplefile

On an individual basis there aren't so many steps that creating a user
manually is a big deal, or that a script dedicated to creating users
according to the policies of a particular environment would be overly complicated. For a large organization I question the idea that user
accounts would be managed by adduser/useradd at all.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter Pentchev@21:1/5 to Michael Stone on Sat Dec 14 11:00:02 2024

On Fri, Dec 13, 2024 at 11:01:43PM -0500, Michael Stone wrote:

On Fri, Dec 13, 2024 at 07:00:36PM +0200, Peter Pentchev wrote:

On Fri, Dec 13, 2024 at 10:08:19AM -0500, Michael Stone wrote:

On Fri, Dec 13, 2024 at 12:22:38PM +0100, Marc Haber wrote:

They are planning to remove the --badname option from useradd, making it impossible to even try UTF-8 user names, without patching useradd.

Or edit the passwd file (vipw), or use any non-passwd-file authentication mechanism, or use a different user management tool, etc.
I think you're overemphasizing the importance of the useradd command here--it just acts as a convenience and sets some baseline policies;
it's not actually essential for adding a user. If you don't like the policy
that useradd sets...just don't use it.

In the context of the whole thread, are you suggesting that adduser(1) should be changed to use something other than useradd(8) under the hood?

No, I'm suggesting that rhetoric asserting that any adduser/useradd policy could constrain people is overblown because users can be added to the system without using either of those tools. The tools' policies should reflect what is safest and most sensible for the majority of users, but if someone wants to do something different there is nothing stopping them from doing so.

[snip more about adding accounts without useradd/adduser]

Thanks, that makes sense. Apologies if my reply came through as snarky.

G'luck,
Peter

--
Peter Pentchev roam@ringlet.net roam@debian.org peter@morpheusly.com
PGP key: https://www.ringlet.net/roam/roam.key.asc
Key fingerprint 2EE7 A7A5 17FC 124C F115 C354 651E EFB0 2527 DF13

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEELuenpRf8EkzxFcNUZR7vsCUn3xMFAmddVhAACgkQZR7vsCUn 3xMajxAAtselXvTcg/EA7ftvqA1jDLJewYsGh5nGW0vm+H/WwawQnIdXeU68SePm qs6fbwrHK4xtwjHQvZf4dCw+WF95E0WvUxpMCDtTktXmAgm3acJ2W6rqKPYtfner 62PGLImVuWrGvFWonnHApewFE1qlPxR7K544jnyvXH+XJrYNuTN/npYeXxLZTgnb mESlq3WfevrpuU8TbFLl5ERPs+WDC2RkJwASnBb3bmgRlBlFE1qyRca0Ee6Clknq wk951vpGhNbGnXIvJqDA5PPsd5owHgnN2a5ZNVuXxUdQPW+DoybbTcOqD7FR6Hbp urolxrAmP8wup/EvhTi7f8HRuLZbecNbNA4mll9untzKxTZUKf7ED3V/8y+7S48l isBObSqYNXiLwyva7glLL9SbdDsjR03r0HYlKY7/ohfAGD/SNZ2ndzGmJxlHmXgz Yxg3ktxcDX0l5nNM8YAdq6oJNmX6a+t6BgdoINMhcGx0G4bL5y1zzfJE0nL7csZE bzjNj4N0wr9E/RMApBn0fuCznXjJXR16NJ4Hb+cK3J9itRWOLQt7m1Uyhhq8d/qt 0nLGF1SaJr/mfwSaION2DpfPyCrRohwffr1nIl1WzhaCiYMcvvjU/fimRx99PGTV U7MrGsWgzajS8cyWogDBmTa9A7s4xIoOp9wweoDtnPL3dkS/Jig=
=AIwG

Who's Online
Recent Visitors
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (1 / 15)
Uptime:	160:37:10
Calls:	10,385
Calls today:	2
Files:	14,056
Messages:	6,416,493

Musings about Usernames in adduser and Debian

Who's Online

Recent Visitors

System Info