• About dash as sh

    From Ilya Kazakevich@21:1/5 to All on Fri Jun 21 00:30:01 2024
    Hello,

    I've recently come across a bug in dash.

    https://lore.kernel.org/dash/CAMQsgbSZnEac=ETYnR6a_ysnAysaHThwY03pnoDxC=p5FqtAag@mail.gmail.com/T

    This issue is known for 7 years: https://groups.google.com/g/linux.debian.bugs.dist/c/c6kRE-fhyuM

    Fix is 18 months old, but unfortunately not released yet. Hence, we
    have this issue even in sid (as I understand).


    As this bug doesn't exist in bash I started thinking: why does Debian
    use dash at all (not like RH for example, which uses `bash` for `sh)?

    It turned out that 27 years ago there were 2 arguments:
    1) Speed: bash is much larger and slower, and boot time was affected.
    2) Posix compatibility.

    The former argument is probably not so important now since Debian uses `systemd` (no more sh scripts) and, honestly, I can't imagine how bash
    could be a bottleneck for anything in 2024 (if you have such
    scenarios, please share).

    The latter is also a little bit strange as aforenamed bug breaks POSIX compatibility (yes, stable Debian has a bug that breaks POSIX).

    Having two shells (one for scripting and other one for interactive)
    might lead to some other inconsistencies (one code-base is usually
    more consistent than two).

    With all of that I am pretty sure there should be some reason why dash
    is still `sh` in Debian, and I must be missing something.

    So, what is the reason?

    Thank you,

    Ilya.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael =?utf-8?B?S2rDtnJsaW5n?=@21:1/5 to All on Fri Jun 21 12:20:01 2024
    On 21 Jun 2024 00:28 +0200, from ilya.kazakevich@jetbrains.com (Ilya Kazakevich):
    [...] honestly, I can't imagine how bash
    could be a bottleneck for anything in 2024 (if you have such
    scenarios, please share).

    Debian doesn't target only desktops and servers, where your assertion
    is quite possibly correct. It's equally supported on comparatively
    very low-powered systems; consider for example a low-RAM, perhaps also slow-storage armel system.

    Also, Debian doesn't target only this-year's systems. My own desktop
    system uses a CPU which wasn't even top of the line when I put the
    computer together over a decade ago now, and I like that Debian runs
    well on it without requiring me to buy a new computer every few years.
    (The one I had before this one reached a similar age before it broke
    beyond the point of reasonably fixing it by replacing individual
    parts.) Not only does it save money, it also saves on limited physical resources and results in significantly less e-waste. Yes, _one_
    computer may be relatively inconsequential, but in aggregate it does
    add up.

    --
    Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to All on Fri Jun 21 13:30:01 2024
    The original message began with the assertion that the OP had run
    across a bug in dash, and gave two URLs, with no description of the bug
    or the impact it was having on their life.

    I read one of the URLs, and the bug is rather obscure. It involves a
    second script embedded inside a here document inside the first script,
    with the second script being passed to an interpreter process on stdin.
    I'm not surprised that nobody knew about the bug for many years.

    So, having found one obscure bug in dash, the OP decided that the
    best solution is to change Debian's years-old /bin/sh policy.

    This ignores the fact that all shells, including bash, have *lots*
    of bugs in them. Switching /bin/sh to another shell would simply be
    trading one set of bugs for a different set.

    Given that Debian *originally* used bash as /bin/sh, and made a
    conscious decision to switch that default to dash several years ago,
    it would take an overwhelmingly strong reason to revert that change.
    "I found an obscure bug in dash that affects me and one other person"
    is probably not strong enough, especially when the bug has been fixed
    upstream (albeit not in a released version yet??).

    A more productive course of action would be to open a Debian bug report
    against dash, describe the issue and how it affects you, point to the
    upstream patch, and hope that a patched version of dash makes it into
    trixie.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Fri Jun 21 13:50:02 2024
    Greg Wooledge (12024-06-21):
    The original message began with the assertion that the OP had run
    across a bug in dash, and gave two URLs, with no description of the bug
    or the impact it was having on their life.

    I read one of the URLs, and the bug is rather obscure. It involves a
    second script embedded inside a here document inside the first script,
    with the second script being passed to an interpreter process on stdin.
    I'm not surprised that nobody knew about the bug for many years.

    The purported bug boils down to this: if you pipe to a non-interactive
    shell a command and data for that command, then the non-interactive
    shell might read more than just the command as part of its input
    buffering and leave less or nothing as data to the command itself.

    It is indeed a bug, since the standard says:

    When the shell is using standard input and it invokes a command that
    also uses standard input, the shell shall ensure that the standard
    input file pointer points directly after the command it has read when
    the command begins execution.

    But I consider this clause is misguided, it should apply only when the
    input is a tty. Relying on it is a terrible idea.

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Nicolas George on Fri Jun 21 14:00:01 2024
    On Fri, Jun 21, 2024 at 13:44:35 +0200, Nicolas George wrote:
    Greg Wooledge (12024-06-21):
    The original message began with the assertion that the OP had run
    across a bug in dash, and gave two URLs, with no description of the bug
    or the impact it was having on their life.

    I read one of the URLs, and the bug is rather obscure. It involves a second script embedded inside a here document inside the first script,
    with the second script being passed to an interpreter process on stdin.
    I'm not surprised that nobody knew about the bug for many years.

    The purported bug boils down to this: if you pipe to a non-interactive
    shell a command and data for that command, then the non-interactive
    shell might read more than just the command as part of its input
    buffering and leave less or nothing as data to the command itself.

    It is indeed a bug, since the standard says:

    When the shell is using standard input and it invokes a command that
    also uses standard input, the shell shall ensure that the standard
    input file pointer points directly after the command it has read when
    the command begins execution.

    I understood the bug as described. I've been doing shell stuff for
    a while now, and I've picked up a few bits of knowledge here and there.
    I still claim that it's an obscure corner case that most script writers
    will never encounter.

    That's why I find it frustrating when someone claims that this bug is
    so severe that Debian has to *change their policy* without even describing
    how this bug is affecting them in real life.

    But I consider this clause is misguided, it should apply only when the
    input is a tty. Relying on it is a terrible idea.

    I think the POSIX wording is there primarily because of here documents,
    and people doing what I can only *guess* is similar to what the OP of
    this thread wanted to do -- embedding layers of scripts inside scripts
    using here documents.

    I doubt that the wording was intended only for input coming from
    terminals.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Mike Castle on Fri Jun 21 19:20:01 2024
    On Fri, Jun 21, 2024 at 09:43:52 -0700, Mike Castle wrote:
    On Fri, Jun 21, 2024 at 4:57 AM Greg Wooledge <greg@wooledge.org> wrote:
    That's why I find it frustrating when someone claims that this bug is
    so severe that Debian has to *change their policy* without even describing how this bug is affecting them in real life.

    I did not feel like the OP was saying the bug was that bad and the
    policy needed to change, but as a starting point to ask why it is
    still the policy after 27 years.

    Are you unaware that it *changed*?

    Here's a quote from <https://wiki.debian.org/Shell> which was the
    first place I could find it:

    In all releases up to and including DebianLenny, Bash was the default
    non-interactive shell. Beginning with DebianSqueeze, Debian uses Dash
    (the Debian Almquist shell) as the target of the /bin/sh symlink.

    Debian made a *conscious choice* to switch /bin/sh from bash to dash.
    The OP of this thread is requesting that Debian should reverse this
    and change *back* to bash, because of one bug, which affects a very
    small number of scripts.

    Furthermore:

    From DebianSqueeze to DebianBullseye, it was possible to select Bash
    as the target of the /bin/sh symlink by running dpkg-reconfigure
    dash. However, as of DebianBookworm, this is no longer supported.

    So, the OP is not only asking for a reversion of the policy decision
    that was made, but for a reinvestment of the time and resources that
    would be required to support this new-but-really-old policy. The
    resources to support /bin/sh -> bash have already been discontinued.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Sat Jun 22 01:30:01 2024
    When the shell is using standard input and it invokes a command that
    also uses standard input, the shell shall ensure that the standard
    input file pointer points directly after the command it has read when
    the command begins execution.

    But I consider this clause is misguided, it should apply only when the
    input is a tty.

    And if it's not a tty, you get some kind of Undefined Behavior?
    I don't think I'd like that because I don't think the benefit would be worth the UB troubles.

    Relying on it is a terrible idea.

    I'd tend to agree.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nicolas George@21:1/5 to All on Mon Jun 24 08:20:01 2024
    Stefan Monnier (12024-06-21):
    And if it's not a tty, you get some kind of Undefined Behavior?

    Knowing that “undefined behavior” is just an expression invented by C standards authors to make “we make no guarantee about it, use it at your
    own risk” sound more scary, I do not think it is a severe problem.

    I don't think I'd like that because I don't think the benefit would be worth the UB troubles.

    The reasoning is the other way around: this feature should not be used,
    and therefore the trouble of standardizing it is a waste of time.

    The reason this feature should not be used is that it is exceptional. It
    does not work with scripting languages that read their whole script
    before running. It does not work with chained commands: “(head -n 2 >
    /tmp/1 ; head -n 4 > /tmp/2)” will not put the next four lines in the
    /tmp/2 file. None of the standard shell utilities makes any effort to
    control buffering, and doing so would either change their semantics or
    ruin their performances. Only the shell itself has this constraint.

    So in order to use this guarantee properly, you need to be absolutely
    sure you are in the exact case it covers. And odds are you will realize
    it does not work because you added something in between that does
    buffering.

    Better just design your script in a way that does not rely on buffering subtleties.

    Regards,

    --
    Nicolas George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ilya Kazakevich@21:1/5 to All on Wed Jun 26 20:20:01 2024
    Hello,

    Thank you for your answers.

    My intention was to understand if decisions about dash are still
    valid, not to tell Debian to switch back to bash of course.

    Speaking about "bug or not":

    This bug was confirmed by author: https://lore.kernel.org/dash/Zm5y3du0C2MHdhSR@gondor.apana.org.au/

    And here is the commit that fixes it: https://git.kernel.org/pub/scm/utils/dash/dash.git/commit/?id=5f094d08c5bcee876191404a4f3dd2d075571215

    As one might see, it has a POSIX quote in its commit message, so this
    behaviour is a part of POSIX.


    When the shell is using standard input and it invokes a command that
    also uses standard input, the shell shall ensure that the standard
    input file pointer points directly after the command it has read when
    the command begins execution. It shall not read ahead in such a manner
    that any characters intended to be read by the invoked command are
    consumed by the shell (whether interpreted by the shell or not) or
    that characters that are not read by the invoked command are not seen
    by the shell.


    I am not the only one who found it: https://michael-prokop.at/blog/2017/05/18/debugging-a-mystery-ssh-causing-strange-exit-codes/

    There is even a Debian bug that clearly reports POSIX conformance issue:

    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=862907


    So the bug, indeed, exists, but this is not a reason to switch to bash
    as any software has bugs.

    Even if it weren't covered by POSIX I wouldn't call it "undefined
    behaviour", but "unspecified".
    I am not sure if the term "undefined" is valid for shells (I guess
    not) but in the C world that means compiler might do anything (think
    "0 != 0" or the whole program optimized to "return;") in any place
    since the app is ill.

    https://en.cppreference.com/w/c/language/behavior

    Many modern languages do not have this concept (although they have
    race conditions, unspecified behaviour etc).


    Speaking about different shells for interactive and scripting usage
    and csh as an example.

    AFAIR csh was originally created to substitute Bourne shell and Joy
    tried to make it to be more C-like both for interactive and
    programming usage (why study new syntax if everyone in the UNIX world
    speaks C?)

    Bourne sh was pretty weak back then: it didn't even have job control,
    so csh was like a shiny future.
    Later, SysV gained Korn Shell which was as powerful as csh, but
    backward compatible with bourne shell.

    So, BSD had csh and SysV had ksh (which was not portable to BSD due to license).

    POSIX claimed that there should be a Bourne shell named `sh`, and BSD
    had to have both: csh and Bourne shell to be POSIX compatible.

    People started to write scripts in csh, but later on it was considered
    bad practice.
    There is a well-known document called "Csh Programming Considered
    Harmful" from 1996
    https://harmful.cat-v.org/software/csh

    Csh exists on some BSDs because of lots of people who studied it long
    ago and do not want to change old behaviours.

    Do they regret this decision? I do not know, but OpenBSD, for example,
    adopted pdksh (which is public domain ksh) as it is free, light
    (lighter than bash), non-GPL and compatible with Bourne shell. They do
    not use csh

    https://misc.openbsd.narkive.com/Ekowx1ya/why-ksh

    I doubt that any shell was created to be "scripting only": even ash
    wasn't. The reason it didn't have history was that the author believed
    that this should be implemented by tty driver called atty: https://groups.google.com/g/comp.sources.unix/c/A6cnyKX-Gq4/m/dGKOOmXndCcJ


    . History. It seems to me that the csh history mechanism is
    mostly a response to the deficiencies of UNIX terminal I/O.
    Those of you running 4.2 BSD should try out atty (which I am
    posting to the net at the same time as ash) and see if you
    still want history.

    But he changed it soon after: https://www.in-ulm.de/~mascheck/various/ash/#44bsdalpha


    Having two implementations of the same standard is good for standard
    (if standard has single implementation only, I wouldn't call it real
    standard), but might produce compatibility issues between
    implementations and might be impractical.

    Anyway, I now see that dash is here for the good reason as Debian
    could still be run on 15 years old hardware, tiny computers etc.

    Ilya.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)