• sort by multiple columns

    From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to All on Wed Apr 19 09:27:12 2023
    Hi all,

    how do I sort by multiple columns?

    Example:
    +++
    Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindensttte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindensttte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindensttte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindensttte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindensttte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindensttte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindensttte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindensttte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindensttte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindensttte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindensttte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindensttte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindensttte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindensttte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindensttte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindensttte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindensttte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindensttte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindensttte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindensttte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindensttte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindensttte;8;;32;519673.036;5709040.372
    +++

    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    So the result should be
    +++
    Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindensttte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindensttte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindensttte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindensttte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindensttte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindensttte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindensttte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindensttte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindensttte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindensttte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindensttte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindensttte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindensttte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindensttte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindensttte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindensttte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindensttte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindensttte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindensttte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindensttte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindensttte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindensttte;26;;32;519503.270;5709103.321
    +++

    I tried both
    sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
    and
    sort -k4 -t";" -n -k2,2 -k3,3
    and some permutations and reverted orders, without success.
    The sort by column 4 just gets lost or resorted.

    I'm not sure about the man page
    -k, --key=POS1[,POS2]
    start a key at POS1, end it at POS2 (origin 1)

    So I tried relative positions with
    -k3,1
    as well, without success.

    How do I apply the sort syntax properly?

    Thanks
    Martin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Martin Trautmann on Wed Apr 19 08:43:15 2023
    On Wed, 19 Apr 2023 09:27:12 +0200
    Martin Trautmann <t-usenet@gmx.net> wrote:

    Hi all,

    how do I sort by multiple columns?

    Example:
    +++
    Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindensttte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindensttte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindensttte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindensttte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindensttte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindensttte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindensttte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindensttte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindensttte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindensttte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindensttte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindensttte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindensttte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindensttte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindensttte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindensttte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindensttte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindensttte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindensttte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindensttte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindensttte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindensttte;8;;32;519673.036;5709040.372
    +++

    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    So the result should be
    +++
    Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109
    [...]
    Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354

    Why are these 2 lines sorted this way ? Column 4 is the same ("1" in
    both) so it boils down to how "D9444" and "D9386" get sorted. What
    comes first and why ? It seems to me that "D9386" comes earlier than
    "D9444" .

    Your locale may also turn out to be relevant so you should mention
    that.

    Unrelated but the first letter of your last name is unicode codepoint
    3A4 which is the Greek upper case tau. Was this intentional or an
    accident ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to All on Wed Apr 19 10:44:05 2023
    On 19.04.2023 09:27, Martin Τrautmann wrote:

    Hi all,

    how do I sort by multiple columns?

    Example:
    +++
    Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
    +++

    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    From that specification I'd write

    sort -t\; -k4n -k2 -k3

    but your expected data below doesn't follow your own spec. So the
    specification probably needs a correction.

    (Option -s for a "stable sort" may also be part of your solution.)

    Janis


    So the result should be
    +++
    Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
    +++

    I tried both
    sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
    and
    sort -k4 -t";" -n -k2,2 -k3,3
    and some permutations and reverted orders, without success.
    The sort by column 4 just gets lost or resorted.

    I'm not sure about the man page
    -k, --key=POS1[,POS2]
    start a key at POS1, end it at POS2 (origin 1)

    So I tried relative positions with
    -k3,1
    as well, without success.

    How do I apply the sort syntax properly?

    Thanks
    Martin


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Janis Papanagnou on Wed Apr 19 11:51:41 2023
    On 19.04.2023 10:44, Janis Papanagnou wrote:
    On 19.04.2023 09:27, Martin Τrautmann wrote:

    Hi all,

    how do I sort by multiple columns?
    [...]

    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    From that specification I'd write

    sort -t\; -k4n -k2 -k3

    Oops... - make that

    sort -t\; -k4,4n -k2,2 -k3,3



    but your expected data below doesn't follow your own spec. So the specification probably needs a correction.

    You probably meant something like

    sort -t\; -k3,3 -k4,4n -k2,2


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Spiros Bousbouras on Wed Apr 19 12:39:22 2023
    On Wed, 19 Apr 2023 08:43:15 -0000 (UTC), Spiros Bousbouras wrote:
    So the result should be
    +++
    Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109
    [...]
    Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354

    Why are these 2 lines sorted this way ? Column 4 is the same ("1" in
    both) so it boils down to how "D9444" and "D9386" get sorted. What
    comes first and why ? It seems to me that "D9386" comes earlier than
    "D9444" .

    You took an example where both D9444 / D9386 and "Auf der Lindensttte"
    / "Lindensttte" differ.

    Your locale may also turn out to be relevant so you should mention
    that.

    LANG=en_US.UTF-8
    LC_ALL=en_US.UTF-8
    LC_CTYPE=UTF-8

    Unrelated but the first letter of your last name is unicode codepoint
    3A4 which is the Greek upper case tau. Was this intentional or an
    accident ?

    unrelated - it's a check for proper UTF8 handling within headers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Janis Papanagnou on Wed Apr 19 12:42:01 2023
    On Wed, 19 Apr 2023 11:51:41 +0200, Janis Papanagnou wrote:
    On 19.04.2023 10:44, Janis Papanagnou wrote:
    On 19.04.2023 09:27, Martin Τrautmann wrote:
    You probably meant something like

    sort -t\; -k3,3 -k4,4n -k2,2

    Wow, that's just perfect. I did not know I can attach the n directly to
    the key option

    Thanks!
    Martin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sat Apr 22 03:33:43 2023
    Martin Τραωτμανν <t-usenet@gmx.net>:

    how do I sort by multiple columns?


    [An example text…]


    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    […with sorted result]


    The sorted result of your example has apparently been sorted
    according to the following description:

    First, group the lines sorted by column 3, that is, sort the
    lines in a manner that results in alphabetically ascending values
    in column 3.

    Then, in each group of lines, that have got a common value in
    column 3, sort the lines independently in a manner that results
    in alphabetically ascending values in column 2.

    Then, in each group of lines that have got common values in
    columns 3 and 2 respectively, sort the lines independently in a
    manner that results in numerically ascending values in column 4.

    Finally, each group of lines that has got equal values in columns
    3, 2, and 4 according to the sort criteria as specified above, is
    sorted according to a default sorting criterium which comprises
    the whole line.

    This can be achieved using the following commandline:


    sort -t ';' -k 3,3 -k 2,2 -k 4,4n


    You might read the description of the "sort" utility in the POSIX
    standard
    (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#top>),
    especially the last paragraph in the "OPTIONS" section
    (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04>): 
    "When there are multiple key fields, later keys shall be compared
    only after all earlier keys compare equal.  Except when the -u
    option is specified, lines that otherwise compare equal shall be
    ordered as if none of the options -d, -f, -i, -n, or -k were
    present (but with -r still in effect, if it was specified) and
    with all bytes in the lines significant to the comparison.  The
    order in which lines that still compare equal are written is
    unspecified."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Helmut Waitzmann on Sat Apr 22 08:57:55 2023
    On Sat, 22 Apr 2023 03:33:43 +0200, Helmut Waitzmann wrote:
    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    […with sorted result]


    The sorted result of your example has apparently been sorted
    according to the following description:

    First, group the lines sorted by column 3, that is, sort the
    lines in a manner that results in alphabetically ascending values
    in column 3.

    That's a matter of concern how the sort works.

    If I want to pre-sort by 3 first, then sub-sort by column 2, that's
    fine. But when I pipe one sort to the other, the second sort will
    destroy the sort before. That's why i had my sort order in reverted
    order, using a pipe example.

    If all sorts can be done within a single command, the direct order works better. I had not been aware of the direkt -k4,4n option, while the -n
    option could not be applied by me as desired.

    You might read the description of the "sort" utility in the POSIX
    standard
    (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#top>),
    especially the last paragraph in the "OPTIONS" section
    (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04>): 
    "When there are multiple key fields, later keys shall be compared
    only after all earlier keys compare equal.  Except when the -u
    option is specified, lines that otherwise compare equal shall be
    ordered as if none of the options -d, -f, -i, -n, or -k were
    present (but with -r still in effect, if it was specified) and
    with all bytes in the lines significant to the comparison.  The
    order in which lines that still compare equal are written is
    unspecified."

    This description is much better than my man and info sort - but
    unfortunately I can't be sure that the POSIX info actually does work on
    my local sort implementation:
    sort 5.93 November 2005

    AUTHOR
    Written by Mike Haertel and Paul Eggert.

    REPORTING BUGS
    Report bugs to <bug-coreutils@gnu.org>.

    COPYRIGHT
    Copyright (C) 2005 Free Software Foundation, Inc.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sun Apr 23 03:33:47 2023
    Martin Τrautmann <t-usenet@gmx.net>:
    On Sat, 22 Apr 2023 03:33:43 +0200, Helmut Waitzmann wrote:
    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    […with sorted result]


    The sorted result of your example has apparently been sorted
    according to the following description:

    First, group the lines sorted by column 3, that is, sort the
    lines in a manner that results in alphabetically ascending
    values in column 3.

    That's a matter of concern how the sort works.


    If I want to pre-sort by 3 first, then sub-sort by column 2,
    that's fine. But when I pipe one sort to the other, the second
    sort will destroy the sort before. That's why i had my sort
    order in reverted order, using a pipe example.

    That won't help, either:  A sorting pipe using (a standard)
    "sort" won't solve the problem, because one cannot tell (a
    standard) "sort" to do a sort on the given key option only.  Each
    sort in the pipe will be total (according to its sort criteria)
    of its own.

    With GNU‐"sort", though, a sorting pipe can solve the problem, if
    one applies the "--stable" option to each (except the first) of
    the "sort" invocations.  Then the command

    sort --stable -t ';' -n -k 4,4 |
    sort --stable -t ';' -k 2,2 |
    sort --stable -t ';' -k 3,3

    will do the job.  (Unfortunately the "--stable" option is not
    part of the POSIX standard.)


    [A quote from the "sort" description in the POSIX standard]


    This description is much better than my man and info sort


    Yes, that's my experience, too.  I tend to read not only the
    manual page or info documentation but also look into the
    corresponding POSIX description (if the utility is part of the
    POSIX standard), and then check, whether the manual page or info
    documentation conflicts with the POSIX description.

    - but unfortunately I can't be sure that the POSIX info actually
    does work on my local sort implementation: sort 5.93 November
    2005

    Yes, that might happen.  In practice, GNU tries to follow the
    POSIX standard.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to All on Sun Apr 23 14:02:49 2023
    On 23.04.2023 13:28, Martin Τrautmann wrote:
    On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
    If I want to pre-sort by 3 first, then sub-sort by column 2,
    that's fine. But when I pipe one sort to the other, the second
    sort will destroy the sort before. That's why i had my sort
    order in reverted order, using a pipe example.

    That won't help, either: A sorting pipe using (a standard)
    "sort" won't solve the problem, because one cannot tell (a
    standard) "sort" to do a sort on the given key option only. Each
    sort in the pipe will be total (according to its sort criteria)
    of its own.

    That was my problem - I expected that a pipe through several sorts would
    keep the order. I don't know why it doesn't.

    Because sorting on one criterion generally doesn't impose any
    restrictions on other criteria. By that sorting can be made a
    very efficient implementation. But that's what stable sorting
    is for; to make some provisions for specific ordering cases,
    how to handle the set of records with equal keys. With Unix'es
    'sort' implementation being and able to specify multiple keys
    there's of course less need to separate sorting with pipes to
    several distinct processes.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Helmut Waitzmann on Sun Apr 23 13:28:22 2023
    On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
    If I want to pre-sort by 3 first, then sub-sort by column 2,
    that's fine. But when I pipe one sort to the other, the second
    sort will destroy the sort before. That's why i had my sort
    order in reverted order, using a pipe example.

    That won't help, either: A sorting pipe using (a standard)
    "sort" won't solve the problem, because one cannot tell (a
    standard) "sort" to do a sort on the given key option only. Each
    sort in the pipe will be total (according to its sort criteria)
    of its own.

    That was my problem - I expected that a pipe through several sorts would
    keep the order. I don't know why it doesn't.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David W. Hodgins@21:1/5 to t-usenet@gmx.net on Sun Apr 23 09:43:06 2023
    On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
    That was my problem - I expected that a pipe through several sorts would
    keep the order. I don't know why it doesn't.

    It may be easier to understand if you use a temporary files instead of pipes.

    Sorting the input file by column 4, numerical creating a first temporary file. Sort the first temporary file by column 2 creating a second temporary file. Sort the second temporary file by column 3 creating the output.

    The last sort doesn't know that the prior two sorts have been done. It just looks at the file it's giving and sorts it by column 3.

    Using a pipe just takes the output of the first and second sort and uses it directly as input for the next sort. All the pipe does is eliminate the
    need for a temporary file.

    Keep in mind. When sorting a file, the last line in the input may end up becoming
    the first line in the output. The sort can not write anything to the pipe or output file until it's sorted the entire input. With a pipe, the temporary
    file is in ram rather then being a named file on disk.

    Regards, Dave Hodgins

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to David W. Hodgins on Sun Apr 23 14:36:30 2023
    In article <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>,
    David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
    ...
    Keep in mind. When sorting a file, the last line in the input may end up >becoming the first line in the output. The sort can not write anything to
    the pipe or output file until it's sorted the entire input. With a pipe,
    the temporary file is in ram rather then being a named file on disk.

    This actually raises an interesting point. Pipes are not infinite in size,
    and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
    likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).

    For most programs, this is rarely a concern, since most pipelines write and read more or less simultaneously in real time, but sort is an edge case for
    the reason you explain above.

    Something to keep in mind if you ever decide to sort very large files in a pipeline. And it is probably a better idea not to do so; to sort it all at once, using multiple key specifications on the command line.

    --
    Rich people pay Fox people to convince middle class people to blame poor people.

    (John Fugelsang)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Kenny McCormack on Sun Apr 23 16:54:12 2023
    On 23.04.2023 16:36, Kenny McCormack wrote:
    [...]

    For most programs, this is rarely a concern, since most pipelines write and read more or less simultaneously in real time, but sort is an edge case for the reason you explain above.

    Note also that there are quite some sorting operations inherently
    used (e.g. in 'ls', in shells '*' glob/pattern expansion, etc.).
    For example, don't expect find | xargs ls to provide a sorted
    output.


    Something to keep in mind if you ever decide to sort very large files in a pipeline. [...]

    In whatever way some instance of sort is implemented (memory, or
    temporary files, or whatever), my expectation is that
    whatever | sort
    will have to produce sorted output .- Isn't that guaranteed?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Kenny McCormack on Sun Apr 23 15:51:53 2023
    On 2023-04-23, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    The bad case would be if a program produced a ton of output, but the reader didn't read any of it. I'll have to think some more as to whether or not that applies here.

    Limited pipe sizes cause two potential problems:

    - deadlock: programs that both read and write a pipe may work when
    tested with small messages, but lock up on larger ones.

    - atomicity of writes: a write of a number of bytes smaller
    than the pipe size can be read all together on the other end,
    so the reading end will work correctly without checking for
    a short read. When the message size exceeds the pipe size,
    that breaks.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to janis_papanagnou+ng@hotmail.com on Sun Apr 23 15:30:00 2023
    In article <u23gql$3rkl5$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 23.04.2023 16:36, Kenny McCormack wrote:
    [...]

    For most programs, this is rarely a concern, since most pipelines write and >> read more or less simultaneously in real time, but sort is an edge case for >> the reason you explain above.

    Note also that there are quite some sorting operations inherently
    used (e.g. in 'ls', in shells '*' glob/pattern expansion, etc.).
    For example, don't expect find | xargs ls to provide a sorted
    output.


    Something to keep in mind if you ever decide to sort very large files in a >> pipeline. [...]

    In whatever way some instance of sort is implemented (memory, or
    temporary files, or whatever), my expectation is that
    whatever | sort
    will have to produce sorted output .- Isn't that guaranteed?

    Actually, I may be wrong about this. May have posted too quickly.

    The bad case would be if a program produced a ton of output, but the reader didn't read any of it. I'll have to think some more as to whether or not
    that applies here.

    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/GodDelusion

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Spiros Bousbouras on Sun Apr 23 16:05:37 2023
    On Sun, 23 Apr 2023 15:51:42 -0000 (UTC)
    Spiros Bousbouras <spibou@gmail.com> wrote:
    I don't see the problem. If sort is on the left of a pipe then it will
    sort its whole input and then all it will do is write to the pipe. If sort is on the right of a pipe then in the beginning it will only do reading
    until it has read everything and then do the sorting. Obviously if you
    have process1 | process2 and one side does reading or writing (whatever applies) much slower than the other side then the fast side will block

    To be precise , *may* block if the amount of data going through the pipe is large enough.

    but there's nothing special with sort about that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Kenny McCormack on Sun Apr 23 15:51:42 2023
    On Sun, 23 Apr 2023 14:36:30 -0000 (UTC)
    gazelle@shell.xmission.com (Kenny McCormack) wrote:
    In article <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>,
    David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
    ...
    Keep in mind. When sorting a file, the last line in the input may end up >becoming the first line in the output. The sort can not write anything to >the pipe or output file until it's sorted the entire input. With a pipe, >the temporary file is in ram rather then being a named file on disk.

    This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
    likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).

    I tried to find an argument which you can give to getconf to get the
    answer to that but I didn't see anything. I don't think POSIX gives a constant (in some C header) to get the answer to that. There is PIPE_BUF but this is
    for atomic writes rather than total pipe capacity.

    For most programs, this is rarely a concern, since most pipelines write and read more or less simultaneously in real time, but sort is an edge case for the reason you explain above.

    Something to keep in mind if you ever decide to sort very large files in a pipeline. And it is probably a better idea not to do so; to sort it all at once, using multiple key specifications on the command line.

    I don't see the problem. If sort is on the left of a pipe then it will
    sort its whole input and then all it will do is write to the pipe. If sort
    is on the right of a pipe then in the beginning it will only do reading
    until it has read everything and then do the sorting. Obviously if you
    have process1 | process2 and one side does reading or writing (whatever applies) much slower than the other side then the fast side will block but there's nothing special with sort about that. On the contrary , by the
    nature of what it does , sort will only do reading or writing during part
    of its operation.

    --
    Fans of both doomsday scenario movies and movies that show close-ups of Willem Dafoe's pubic region should walk away eerily pleased from this one.
    https://www.imdb.com/review/rw2553866/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David W. Hodgins@21:1/5 to Kenny McCormack on Sun Apr 23 12:26:48 2023
    On Sun, 23 Apr 2023 10:36:30 -0400, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
    likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).

    Just tested "sort bigfile|hexdump|less". htop shows it's using 917M of ram
    and 2.5GB of virtual storage (reserved, not all used) to sort a 730M input file.

    After ending the less output ...
    $ free -m
    total used free shared buff/cache available
    Mem: 15955 5715 2376 361 7863 9548
    Swap: 32761 2 32758

    There may be versions of sort that are still limit how much ram it can use but the version from the coreutils packages is not one of them. It's only limit is based on the amount of ram and swap space available, and what the oom killer can make available if you do start to run out.

    Also note it has options such as "--temporary-directory=DIR" to use disk files for temporary storage instead of ram.

    Regards, Dave Hodgins

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Spiros Bousbouras on Sun Apr 23 18:19:25 2023
    On 23.04.2023 17:51, Spiros Bousbouras wrote:

    [...] If sort
    is on the right of a pipe then in the beginning it will only do reading
    until it has read everything and then do the sorting. [...]

    This is [in principle] not necessarily the case. The sort algorithm
    can start to sort subsets of the stream to create runs of already
    sorted sequences. Mergesort, for example, is a good candidate for
    such a process; it can use (e.g.) Heapsort to create larger runs in
    memory and then needs less merge-runs (which are typically costly
    if that's done over files). How much data the Heapsort will process
    may vary, but a size of magnitude of the pipe-buffer is reasonable.

    Disclaimer: I don't know how Unix'es 'sort' is typically implemented,
    but I expect some sophisticated implementation, since what I wrote
    above is decades old knowledge (at least since the 1980's - when I
    implemented some hybrid sorting algorithms -, or maybe even back to
    Donald Knuth's work; but I don't recall whether it's covered in his
    "Searching and Sorting" book).

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to Janis Papanagnou on Sun Apr 23 17:36:47 2023
    On 23/04/2023 17:19, Janis Papanagnou wrote:
    On 23.04.2023 17:51, Spiros Bousbouras wrote:

    [...] If sort
    is on the right of a pipe then in the beginning it will only do reading
    until it has read everything and then do the sorting. [...]

    This is [in principle] not necessarily the case. The sort algorithm
    can start to sort subsets of the stream to create runs of already
    sorted sequences. Mergesort, for example, is a good candidate for
    such a process; it can use (e.g.) Heapsort to create larger runs in
    memory and then needs less merge-runs (which are typically costly
    if that's done over files). How much data the Heapsort will process
    may vary, but a size of magnitude of the pipe-buffer is reasonable.

    Disclaimer: I don't know how Unix'es 'sort' is typically implemented,
    but I expect some sophisticated implementation, since what I wrote
    above is decades old knowledge (at least since the 1980's - when I implemented some hybrid sorting algorithms -, or maybe even back to
    Donald Knuth's work; but I don't recall whether it's covered in his "Searching and Sorting" book).

    My man page says:

    --radixsort
    Try to use radix sort, if the sort specifications allow.
    The radix sort can only be used for trivial locales (C and
    POSIX), and it cannot be used for numeric or month sort.
    Radix sort is very fast and stable.

    --mergesort
    Use mergesort. This is a universal algorithm that can
    always be used, but it is not always the fastest.

    --qsort
    Try to use quick sort, if the sort specifications allow.
    This sort algorithm cannot be used with -u and -s.

    --heapsort
    Try to use heap sort, if the sort specifications allow.
    This sort algorithm cannot be used with -u and -s.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Richard Harnden on Sun Apr 23 18:45:57 2023
    On 23.04.2023 18:36, Richard Harnden wrote:

    My man page says:

    Thanks for that, since my man page doesn't say anything about the
    algorithms. Now we have some clue what 'sort' on Unix does; and it
    seems that hybrid sorting algorithms aren't implemented; which is
    really strange since Quicksort implementations usually use Linear
    Sort for small partitions, and upthread I already spoke about the Mergesort/Heapsort hybrid. (Room for improvement? Or are they just
    presuming that everything is doable with an arbitrary large virtual
    memory? Who knows.)


    --radixsort
    Try to use radix sort, if the sort specifications allow.
    The radix sort can only be used for trivial locales (C and
    POSIX), and it cannot be used for numeric or month sort.
    Radix sort is very fast and stable.

    --mergesort
    Use mergesort. This is a universal algorithm that can
    always be used, but it is not always the fastest.

    --qsort
    Try to use quick sort, if the sort specifications allow.
    This sort algorithm cannot be used with -u and -s.

    --heapsort
    Try to use heap sort, if the sort specifications allow.
    This sort algorithm cannot be used with -u and -s.


    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sun Apr 23 21:52:22 2023
    Martin Τrautmann <t-usenet@gmx.net>:
    On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
    If I want to pre-sort by 3 first, then sub-sort by column 2,
    that's fine. But when I pipe one sort to the other, the second
    sort will destroy the sort before. That's why i had my sort
    order in reverted order, using a pipe example.

    That won't help, either:  A sorting pipe using (a standard)
    "sort" won't solve the problem, because one cannot tell (a
    standard) "sort" to do a sort on the given key option only. 
    Each sort in the pipe will be total (according to its sort
    criteria) of its own.

    That was my problem - I expected that a pipe through several
    sorts would keep the order. I don't know why it doesn't.


    Look at these sample lines:


    1;0
    1;1
    1;2
    0;0
    0;1
    0;2
    2;0
    2;1
    2;2


    To have this sequence of lines sorted in such a way that the
    first field is sorted in ascending numeric order while the second
    is sorted in descending numeric order, one could specify the two
    sort criteria at once:

    sort -t ';' -k 1nb,1 -k 2nr,2


    How would the command line be if one would use two "sort"
    invocations with each of them getting only one "-k" option
    (replacing the "???" by the appropriate sort key specifications)?

    first=??? ; second=???
    sort -t ';' -k "$first" |
    sort -t ';' -k "$second"

    Or (if it's easier to understand, but it's equivalent) use an
    intermediate file rather than a pipe:

    first=??? ; second=???
    sort -t ';' -k "$first" > file &&
    sort -t ';' -k "$second" -- file


    Try to answer the following questions:


    Would the variable assignments


    first=2nr,2
    second=1nb,1

    yield the correct result?  Why or why not?  Would they work if
    one adds the GNU‐"sort" "--stable" option to the second "sort"
    invocations?  Why or why not?

    When using the variant with the intermediate file, after having
    run the first "sort" invocation, you might examine the
    intermediate file and try to predict what would be the outcome of
    the second "sort" invocation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David W. Hodgins@21:1/5 to John-Paul Stewart on Sun Apr 23 16:06:47 2023
    On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart <jpstewart@personalprojects.net> wrote:
    So pipes on Linux aren't very large at all. I don't know how other Unix systems compare.

    The pipe only has to store a minimum of one buffer of data. If the process writing data to the pipe is faster than the one reading it, then the write process will block while it waits for the reading process to catch up.
    Likewise if the reading process is faster. It will just block while it waits for the data to be ready.

    Having more buffers will speed it up only the processes run at different
    speeds with the slower one being inconsistent in it's speed.

    A good example of that is sort somefile>less.

    If the the user presses page down repeatedly. Each time the faster sort process has written enough data to fill the buffers, it gets blocked from writing until the page down key is pressed and the less command reads the data for the next screen full, freeing up some of the buffer space.

    Note that when I write that the sort command is faster, by time the first screen full shows up in less, all of the data has been sorted, it just needs
    to be written to the output. Until the data is sorted, the less command is blocked, waiting for input.

    Regards, Dave Hodgins

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John-Paul Stewart@21:1/5 to Kenny McCormack on Sun Apr 23 15:42:00 2023
    On 4/23/23 10:36, Kenny McCormack wrote:
    This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
    likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).

    FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
    section that says in part:

    Before Linux 2.6.11, the capacity of a pipe was the same as the
    system page size (e.g., 4096 bytes on i386). Since Linux
    2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
    system with a page size of 4096 bytes). Since Linux 2.6.35,
    the default pipe capacity is 16 pages, but the capacity can be
    queried and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
    PIPE_SZ operations. See fcntl(2) for more information.

    So pipes on Linux aren't very large at all. I don't know how other Unix systems compare.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to John-Paul Stewart on Sun Apr 23 20:41:37 2023
    On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart wrote:

    On 4/23/23 10:36, Kenny McCormack wrote:
    This actually raises an interesting point. Pipes are not infinite in size, >> and they could, theoretically block if enough is written on the write end
    without anything being read from the read end. Though the limits are
    likely very large nowadays on modern systems, I think the original
    implementation was only 4096 bytes and the standards today (POSIX) may not >> guarantee anything more than that (haven't checked).

    FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
    section that says in part:

    Before Linux 2.6.11, the capacity of a pipe was the same as the
    system page size (e.g., 4096 bytes on i386). Since Linux
    2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
    system with a page size of 4096 bytes). Since Linux 2.6.35,
    the default pipe capacity is 16 pages, but the capacity can be
    queried and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
    PIPE_SZ operations. See fcntl(2) for more information.

    And fcntl(2) says
    F_SETPIPE_SZ (int; since Linux 2.6.35)
    Change the capacity of the pipe referred to by fd to be at least
    arg bytes. An unprivileged process can adjust the pipe capacity
    to any value between the system page size and the limit defined
    in /proc/sys/fs/pipe-max-size (see proc(5)).

    On my Linux (untuned 4.4.301 kernel), /proc/sys/fs/pipe-max-size
    is set to
    16:35 $ cat /proc/sys/fs/pipe-max-size
    1048576
    or 1Mb

    So pipes on Linux aren't very large at all.

    ... unless you tune them upward.

    I don't know how other Unix systems compare.

    I've seen some studies; Linux pipe buffer sizes seem comparable to
    other systems, which range in the 20K to 64K default size range, and
    top out at about 1Mb.

    HTH
    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Sun Apr 23 22:30:24 2023
    Helmut Waitzmann <nn.throttle@xoxy.net>:
    Look at these sample lines:


    1;0
    1;1
    1;2
    0;0
    0;1
    0;2
    2;0
    2;1
    2;2


    To have this sequence of lines sorted in such a way that the
    first field is sorted in ascending numeric order while the
    second is sorted in descending numeric order,

    I'm sorry, that is a quite misleading description.  What I wanted
    to say is that the sequence of lines should be sorted to look
    like

    0;2
    0;1
    0;0
    1;2
    1;1
    1;0
    2;2
    2;1
    2;0

    and to achieve this…

    one could specify the two sort criteria at once:

    sort -t ';' -k 1nb,1 -k 2nr,2

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From vallor@21:1/5 to John-Paul Stewart on Mon Apr 24 14:05:39 2023
    On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart wrote:

    On 4/23/23 10:36, Kenny McCormack wrote:
    This actually raises an interesting point. Pipes are not infinite in
    size,
    and they could, theoretically block if enough is written on the write
    end without anything being read from the read end. Though the limits
    are likely very large nowadays on modern systems, I think the original
    implementation was only 4096 bytes and the standards today (POSIX) may
    not guarantee anything more than that (haven't checked).

    FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
    section that says in part:

    Before Linux 2.6.11, the capacity of a pipe was the same as the
    system page size (e.g., 4096 bytes on i386). Since Linux
    2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
    system with a page size of 4096 bytes). Since Linux 2.6.35, the
    default pipe capacity is 16 pages, but the capacity can be queried
    and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
    PIPE_SZ operations. See fcntl(2) for more information.

    So pipes on Linux aren't very large at all. I don't know how other Unix systems compare.

    Could the actual pipe size perhaps be queried
    and set with "ulimit"?

    $ ulimit -a
    [...]
    pipe size (512 bytes, -p) 8
    [...]

    With: GNU bash, version 5.1.16
    ("help ulimit" for docs on the shell built-in...)

    --
    -v (Scott)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to vallor on Mon Apr 24 17:03:57 2023
    On 24.04.2023 16:05, vallor wrote:

    Could the actual pipe size perhaps be queried
    and set with "ulimit"?

    $ ulimit -a
    [...]
    pipe size (512 bytes, -p) 8
    [...]

    With: GNU bash, version 5.1.16
    ("help ulimit" for docs on the shell built-in...)

    It's quite funny that every shell has its own formats; in bash you
    have to do the math (8x512) while in ksh it's 4096. Other quantities
    have different scaling, e.g. bytes vs. Kibytes. And some have units
    not defined (in ulimit or ulimit --man), like "blocks".

    # bash
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    file size (blocks, -f) unlimited

    # ksh
    pipe buffer size (bytes) (-p) 4096
    message queue size (Kibytes) (-q) 800
    file size (blocks) (-f) unlimited

    And zsh's ulimit "doesn't know" pipe size?

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to David W. Hodgins on Mon Apr 24 16:50:42 2023
    On 2023-04-23, David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
    On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart <jpstewart@personalprojects.net> wrote:
    So pipes on Linux aren't very large at all. I don't know how other Unix
    systems compare.

    The pipe only has to store a minimum of one buffer of data. If the process

    In fact, I suspect, a pipe doesn't have to store anything. It can be a
    pure rendezvous. The write() call can block until the reader performs a read(), or vice versa, at which time MIN(read_size, write_size) bytes
    can be transferred directly between their respective buffers, that value
    then being returned from the read and write.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to Janis Papanagnou on Tue Apr 25 13:50:14 2023
    Janis Papanagnou wrote:

    On 24.04.2023 16:05, vallor wrote:

    Could the actual pipe size perhaps be queried
    and set with "ulimit"?

    $ ulimit -a
    [...]
    pipe size (512 bytes, -p) 8
    [...]

    With: GNU bash, version 5.1.16
    ("help ulimit" for docs on the shell built-in...)

    It's quite funny that every shell has its own formats; in bash you
    have to do the math (8x512) while in ksh it's 4096.

    I believe the value ulimit is giving here is PIPE_BUF, not the
    capacity of the pipe.

    On my Linux system, much more than 4096 bytes can be written to
    a pipe without anything being read from it:

    $ dd if=/dev/zero | sleep 10
    ^C129+0 records in
    128+0 records out
    65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s

    (I used Ctrl-C to send dd a SIGINT.)

    $ ulimit -a | grep pipe
    pipe size (512 bytes, -p) 8
    $ getconf PIPE_BUF .
    4096

    In any case, on some systems "pipe capacity" is not a simple concept.
    SVR4's STREAMS-based pipes have separate high-water and low-water
    thresholds. (The writer blocks when high-water is reached but
    doesn't unblock until enough has been read to take the level below
    low-water.)

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to netnews@gclare.org.uk on Tue Apr 25 13:29:48 2023
    In article <6tukhj-bl1.ln1@ID-313840.user.individual.net>,
    Geoff Clare <netnews@gclare.org.uk> wrote:
    ...
    On my Linux system, much more than 4096 bytes can be written to
    a pipe without anything being read from it:

    $ dd if=/dev/zero | sleep 10
    ^C129+0 records in
    128+0 records out
    65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s

    (I used Ctrl-C to send dd a SIGINT.)

    Didn't somebody say upthread that the default limit on Linux is 64K?
    So, kinda funny that you chose exactly 64K for your demonstration.

    Anyway, you can (according to those same people) bump it up to 1M. if
    needed.

    --
    People who want to share their religious views with you
    almost never want you to share yours with them. -- Dave Barry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David W. Hodgins@21:1/5 to Kenny McCormack on Tue Apr 25 11:01:06 2023
    On Tue, 25 Apr 2023 09:29:48 -0400, Kenny McCormack <gazelle@shell.xmission.com> wrote:

    In article <6tukhj-bl1.ln1@ID-313840.user.individual.net>,
    Geoff Clare <netnews@gclare.org.uk> wrote:
    ...
    On my Linux system, much more than 4096 bytes can be written to
    a pipe without anything being read from it:

    $ dd if=/dev/zero | sleep 10
    ^C129+0 records in
    128+0 records out
    65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s

    (I used Ctrl-C to send dd a SIGINT.)

    Didn't somebody say upthread that the default limit on Linux is 64K?
    So, kinda funny that you chose exactly 64K for your demonstration.

    Anyway, you can (according to those same people) bump it up to 1M. if
    needed.

    It stopped after filling the output buffer, not the pipe. That data was still waiting to be written to the pipe when the dd command was terminated.

    Regards, Dave Hodgins

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Geoff Clare@21:1/5 to David W. Hodgins on Wed Apr 26 13:20:15 2023
    David W. Hodgins wrote:

    On Tue, 25 Apr 2023 09:29:48 -0400, Kenny McCormack <gazelle@shell.xmission.com> wrote:

    In article <6tukhj-bl1.ln1@ID-313840.user.individual.net>,
    Geoff Clare <netnews@gclare.org.uk> wrote:
    ...
    On my Linux system, much more than 4096 bytes can be written to
    a pipe without anything being read from it:

    $ dd if=/dev/zero | sleep 10
    ^C129+0 records in
    128+0 records out
    65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s

    (I used Ctrl-C to send dd a SIGINT.)

    Didn't somebody say upthread that the default limit on Linux is 64K?
    So, kinda funny that you chose exactly 64K for your demonstration.

    I didn't actively choose 64K. I haven't ever changed the pipe size on
    a Linux system, so the size used was whatever is the default.

    Anyway, you can (according to those same people) bump it up to 1M. if
    needed.

    It stopped after filling the output buffer, not the pipe. That data was still waiting to be written to the pipe when the dd command was terminated.

    Only one block (of 512 bytes) was waiting to be written. A feature
    of dd is that it reads and writes exactly the block sizes you tell it
    to (or 512 bytes by default). The dd output:

    128+0 records out

    means it had successfully written 128 blocks (of 512 bytes) to the pipe
    when it exited. The "129+0 records in" is what shows it had read one
    extra block that was waiting to be written.

    If I tell dd to read and write one byte at a time, it does exactly that:

    $ dd bs=1 if=/dev/zero | sleep 10
    ^C65537+0 records in
    65536+0 records out
    65536 bytes (66 kB, 64 KiB) copied, 4.88295 s, 13.4 kB/s

    --
    Geoff Clare <netnews@gclare.org.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eric Pozharski@21:1/5 to Janis Papanagnou on Wed Apr 26 14:56:44 2023
    with <u265ou$csqa$1@dont-email.me> Janis Papanagnou wrote:
    On 24.04.2023 16:05, vallor wrote:

    Could the actual pipe size perhaps be queried and set with "ulimit"?
    *SKIP*
    And zsh's ulimit "doesn't know" pipe size?

    Funny thing, looking through /usr/include/**/resource.h suggests that size
    of pipe has nothing to do with setrlimit(2) or ulimit(3). Weird.

    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to David W. Hodgins on Sat Apr 29 12:01:55 2023
    On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
    On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
    That was my problem - I expected that a pipe through several sorts would
    keep the order. I don't know why it doesn't.

    It may be easier to understand if you use a temporary files instead of pipes.

    Sorting the input file by column 4, numerical creating a first temporary file.
    Sort the first temporary file by column 2 creating a second temporary file. Sort the second temporary file by column 3 creating the output.

    The last sort doesn't know that the prior two sorts have been done. It just looks at the file it's giving and sorts it by column 3.

    Using a pipe just takes the output of the first and second sort and uses it directly as input for the next sort. All the pipe does is eliminate the
    need for a temporary file.

    But if I sort by one column only, then through the pipe by another
    column only, the second sort SHOULD respect the previous sort.
    Unfortunately, I feel it doesn't.

    Keep in mind. When sorting a file, the last line in the input may end up becoming
    the first line in the output. The sort can not write anything to the pipe or output file until it's sorted the entire input. With a pipe, the temporary file is in ram rather then being a named file on disk.

    So the sort via a file actually should work the same as via the pipe?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Helmut Waitzmann on Sat Apr 29 12:05:17 2023
    On Sun, 23 Apr 2023 22:30:24 +0200, Helmut Waitzmann wrote:
    Helmut Waitzmann <nn.throttle@xoxy.net>:
    Look at these sample lines:


    1;0
    1;1
    1;2
    0;0
    0;1
    0;2
    2;0
    2;1
    2;2


    To have this sequence of lines sorted in such a way that the
    first field is sorted in ascending numeric order while the
    second is sorted in descending numeric order,

    I'm sorry, that is a quite misleading description.  What I wanted
    to say is that the sequence of lines should be sorted to look
    like

    0;2
    0;1
    0;0
    1;2
    1;1
    1;0
    2;2
    2;1
    2;0

    and to achieve this…

    one could specify the two sort criteria at once:

    sort -t ';' -k 1nb,1 -k 2nr,2

    Would you achieve this via a pipe as well?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Elvidge@21:1/5 to All on Sat Apr 29 12:01:14 2023
    On 29/04/2023 11:01, Martin Τrautmann wrote:
    On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
    On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
    That was my problem - I expected that a pipe through several sorts would >>> keep the order. I don't know why it doesn't.

    It may be easier to understand if you use a temporary files instead of pipes.

    Sorting the input file by column 4, numerical creating a first temporary file.
    Sort the first temporary file by column 2 creating a second temporary file. >> Sort the second temporary file by column 3 creating the output.

    The last sort doesn't know that the prior two sorts have been done. It just >> looks at the file it's giving and sorts it by column 3.

    Using a pipe just takes the output of the first and second sort and uses it >> directly as input for the next sort. All the pipe does is eliminate the
    need for a temporary file.

    But if I sort by one column only, then through the pipe by another
    column only, the second sort SHOULD respect the previous sort.
    Unfortunately, I feel it doesn't.

    Of course it doesn't. How does the second sort know that the first sort
    even happened?


    Keep in mind. When sorting a file, the last line in the input may end up becoming
    the first line in the output. The sort can not write anything to the pipe or >> output file until it's sorted the entire input. With a pipe, the temporary >> file is in ram rather then being a named file on disk.

    So the sort via a file actually should work the same as via the pipe?



    --
    Chris Elvidge
    England

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to All on Sat Apr 29 12:17:24 2023
    On Sat, 29 Apr 2023 12:05:17 +0200, Martin Τrautmann wrote:
    On Sun, 23 Apr 2023 22:30:24 +0200, Helmut Waitzmann wrote:
    Helmut Waitzmann <nn.throttle@xoxy.net>:
    Look at these sample lines:


    1;0
    1;1
    1;2
    0;0
    0;1
    0;2
    2;0
    2;1
    2;2


    To have this sequence of lines sorted in such a way that the
    first field is sorted in ascending numeric order while the
    second is sorted in descending numeric order,

    I'm sorry, that is a quite misleading description.  What I wanted
    to say is that the sequence of lines should be sorted to look
    like

    0;2
    0;1
    0;0
    1;2
    1;1
    1;0
    2;2
    2;1
    2;0

    and to achieve this…

    one could specify the two sort criteria at once:

    sort -t ';' -k 1nb,1 -k 2nr,2

    Would you achieve this via a pipe as well?

    When I sort by column 2 first and only, I end up with
    0;2
    1;2
    2;2
    0;1
    1;1
    2;1
    0;0
    1;0
    2;0

    Why that? I would expect
    1;2
    0;2
    2;2
    1;1
    0;1
    2;1
    1;0
    0;0
    2;0

    So why does it resort by first column as well? Since it does that, both
    a pipe and a second sort from a temporary file still fail, since they
    also ignore the temporary sort of the other column.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Chris Elvidge on Sat Apr 29 14:12:28 2023
    On Sat, 29 Apr 2023 12:01:14 +0100, Chris Elvidge wrote:
    On 29/04/2023 11:01, Martin Τrautmann wrote:
    On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
    On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
    That was my problem - I expected that a pipe through several sorts would >>>> keep the order. I don't know why it doesn't.

    It may be easier to understand if you use a temporary files instead of pipes.

    Sorting the input file by column 4, numerical creating a first temporary file.
    Sort the first temporary file by column 2 creating a second temporary file. >>> Sort the second temporary file by column 3 creating the output.

    The last sort doesn't know that the prior two sorts have been done. It just >>> looks at the file it's giving and sorts it by column 3.

    Using a pipe just takes the output of the first and second sort and uses it >>> directly as input for the next sort. All the pipe does is eliminate the
    need for a temporary file.

    But if I sort by one column only, then through the pipe by another
    column only, the second sort SHOULD respect the previous sort.
    Unfortunately, I feel it doesn't.

    Of course it doesn't. How does the second sort know that the first sort
    even happened?

    It should sort on the given column only, but keep anything else as it
    was. I guess that's my misconception - however, sort seems to be allowed
    to resort anything else however it likes. That's the difference e.g. to
    an excel spreadsheet, which does keep the former sort.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to All on Sat Apr 29 13:38:58 2023
    On 29/04/2023 13:12, Martin Τrautmann wrote:
    On Sat, 29 Apr 2023 12:01:14 +0100, Chris Elvidge wrote:
    On 29/04/2023 11:01, Martin Τrautmann wrote:
    On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
    On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
    That was my problem - I expected that a pipe through several sorts would >>>>> keep the order. I don't know why it doesn't.

    It may be easier to understand if you use a temporary files instead of pipes.

    Sorting the input file by column 4, numerical creating a first temporary file.
    Sort the first temporary file by column 2 creating a second temporary file.
    Sort the second temporary file by column 3 creating the output.

    The last sort doesn't know that the prior two sorts have been done. It just
    looks at the file it's giving and sorts it by column 3.

    Using a pipe just takes the output of the first and second sort and uses it
    directly as input for the next sort. All the pipe does is eliminate the >>>> need for a temporary file.

    But if I sort by one column only, then through the pipe by another
    column only, the second sort SHOULD respect the previous sort.
    Unfortunately, I feel it doesn't.

    Of course it doesn't. How does the second sort know that the first sort
    even happened?

    It should sort on the given column only, but keep anything else as it
    was. I guess that's my misconception - however, sort seems to be allowed
    to resort anything else however it likes. That's the difference e.g. to
    an excel spreadsheet, which does keep the former sort.

    You want a stable sort, then. Check if you have a '-s' option.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Richard Harnden on Sat Apr 29 20:23:18 2023
    On Sat, 29 Apr 2023 13:38:58 +0100, Richard Harnden wrote:
    It should sort on the given column only, but keep anything else as it
    was. I guess that's my misconception - however, sort seems to be allowed
    to resort anything else however it likes. That's the difference e.g. to
    an excel spreadsheet, which does keep the former sort.

    You want a stable sort, then. Check if you have a '-s' option.

    wow, cool

    -s, --stable
    stabilize sort by disabling last-resort comparison

    I do not understand what that means. But it worked

    1;2
    0;2
    2;2
    1;1
    0;1
    2;1
    1;0
    0;0
    2;0
    ...
    0;2
    0;1
    0;0
    1;2
    1;1
    1;0
    2;2
    2;1
    2;0

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David W. Hodgins@21:1/5 to t-usenet@gmx.net on Sat Apr 29 16:45:15 2023
    On Sat, 29 Apr 2023 14:23:18 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:

    On Sat, 29 Apr 2023 13:38:58 +0100, Richard Harnden wrote:
    It should sort on the given column only, but keep anything else as it
    was. I guess that's my misconception - however, sort seems to be allowed >>> to resort anything else however it likes. That's the difference e.g. to
    an excel spreadsheet, which does keep the former sort.

    You want a stable sort, then. Check if you have a '-s' option.

    wow, cool

    -s, --stable
    stabilize sort by disabling last-resort comparison

    I do not understand what that means. But it worked

    See https://unix.stackexchange.com/questions/64102/why-is-sort-changing-the-order-of-lines-with-identical-sort-keys

    Regards, Dave Hodgins

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to All on Sat Apr 29 20:33:24 2023
    On Sat, 29 Apr 2023 20:23:18 +0200, Martin Τrautmann wrote:

    On Sat, 29 Apr 2023 13:38:58 +0100, Richard Harnden wrote:
    It should sort on the given column only, but keep anything else as it
    was. I guess that's my misconception - however, sort seems to be allowed >>> to resort anything else however it likes. That's the difference e.g. to
    an excel spreadsheet, which does keep the former sort.

    You want a stable sort, then. Check if you have a '-s' option.

    wow, cool

    -s, --stable
    stabilize sort by disabling last-resort comparison

    I do not understand what that means. But it worked

    From the option summary, the meaning is less than obvious. However
    some versions of the manpage include an explanation:

    "A pair of lines is compared as follows: if any key fields have
    been specified, 'sort' compares each pair of fields, in the
    order specified on the command line, according to the associated
    ordering options, until a difference is found or no fields are
    left.
    ...
    Finally, as a last resort when all keys compare equal (or if no
    ordering options were specified at all), 'sort' compares the
    entire lines. ... The '-s' (stable) option disables this
    last-resort comparison so that lines in which all fields
    compare equal are left in their original relative order.
    ..."

    In the case of a file that has already been sorted, either on a
    key occurring before the key-to-be-sorted, or on a key that follows
    (but is not adjacent to) the key-to-be sorted, this "last resort
    comparison" may result in a record that sorts out-of-sequence
    with respect to the prior sort order. To ensure that the order
    from a prior sort is not lost, you have to disable this "last
    resort comparison".

    [snip]

    HTH
    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Lew Pitcher on Sat Apr 29 22:53:51 2023
    On Sat, 29 Apr 2023 20:33:24 -0000 (UTC), Lew Pitcher wrote:
    From the option summary, the meaning is less than obvious. However
    some versions of the manpage include an explanation:

    "A pair of lines is compared as follows: if any key fields have
    been specified, 'sort' compares each pair of fields, in the
    order specified on the command line, according to the associated
    ordering options, until a difference is found or no fields are
    left.
    ...

    Finally, as a last resort when all keys compare equal (or if no
    ordering options were specified at all), 'sort' compares the
    entire lines. ...

    That is unexpected, but does explain why it behaves that way.

    In the case of a file that has already been sorted, either on a
    key occurring before the key-to-be-sorted, or on a key that follows
    (but is not adjacent to) the key-to-be sorted, this "last resort
    comparison" may result in a record that sorts out-of-sequence
    with respect to the prior sort order. To ensure that the order
    from a prior sort is not lost, you have to disable this "last
    resort comparison".

    Again a lesson learned

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Mon May 1 13:27:42 2023
    Martin Τrautmann <t-usenet@gmx.net>:

    So the sort via a file actually should work the same as via the
    pipe?

    Yes.  At least the result will be the same.  When using a pipe,
    the first sort must either use its virtual memory if the data fit
    into it else use a temporary file.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Mon May 1 13:19:24 2023
    Martin Τrautmann <t-usenet@gmx.net>:
    On Sat, 29 Apr 2023 12:05:17 +0200, Martin Τrautmann wrote:
    On Sun, 23 Apr 2023 22:30:24 +0200, Helmut Waitzmann wrote:
    Helmut Waitzmann <nn.throttle@xoxy.net>:
    Look at these sample lines:


    1;0
    1;1
    1;2
    0;0
    0;1
    0;2
    2;0
    2;1
    2;2


    To have this sequence of lines sorted in such a way that the
    first field is sorted in ascending numeric order while the
    second is sorted in descending numeric order,

    I'm sorry, that is a quite misleading description.  What I wanted
    to say is that the sequence of lines should be sorted to look
    like

    0;2
    0;1
    0;0
    1;2
    1;1
    1;0
    2;2
    2;1
    2;0

    and to achieve this…


    one could specify the two sort criteria at once:


    sort -t ';' -k 1nb,1 -k 2nr,2

    Would you achieve this via a pipe as well?


    When I sort by column 2 first and only, I end up with

    0;2
    1;2
    2;2
    0;1
    1;1
    2;1
    0;0
    1;0
    2;0

    Why that? I would expect

    1;2
    0;2
    2;2
    1;1
    0;1
    2;1
    1;0
    0;0
    2;0


    [Sorted by using the


    sort -t ';' -k 2nr,2

    command]

    So why does it resort by first column as well?


    Because that is the way "sort" is supposed to work.  The POSIX
    standard, especially the last paragraph in the "OPTIONS" section
    (<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04>)
    says:  "Except when the -u option is specified, lines that
    otherwise compare equal shall be ordered as if none of the
    options -d, -f, -i, -n, or -k were present (but with -r still in
    effect, if it was specified) and with all bytes in the lines
    significant to the comparison.  The order in which lines that
    still compare equal are written is unspecified."

    In your case, the lines "1;2" and "0;2" for example compare equal
    when compared according to the "-k 2nr,2" key specification. 
    Because of that equality, these two equal comparing lines are
    ordered as a last resort, as if by the

    sort

    command line, and that of course will sort "0;2" before "1;2".


    With GNU sort, you may specify the "--stable" option (which
    unfortunately is not part of the POSIX standard) to suppress that
    last resort ordering.

    Since it does that, both a pipe and a second sort from a
    temporary file still fail, since they also ignore the temporary
    sort of the other column.

    Yes, "sort" without the GNU "sort" "--stable" option will always
    do a total ordering, ignoring and destroying any order that has
    been done to its input before.  That's what we've been discussing
    the whole thread and that's what makes the GNU "sort" "--stable"
    option a nice thing to have.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Helmut Waitzmann on Mon May 1 15:02:25 2023
    Helmut Waitzmann <nn.throttle@xoxy.net> writes:

    Sorry, piggybacking...

    Yes, "sort" without the GNU "sort" "--stable" option will always do a
    total ordering, ignoring and destroying any order that has been done to
    its input before. That's what we've been discussing the whole thread and that's what makes the GNU "sort" "--stable" option a nice thing to
    have.

    There's an old trick that was common back in the day of adding a line
    number (or similar) and then removing it. You could then either
    explicitly sort on that number or make sure that the number has leading
    zeros so the default sort restores the original order:

    nl -n rz data | sort -t ';' -k 2nr,2 | cut -f2-

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to All on Mon May 1 18:57:13 2023
    On Mon, 01 May 2023 20:27:57 +0200, Martin Τrautmann wrote:

    On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
    So why does it resort by first column as well?


    Because that is the way "sort" is supposed to work. 

    How should I know that this is supposed that way? If I tell "sort" to
    sort by a certain column only, why would I have to expect that it will
    sort by something else as well?

    As Helmut said, "because that is the way 'sort' is supposed to work".
    The Open Group defines the interface and results for each of the common
    'Unix' utilities, "sort[1]" included, and their definition of sort says
    that
    "When there are multiple key fields, later keys shall be compared
    only after all earlier keys compare equal. ... [L]ines that otherwise
    compare equal shall be ordered as if none of the options -d, -f, -i,
    -n, or -k were present ... and with all bytes in the lines
    significant to the comparison."

    The "Rationale" section /does/ seem to give implementations some leeway:
    "Implementations are encouraged to perform the recommended further
    byte-by-byte comparison of lines that collate equally, even though
    this may affect efficiency."
    The key phrase here is "are encouraged", implying that this behaviour,
    while specified, is not absolutely required.

    [1] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html


    HTH
    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to t-usenet@gmx.net on Mon May 1 19:13:05 2023
    On 2023-05-01, Martin Τrautmann <t-usenet@gmx.net> wrote:
    On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
    So why does it resort by first column as well?


    Because that is the way "sort" is supposed to work. 

    How should I know that this is supposed that way? If I tell "sort" to
    sort by a certain column only, why would I have to expect that it will
    sort by something else as well?

    Sorting, in computer science, may be stable or unstable. If you've
    not read the documentation of a sorting system thoroughly,
    you have no basis for expecting it to be one way or the other.

    When you expect stable sort, you're still expecting "sorting by
    something else".

    Under stable sort, all records are imagined to be put into
    correspondence with the natural numbers, in their original sorted order.
    When two records are considered equal by the sorting comparison
    function, they are in fact not considered equal but further
    compared by their original order number: the record with the lower
    number is considered lesser.

    Some sorting algorithms achieve that behavior implicitly, by never
    exchanging the relative position of items that are equal by the sorting comparison function. Algorithms which work by comparing elements
    pairwise and swapping them into order will be like this: merge sort,
    insertion sort, Shell sort.

    Some sorting algorithms, like quicksort, will wreck the original order
    for equal keys. Quicksort has a partitioning step whereby it chooses
    some middle key value, and then separates records into two groups: those
    higher and those lower.

    If all you know is that some program sorts, you have no idea
    which kind of algorithm is using.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Helmut Waitzmann on Mon May 1 20:27:57 2023
    On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
    So why does it resort by first column as well?


    Because that is the way "sort" is supposed to work.

    How should I know that this is supposed that way? If I tell "sort" to
    sort by a certain column only, why would I have to expect that it will
    sort by something else as well?

    Since it does that, both a pipe and a second sort from a
    temporary file still fail, since they also ignore the temporary
    sort of the other column.

    Yes, "sort" without the GNU "sort" "--stable" option will always
    do a total ordering, ignoring and destroying any order that has
    been done to its input before. That's what we've been discussing
    the whole thread and that's what makes the GNU "sort" "--stable"
    option a nice thing to have.

    Fortunately, someone else pointet out the stable and last resort options
    to me. Thanks for that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Waitzmann@21:1/5 to All on Mon May 1 21:16:09 2023
    Ben Bacarisse <ben.usenet@bsb.me.uk>:
    Helmut Waitzmann <nn.throttle@xoxy.net> writes:

    Yes, "sort" without the GNU "sort" "--stable" option will
    always do a total ordering, ignoring and destroying any order
    that has been done to its input before.  That's what we've been
    discussing the whole thread and that's what makes the GNU
    "sort" "--stable" option a nice thing to have.

    There's an old trick that was common back in the day of adding a
    line number (or similar) and then removing it. You could then
    either explicitly sort on that number or make sure that the
    number has leading zeros so the default sort restores the
    original order:

    nl -n rz data | sort -t ';' -k 2nr,2 | cut -f2-


    I'm stunned.  Thank you for presenting this solution!  And thank
    you, Martin, for initiating this interesting topic!

    I prefer


    grep -F -n -- ''

    over

    nl -n rz

    though, because it doesn't get confused by header, body, and
    footer lines (see the "nl(1)" manual):

    # Sort according to the second numerical field, descending:
    #
    sort -t ';' -k 2nr,2 |

    # To each line, prepend an additional numeric field, ascending,
    # separated by a ";", in order to "save" the sorted result for
    # later retrieval thus making the second sort below a "stable"
    # one:
    #
    grep -F -n -- '' | sed -e 's/:/;/' |

    # Sort according to the original first - now second - numerical
    # field, ascending, and the "saved" sort in the first numerical
    # field, ascending, thus getting a stable sort:
    #
    sort -t ';' -k 2nb,2 -k 1nb,1 |

    # Finally remove the leading field of the saved sort result:
    #
    cut -d ';' -f 2-


    Of course this is no better than just doing


    sort -t ';' -k 1n,1 -k 2nr,2

    but it might be helpful when there are either more than 10 sort
    keys (POSIX only requires that "sort" shall at least allow 10
    sort keys) or different delimiters ("-t" option), which can't be
    specified in one sort invocation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Lew Pitcher on Mon May 1 22:35:00 2023
    On Mon, 1 May 2023 18:57:13 -0000 (UTC), Lew Pitcher wrote:
    On Mon, 01 May 2023 20:27:57 +0200, Martin Τrautmann wrote:

    On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
    So why does it resort by first column as well?


    Because that is the way "sort" is supposed to work. 

    How should I know that this is supposed that way? If I tell "sort" to
    sort by a certain column only, why would I have to expect that it will
    sort by something else as well?

    As Helmut said, "because that is the way 'sort' is supposed to work".
    The Open Group defines the interface and results for each of the common 'Unix' utilities, "sort[1]" included, and their definition of sort says
    that
    "When there are multiple key fields, later keys shall be compared
    only after all earlier keys compare equal. ... [L]ines that otherwise
    compare equal shall be ordered as if none of the options -d, -f, -i,
    -n, or -k were present ... and with all bytes in the lines
    significant to the comparison."

    So where is that information available on my computer? Sorry, but I
    really did not think about using a geneology search first to find out
    how someone thought something should behave. No, it was not obvious to
    me. When -k tells me about first and last key to sort by, I just did not
    expect a bonus sort.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to t-usenet@gmx.net on Mon May 1 17:49:24 2023
    Martin Τrautmann <t-usenet@gmx.net> writes:
    On Mon, 1 May 2023 18:57:13 -0000 (UTC), Lew Pitcher wrote:
    On Mon, 01 May 2023 20:27:57 +0200, Martin Τrautmann wrote:
    On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
    So why does it resort by first column as well?

    Because that is the way "sort" is supposed to work. 

    How should I know that this is supposed that way? If I tell "sort" to
    sort by a certain column only, why would I have to expect that it will
    sort by something else as well?

    As Helmut said, "because that is the way 'sort' is supposed to work".
    The Open Group defines the interface and results for each of the common
    'Unix' utilities, "sort[1]" included, and their definition of sort says
    that
    "When there are multiple key fields, later keys shall be compared
    only after all earlier keys compare equal. ... [L]ines that otherwise
    compare equal shall be ordered as if none of the options -d, -f, -i,
    -n, or -k were present ... and with all bytes in the lines
    significant to the comparison."

    So where is that information available on my computer? Sorry, but I
    really did not think about using a geneology search first to find out
    how someone thought something should behave. No, it was not obvious to
    me. When -k tells me about first and last key to sort by, I just did not expect a bonus sort.

    Nobody is expecting you to know this inherently. Helmut told you
    'Because that is the way "sort" is supposed to work.' I don't think he
    meant to imply that there was anything wrong with you for not already
    knowing it. You asked; he answered.

    You *should* be able to get this information with `man sort`. If you
    have the GNU coreutils implementation of sort, the man page doesn't
    mention re-sorting by the whole line (which is IMHO unfortunate), but at
    the bottom of the man page there is a reference to the full
    documentation:

    Full documentation <https://www.gnu.org/software/coreutils/sort>
    or available locally via: info '(coreutils) sort invocation'

    If you have an implemntation other than GNU coreutils, `man sort` is
    likely to describe it in more detail. `sort --help` is also a good
    thing to try.

    It's also good to know about the POSIX standard:
    <https://pubs.opengroup.org/onlinepubs/9699919799/toc.htm>
    This is the standard for the behavior of Unix tools, but not all implementations follow it completely, and most provide extra
    functionality.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for XCOM Labs
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Keith Thompson on Tue May 2 12:43:00 2023
    On Mon, 01 May 2023 17:49:24 -0700, Keith Thompson wrote:
    You *should* be able to get this information with `man sort`. If you
    have the GNU coreutils implementation of sort, the man page doesn't
    mention re-sorting by the whole line (which is IMHO unfortunate), but at
    the bottom of the man page there is a reference to the full
    documentation:

    Full documentation <https://www.gnu.org/software/coreutils/sort>
    or available locally via: info '(coreutils) sort invocation'

    If you have an implemntation other than GNU coreutils, `man sort` is
    likely to describe it in more detail. `sort --help` is also a good
    thing to try.

    No, mine says

    SEE ALSO
    The full documentation for sort is maintained as a Texinfo
    manual. If the info and sort programs are properly installed at your
    site, the command

    info sort

    should give you access to the complete manual.

    sort 5.93 November 2005
    SORT(1)

    And info sort does not provide more details here.

    It's also good to know about the POSIX standard:
    <https://pubs.opengroup.org/onlinepubs/9699919799/toc.htm>
    This is the standard for the behavior of Unix tools, but not all implementations follow it completely, and most provide extra
    functionality.

    It does indicate
    +++
    Implementations are encouraged to perform the recommended further
    byte-by-byte comparison of lines that collate equally, even though this
    may affect efficiency. The impact on efficiency can be mitigated by only performing the additional comparison if the current locale's collating
    sequence does not have a total ordering of all characters (if the implementation provides a way to query this) or by only performing the additional comparison if the locale name associated with the LC_COLLATE category has an '@' modifier in the name (since locales without an '@'
    modifier should have a total ordering of all characters - see XBD
    LC_COLLATE). Note that if the implementation provides a stable sort
    option as an extension (usually -s), the additional comparison should
    not be performed when this option has been specified.
    +++

    So the -s stable is named within further notes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to t-usenet@gmx.net on Tue May 2 11:24:46 2023
    On Tue, 2 May 2023 12:43:00 +0200
    Martin =?UTF-8?Q?=CE=A4rautmann?= <t-usenet@gmx.net> wrote:
    On Mon, 01 May 2023 17:49:24 -0700, Keith Thompson wrote:
    You *should* be able to get this information with `man sort`. If you
    have the GNU coreutils implementation of sort, the man page doesn't
    mention re-sorting by the whole line (which is IMHO unfortunate), but at the bottom of the man page there is a reference to the full
    documentation:

    Full documentation <https://www.gnu.org/software/coreutils/sort>
    or available locally via: info '(coreutils) sort invocation'

    If you have an implemntation other than GNU coreutils, `man sort` is
    likely to describe it in more detail. `sort --help` is also a good
    thing to try.

    No, mine says

    SEE ALSO
    The full documentation for sort is maintained as a Texinfo
    manual. If the info and sort programs are properly installed at your
    site, the command

    info sort

    should give you access to the complete manual.

    sort 5.93 November 2005
    SORT(1)

    And info sort does not provide more details here.

    On the other hand https://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html says

    A pair of lines is compared as follows: sort compares each pair of
    fields (see --key), in the order specified on the command line,
    according to the associated ordering options, until a difference is
    found or no fields are left. If no key fields are specified, sort uses a
    default key of the entire line. Finally, as a last resort when all keys
    compare equal, sort compares entire lines as if no ordering options
    other than --reverse (-r) were specified. The --stable (-s) option
    disables this last-resort comparison so that lines in which all fields
    compare equal are left in their original relative order. The --unique
    (-u) option also disables the last-resort comparison.

    .With GNU software it is worth checking (and perhaps also downloading) the online documentation which usually has a lot more detail than what is automatically installed and that includes both info and man pages. (GNU
    tar is a prominent example)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to t-usenet@gmx.net on Tue May 2 13:41:28 2023
    Martin Τrautmann <t-usenet@gmx.net> writes:
    On Mon, 01 May 2023 17:49:24 -0700, Keith Thompson wrote:
    You *should* be able to get this information with `man sort`. If you
    have the GNU coreutils implementation of sort, the man page doesn't
    mention re-sorting by the whole line (which is IMHO unfortunate), but at
    the bottom of the man page there is a reference to the full
    documentation:

    Full documentation <https://www.gnu.org/software/coreutils/sort>
    or available locally via: info '(coreutils) sort invocation'

    If you have an implemntation other than GNU coreutils, `man sort` is
    likely to describe it in more detail. `sort --help` is also a good
    thing to try.

    No, mine says

    SEE ALSO
    The full documentation for sort is maintained as a Texinfo
    manual. If the info and sort programs are properly installed at your
    site, the command

    info sort

    should give you access to the complete manual.

    sort 5.93 November 2005
    SORT(1)

    And info sort does not provide more details here.

    Yours is quite old. If you don't have the "info" documentation
    installed, "info sort" falls back to showing you the man page. Is that
    what you're seeing? The info documentation for COREUTILS-5_92 (which I
    presume is very close to the version you have) says:

    If no key fields are specified, @command{sort} uses a default key of
    the entire line. Finally, as a last resort when all keys compare
    equal, @command{sort} compares entire lines as if no ordering options
    other than @option{--reverse} (@option{-r}) were specified. The
    @option{--stable} (@option{-s}) option disables this @dfn{last-resort
    comparison} so that lines in which all fields compare equal are left
    in their original relative order. The @option{--unique}
    (@option{-u}) option also disables the last-resort comparison.

    (That's from the raw coreutils.texi file; the info documentation is
    generated from it.)

    At least on modern Ubuntu, the "coreutils" package installs all the documentation. Perhaps on your system the tools and the documentation
    are in separate packages, for example "coreutils" and "coreutils-doc".

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for XCOM Labs
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Keith Thompson on Wed May 3 08:47:28 2023
    On Tue, 02 May 2023 13:41:28 -0700, Keith Thompson wrote:
    And info sort does not provide more details here.

    Yours is quite old. If you don't have the "info" documentation
    installed, "info sort" falls back to showing you the man page.

    That's what it does. It's the older MacBook where I do that stuff and
    its SSD is limited to get all available coreutils etc. stuff.

    Good to know that the actual info files would have known the details.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dr Eberhard W Lisse@21:1/5 to All on Fri May 5 10:35:01 2023
    mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3

    mfg, el

    On 19/04/2023 09:27, Martin Τrautmann wrote:

    Hi all,

    how do I sort by multiple columns?

    Example:
    +++
    Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
    +++

    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    So the result should be
    +++
    Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
    +++

    I tried both
    sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
    and
    sort -k4 -t";" -n -k2,2 -k3,3
    and some permutations and reverted orders, without success.
    The sort by column 4 just gets lost or resorted.

    I'm not sure about the man page
    -k, --key=POS1[,POS2]
    start a key at POS1, end it at POS2 (origin 1)

    So I tried relative positions with
    -k3,1
    as well, without success.

    How do I apply the sort syntax properly?

    Thanks
    Martin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Dr Eberhard W Lisse on Fri May 5 12:28:16 2023
    On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
    mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3

    miller looks very powerful to, but unfortunately it's not available here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to traut@gmx.de on Fri May 5 12:26:56 2023
    In article <slrnu59tbj.2sg.t-usenet@ID-685.user.individual.de>,
    Martin rautmann <traut@gmx.de> wrote:
    On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
    mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3

    miller looks very powerful to me, but unfortunately it's not available here.

    Some sort of import/export restriction in your country?

    --
    If you ask a Trumper who is to blame for the debacle of Jan 6, they will almost certainly say
    something about Antifa/BLM/something/whatever. This shows just how screwed up they are; they can't
    even get their narrative straight. What they *should* say is "Eugene Goodman". If not for him, the plot
    would probably have succeeded, so he (Eugene) is clearly to blame for the failure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Dr Eberhard W Lisse on Fri May 5 14:24:19 2023
    On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
    mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3

    miller looks very powerful to me, but unfortunately it's not available here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Kenny McCormack on Fri May 5 16:35:02 2023
    On Fri, 5 May 2023 12:26:56 -0000 (UTC), Kenny McCormack wrote:
    In article <slrnu59tbj.2sg.t-usenet@ID-685.user.individual.de>,
    Martin rautmann <traut@gmx.de> wrote:
    On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
    mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3

    miller looks very powerful to me, but unfortunately it's not available here.

    Some sort of import/export restriction in your country?

    Error: Port miller requires a full Xcode installation, which was not
    found on your system.

    ...and I've not enough space for that, 256 GB SSD only.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gerg@21:1/5 to traut@gmx.de on Fri May 5 23:45:59 2023
    In article <slrnu5a50m.2sg.t-usenet@ID-685.user.individual.de>,
    Martin Τrautmann <traut@gmx.de> wrote:
    On Fri, 5 May 2023 12:26:56 -0000 (UTC), Kenny McCormack wrote:
    In article <slrnu59tbj.2sg.t-usenet@ID-685.user.individual.de>,
    Martin rautmann <traut@gmx.de> wrote:
    On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
    mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3

    miller looks very powerful to me, but unfortunately it's not available here. >>
    Some sort of import/export restriction in your country?

    Error: Port miller requires a full Xcode installation, which was not
    found on your system.

    ...and I've not enough space for that, 256 GB SSD only.


    Homebrew is a thing on MacOS. A thing that seems to include miller v6.7.0. <https://formulae.brew.sh/formula/miller#default>

    (Homebrew only needs the Xcode runtime, not the full install)

    --
    ::::::::::::: Greg Andrews ::::: gerg@panix.com :::::::::::::
    I have a map of the United States that's actual size.
    -- Steven Wright

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Popping Mad@21:1/5 to All on Sat May 6 02:03:24 2023
    On 4/19/23 03:27, Martin Τrautmann wrote:

    Hi all,

    how do I sort by multiple columns?


    awk

    Example:
    +++
    Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
    +++

    I want to sort
    * first by column 4, numerical,
    * second by column 2
    * third by column 3

    So the result should be
    +++
    Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
    +++

    I tried both
    sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
    and
    sort -k4 -t";" -n -k2,2 -k3,3
    and some permutations and reverted orders, without success.
    The sort by column 4 just gets lost or resorted.

    I'm not sure about the man page
    -k, --key=POS1[,POS2]
    start a key at POS1, end it at POS2 (origin 1)

    So I tried relative positions with
    -k3,1
    as well, without success.

    How do I apply the sort syntax properly?

    Thanks
    Martin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin =?UTF-8?Q?=CE=A4rautmann?=@21:1/5 to Popping Mad on Sat May 6 09:48:21 2023
    On Sat, 6 May 2023 02:03:24 -0400, Popping Mad wrote:
    On 4/19/23 03:27, Martin Τrautmann wrote:

    Hi all,

    how do I sort by multiple columns?


    awk

    Nope. "awk" alone does not to the job.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to t-usenet@gmx.net on Sat May 6 09:47:05 2023
    On 2023-05-06, Martin Τrautmann <t-usenet@gmx.net> wrote:
    On Sat, 6 May 2023 02:03:24 -0400, Popping Mad wrote:
    On 4/19/23 03:27, Martin Τrautmann wrote:

    Hi all,

    how do I sort by multiple columns?


    awk

    Nope. "awk" alone does not to the job.

    You may be able to cob together with GNU Awk, which provides:

    - controlling the traversal of associative array to be in sorted orders.

    - the asort function for sorting an associative array
    (the indices are clobbered to a 1..N enumeration).

    - the asorti function which sorts the indices instead: they
    become the values, and indices go to 1..N.

    A user-defined comparison can be used in asort and asorti,
    which receives all four relevant values: left key and value,
    right key and value.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kenny McCormack@21:1/5 to traut@gmx.de on Sat May 6 11:48:49 2023
    In article <slrnu5c1i5.3rj.t-usenet@ID-685.user.individual.de>,
    Martin rautmann <traut@gmx.de> wrote:
    On Sat, 6 May 2023 02:03:24 -0400, Popping Mad wrote:
    On 4/19/23 03:27, Martin rautmann wrote:

    Hi all,

    how do I sort by multiple columns?


    awk

    Nope. "awk" alone does not to the job.

    Yes. As Kaz explains.

    --
    "Women should not be enlightened or educated in any way. They should be segregated because they are the cause of unholy erections in holy men.

    -- Saint Augustine (354-430) --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dr Eberhard W Lisse@21:1/5 to gerg on Sat May 6 16:04:30 2023
    Homebrew only needs the Xcode Command Line Tools, but only if you
    wish to or have to compile sources. It's mainly binaries only and
    works perfectly well without if if you can tolerate the squealing
    :-)-O

    I don't want to start a war on that but I find Homebrew more complete.

    el

    On 06/05/2023 01:45, gerg wrote:
    [...]
    Homebrew is a thing on MacOS. A thing that seems to include
    miller v6.7.0.
    <https://formulae.brew.sh/formula/miller#default>

    (Homebrew only needs the Xcode runtime, not the full install)


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dr Eberhard W Lisse@21:1/5 to All on Sat May 6 16:01:14 2023
    I use Homebrew on my Macs, which has it but you can also pull the
    tar.gz from

    https://github.com/johnkerl/miller/releases/tag/v6.7.0

    and after extracting cd to the directory and run

    xattr -cr mlr

    greetings, el

    On 05/05/2023 16:35, Martin Τrautmann wrote:
    [...]
    Error: Port miller requires a full Xcode installation, which was not
    found on your system.

    ...and I've not enough space for that, 256 GB SSD only.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)