Hi all,[...]
how do I sort by multiple columns?
Example:
+++
Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindensttte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindensttte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindensttte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindensttte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindensttte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindensttte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindensttte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindensttte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindensttte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindensttte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindensttte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindensttte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindensttte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindensttte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindensttte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindensttte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindensttte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindensttte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindensttte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindensttte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindensttte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindensttte;8;;32;519673.036;5709040.372
+++
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
So the result should be
+++
Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109
Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354
Hi all,
how do I sort by multiple columns?
Example:
+++
Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
+++
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
So the result should be
+++
Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
+++
I tried both
sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
and
sort -k4 -t";" -n -k2,2 -k3,3
and some permutations and reverted orders, without success.
The sort by column 4 just gets lost or resorted.
I'm not sure about the man page
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
So I tried relative positions with
-k3,1
as well, without success.
How do I apply the sort syntax properly?
Thanks
Martin
On 19.04.2023 09:27, Martin Τrautmann wrote:
Hi all,
how do I sort by multiple columns?
[...]
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
From that specification I'd write
sort -t\; -k4n -k2 -k3
but your expected data below doesn't follow your own spec. So the specification probably needs a correction.
So the result should be[...]
+++
Borgentreich;D9444;Auf der Lindensttte;1;;32;519950.850;5708982.109
Borgentreich;D9386;Lindensttte;1;;32;520150.696;5709236.354
Why are these 2 lines sorted this way ? Column 4 is the same ("1" in
both) so it boils down to how "D9444" and "D9386" get sorted. What
comes first and why ? It seems to me that "D9386" comes earlier than
"D9444" .
Your locale may also turn out to be relevant so you should mention
that.
Unrelated but the first letter of your last name is unicode codepoint
3A4 which is the Greek upper case tau. Was this intentional or an
accident ?
On 19.04.2023 10:44, Janis Papanagnou wrote:
On 19.04.2023 09:27, Martin Τrautmann wrote:You probably meant something like
sort -t\; -k3,3 -k4,4n -k2,2
Martin Τραωτμανν <t-usenet@gmx.net>:
how do I sort by multiple columns?
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
[…with sorted result]
The sorted result of your example has apparently been sorted
according to the following description:
First, group the lines sorted by column 3, that is, sort the
lines in a manner that results in alphabetically ascending values
in column 3.
You might read the description of the "sort" utility in the POSIX
standard
(<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#top>),
especially the last paragraph in the "OPTIONS" section
(<https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html#tag_20_119_04>):
"When there are multiple key fields, later keys shall be compared
only after all earlier keys compare equal. Except when the -u
option is specified, lines that otherwise compare equal shall be
ordered as if none of the options -d, -f, -i, -n, or -k were
present (but with -r still in effect, if it was specified) and
with all bytes in the lines significant to the comparison. The
order in which lines that still compare equal are written is
unspecified."
On Sat, 22 Apr 2023 03:33:43 +0200, Helmut Waitzmann wrote:
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
[…with sorted result]
The sorted result of your example has apparently been sorted
according to the following description:
First, group the lines sorted by column 3, that is, sort the
lines in a manner that results in alphabetically ascending
values in column 3.
That's a matter of concern how the sort works.
If I want to pre-sort by 3 first, then sub-sort by column 2,
that's fine. But when I pipe one sort to the other, the second
sort will destroy the sort before. That's why i had my sort
order in reverted order, using a pipe example.
This description is much better than my man and info sort
- but unfortunately I can't be sure that the POSIX info actually
does work on my local sort implementation: sort 5.93 November
2005
On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
If I want to pre-sort by 3 first, then sub-sort by column 2,
that's fine. But when I pipe one sort to the other, the second
sort will destroy the sort before. That's why i had my sort
order in reverted order, using a pipe example.
That won't help, either: A sorting pipe using (a standard)
"sort" won't solve the problem, because one cannot tell (a
standard) "sort" to do a sort on the given key option only. Each
sort in the pipe will be total (according to its sort criteria)
of its own.
That was my problem - I expected that a pipe through several sorts would
keep the order. I don't know why it doesn't.
If I want to pre-sort by 3 first, then sub-sort by column 2,
that's fine. But when I pipe one sort to the other, the second
sort will destroy the sort before. That's why i had my sort
order in reverted order, using a pipe example.
That won't help, either: A sorting pipe using (a standard)
"sort" won't solve the problem, because one cannot tell (a
standard) "sort" to do a sort on the given key option only. Each
sort in the pipe will be total (according to its sort criteria)
of its own.
That was my problem - I expected that a pipe through several sorts would
keep the order. I don't know why it doesn't.
Keep in mind. When sorting a file, the last line in the input may end up >becoming the first line in the output. The sort can not write anything to
the pipe or output file until it's sorted the entire input. With a pipe,
the temporary file is in ram rather then being a named file on disk.
[...]
For most programs, this is rarely a concern, since most pipelines write and read more or less simultaneously in real time, but sort is an edge case for the reason you explain above.
Something to keep in mind if you ever decide to sort very large files in a pipeline. [...]
The bad case would be if a program produced a ton of output, but the reader didn't read any of it. I'll have to think some more as to whether or not that applies here.
On 23.04.2023 16:36, Kenny McCormack wrote:
[...]
For most programs, this is rarely a concern, since most pipelines write and >> read more or less simultaneously in real time, but sort is an edge case for >> the reason you explain above.
Note also that there are quite some sorting operations inherently
used (e.g. in 'ls', in shells '*' glob/pattern expansion, etc.).
For example, don't expect find | xargs ls to provide a sorted
output.
Something to keep in mind if you ever decide to sort very large files in a >> pipeline. [...]
In whatever way some instance of sort is implemented (memory, or
temporary files, or whatever), my expectation is that
whatever | sort
will have to produce sorted output .- Isn't that guaranteed?
I don't see the problem. If sort is on the left of a pipe then it will
sort its whole input and then all it will do is write to the pipe. If sort is on the right of a pipe then in the beginning it will only do reading
until it has read everything and then do the sorting. Obviously if you
have process1 | process2 and one side does reading or writing (whatever applies) much slower than the other side then the fast side will block
but there's nothing special with sort about that.
In article <op.13uwd4i8a3w0dxdave@hodgins.homeip.net>,
David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
...
Keep in mind. When sorting a file, the last line in the input may end up >becoming the first line in the output. The sort can not write anything to >the pipe or output file until it's sorted the entire input. With a pipe, >the temporary file is in ram rather then being a named file on disk.
This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).
For most programs, this is rarely a concern, since most pipelines write and read more or less simultaneously in real time, but sort is an edge case for the reason you explain above.
Something to keep in mind if you ever decide to sort very large files in a pipeline. And it is probably a better idea not to do so; to sort it all at once, using multiple key specifications on the command line.
This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).
[...] If sort
is on the right of a pipe then in the beginning it will only do reading
until it has read everything and then do the sorting. [...]
On 23.04.2023 17:51, Spiros Bousbouras wrote:
[...] If sort
is on the right of a pipe then in the beginning it will only do reading
until it has read everything and then do the sorting. [...]
This is [in principle] not necessarily the case. The sort algorithm
can start to sort subsets of the stream to create runs of already
sorted sequences. Mergesort, for example, is a good candidate for
such a process; it can use (e.g.) Heapsort to create larger runs in
memory and then needs less merge-runs (which are typically costly
if that's done over files). How much data the Heapsort will process
may vary, but a size of magnitude of the pipe-buffer is reasonable.
Disclaimer: I don't know how Unix'es 'sort' is typically implemented,
but I expect some sophisticated implementation, since what I wrote
above is decades old knowledge (at least since the 1980's - when I implemented some hybrid sorting algorithms -, or maybe even back to
Donald Knuth's work; but I don't recall whether it's covered in his "Searching and Sorting" book).
My man page says:
--radixsort
Try to use radix sort, if the sort specifications allow.
The radix sort can only be used for trivial locales (C and
POSIX), and it cannot be used for numeric or month sort.
Radix sort is very fast and stable.
--mergesort
Use mergesort. This is a universal algorithm that can
always be used, but it is not always the fastest.
--qsort
Try to use quick sort, if the sort specifications allow.
This sort algorithm cannot be used with -u and -s.
--heapsort
Try to use heap sort, if the sort specifications allow.
This sort algorithm cannot be used with -u and -s.
On Sun, 23 Apr 2023 03:33:47 +0200, Helmut Waitzmann wrote:
If I want to pre-sort by 3 first, then sub-sort by column 2,
that's fine. But when I pipe one sort to the other, the second
sort will destroy the sort before. That's why i had my sort
order in reverted order, using a pipe example.
That won't help, either: A sorting pipe using (a standard)
"sort" won't solve the problem, because one cannot tell (a
standard) "sort" to do a sort on the given key option only.
Each sort in the pipe will be total (according to its sort
criteria) of its own.
That was my problem - I expected that a pipe through several
sorts would keep the order. I don't know why it doesn't.
So pipes on Linux aren't very large at all. I don't know how other Unix systems compare.
This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end without anything being read from the read end. Though the limits are
likely very large nowadays on modern systems, I think the original implementation was only 4096 bytes and the standards today (POSIX) may not guarantee anything more than that (haven't checked).
On 4/23/23 10:36, Kenny McCormack wrote:
This actually raises an interesting point. Pipes are not infinite in size, >> and they could, theoretically block if enough is written on the write end
without anything being read from the read end. Though the limits are
likely very large nowadays on modern systems, I think the original
implementation was only 4096 bytes and the standards today (POSIX) may not >> guarantee anything more than that (haven't checked).
FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
section that says in part:
Before Linux 2.6.11, the capacity of a pipe was the same as the
system page size (e.g., 4096 bytes on i386). Since Linux
2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
system with a page size of 4096 bytes). Since Linux 2.6.35,
the default pipe capacity is 16 pages, but the capacity can be
queried and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
PIPE_SZ operations. See fcntl(2) for more information.
So pipes on Linux aren't very large at all.
I don't know how other Unix systems compare.
Look at these sample lines:
1;0
1;1
1;2
0;0
0;1
0;2
2;0
2;1
2;2
To have this sequence of lines sorted in such a way that the
first field is sorted in ascending numeric order while the
second is sorted in descending numeric order,
one could specify the two sort criteria at once:
sort -t ';' -k 1nb,1 -k 2nr,2
On 4/23/23 10:36, Kenny McCormack wrote:
This actually raises an interesting point. Pipes are not infinite in
size,
and they could, theoretically block if enough is written on the write
end without anything being read from the read end. Though the limits
are likely very large nowadays on modern systems, I think the original
implementation was only 4096 bytes and the standards today (POSIX) may
not guarantee anything more than that (haven't checked).
FWIW, the pipe(7) manpage from Debian GNU/Linux has a "Pipe capacity"
section that says in part:
Before Linux 2.6.11, the capacity of a pipe was the same as the
system page size (e.g., 4096 bytes on i386). Since Linux
2.6.11, the pipe capacity is 16 pages (i.e., 65,536 bytes in a
system with a page size of 4096 bytes). Since Linux 2.6.35, the
default pipe capacity is 16 pages, but the capacity can be queried
and set using the fcntl(2) F_GETPIPE_SZ and F_SET‐
PIPE_SZ operations. See fcntl(2) for more information.
So pipes on Linux aren't very large at all. I don't know how other Unix systems compare.
Could the actual pipe size perhaps be queried
and set with "ulimit"?
$ ulimit -a
[...]
pipe size (512 bytes, -p) 8
[...]
With: GNU bash, version 5.1.16
("help ulimit" for docs on the shell built-in...)
On Sun, 23 Apr 2023 15:42:00 -0400, John-Paul Stewart <jpstewart@personalprojects.net> wrote:
So pipes on Linux aren't very large at all. I don't know how other Unix
systems compare.
The pipe only has to store a minimum of one buffer of data. If the process
On 24.04.2023 16:05, vallor wrote:
Could the actual pipe size perhaps be queried
and set with "ulimit"?
$ ulimit -a
[...]
pipe size (512 bytes, -p) 8
[...]
With: GNU bash, version 5.1.16
("help ulimit" for docs on the shell built-in...)
It's quite funny that every shell has its own formats; in bash you
have to do the math (8x512) while in ksh it's 4096.
On my Linux system, much more than 4096 bytes can be written to
a pipe without anything being read from it:
$ dd if=/dev/zero | sleep 10
^C129+0 records in
128+0 records out
65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s
(I used Ctrl-C to send dd a SIGINT.)
In article <6tukhj-bl1.ln1@ID-313840.user.individual.net>,
Geoff Clare <netnews@gclare.org.uk> wrote:
...
On my Linux system, much more than 4096 bytes can be written to
a pipe without anything being read from it:
$ dd if=/dev/zero | sleep 10
^C129+0 records in
128+0 records out
65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s
(I used Ctrl-C to send dd a SIGINT.)
Didn't somebody say upthread that the default limit on Linux is 64K?
So, kinda funny that you chose exactly 64K for your demonstration.
Anyway, you can (according to those same people) bump it up to 1M. if
needed.
On Tue, 25 Apr 2023 09:29:48 -0400, Kenny McCormack <gazelle@shell.xmission.com> wrote:
In article <6tukhj-bl1.ln1@ID-313840.user.individual.net>,
Geoff Clare <netnews@gclare.org.uk> wrote:
...
On my Linux system, much more than 4096 bytes can be written to
a pipe without anything being read from it:
$ dd if=/dev/zero | sleep 10
^C129+0 records in
128+0 records out
65536 bytes (66 kB, 64 KiB) copied, 2.04325 s, 32.1 kB/s
(I used Ctrl-C to send dd a SIGINT.)
Didn't somebody say upthread that the default limit on Linux is 64K?
So, kinda funny that you chose exactly 64K for your demonstration.
Anyway, you can (according to those same people) bump it up to 1M. if
needed.
It stopped after filling the output buffer, not the pipe. That data was still waiting to be written to the pipe when the dd command was terminated.
On 24.04.2023 16:05, vallor wrote:*SKIP*
Could the actual pipe size perhaps be queried and set with "ulimit"?
And zsh's ulimit "doesn't know" pipe size?
On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
That was my problem - I expected that a pipe through several sorts would
keep the order. I don't know why it doesn't.
It may be easier to understand if you use a temporary files instead of pipes.
Sorting the input file by column 4, numerical creating a first temporary file.
Sort the first temporary file by column 2 creating a second temporary file. Sort the second temporary file by column 3 creating the output.
The last sort doesn't know that the prior two sorts have been done. It just looks at the file it's giving and sorts it by column 3.
Using a pipe just takes the output of the first and second sort and uses it directly as input for the next sort. All the pipe does is eliminate the
need for a temporary file.
Keep in mind. When sorting a file, the last line in the input may end up becoming
the first line in the output. The sort can not write anything to the pipe or output file until it's sorted the entire input. With a pipe, the temporary file is in ram rather then being a named file on disk.
Helmut Waitzmann <nn.throttle@xoxy.net>:
Look at these sample lines:
1;0
1;1
1;2
0;0
0;1
0;2
2;0
2;1
2;2
To have this sequence of lines sorted in such a way that the
first field is sorted in ascending numeric order while the
second is sorted in descending numeric order,
I'm sorry, that is a quite misleading description. What I wanted
to say is that the sequence of lines should be sorted to look
like
0;2
0;1
0;0
1;2
1;1
1;0
2;2
2;1
2;0
and to achieve this…
one could specify the two sort criteria at once:
sort -t ';' -k 1nb,1 -k 2nr,2
On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
That was my problem - I expected that a pipe through several sorts would >>> keep the order. I don't know why it doesn't.
It may be easier to understand if you use a temporary files instead of pipes.
Sorting the input file by column 4, numerical creating a first temporary file.
Sort the first temporary file by column 2 creating a second temporary file. >> Sort the second temporary file by column 3 creating the output.
The last sort doesn't know that the prior two sorts have been done. It just >> looks at the file it's giving and sorts it by column 3.
Using a pipe just takes the output of the first and second sort and uses it >> directly as input for the next sort. All the pipe does is eliminate the
need for a temporary file.
But if I sort by one column only, then through the pipe by another
column only, the second sort SHOULD respect the previous sort.
Unfortunately, I feel it doesn't.
Keep in mind. When sorting a file, the last line in the input may end up becoming
the first line in the output. The sort can not write anything to the pipe or >> output file until it's sorted the entire input. With a pipe, the temporary >> file is in ram rather then being a named file on disk.
So the sort via a file actually should work the same as via the pipe?
On Sun, 23 Apr 2023 22:30:24 +0200, Helmut Waitzmann wrote:
Helmut Waitzmann <nn.throttle@xoxy.net>:
Look at these sample lines:
1;0
1;1
1;2
0;0
0;1
0;2
2;0
2;1
2;2
To have this sequence of lines sorted in such a way that the
first field is sorted in ascending numeric order while the
second is sorted in descending numeric order,
I'm sorry, that is a quite misleading description. What I wanted
to say is that the sequence of lines should be sorted to look
like
0;2
0;1
0;0
1;2
1;1
1;0
2;2
2;1
2;0
and to achieve this…
one could specify the two sort criteria at once:
sort -t ';' -k 1nb,1 -k 2nr,2
Would you achieve this via a pipe as well?
On 29/04/2023 11:01, Martin Τrautmann wrote:
On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
That was my problem - I expected that a pipe through several sorts would >>>> keep the order. I don't know why it doesn't.
It may be easier to understand if you use a temporary files instead of pipes.
Sorting the input file by column 4, numerical creating a first temporary file.
Sort the first temporary file by column 2 creating a second temporary file. >>> Sort the second temporary file by column 3 creating the output.
The last sort doesn't know that the prior two sorts have been done. It just >>> looks at the file it's giving and sorts it by column 3.
Using a pipe just takes the output of the first and second sort and uses it >>> directly as input for the next sort. All the pipe does is eliminate the
need for a temporary file.
But if I sort by one column only, then through the pipe by another
column only, the second sort SHOULD respect the previous sort.
Unfortunately, I feel it doesn't.
Of course it doesn't. How does the second sort know that the first sort
even happened?
On Sat, 29 Apr 2023 12:01:14 +0100, Chris Elvidge wrote:
On 29/04/2023 11:01, Martin Τrautmann wrote:
On Sun, 23 Apr 2023 09:43:06 -0400, David W. Hodgins wrote:
On Sun, 23 Apr 2023 07:28:22 -0400, Martin Τrautmann <t-usenet@gmx.net> wrote:
That was my problem - I expected that a pipe through several sorts would >>>>> keep the order. I don't know why it doesn't.
It may be easier to understand if you use a temporary files instead of pipes.
Sorting the input file by column 4, numerical creating a first temporary file.
Sort the first temporary file by column 2 creating a second temporary file.
Sort the second temporary file by column 3 creating the output.
The last sort doesn't know that the prior two sorts have been done. It just
looks at the file it's giving and sorts it by column 3.
Using a pipe just takes the output of the first and second sort and uses it
directly as input for the next sort. All the pipe does is eliminate the >>>> need for a temporary file.
But if I sort by one column only, then through the pipe by another
column only, the second sort SHOULD respect the previous sort.
Unfortunately, I feel it doesn't.
Of course it doesn't. How does the second sort know that the first sort
even happened?
It should sort on the given column only, but keep anything else as it
was. I guess that's my misconception - however, sort seems to be allowed
to resort anything else however it likes. That's the difference e.g. to
an excel spreadsheet, which does keep the former sort.
It should sort on the given column only, but keep anything else as it
was. I guess that's my misconception - however, sort seems to be allowed
to resort anything else however it likes. That's the difference e.g. to
an excel spreadsheet, which does keep the former sort.
You want a stable sort, then. Check if you have a '-s' option.
On Sat, 29 Apr 2023 13:38:58 +0100, Richard Harnden wrote:
It should sort on the given column only, but keep anything else as it
was. I guess that's my misconception - however, sort seems to be allowed >>> to resort anything else however it likes. That's the difference e.g. to
an excel spreadsheet, which does keep the former sort.
You want a stable sort, then. Check if you have a '-s' option.
wow, cool
-s, --stable
stabilize sort by disabling last-resort comparison
I do not understand what that means. But it worked
On Sat, 29 Apr 2023 13:38:58 +0100, Richard Harnden wrote:
It should sort on the given column only, but keep anything else as it
was. I guess that's my misconception - however, sort seems to be allowed >>> to resort anything else however it likes. That's the difference e.g. to
an excel spreadsheet, which does keep the former sort.
You want a stable sort, then. Check if you have a '-s' option.
wow, cool
-s, --stable
stabilize sort by disabling last-resort comparison
I do not understand what that means. But it worked
From the option summary, the meaning is less than obvious. However
some versions of the manpage include an explanation:
"A pair of lines is compared as follows: if any key fields have
been specified, 'sort' compares each pair of fields, in the
order specified on the command line, according to the associated
ordering options, until a difference is found or no fields are
left.
...
Finally, as a last resort when all keys compare equal (or if no
ordering options were specified at all), 'sort' compares the
entire lines. ...
In the case of a file that has already been sorted, either on a
key occurring before the key-to-be-sorted, or on a key that follows
(but is not adjacent to) the key-to-be sorted, this "last resort
comparison" may result in a record that sorts out-of-sequence
with respect to the prior sort order. To ensure that the order
from a prior sort is not lost, you have to disable this "last
resort comparison".
So the sort via a file actually should work the same as via the
pipe?
On Sat, 29 Apr 2023 12:05:17 +0200, Martin Τrautmann wrote:
On Sun, 23 Apr 2023 22:30:24 +0200, Helmut Waitzmann wrote:
Helmut Waitzmann <nn.throttle@xoxy.net>:
Look at these sample lines:
1;0
1;1
1;2
0;0
0;1
0;2
2;0
2;1
2;2
To have this sequence of lines sorted in such a way that the
first field is sorted in ascending numeric order while the
second is sorted in descending numeric order,
I'm sorry, that is a quite misleading description. What I wanted
to say is that the sequence of lines should be sorted to look
like
0;2
0;1
0;0
1;2
1;1
1;0
2;2
2;1
2;0
and to achieve this…
one could specify the two sort criteria at once:
sort -t ';' -k 1nb,1 -k 2nr,2
Would you achieve this via a pipe as well?
When I sort by column 2 first and only, I end up with
0;2
1;2
2;2
0;1
1;1
2;1
0;0
1;0
2;0
Why that? I would expect
1;2
0;2
2;2
1;1
0;1
2;1
1;0
0;0
2;0
So why does it resort by first column as well?
Since it does that, both a pipe and a second sort from a
temporary file still fail, since they also ignore the temporary
sort of the other column.
Yes, "sort" without the GNU "sort" "--stable" option will always do a
total ordering, ignoring and destroying any order that has been done to
its input before. That's what we've been discussing the whole thread and that's what makes the GNU "sort" "--stable" option a nice thing to
have.
On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
So why does it resort by first column as well?
Because that is the way "sort" is supposed to work.
How should I know that this is supposed that way? If I tell "sort" to
sort by a certain column only, why would I have to expect that it will
sort by something else as well?
On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
So why does it resort by first column as well?
Because that is the way "sort" is supposed to work.
How should I know that this is supposed that way? If I tell "sort" to
sort by a certain column only, why would I have to expect that it will
sort by something else as well?
So why does it resort by first column as well?
Because that is the way "sort" is supposed to work.
Since it does that, both a pipe and a second sort from a
temporary file still fail, since they also ignore the temporary
sort of the other column.
Yes, "sort" without the GNU "sort" "--stable" option will always
do a total ordering, ignoring and destroying any order that has
been done to its input before. That's what we've been discussing
the whole thread and that's what makes the GNU "sort" "--stable"
option a nice thing to have.
Helmut Waitzmann <nn.throttle@xoxy.net> writes:
Yes, "sort" without the GNU "sort" "--stable" option will
always do a total ordering, ignoring and destroying any order
that has been done to its input before. That's what we've been
discussing the whole thread and that's what makes the GNU
"sort" "--stable" option a nice thing to have.
There's an old trick that was common back in the day of adding a
line number (or similar) and then removing it. You could then
either explicitly sort on that number or make sure that the
number has leading zeros so the default sort restores the
original order:
nl -n rz data | sort -t ';' -k 2nr,2 | cut -f2-
On Mon, 01 May 2023 20:27:57 +0200, Martin Τrautmann wrote:
On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
So why does it resort by first column as well?
Because that is the way "sort" is supposed to work.
How should I know that this is supposed that way? If I tell "sort" to
sort by a certain column only, why would I have to expect that it will
sort by something else as well?
As Helmut said, "because that is the way 'sort' is supposed to work".
The Open Group defines the interface and results for each of the common 'Unix' utilities, "sort[1]" included, and their definition of sort says
that
"When there are multiple key fields, later keys shall be compared
only after all earlier keys compare equal. ... [L]ines that otherwise
compare equal shall be ordered as if none of the options -d, -f, -i,
-n, or -k were present ... and with all bytes in the lines
significant to the comparison."
On Mon, 1 May 2023 18:57:13 -0000 (UTC), Lew Pitcher wrote:
On Mon, 01 May 2023 20:27:57 +0200, Martin Τrautmann wrote:
On Mon, 01 May 2023 13:19:24 +0200, Helmut Waitzmann wrote:
So why does it resort by first column as well?
Because that is the way "sort" is supposed to work.
How should I know that this is supposed that way? If I tell "sort" to
sort by a certain column only, why would I have to expect that it will
sort by something else as well?
As Helmut said, "because that is the way 'sort' is supposed to work".
The Open Group defines the interface and results for each of the common
'Unix' utilities, "sort[1]" included, and their definition of sort says
that
"When there are multiple key fields, later keys shall be compared
only after all earlier keys compare equal. ... [L]ines that otherwise
compare equal shall be ordered as if none of the options -d, -f, -i,
-n, or -k were present ... and with all bytes in the lines
significant to the comparison."
So where is that information available on my computer? Sorry, but I
really did not think about using a geneology search first to find out
how someone thought something should behave. No, it was not obvious to
me. When -k tells me about first and last key to sort by, I just did not expect a bonus sort.
You *should* be able to get this information with `man sort`. If you
have the GNU coreutils implementation of sort, the man page doesn't
mention re-sorting by the whole line (which is IMHO unfortunate), but at
the bottom of the man page there is a reference to the full
documentation:
Full documentation <https://www.gnu.org/software/coreutils/sort>
or available locally via: info '(coreutils) sort invocation'
If you have an implemntation other than GNU coreutils, `man sort` is
likely to describe it in more detail. `sort --help` is also a good
thing to try.
It's also good to know about the POSIX standard:
<https://pubs.opengroup.org/onlinepubs/9699919799/toc.htm>
This is the standard for the behavior of Unix tools, but not all implementations follow it completely, and most provide extra
functionality.
On Mon, 01 May 2023 17:49:24 -0700, Keith Thompson wrote:
You *should* be able to get this information with `man sort`. If you
have the GNU coreutils implementation of sort, the man page doesn't
mention re-sorting by the whole line (which is IMHO unfortunate), but at the bottom of the man page there is a reference to the full
documentation:
Full documentation <https://www.gnu.org/software/coreutils/sort>
or available locally via: info '(coreutils) sort invocation'
If you have an implemntation other than GNU coreutils, `man sort` is
likely to describe it in more detail. `sort --help` is also a good
thing to try.
No, mine says
SEE ALSO
The full documentation for sort is maintained as a Texinfo
manual. If the info and sort programs are properly installed at your
site, the command
info sort
should give you access to the complete manual.
sort 5.93 November 2005
SORT(1)
And info sort does not provide more details here.
On Mon, 01 May 2023 17:49:24 -0700, Keith Thompson wrote:
You *should* be able to get this information with `man sort`. If you
have the GNU coreutils implementation of sort, the man page doesn't
mention re-sorting by the whole line (which is IMHO unfortunate), but at
the bottom of the man page there is a reference to the full
documentation:
Full documentation <https://www.gnu.org/software/coreutils/sort>
or available locally via: info '(coreutils) sort invocation'
If you have an implemntation other than GNU coreutils, `man sort` is
likely to describe it in more detail. `sort --help` is also a good
thing to try.
No, mine says
SEE ALSO
The full documentation for sort is maintained as a Texinfo
manual. If the info and sort programs are properly installed at your
site, the command
info sort
should give you access to the complete manual.
sort 5.93 November 2005
SORT(1)
And info sort does not provide more details here.
And info sort does not provide more details here.
Yours is quite old. If you don't have the "info" documentation
installed, "info sort" falls back to showing you the man page.
Hi all,
how do I sort by multiple columns?
Example:
+++
Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
+++
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
So the result should be
+++
Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
+++
I tried both
sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
and
sort -k4 -t";" -n -k2,2 -k3,3
and some permutations and reverted orders, without success.
The sort by column 4 just gets lost or resorted.
I'm not sure about the man page
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
So I tried relative positions with
-k3,1
as well, without success.
How do I apply the sort syntax properly?
Thanks
Martin
mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3
On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3
miller looks very powerful to me, but unfortunately it's not available here.
mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3
In article <slrnu59tbj.2sg.t-usenet@ID-685.user.individual.de>,
Martin rautmann <traut@gmx.de> wrote:
On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:
mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3
miller looks very powerful to me, but unfortunately it's not available here.
Some sort of import/export restriction in your country?
On Fri, 5 May 2023 12:26:56 -0000 (UTC), Kenny McCormack wrote:
In article <slrnu59tbj.2sg.t-usenet@ID-685.user.individual.de>,
Martin rautmann <traut@gmx.de> wrote:
On Fri, 5 May 2023 10:35:01 +0200, Dr Eberhard W Lisse wrote:Some sort of import/export restriction in your country?
mlr --fs 'semicolon' --ocsv --hi --ho --from t.ssv sort -n 4 -f 2,3
miller looks very powerful to me, but unfortunately it's not available here. >>
Error: Port miller requires a full Xcode installation, which was not
found on your system.
...and I've not enough space for that, 256 GB SSD only.
Hi all,
how do I sort by multiple columns?
Example:
+++
Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372
+++
I want to sort
* first by column 4, numerical,
* second by column 2
* third by column 3
So the result should be
+++
Borgentreich;D9444;Auf der Lindenstätte;1;;32;519950.850;5708982.109 Borgentreich;D9444;Auf der Lindenstätte;2;;32;519926.937;5708966.116 Borgentreich;D9444;Auf der Lindenstätte;3;;32;520008.619;5709083.464 Borgentreich;D9444;Auf der Lindenstätte;4;;32;519860.278;5709041.468 Borgentreich;D9386;Lindenstätte;1;;32;520150.696;5709236.354 Borgentreich;T2960;Lindenstätte;1;;32;519778.725;5709026.584 Borgentreich;T2960;Lindenstätte;2;;32;519758.267;5709057.635 Borgentreich;T2960;Lindenstätte;3;;32;519732.686;5709020.833 Borgentreich;T2960;Lindenstätte;4;;32;519722.956;5709043.915 Borgentreich;T2960;Lindenstätte;5;;32;519708.841;5709015.137 Borgentreich;T2960;Lindenstätte;6;;32;519696.745;5709038.833 Borgentreich;T2960;Lindenstätte;7;;32;519678.983;5709007.380 Borgentreich;T2960;Lindenstätte;8;;32;519673.036;5709040.372 Borgentreich;T2960;Lindenstätte;9;;32;519651.859;5709000.462 Borgentreich;T2960;Lindenstätte;10;;32;519648.417;5709028.865 Borgentreich;T2960;Lindenstätte;11;;32;519607.438;5708989.545 Borgentreich;T2960;Lindenstätte;12;;32;519622.835;5709023.590 Borgentreich;T2960;Lindenstätte;14;;32;519596.623;5709013.684 Borgentreich;T2960;Lindenstätte;15;;32;519489.638;5709077.693 Borgentreich;T2960;Lindenstätte;16;;32;519569.141;5709017.854 Borgentreich;T2960;Lindenstätte;18;;32;519559.108;5709037.356 Borgentreich;T2960;Lindenstätte;22;;32;519540.257;5709072.032 Borgentreich;T2960;Lindenstätte;24;;32;519518.763;5709090.026 Borgentreich;T2960;Lindenstätte;26;;32;519503.270;5709103.321
+++
I tried both
sort -k4 -t";" -n | sort -k2,2 -t";" | sort -k3,3 -t";"
and
sort -k4 -t";" -n -k2,2 -k3,3
and some permutations and reverted orders, without success.
The sort by column 4 just gets lost or resorted.
I'm not sure about the man page
-k, --key=POS1[,POS2]
start a key at POS1, end it at POS2 (origin 1)
So I tried relative positions with
-k3,1
as well, without success.
How do I apply the sort syntax properly?
Thanks
Martin
On 4/19/23 03:27, Martin Τrautmann wrote:
Hi all,
how do I sort by multiple columns?
awk
On Sat, 6 May 2023 02:03:24 -0400, Popping Mad wrote:
On 4/19/23 03:27, Martin Τrautmann wrote:
Hi all,
how do I sort by multiple columns?
awk
Nope. "awk" alone does not to the job.
On Sat, 6 May 2023 02:03:24 -0400, Popping Mad wrote:
On 4/19/23 03:27, Martin rautmann wrote:
Hi all,
how do I sort by multiple columns?
awk
Nope. "awk" alone does not to the job.
Homebrew is a thing on MacOS. A thing that seems to include
miller v6.7.0.
<https://formulae.brew.sh/formula/miller#default>
(Homebrew only needs the Xcode runtime, not the full install)
Error: Port miller requires a full Xcode installation, which was not
found on your system.
...and I've not enough space for that, 256 GB SSD only.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 498 |
Nodes: | 16 (2 / 14) |
Uptime: | 52:59:19 |
Calls: | 9,810 |
Calls today: | 12 |
Files: | 13,754 |
Messages: | 6,190,510 |