Forum: >>> Magnum BBS <<<

Re: Why to sort an array of numbers

From none) (albert@21:1/5 to siarczek83@gmail.com on Thu Aug 18 13:07:52 2022

In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
Micha� Kasprzak <siarczek83@gmail.com> wrote:
<SNIP>

Mouse teaches us that we can append additional information to a numeric key to create a new larger number.
We do this by shifting the key as many bits to the left as the tied information has. In the obtained place we insert the
related information.
This additional information could be a pointer to related data. This pointer may be an index instead of an address.

Note that the locality property of quicksort goes down the drain with pointers. This is why my QSORT accept execution tokes for comparison and exchange words.

The above solution has become more possible in today's 64-bit era. 32 bits used to have fewer opportunities to pack
essential information into our numerical mini-records.
And this idea has a great future ahead of it. Have you heard about AVX-512 and 512 bit ZMM0-31 registers?

A more generic QSORT (see above) has no problem with 2 CELL records in a
32 bits Forth. Of course packing using a 64 bits Forth may be slightly faster.

Groetjes Albert.
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Micha=C5=82_Kasprzak?=@21:1/5 to All on Thu Aug 18 03:39:16 2022

I have to start a new thread because important things are happening that you don't notice. After understanding the importance of this topic, an avalanche of comments may be flooded, obscuring the thread with the ugly Heapsort algorithm.

The following dates are important because they show that no one but Zombie saw the big idea and even Zombie needed a week.

On August 7, 2022 Waldek Hebisch wrote: "Note 1: I pack one number and sum of cubes into a word. From then one can reconstruct the second number."

On August 14, 2022 Anton Ertl commented on the above as: "Clever!"

Professor Waldek Hebisch is like a clever little gray mouse trying to stay invisible that goes out at night to eat something, so I suggest a nickname for Waldek: Mouse ;)

We are happy that thanks to Mouse we have the heapsort word which implements the Heap Sort Algorithm called Ugly which sorts an array of numbers.
We tested the heapsort word (even Marcel did!) sorting out the example. The example was an array of numbers. An array of numbers is a very simple data structure.
And now the question arises: what else can we sort with the heapsort word?
Due to the difficulty of answering, this question quickly turns into another question: why to sort an array of numbers at all?

It is not a question of why to sort at all. It is very easy to give many examples of when sorting is useful. Probably all the examples that come to your mind are record sorting, in which the sorted numbers appear as the record keys.
That is, there are keys associated with additional information that must be rearranged simultaneously during sorting.
If the data structures take up a lot of memory then it is better to leave them untouched and bind numeric keys and pointers to their addresses in memory.
But such a data structure is no longer an array of numbers for which our heapsort word could apply.

So do you know any reason why you should sort an array of numbers with no information tied to it?
Maybe ideas like SQL Aggregate Functions come to your mind: Max, Min, Sum, Avg, Count. But you implement these functions faster without sorting. After further reflection, you may come to the conclusion that if you take the numbers away from the
information related to them, their order will probably not matter to you either.
I recall that in some of the IT Olympics tasks there was a need to sort an array of numbers, but I can't recall the details.
Are these really the only training reasons for sorting an array of numbers, such as testing the written heapsort word or benchmarks?
We had a taxi which our clever Mouse got into.

We conclude that sorting numbers makes sense almost only when there is data associated with the numbers.
Does that mean our heapsort word doesn't make sense because it won't have anything to sort? It would be a pity because this word sorts very quickly, because the data structure is simple and fits well with the CPU. When the data structures are more
complicated, the sort speed drops and Pony knows something about it because he wrote his MERGE-SORT ;)

Mouse teaches us that we can append additional information to a numeric key to create a new larger number.
We do this by shifting the key as many bits to the left as the tied information has. In the obtained place we insert the related information.
This additional information could be a pointer to related data. This pointer may be an index instead of an address.
We can now use our very quick heapsort word.

The above solution has become more possible in today's 64-bit era. 32 bits used to have fewer opportunities to pack essential information into our numerical mini-records.
And this idea has a great future ahead of it. Have you heard about AVX-512 and 512 bit ZMM0-31 registers?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to siarczek83@gmail.com on Thu Aug 18 11:00:05 2022

Michał Kasprzak <siarczek83@gmail.com> writes:

Due to the difficulty of answering, this question quickly turns into
another question: why to sort an array of numbers at all?

One obvious reason is to check the array for duplicates. The C++ code
that I posted for the taxicab problem did that. Another might be to
find the median. Sorting is a convenient though not optimal way to do
that. Or, you might want to know what the largest or smallest N numbers
are. Again, sorting is a straightforward method.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to Paul Rubin on Thu Aug 18 12:36:59 2022

On Thursday, August 18, 2022 at 8:00:26 PM UTC+2, Paul Rubin wrote:

Michał Kasprzak <siarc...@gmail.com> writes:

Due to the difficulty of answering, this question quickly turns into another question: why to sort an array of numbers at all?

One obvious reason is to check the array for duplicates. The C++ code
that I posted for the taxicab problem did that. Another might be to
find the median. Sorting is a convenient though not optimal way to do
that. Or, you might want to know what the largest or smallest N numbers
are. Again, sorting is a straightforward method.

Finding duplicates and largest/smaller need only a single pass through
all elements. The median can be done in O(n) in a few lines?

I found 1863 mentions of SORT in iForth, but their only use seems
to be benchmarks. What may occur more frequently is inserting
elements in an ordered list, but I don't know how to search for that :-)

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Marcel Hendrix on Thu Aug 18 21:10:34 2022

Marcel Hendrix <mhx@iae.nl> writes:

Finding duplicates and largest/smaller need only a single pass through
all elements.

Finding duplicates in a single pass needs some kind of lookup table,
right? So there is more storage overhead than just sorting in place.

Similarly, the "right" way to find the N largest (say N=100) is to use a
heap as a priority queue and pull off N elements, but calling a general
purpose sorting routine is simpler.

The median can be done in O(n) in a few lines?

I wasn't aware of that and I thought it was more complicated. I know of
a guaranteed O(n) method that is fairly intricate, and an O(n)
average-case method that amounts to doing quicksort but figuring out
after each pivot which partition contains the median, and only recursing
on that partition instead of on both. Again though, sorting takes 1
line instead of "a few".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Fri Aug 19 11:11:07 2022

Paul Rubin <no.email@nospam.invalid> writes:

Marcel Hendrix <mhx@iae.nl> writes:

Finding duplicates and largest/smaller need only a single pass through
all elements.

Finding duplicates in a single pass needs some kind of lookup table,
right? So there is more storage overhead than just sorting in place.

Not necessarily. In the lookup table, you need only memory for the
unique elements, while with sorting, you need memory for all elements.

Similarly, the "right" way to find the N largest (say N=100) is to use a
heap as a priority queue and pull off N elements,

That's slow, because creating a heap for all the elements is slow (bad locality). I would use a variant of quicksort that only recurses into
the partitions that include the requested ranks. Another alternative
would be to use mergesort and stop merging after N elements in each
step.

The median can be done in O(n) in a few lines?

I wasn't aware of that and I thought it was more complicated. I know of
a guaranteed O(n) method that is fairly intricate, and an O(n)
average-case method that amounts to doing quicksort but figuring out
after each pivot which partition contains the median and only recursing
on that partition instead of on both.

It's trivial to "figure out" which partition contains a rank: After partitioning the elements equal to the pivot are at their final ranks,
so you just need to compare these ranks (indexes) with the desired rank.

Again though, sorting takes 1
line instead of "a few".

Only if you have a sort library call and no
search-for-element-with-rank library call.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to siarczek83@gmail.com on Tue Aug 23 12:30:50 2022

In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
Micha� Kasprzak <siarczek83@gmail.com> wrote:
<SNIP>

This additional information could be a pointer to related data.
This pointer may be an index instead of an address.
We can now use our very quick heapsort word.

An important note is that pointers have a detrimental effect on the
locality of memory access (caching issues).
I'd say that quicksort using pointers is about as efficient as heapsort
without pointers.

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bernd Linsel@21:1/5 to albert on Tue Aug 23 17:34:08 2022

On 23.08.2022 12:30, albert wrote:

In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
Micha? Kasprzak <siarczek83@gmail.com> wrote: <SNIP>

This additional information could be a pointer to related data.
This pointer may be an index instead of an address. We can now use
our very quick heapsort word.

An important note is that pointers have a detrimental effect on the
locality of memory access (caching issues). I'd say that quicksort
using pointers is about as efficient as heapsort without pointers.

Groetjes Albert

That's firstly a question of the locality of the pointed-to data. If
they are organized in an array, and you have another array of pointers
into that array, the data to be sorted will be cached alike.
There is even an advantage, as swap operations only have to swap
pointers, not whole array elements.

Your point is only valid if the pointed-to data is scattered over the
whole memory, so that nearly every access (eg for getting the keys to
compare) misses the cache.

Regards
Bernd

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Bernd Linsel on Tue Aug 23 17:33:23 2022

Bernd Linsel <bl1-removethis@gmx.com> writes:

On 23.08.2022 12:30, albert wrote:

In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
Micha? Kasprzak <siarczek83@gmail.com> wrote: <SNIP>

This additional information could be a pointer to related data.
This pointer may be an index instead of an address. We can now use
our very quick heapsort word.

An important note is that pointers have a detrimental effect on the
locality of memory access (caching issues). I'd say that quicksort
using pointers is about as efficient as heapsort without pointers.

...

That's firstly a question of the locality of the pointed-to data. If
they are organized in an array, and you have another array of pointers
into that array, the data to be sorted will be cached alike.

Only at the start. After a few steps of a sorting algorithm with
spatial locality the pointers are rearranged and walking through them
linearly will access the elements in a way that has little spatial
locality.

But still, I don't see any validity in his point. Sure we can throw
away the spatial locality of quicksort and mergesort, and then it
might do as badly as heapsort (which does not have to take such
measures to have bad spatial locality), but why would one do that?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Sun Jul 13 11:30:55 2025
  from Wales, Uk via Telnet
- Michal Wronka
  Sun Jul 13 09:16:55 2025
  from Wroclaw, Poland via SSH
- Gwylbert
  Sun Jul 13 04:08:48 2025
  from Sydney, Nsw via Telnet
- Plume
  Sun Jul 13 02:17:39 2025
  from Uk via Raw
- Michal Wronka
  Sat Jul 12 23:15:13 2025
  from Wroclaw, Poland via SSH
- Bob Worm
  Sat Jul 12 23:04:09 2025
  from Wales, Uk via Telnet
- Michal Wronka
  Sat Jul 12 22:43:44 2025
  from Wroclaw, Poland via SSH
- Bagwaa
  Sat Jul 12 21:37:02 2025
  from Nottingham via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	508
Nodes:	16 (0 / 16)
Uptime:	242:25:32
Calls:	9,986
Calls today:	4
Files:	13,836
Messages:	6,358,541

Re: Why to sort an array of numbers

Who's Online

Recent Visitors

System Info