I retired last year, and I haven't really written any code since. This
has turned out to be quite a fun little thing of a type I haven't had
time for for YEARS. And oddly I still don't seem to have enough time
for it... It's the garden, and the kid's gardens, and my mum's garden,
and all those holidays :)
But some optimisations. You'll remember in Bonita's first version the
bitmap was initialised to 0xaaaa, because it's a waste of time doing
the sieve for 2.
I pointed out that we don't need to even store the even numbers.
But there's more.
If you look at the bitmap when you've sieved for 2 you see
12 34 56 78
11 10 10 10...
which is a repeat of 2 after an initialisation word. That's the aaaa.
You can do the same with 3
123 456 789
111 110 110 110
except this time the repeat is 3. And annoyingly that doesn't map well
down onto a byte-based architecture. You end up with an initial word,
then a 3-word repeat. (If your word was 24 or 36 bytes it would only
be 1 word, but I haven't seen that since the 1970s)
In hex, with lowest byte first, that is
fd b6 6d db b6 6d db
(That BTW is the same if you only store the odd numbers - a 3-word repeat.)
So rather than start off with your bitmap all set to 1s you can set it
to this repeating pattern. That replaces all the ANDs for all the
values of three with a memory fill with far fewer accesses.
You can do the same with 5:
12345 6789a bcdef
11111 11110 11110 11110
You can then AND your pattern for 3 with the one for 5, and get one
with a repeat length of 15, and set that into your bitmap. You've now replaced about a fifth of all your AND operations with a flood fill.
This can carry on - for a while. Only a short while. You _can_ make a sequence for lots of primes. But it gets quite long, quite
quickly. For up to 23, and not storing the evens, it's over 1E8 words
long!
I was implementing a version of that when something else occurred to
me. You can sacrifice speed for store size if you're prepared to do an integer divide for every prime lookup.
[...]
But a note for the group of course - optimising this to the max has
nothing to do whatever with C++. The only C++ optimising I've found
myself doing is using raw pointers, not vector's
operator[]. (certainly not the at function). And also I found myself
using emplace_back a lot. It's a PITA because you can only emplace
back a single item, and it is slow.
Am 16.02.2024 um 17:06 schrieb Tim Rentsch:
I have an implementation (written in C) based on this approach that
determines all primes less than roughly 51.75 billion, taking just
under 56 seconds to complete. (No threading is used - code is all
single threaded.)
On my 16 core PC this takes 1.73 seconds and 43 seconds
overall CPU time without printing the numbers to a file.
C:\Users\Boni\Documents\Source\bitmapSieve>timep "x64\Release-clang++\bitmapSieve.exe" 51750000000 ""
real 1.729s
user 43.094s
sys 0.094s
cylces 194.738.953.589
Am 23.02.2024 um 14:51 schrieb Tim Rentsch:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 16.02.2024 um 17:06 schrieb Tim Rentsch:
I have an implementation (written in C) based on this approach that
determines all primes less than roughly 51.75 billion, taking just
under 56 seconds to complete. (No threading is used - code is all
single threaded.)
On my 16 core PC this takes 1.73 seconds and 43 seconds
overall CPU time without printing the numbers to a file.
C:\Users\Boni\Documents\Source\bitmapSieve>timep
"x64\Release-clang++\bitmapSieve.exe" 51750000000 ""
real 1.729s
user 43.094s
sys 0.094s
cylces 194.738.953.589
If you run the program as a single thread, what is the
elapsed time?
C:\Users\Boni\Documents\Source\bitmapSieve>timep "x64\Release-clang++\bitmapSieve.exe" 51750000000 "" 1
real 22.945s
user 22.672s
sys 0.234s
cylces 102.917.207.295
Also how long does the program take to determine all
primes up to 3 trillion? Here again I am interested
in the single-thread version.
C:\Users\Boni\Documents\Source\bitmapSieve>timep "x64\Release-clang++\bitmapSieve.exe" 3000000000 "" 1
real 1.234s
user 1.203s
sys 0.031s
cylces 5.523.677.002
Am 25.02.2024 um 09:48 schrieb Tim Rentsch:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 23.02.2024 um 14:51 schrieb Tim Rentsch:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 16.02.2024 um 17:06 schrieb Tim Rentsch:
Also how long does the program take to determine all
primes up to 3 trillion? Here again I am interested
in the single-thread version.
C:\Users\Boni\Documents\Source\bitmapSieve>timep
"x64\Release-clang++\bitmapSieve.exe" 3000000000 "" 1
real 1.234s
user 1.203s
sys 0.031s
cylces 5.523.677.002
What I asked for was 3 trillion with a T, not 3 billion with
a B. Not 3000000000 but 3000000000000.
That would in a bitmap of 375GiB, which won't fit into my memory.
Am 11.03.2024 um 18:10 schrieb Tim Rentsch:
Sounds like you're using 1 bit per number, most of which are
wasted. If you use a different encoding the memory requirements
can be much smaller. How much memory do you have on the box?
If you have 64G you should be able to determine all primes
less than 1.5 trillion, using a simple encoding.
I'm omitting even numbers and I handle the number two as a
special case; that's the fastest solution.
I've mentioned this encoding before but let me give it again.
If numbers are considered mod 30, there are only 8 residues
that are not divisible by 2, 3, or 5. The 8 residues are
1, 7, 11, 13, 17, 19, 23, and 29. So a single byte can
hold all the information needed for 30 numbers, which means
1500000000000 / 30 = 50000000000
which is to say 50 gigabytes should suffice.
Show me the code.
Am 20.04.2024 um 17:35 schrieb Tim Rentsch:[...]
Bonita Montero <Bonita.Montero@gmail.com> writes:
Show me the code.
Apparently you have missed the point.
I want to see the code for your idea.
Am 24.04.2024 um 21:28 schrieb Tim Rentsch:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 20.04.2024 um 17:35 schrieb Tim Rentsch:
Bonita Montero <Bonita.Montero@gmail.com> writes:
[...]
Show me the code.
Apparently you have missed the point.
I want to see the code for your idea.
Yes I already understood what you want. That is what
led me to conclude that you have missed the point.
I don't have "missed the point"; [...]
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 57:00:45 |
Calls: | 10,397 |
Calls today: | 5 |
Files: | 14,067 |
Messages: | 6,417,447 |
Posted today: | 1 |