Forum: >>> Magnum BBS <<<

OT: Re: A very slow program

From Vir Campestris@21:1/5 to Bonita Montero on Thu Oct 3 17:06:13 2024

XPost: alt.comp.lang.c

On 02/10/2024 13:44, Bonita Montero wrote:

Ive developed a Sieve of Erastosthenes in C++20 with an unbeaten
performance. With this code I get all primes <= 2 ^ 32 in 140ms
on my AMD 7950X 16-core Zen4-system.
It calculates first only the primes up to the square root of the
maximum. The remaining bits are partitioned according to the num-
ber of CPU-cores. Within each thread the bitmap is partitioned in
parts that fit into the L2-cache (queried via CPUID).
All operations are aligned on a cacheline-boundary to avoid false
sharing and locking of shared words.

Unbeaten?

This isn't the same program you posted before over on comp.lang.c++, but
it builds and runs fine on my machine.

for 4e9 primes my machine take nearly 10 seconds for 4294967295 primes.
The one I posted before over there takes 1.5 seconds.

This version is if anything slightly slower than your other one.

You might like to have a look.

(And Keith, yes, I realise there is no point whatsoever in doing this...
but it's been fun. First time in years I've written anything I'm not
getting paid for!)

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Bonita Montero on Sun Oct 20 11:49:22 2024

XPost: alt.comp.lang.c

On 04/10/2024 05:53, Bonita Montero wrote:

Am 03.10.2024 um 18:06 schrieb Vir Campestris:

for 4e9 primes my machine take nearly 10 seconds for 4294967295
primes. The one I posted before over there takes 1.5 seconds.

Show me your code. I'm pretty sure you never posted faster code.

Try this.

--
/*
Programme to calculate primes using the Sieve or Eratosthenes
Copyright (C) 2024 the person known as Vir Campestris

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
*/

#include <algorithm>
#include <cassert>
#include <chrono>
#include <climits>
#include <cmath>
#include <csignal>
#include <cstring>
#include <fstream>
#include <iomanip>
#include <iostream>
#include <list>
#include <memory>
#include <new>
#include <thread>
#include <vector>

// If these trigger you have a really odd architecture. static_assert(sizeof(uint8_t) == 1);
static_assert(CHAR_BIT == 8);

// The type of a prime number.
// size_t isn't big enough on a 32-bit machine with 2GB memory allocated
and 16 primes per byte.
typedef uint64_t PRIME;

// This is the type of the word I operate in. The code will run OK with
uin8_t or uint32_t,
// but *on my computer* it's slower.
typedef uint64_t StoreType;

// tuneable constants
constexpr size_t tinyCount = 5;
constexpr size_t smallCount = 10;
constexpr size_t bigCache = 16384;

// this is a convenience class for timing.
class Stopwatch {
typedef std::chrono::steady_clock CLOCK;
typedef std::pair<unsigned, std::chrono::time_point<CLOCK> > LAP;
typedef std::vector<LAP> LAPS;
LAPS mLaps;
public:
typedef std::vector<std::pair<unsigned, double>> LAPTIMES;
Stopwatch() {}

// record the start of an interval
void start()
{
mLaps.clear();
mLaps.emplace_back(std::make_pair(0, CLOCK::now()));
}

void lap(unsigned label)
{
mLaps.emplace_back(std::make_pair(label, CLOCK::now()));
}

// record the end of an interval
void stop()
{
mLaps.emplace_back(std::make_pair(-1, CLOCK::now()));
}

// report the difference between the last start and stop
double delta() const
{
assert(mLaps.size() >= 2);
assert(mLaps.front().first == 0);
assert(mLaps.back().first == -1);
return std::chrono::duration_cast<std::chrono::nanoseconds>(mLaps.back().second
- mLaps.front().second).count() / (double)1e9;
}

LAPTIMES const laps() const
{
LAPTIMES ret;
assert(mLaps.size() >= 2);
ret.reserve(mLaps.size() - 1);
auto lap = mLaps.begin();
auto start = lap->second;

for(++lap; lap != mLaps.end(); ++lap)
{
ret.emplace_back(
std::make_pair(
lap->first,

std::chrono::duration_cast<std::chrono::nanoseconds>(lap->second - start).count() / (double)1e9));

}
return ret;
}
};

// Internal class used to store the mask data for one or more
// primes rather than all primes. This has an initial block,
// followed by a block which contains data which is repeated
// for all higher values.
// Example: StoreType== uint8_t, prime == 3, the mask should be
// 90 meaning 1,3,5,7,11,13 are prime but not 9, 15
// 24,49,92 covers odds 17-63
// 24,49,92 covers 65-111, with an identical mask
// As the mask is identical we only need to store the
// non-repeat, plus one copy of the repeated block and data to
// show where it starts and ends. Note that for big primes the
// initial block may be more than one StoreType.
// As there are no common factors between any prime and the
// number of bits in StoreType the repeat is the size of the
// prime.
// Note also that a masker for two primes will have an
// initial block which is as big as the larger of the sources,
// and a repeat which is the product of the sources.
class Masker final
{
// The length of the initial block.
size_t initSize;

// The size of the repeated part. The entire store size is the
sum of these two.
size_t repSize;

// Buffer management local class. I started using std::vector
for the store,
// but I found it was slow for large sizes as it insists on
calling the constructor
// for each item in turn. And there may be billions of them.
class Buffer final
{
StoreType* mpBuffer; // base of the data referred to
size_t mSize; // size of the buffer in
StoreType-s.

public:
// Normal constructor
Buffer(size_t size = 0) : mpBuffer(nullptr), mSize(size)
{
if (size > 0)
{
mpBuffer = static_cast<StoreType*>(std::malloc(size * sizeof(StoreType)));

if (!mpBuffer)
throw std::bad_alloc();
}
}

Buffer(Buffer&) = delete;

// Move constructor
Buffer(Buffer&& source)
: mpBuffer(source.mpBuffer), mSize(source.mSize)
{
source.mSize = 0;
source.mpBuffer = nullptr;
};

Buffer& operator=(Buffer&) = delete;

// Move assignment
Buffer& operator=(Buffer&& source)
{
if (mpBuffer)
std::free(mpBuffer);
mpBuffer = source.mpBuffer;
mSize = source.mSize;
source.mSize = 0;
source.mpBuffer = nullptr;
return *this;
}

void resize(size_t newSize)
{
mpBuffer = static_cast<StoreType*>(std::realloc(mpBuffer, newSize *
sizeof(StoreType)));
if (!mpBuffer)
throw std::bad_alloc();
mSize = newSize;
}

// Get the data start
inline StoreType* operator*() const
{
return mpBuffer;
}

// Get the owned size
inline size_t const & size() const
{
return mSize;
}

// clean up.
~Buffer()
{
if (mpBuffer)
std::free(mpBuffer);
}
} mBuffer;

// Raw pointer to the store for speed.
StoreType * mStorePtr;

// shift from the prime value to the index in mStore
static constexpr size_t wordshift = sizeof(StoreType) == 1 ? 4
: sizeof(StoreType) == 4 ? 6 : sizeof(StoreType) == 8 ? 7 : 0;
static_assert(wordshift);

// Mask for the bits in the store that select a prime.
static constexpr size_t wordmask = (StoreType)-1 >>
(sizeof(StoreType) * CHAR_BIT + 1 - wordshift);

// convert a prime to the index of a word in the store
static inline size_t index(PRIME const & prime)
{
return prime >> wordshift;
}

// get a mask representing the bit for a prime in a word
static inline StoreType mask(PRIME const & prime)
{
return (StoreType)1 << ((prime >> 1) & wordmask);
}

public:
// This class allows us to iterate through a Masker
indefinitely, retrieving words,
// automatically wrapping back to the beginning of the repat
part when the end is met.
class MaskIterator
{
StoreType const * index, *repStart, *repEnd;
public:
MaskIterator(Masker const * origin)
: index(origin->mStorePtr)
, repStart(index + origin->initSize)
, repEnd(repStart + origin->repSize)
{}

inline StoreType const & operator*() const
{
return *index;
}

inline StoreType next() // returns value with iterator post-increment
{
auto & retval = *index;
if (++index >= repEnd)
index = repStart;
return retval;
}
};

// Default constructor. Makes a dummy mask for overwriting.
Masker()
: initSize(0)
, repSize(1)
, mBuffer(1)
, mStorePtr(*mBuffer)
{
*mStorePtr = 0; // nothing marked unprime
}

// Move constructor (destroys source)
Masker(Masker&& source)
: initSize(source.initSize)
, repSize(source.repSize)
, mBuffer(std::move(source.mBuffer))
, mStorePtr(*mBuffer)
{
}

// Copy constructor (preserves source)
Masker(Masker& source)
: initSize(source.initSize)
, repSize(source.repSize)
, mBuffer(initSize + repSize)
, mStorePtr(*mBuffer)
{
memcpy(*mBuffer, *source.mBuffer, mBuffer.size() * sizeof(StoreType));
}

// move assignment (destroys source)
Masker& operator=(Masker&& source)
{
initSize = source.initSize;
repSize = source.repSize;
mStorePtr = source.mStorePtr;
mBuffer = std::move(source.mBuffer);
return *this;
}

// Construct for a single prime
Masker(PRIME prime)
: initSize(this->index(prime)+1)
, repSize(prime)
, mBuffer(initSize + prime)
, mStorePtr(*mBuffer)
{
// Pre-zero the buffer, because for bigger primes we
don't write every word.
// This is fast enough not to care about excess writes.
memset(mStorePtr, 0, mBuffer.size() * sizeof(StoreType));

// The max prime we'll actually store.
// *2 because we don't store evens.
PRIME const maxPrime = (initSize + repSize) * CHAR_BIT
* sizeof(StoreType) * 2;

// Filter all the bits. Note that although we
// don't strictly need to filter from prime*3,
// but only from prime**2, doing from *3 makes
// the init block smaller.
PRIME const prime2 = prime * 2;

for (PRIME value = prime * 3; value < maxPrime; value
+= prime2)
{
mStorePtr[this->index(value)] |= this->mask(value);
}
}

// Construct from two others, making a mask that excludes all
numbers
// marked not prime in either of them. The output init block
// will be the largest from either of the inputs; the repeat
block size will be the
// product of the input repeat sizes.
// The output could get quite large - 3.7E12 for all primes up
to 37
Masker(const Masker& left, const Masker& right, size_t
storeLimit = 0)
: initSize(std::max(left.initSize, right.initSize))
, repSize (left.repSize * right.repSize)
, mBuffer()
{
if (storeLimit)
repSize = std::min(initSize + repSize,
storeLimit) - initSize;

auto storeSize = initSize + repSize;

// Actually construct the store with the desired size
mBuffer = storeSize;
mStorePtr = *mBuffer;

// get iterators to the two inputs. These automatically
wrap
// when their repsize is reached.
auto li = left.begin();
auto ri = right.begin();

for (size_t word = 0; word < storeSize; ++word)
{
mStorePtr[word] = li.next() | ri.next();
}
}

// Construct from several others, making a mask that excludes
all numbers
// marked not prime in any of them. The output init block
// will be the largest from any of the inputs; the repeat size
will be the
// product of all the input repeat sizes.
// The output could get quite large - 3.7E12 for all primes up
to 37
// the type iterator should be an iterator into a collection of Masker-s.
template <typename iterator> Masker(const iterator& begin,
const iterator& end, size_t storeLimit = 0)
: initSize(0)
, repSize(1)
, mBuffer() // empty for now
{
// Iterate over the inputs. We will
// * Determine the maximum init size
// * Determine the product of all the repeat sizes.
// * Record the number of primes we represent.
size_t nInputs = std::distance(begin, end);
std::vector<MaskIterator> iterators;
iterators.reserve(nInputs);

for (auto i = begin; i != end; ++i)
{
initSize = std::max(initSize, i->initSize);
repSize *= i->repSize;
iterators.push_back(i->begin());
}
if (storeLimit)
repSize = std::min(initSize + repSize,
storeLimit) - initSize;
auto storeSize = initSize + repSize;
// Actually construct the store with the desired size
mBuffer = storeSize;
mStorePtr = *mBuffer;

// take the last one off (most efficient to remove)
// and use it as the initial mask value
auto last = iterators.back();
iterators.pop_back();
for (auto word = 0; word < storeSize; ++word)
{
StoreType mask = last.next();
for(auto &i: iterators)
{
mask |= i.next();
}
mStorePtr[word] = mask;
}
}

template <typename iterator> Masker& andequals(size_t
cachesize, iterator begin, iterator end)
{
static constexpr size_t wordshift = sizeof(StoreType)
== 1 ? 4 : sizeof(StoreType) == 4 ? 6 : sizeof(StoreType) == 8 ? 7 : 0;
static constexpr size_t wordmask = (StoreType)-1 >> (sizeof(StoreType) * CHAR_BIT + 1 - wordshift);

struct value
{
PRIME step; // this is the
prime. The step should be 2P, but only half the bits are stored
PRIME halfMultiple; // this is the
current multiple, _without_ the last bit
value(PRIME prime): step(prime), halfMultiple((prime*prime) >> 1) {}
bool operator<(PRIME halfLimit) const
{
return halfMultiple < halfLimit;
}
void next(StoreType * store)
{
static constexpr size_t wsm1 =
wordshift - 1;
auto index = halfMultiple >> wsm1;
auto mask = (StoreType)1 <<
(halfMultiple & wordmask);
*(store + index) |= mask;
halfMultiple += step;
}
};
std::vector<value> values;
values.reserve(std::distance(begin, end));
for (auto prime = begin; prime != end; ++prime)
{
auto p = *prime;
values.emplace_back(p);
}

size_t limit = 0;
do
{
limit = std::min(limit + cachesize, initSize +
repSize + 1);
PRIME halfMaxPrime = (limit * 16 *
sizeof(StoreType)) / 2 + 1;
for (auto & value: values)
{
while (value < halfMaxPrime)
value.next(mStorePtr);
}
}
while (limit < initSize + repSize);

return *this;
}

// duplicates the data up to the amount needed for a prime newSize
void resize(PRIME newSize)
{
size_t sizeWords = this->index(newSize) + 1;
assert(sizeWords > (initSize + repSize)); // not
allowed to shrink!
mBuffer.resize(sizeWords);
mStorePtr = *mBuffer;

auto copySource = mStorePtr + initSize;
do
{
auto copySize = std::min(sizeWords - repSize - initSize, repSize);
auto dest = copySource + repSize;
memcpy(dest, copySource, copySize * sizeof(StoreType));
repSize += copySize;
} while ((initSize + repSize) < sizeWords);
}

// returns true if this mask thinks value is prime.
bool get(PRIME value) const
{
assert(value < sizeof(StoreType) * CHAR_BIT * 2 * mBuffer.size());
auto ret =
(value <= 3)
|| ((value & 1) &&
(mStorePtr[this->index(value)] & this->mask(value)) == 0);

return ret;
}

// Get the beginning of the bitmap.
// Incrementing this iterator can continue indefinitely,
regardless of the bitmap size.
MaskIterator begin() const
{
return MaskIterator(this);
}

size_t repsize() const
{
return repSize;
}

size_t size() const
{
return initSize + repSize;
}

// prints a collection to stderr
void dump(size_t limit = 0) const
{
std::cerr << std::dec << repSize;
std::cerr << std::hex;

if (limit == 0) limit = initSize + repSize;

auto iter = begin();
for (auto i = 0; i < limit; ++i)
{
// cast prevents attempting to print uint8_t as
a char
std::cerr << ',' << std::setfill('0') << std::setw(sizeof(StoreType) * 2) << uint64_t(mStorePtr[i]);
}
std::cerr << std::endl;
std::cerr << std::dec << repSize;

for (auto p = 1; p < limit * 16 * sizeof(StoreType); ++p)
{
if (get(p))
{
std::cerr << ',' << p;
}
}
std::cerr << std::endl;
}
};

int main(int argc, char**argv)
{
// find out if the user asked for a different number of primes
PRIME PrimeCount = 1e9;
if (argc >= 2)
{
std::istringstream arg1(argv[1]);
std::string crud;
arg1 >> PrimeCount >> crud;
if (!crud.empty() || PrimeCount < 3)
{
double isitfloat;
arg1.seekg(0, arg1.beg);
crud.clear();
arg1 >> isitfloat >> crud;
if (!crud.empty() || isitfloat < 3)
{
std::cerr << "Invalid first argument
\"" << argv[1] << "\" Should be decimal number of primes required\n";
exit(42);
}
else PrimeCount = isitfloat;
}
}

std::string outName;
if (argc >= 3)
{
outName = argv[2];
}

Stopwatch s; s.start();

Masker primes;
PRIME nextPrime;
{
nextPrime = 3;
Masker tiny(nextPrime);
std::vector<Masker> smallMasks;
smallMasks.emplace_back(tiny);

// Make a mask for the really small primes, 3 5 7 11
size_t count = 1; // allows for the 3
for ( ; count < tinyCount; ++count)
{
do
{
nextPrime += 2;
} while (!tiny.get(nextPrime));
tiny = Masker(tiny, nextPrime);
}

// that masker for three will be correct up until 5
squared,
// the first non-prime whose smallest root is not 2 or 3
assert (nextPrime < 25 && "If this triggers tinyCount
is too big and the code will give wrong results");

// this limit is too small, but is easy to calculate
and big enough
auto limit = nextPrime*nextPrime;

smallMasks.clear();
for (; count < smallCount; ++count)
{
do
{
nextPrime += 2;
} while (!tiny.get(nextPrime));
smallMasks.emplace_back(nextPrime);
}
assert(nextPrime <= limit && "If this triggers
smallCount is too big and the code will give wrong results");

// Make a single masker for all the small primes. Don't
make it bigger than needed.
auto small = Masker(smallMasks.begin(),
smallMasks.end(), PrimeCount / (2 * CHAR_BIT * sizeof(StoreType)) + 1);
s.lap(__LINE__);

// This is the first step that takes an appreciable
time. 2.5s for primes up to 1e11 on my machine.
primes = Masker(tiny, small, PrimeCount / (2 * CHAR_BIT
* sizeof(StoreType)) + 1);
s.lap(__LINE__);
}

// if the buffer isn't big enough yet expand it by duplicating
early entries.
if (primes.size() < PrimeCount / (2 * CHAR_BIT *
sizeof(StoreType)) + 1)
{
primes.resize(PrimeCount);
s.lap(__LINE__);
}

do
{
std::vector<PRIME> quickies;
PRIME quickMaxPrime = std::min(nextPrime*nextPrime, PRIME(sqrt(PrimeCount) + 1));
do
{
do
{
nextPrime += 2;
} while (!primes.get(nextPrime));
quickies.emplace_back(nextPrime);
} while (nextPrime < quickMaxPrime);
s.lap(__LINE__);

// this function adds all the quickies into the mask,
but does all of them in
// one area of memory before moving onto the next. The
area of memory,
// and the list of primes, both fit into the L1 cache.
// You may need to adjust bigCache for best performance.
// This step takes a while - two lots of 20s for qe11
primes.
primes.andequals(bigCache, quickies.begin(),
quickies.end());
s.lap(__LINE__);
} while (nextPrime*nextPrime < PrimeCount);

s.stop();
std::cerr << primes.size() << "," << s.laps().back().second << std::endl;
for (auto l: s.laps()) std::cerr << l.first << "," << l.second
<< std::endl;

if (outName.length())
{
std::ofstream outfile(outName);
outfile << "2\n";
for (PRIME prime = 3; prime < PrimeCount; prime += 2)
{
if (primes.get(prime))
{
outfile << prime << '\n';
}
}
}

return 0;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to vir.campestris@invalid.invalid on Sun Oct 20 12:14:42 2024

XPost: alt.comp.lang.c

In article <vf2n7i$cko5$1@dont-email.me>,
Vir Campestris <vir.campestris@invalid.invalid> wrote:
...

#include <algorithm>

What newsgroup(s) is/are we in?

Let's check:

Newsgroups: alt.comp.lang.c,comp.lang.c

Hmmm.

--
A pervert, a racist, and a con man walk into a bar...

Bartender says, "What will you have, Donald!"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Harnden@21:1/5 to Vir Campestris on Tue Oct 22 17:13:08 2024

On 20/10/2024 11:49, Vir Campestris wrote:

On 04/10/2024 05:53, Bonita Montero wrote:

Am 03.10.2024 um 18:06 schrieb Vir Campestris:

for 4e9 primes my machine take nearly 10 seconds for 4294967295
primes. The one I posted before over there takes 1.5 seconds.

Show me your code. I'm pretty sure you never posted faster code.

Try this.

I just want to say that those are very good comments.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Kenny McCormack on Sat Oct 26 15:30:15 2024

XPost: alt.comp.lang.c

On 20/10/2024 13:14, Kenny McCormack wrote:

In article <vf2n7i$cko5$1@dont-email.me>,
Vir Campestris <vir.campestris@invalid.invalid> wrote:
...

#include <algorithm>

What newsgroup(s) is/are we in?

Let's check:

Newsgroups: alt.comp.lang.c,comp.lang.c

Hmmm.

And the thread title is... ?

Just kill the thread if you don't care. I won't start another one.

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Bonita Montero on Sat Oct 26 15:28:24 2024

XPost: alt.comp.lang.c

On 21/10/2024 16:36, Bonita Montero wrote:

Am 20.10.2024 um 12:49 schrieb Vir Campestris:

On 04/10/2024 05:53, Bonita Montero wrote:

Am 03.10.2024 um 18:06 schrieb Vir Campestris:

for 4e9 primes my machine take nearly 10 seconds for 4294967295
primes. The one I posted before over there takes 1.5 seconds.

Show me your code. I'm pretty sure you never posted faster code.

Try this.

Still about half the speed of my code for 2 ^ 32.

Fascinating.

But after my latest experience perhaps not surprising.

You're on Windows aren't you? I do know you have a far more powerful
machine than I do. I'm running Ubuntu with Gnu compiler. (I tried Clang,
but it's slower)

When I try your code on my system it takes 10.4s eleapsed, and 12.1s of
CPU to calculate 4294967296. That code I supplied up there takes only
1.47s elapsed, and a little less CPU - it's single threaded.

So on _my_ system my code is about 7 times faster!

But my recent experience - I've been playing with using a modulo30
store, not just odds.

My code makes bitmasks for 2 sets of small primes (these can be small,
there's a repeat at the product of all the primes) then combines the 2
to make the initial mask for the desired number of primes. It then masks
out the bigger primes.

With a mask of odds-only the mask words can be 64 bit, and combining
them is way faster than Mod30 - where the mask is a byte. So I put
together some code so that it would put the two small masks together
with 64 bit operations. That bit of the code got twice as fast. Then I
realised that the next part, for reasons I have been unable to
establish, runs three times more slowly.

But only when I'm using GCC (not Clang) and only when -Ofast
optimisation is turned on. That loop is the same speed in all 0other cases.

I then tried putting some debug timing statements in, and I could get
the same effect by inserting some in in various places.

I looked at the output assembly, and there seem to be random changes (instructions that are not dependent on each other being re-ordered) so
it's not obvious why this is happening.
I think I'm bored enough with this project to just leave it. I've been
more interested in correctness, not speed, in recent years.

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Bonita Montero on Sat Oct 26 21:02:10 2024

XPost: alt.comp.lang.c

On 26/10/2024 15:33, Bonita Montero wrote:

Am 26.10.2024 um 16:28 schrieb Vir Campestris:

You're on Windows aren't you? I do know you have a far more powerful
machine than I do. I'm running Ubuntu with Gnu compiler. (I tried
Clang, but it's slower)

I compared your code against mine on a AMD 7950X.

Which operating system, and which compiler?

It's probably also worth trying for a much larger number of primes. I
wrote my command line parser so it will try to read the number as a
double if it isn't an integer.

For 100000000000 primes my code on my machine is 39s, and yours is 3:46.

(I've got the mod30 version down to 28s now, but met an intractable
optimiser funny)

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Vir Campestris@21:1/5 to Bonita Montero on Mon Oct 28 16:21:24 2024

XPost: alt.comp.lang.c

On 27/10/2024 05:02, Bonita Montero wrote:

Am 26.10.2024 um 22:02 schrieb Vir Campestris:

Which operating system, and which compiler?

Win11, clang-cl 17.0.3. MSVC is about the same performance.

The plot thickens.

I'm in Ubuntu 24.01 LTS.

I normally use Gnu C++, g++ (Ubuntu 13.2.0-23ubuntu4) 13.2.0.

Using g++ for 10000000000 primes my code took 3.3s, and yours 22.7.

If I rebuild them using clang (Ubuntu clang version 18.1.3) then
my code takes 14.5s (far slower) and yours takes 11.6 (much quicker than
with Gnu and a bit quicker than mine).

Andy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	09:08:33
Calls:	10,388
Calls today:	3
Files:	14,061
Messages:	6,416,843
Posted today:	1

OT: Re: A very slow program

Who's Online

Recent Visitors

System Info