Forum: >>> Magnum BBS <<<

xxd -i vs DIY Was: C23 thoughts and opinions

From Michael S@21:1/5 to David Brown on Tue May 28 14:41:18 2024

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

No, it does /not/. That's the /whole/ point of #embed, and the main motivation for its existence. People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).
I've done so myself on many projects, all integrated together in
makefiles.

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

/c/altera/13.0sp1/quartus/bin64/db_wys.dll is 52 MB file

$ time xxd -i < /c/altera/13.0sp1/quartus/bin64/db_wys.dll > xxd.txt

real 0m15.288s
user 0m15.054s
sys 0m0.187s

$ time ../quick_xxd/bin_to_list1
/c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list1.txt

real 0m8.502s
user 0m0.000s
sys 0m0.000s

$ time ../quick_xxd/bin_to_list
/c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list.txt

real 0m1.326s
user 0m0.000s
sys 0m0.000s

bin_to_list probably limited by write speed of SSD that in this
particular case is ~9 y.o. and was used rather intensively during these
years.

bin_to_list1 is DYI written in ~5 min.
bin_to_list is DYI written in ~55 min.
In post above David Brown mentioned 'other tools (common or
specialised)'. I'd like to know what they are and how fast they are.

Appendix A.
// bin_to_list1.c
#include <stdio.h>
#include <stdlib.h>

int main(int argz, char** argv)
{
if (argz > 1) {
FILE* fp = fopen(argv[1], "rb");
if (fp) {
int c;
while ((c = fgetc(fp)) >= 0)
printf("%d,\n", c);
fclose(fp);
} else {
perror(argv[1]);
return 1;
}
}
return 0;
}
// end of bin_to_list1.c

Appendix B.
// bin_to_list.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static const char usage[] =
"bin_to_list - convert binary file to comma-delimited list of decimal numbers\n" "Usage:\n"
" bin_to_list infile [oufile]\n"
"When output file is not specified, the result is written to standard output.\n" ;
int main(int argz, char** argv)
{
// process command line
if (argz < 2) {
fprintf(stderr, "%s", usage);
return 1;
}

char* infilename = argv[1];
static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
"--?" }; const int n_help_aliases =
sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
fprintf(stderr, "%s", usage);
return 0;
}
}

// open files
FILE* fpin = fopen(infilename, "rb");
if (!fpin) {
perror(infilename);
return 1;
}

FILE* fpout = stdout;
char* outfilename = NULL;
if (argz > 2) {
outfilename = argv[2];
fpout = fopen(outfilename, "w");
if (!fpout) {
perror(outfilename);
fclose(fpin);
return 1;
}
}

// Initialize table
char bin2dec[256][4];
for (int i = 0; i < 256;++i)
sprintf(bin2dec[i], "%d", i);

// main loop
int err = 0;
int c;
enum { MAX_CHAR_PER_LINE = 80, MAX_CHAR_PER_NUM = 4,
ALMOST_FULL_THR = MAX_CHAR_PER_LINE-MAX_CHAR_PER_NUM };
char outbuf[MAX_CHAR_PER_LINE+1]; // provide space for EOL
char* outptr = outbuf;
while ((c = fgetc(fpin)) >= 0) {
char* dec = bin2dec[c & 255];
do
*outptr++ = *dec++;
while (*dec);
*outptr++ = ',';
if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf) {
err = 2;
break;
}
outptr = outbuf;
}
}
if (ferror(fpin)) {
perror(infilename);
err = 1;
}
// last line
if (outptr != outbuf && err == 0) {
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf)
err = 2;
}

// completion and cleanup
if (err == 2 && outfilename)
perror(outfilename);

fclose(fpin);
if (outfilename) {
fclose(fpout);
if (err)
remove(outfilename);
}
return err;
}
// end of bin_to_list.c

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Tue May 28 15:06:40 2024

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

No, it does /not/. That's the /whole/ point of #embed, and the main
motivation for its existence. People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).
I've done so myself on many projects, all integrated together in
makefiles.

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

/c/altera/13.0sp1/quartus/bin64/db_wys.dll is 52 MB file

$ time xxd -i < /c/altera/13.0sp1/quartus/bin64/db_wys.dll > xxd.txt

real 0m15.288s
user 0m15.054s
sys 0m0.187s

$ time ../quick_xxd/bin_to_list1
/c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list1.txt

real 0m8.502s
user 0m0.000s
sys 0m0.000s

$ time ../quick_xxd/bin_to_list
/c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list.txt

real 0m1.326s
user 0m0.000s
sys 0m0.000s

bin_to_list probably limited by write speed of SSD that in this
particular case is ~9 y.o. and was used rather intensively during these years.

bin_to_list1 is DYI written in ~5 min.
bin_to_list is DYI written in ~55 min.
In post above David Brown mentioned 'other tools (common or
specialised)'. I'd like to know what they are and how fast they are.

I think you might be missing the point here.

The start point is a possibly large binary data file.

The end point is to end up with an application whose binary code has
embedded that data file. (And which makes that data available inside the
C program as a C data structure.)

Without #embed, one technique (which I've only learnt about this week)
is to use a tool called 'xxd' to turn that binary file into C source
code which contains an initialised array or whatever.

But, that isn't the bottleneck. You run that conversion once (or
whenever the binary changes), and use the same resulting C code time you
build the application. And quite likely, the makefile recognises you
don't need to compile it anyway.

It is that building process that can be slow if that C source describing
the data is large.

That is what #embed helps to address. At least, if it takes the fast
path that has been discussed. But implemented naively, or the fast path
is not viable, then it can be just as slow as compiling that
xxd-generated C.

It will at least however have eliminated that xxd step.

The only translation going on here might be:

* Expanding a binary file to text, or tokens (if #embed is done poorly)
* Parsing that text or tokens into the compiler's internal rep

But all that is happening inside the compiler.

It might be that when xxd /is/ used, there might be a faster program to
do the same thing, but I've not heard anyone say xxd's speed is a
problem, only that it's a nuisance to do.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Tue May 28 17:34:19 2024

On 28/05/2024 13:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

No, it does /not/. That's the /whole/ point of #embed, and the main
motivation for its existence. People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).
I've done so myself on many projects, all integrated together in
makefiles.

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

I used 100 MB of random data:

dd if=/dev/urandom bs=1M count=100 of=100MB

I compiled your code with "gcc-11 -O2 -march=native".

I ran everything in a tmpfs filesystem, completely in ram.

xxd took 5.4 seconds - that's the baseline.

Your simple C code took 4.35 seconds. Your second program took 0.9
seconds - a big improvement.

One line of Python code took 8 seconds :

print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

A slightly nicer Python program took 14.3 seconds :

import sys
bs = open(sys.argv[1], "rb").read()
xs = "".join([" 0x%02x," % b for b in bs])
ln = len(xs)
print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

Like "xxd -i", that one split the output into lines of 12 bytes. Some compilers might not like a single 300-600 MB line !

I didn't try compiling a test file from the 100 MB source data, but gcc
took about 16 seconds for an include file generated from 20 MB of random
data. It didn't make a significant difference if the data was in
decimal or hex, one line or multiple lines.

But since compilation took about ten times as long as the single line of
Python code, my conclusion is that the speed of generating the include
file is pretty much irrelevant. Compared to the one-line Python code
and considering the generation and compilation combined, using xxd saves
5% of the time and your best code saves 9% - out of possible 10% cost
saving.

Thus if you want to save build time when including large arrays of data
in the generated executable, time spent on beating xxd is wasted -
implementing optimised #embed is the only way to make an impact.

(I have had reason to include a 0.5 MB file in a statically linked
single binary - I'm not sure when you'd need very fast handling of multi-megabyte embeds.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Tue May 28 17:42:15 2024

On 28/05/2024 16:06, bart wrote:

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

No, it does /not/. That's the /whole/ point of #embed, and the main
motivation for its existence. People have always managed to embed
binary source files into their binary output files - using linker
tricks, or using xxd or other tools (common or specialised) to turn
binary files into initialisers for constant arrays (or structs).
I've done so myself on many projects, all integrated together in
makefiles.

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

/c/altera/13.0sp1/quartus/bin64/db_wys.dll is 52 MB file

$ time xxd -i < /c/altera/13.0sp1/quartus/bin64/db_wys.dll > xxd.txt

real    0m15.288s
user    0m15.054s
sys     0m0.187s

$ time ../quick_xxd/bin_to_list1
/c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list1.txt

real    0m8.502s
user    0m0.000s
sys     0m0.000s

$ time ../quick_xxd/bin_to_list
/c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list.txt

real    0m1.326s
user    0m0.000s
sys     0m0.000s

bin_to_list probably limited by write speed of SSD that in this
particular case is ~9 y.o. and was used rather intensively during these
years.

bin_to_list1 is DYI written in ~5 min.
bin_to_list is DYI written in ~55 min.
In post above David Brown mentioned 'other tools (common or
specialised)'. I'd like to know what they are and how fast they are.

I think you might be missing the point here.

The start point is a possibly large binary data file.

The end point is to end up with an application whose binary code has
embedded that data file. (And which makes that data available inside the
C program as a C data structure.)

Without #embed, one technique (which I've only learnt about this week)
is to use a tool called 'xxd' to turn that binary file into C source
code which contains an initialised array or whatever.

But, that isn't the bottleneck. You run that conversion once (or
whenever the binary changes), and use the same resulting C code time you build the application. And quite likely, the makefile recognises you
don't need to compile it anyway.

Exactly, yes.

(Still, speed tests can be fun as long as you don't pretend they
actually matter!)

It is that building process that can be slow if that C source describing
the data is large.

It is actually the link step that is typically the bottleneck, as that
does not easily run in parallel.

When the "data.bin" file changes, your make (or other build system) will
see the change and use xxd (or whatever) to generate data.c, and then
compile that to data.o. It's done once, whenever data.bin changes.
It's the linking that has to be done at every build of the executable,
whether "data.bin" changes or not.

That is what #embed helps to address. At least, if it takes the fast
path that has been discussed. But implemented naively, or the fast path
is not viable, then it can be just as slow as compiling that
xxd-generated C.

It will at least however have eliminated that xxd step.

Yes - it makes things a little neater and self-contained. I don't see
it as a game-changed, but if you want to avoid make, you might like it.

The only translation going on here might be:

* Expanding a binary file to text, or tokens (if #embed is done poorly)
* Parsing that text or tokens into the compiler's internal rep

But all that is happening inside the compiler.

It might be that when xxd /is/ used, there might be a faster program to
do the same thing, but I've not heard anyone say xxd's speed is a
problem, only that it's a nuisance to do.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Tue May 28 18:14:02 2024

On 28/05/2024 17:56, Michael S wrote:

On Tue, 28 May 2024 15:06:40 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

I think you might be missing the point here.

I don't think so.
I understand your points and agree with just about everything. My post
was off topic, intentionally so.

If we talk about practicalities, the problems with xxd, if there are
problems at all, are not its speed, but the size of the text file
it produces (~6x the size of original binary) and its availability.
I don't know to which package it belongs in typical Linux or BSD distributions, but at least on Windows/msys2 it is part of Vim - rather
big package for which, apart from xxd, I have no use at all.

On Debian, xxd is in a package called "xxd" which contains just xxd and directly associated files (like man pages). I see that "vim-common"
depends on "xxd", so it will get pulled in if you install vim, but not vice-versa. (It's pretty common to have vim installed on *nix systems
anyway, as part of the base set of general packages.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Tue May 28 18:56:24 2024

On Tue, 28 May 2024 15:06:40 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

I think you might be missing the point here.

I don't think so.
I understand your points and agree with just about everything. My post
was off topic, intentionally so.

If we talk about practicalities, the problems with xxd, if there are
problems at all, are not its speed, but the size of the text file
it produces (~6x the size of original binary) and its availability.
I don't know to which package it belongs in typical Linux or BSD
distributions, but at least on Windows/msys2 it is part of Vim - rather
big package for which, apart from xxd, I have no use at all.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Tue May 28 19:20:51 2024

On Tue, 28 May 2024 18:14:02 +0200
David Brown <david.brown@hesbynett.no> wrote:

On Debian, xxd is in a package called "xxd" which contains just xxd
and directly associated files (like man pages).

Good.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Tue May 28 19:57:38 2024

On 28/05/2024 16:56, Michael S wrote:

On Tue, 28 May 2024 15:06:40 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

I think you might be missing the point here.

I don't think so.
I understand your points and agree with just about everything. My post
was off topic, intentionally so.

If we talk about practicalities, the problems with xxd, if there are
problems at all, are not its speed, but the size of the text file
it produces (~6x the size of original binary) and its availability.
I don't know to which package it belongs in typical Linux or BSD distributions, but at least on Windows/msys2 it is part of Vim - rather
big package for which, apart from xxd, I have no use at all.

OK, I had go with your program. I used a random data file of exactly
100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The fastest
time was with gcc -O3.

I then tried a simple program in my language, which took 10 seconds.

I looked more closely at yours, and saw you used a clever method of a
table of precalculated stringified numbers.

Using a similar table, plus more direct string handling, the fastest
timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
supposed to match your layout, but that turned out to be variable.)

Both programs have a trailing comma on the last number, which may be problematical, but also not hard to fix.

I then tried xxd under WSL, and that took 28 seconds, real time, with a
much larger output (616KB instead of 366KB). But it's using fixed width
columns of hex, complete with a '0x' prefix.

Below is that program but in my language. I tried transpiling to C,
hoping it might be even faster, but it got slower (4.5 seconds with
gcc-O3). I don't know why. It would need manual porting to C.

This hardcodes the input filename. 'readfile' is a function in my library.

--------------------------------

[0:256]ichar numtable
[0:256]int numlengths

proc main=
ref byte data
[256]char str
const perline=21
int m, n, slen
byte bb
ichar s, p

for i in 0..255 do
numtable[i] := strdup(strint(i))
numlengths[i] := strlen(numtable[i])
od

data := readfile("/c/data100")
n := rfsize

while n do
m := min(n, perline)
n- := m
p := &str[1]

to m do
bb := data++^
s := numtable[bb]
slen := numlengths[bb]

to slen do
p++^ := s++^
od
p++^ := ','

od
p^ := 0

println str
od
end

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Tue May 28 23:23:15 2024

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 16:56, Michael S wrote:

On Tue, 28 May 2024 15:06:40 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

I think you might be missing the point here.

I don't think so.
I understand your points and agree with just about everything. My
post was off topic, intentionally so.

If we talk about practicalities, the problems with xxd, if there are problems at all, are not its speed, but the size of the text file
it produces (~6x the size of original binary) and its availability.
I don't know to which package it belongs in typical Linux or BSD distributions, but at least on Windows/msys2 it is part of Vim -
rather big package for which, apart from xxd, I have no use at all.

OK, I had go with your program. I used a random data file of exactly
100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging SSD
on my test machine and ALOT slower than SSD of David Brown.

I then tried a simple program in my language, which took 10 seconds.

I looked more closely at yours, and saw you used a clever method of a
table of precalculated stringified numbers.

Using a similar table, plus more direct string handling, the fastest
timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was supposed to match your layout, but that turned out to be variable.)

Yes, I try to get line length almost fixed (77 to 80 characters) and
make no attempts to control number of entries per line.
Since you used random generator, a density advantage of my approach is
smaller than in more typical situations, where 2-digit numbers are more
common than 3-digit numbers.

Also, I think that random numbers are close to worst case for branch
predictor / loop length predictor in my inner loop.
Were I thinking about random case upfront, I'd code an inner loop
differently. I'd always copy 4 octets (comma would be stored in the same table). After that I would update outptr by length taken from
additional table, similarly, but not identically to your method below.

There exist files that have near-random distribution, e.g. anything
zipped or anything encrypted, but I would think that we rarely want
them embedded.

Both programs have a trailing comma on the last number, which may be problematical, but also not hard to fix.

I don't see where (in C) it could be a problem. On the other hand, I can imagine situations where absence of trailing comma is inconvinient.
Now, if your language borrow its array initialization syntax from
Pascal then trailing commas are indeed undesirable.

I then tried xxd under WSL, and that took 28 seconds, real time, with
a much larger output (616KB instead of 366KB).

616 MB, I suppose.
Timing is very similar to my measurements. It is obvious that in case
of xxd, unlike in the rest of our cases, the bottleneck is in CPU rather
than in HD.

But it's using fixed
width columns of hex, complete with a '0x' prefix.

Below is that program but in my language. I tried transpiling to C,
hoping it might be even faster, but it got slower (4.5 seconds with
gcc-O3). I don't know why. It would need manual porting to C.

Why do you measure with gcc -O3 instead of more robust and more popular
-O2 ? Not that it matters in this particular case, but in general I
don't think that it is a good idea.

This hardcodes the input filename. 'readfile' is a function in my
library.

--------------------------------

[0:256]ichar numtable
[0:256]int numlengths

proc main=
ref byte data
[256]char str
const perline=21
int m, n, slen
byte bb
ichar s, p

for i in 0..255 do
numtable[i] := strdup(strint(i))
numlengths[i] := strlen(numtable[i])
od

data := readfile("/c/data100")
n := rfsize

while n do
m := min(n, perline)
n- := m
p := &str[1]

to m do
bb := data++^
s := numtable[bb]
slen := numlengths[bb]

to slen do
p++^ := s++^
od
p++^ := ','

od
p^ := 0

println str
od
end

Reading whole file upfront is undoubtly faster than interleaving of
reads and writes. But by my set of unwritten rules that I imposed on
myself, it is cheating.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Wed May 29 00:45:30 2024

On Tue, 28 May 2024 23:23:15 +0300
Michael S <already5chosen@yahoo.com> wrote:

Also, I think that random numbers are close to worst case for branch predictor / loop length predictor in my inner loop.
Were I thinking about random case upfront, I'd code an inner loop differently. I'd always copy 4 octets (comma would be stored in the
same table). After that I would update outptr by length taken from
additional table, similarly, but not identically to your method below.

That's what I had in mind:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static const char usage[] =
"bin_to_list - convert binary file to comma-delimited list of decimal numbers\n" "Usage:\n"
" bin_to_list infile [oufile]\n"
"When output file is not specified, the result is written to standard output.\n" ;
int main(int argz, char** argv)
{
// process command line
if (argz < 2) {
fprintf(stderr, "%s", usage);
return 1;
}

char* infilename = argv[1];
static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
"--?" }; const int n_help_aliases =
sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
fprintf(stderr, "%s", usage);
return 0;
}
}

// open files
FILE* fpin = fopen(infilename, "rb");
if (!fpin) {
perror(infilename);
return 1;
}

FILE* fpout = stdout;
char* outfilename = NULL;
if (argz > 2) {
outfilename = argv[2];
fpout = fopen(outfilename, "w");
if (!fpout) {
perror(outfilename);
fclose(fpin);
return 1;
}
}

enum { MAX_CHAR_PER_LINE = 80, MAX_CHAR_PER_NUM = 4,
ALMOST_FULL_THR = MAX_CHAR_PER_LINE-MAX_CHAR_PER_NUM };
// Initialize table
unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1]; //
bin2dec[MAX_CHAR_PER_NUM] => length for (int i = 0; i < 256;++i) {
char tmp[8];
int len = sprintf(tmp, "%d,", i);
memcpy(bin2dec[i], tmp, MAX_CHAR_PER_NUM);
bin2dec[i][MAX_CHAR_PER_NUM] = (unsigned char)len;
}

// main loop
int err = 0;
int c;
unsigned char outbuf[MAX_CHAR_PER_LINE+MAX_CHAR_PER_NUM]; // provide
space for EOL unsigned char* outptr = outbuf;
while ((c = fgetc(fpin)) >= 0) {
unsigned char* dec = bin2dec[c & 255];
memcpy(outptr, dec, MAX_CHAR_PER_NUM);
outptr += dec[MAX_CHAR_PER_NUM];
if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf) {
err = 2;
break;
}
outptr = outbuf;
}
}
if (ferror(fpin)) {
perror(infilename);
err = 1;
}
// last line
if (outptr != outbuf && err == 0) {
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf)
err = 2;
}

// completion and cleanup
if (err == 2 && outfilename)
perror(outfilename);

fclose(fpin);
if (outfilename) {
fclose(fpout);
if (err)
remove(outfilename);
}
return err;
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Tue May 28 23:08:22 2024

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

OK, I had go with your program. I used a random data file of exactly
100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging SSD
on my test machine and ALOT slower than SSD of David Brown.

My machine uses an SSD.

However the tests were run on Windows, so I ran your program again under
WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).

I then tried a simple program in my language, which took 10 seconds.

I looked more closely at yours, and saw you used a clever method of a
table of precalculated stringified numbers.

Using a similar table, plus more direct string handling, the fastest
timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
supposed to match your layout, but that turned out to be variable.)

Yes, I try to get line length almost fixed (77 to 80 characters) and
make no attempts to control number of entries per line.
Since you used random generator, a density advantage of my approach is smaller than in more typical situations, where 2-digit numbers are more common than 3-digit numbers.

Also, I think that random numbers are close to worst case for branch predictor / loop length predictor in my inner loop.
Were I thinking about random case upfront, I'd code an inner loop differently. I'd always copy 4 octets (comma would be stored in the same table). After that I would update outptr by length taken from
additional table, similarly, but not identically to your method below.

The difference in file sizes for N bytes will be a factor of 2:1 maximum
(all "1," or all "123," for example).

There exist files that have near-random distribution, e.g. anything
zipped or anything encrypted, but I would think that we rarely want
them embedded.

This hardcodes the input filename. 'readfile' is a function in my
library.

--------------------------------

[0:256]ichar numtable
[0:256]int numlengths

proc main=
ref byte data
[256]char str
const perline=21
int m, n, slen
byte bb
ichar s, p

for i in 0..255 do
numtable[i] := strdup(strint(i))
numlengths[i] := strlen(numtable[i])
od

data := readfile("/c/data100")

Reading whole file upfront is undoubtly faster than interleaving of
reads and writes. But by my set of unwritten rules that I imposed on
myself, it is cheating.

Why not? Isn't the whole point to have a practical tool which is faster
than xxd?

I never use buffered file input or request file data a character at a
time from a file system API; who knows how inefficient it might be.

I looked at your code again, and saw you're using fwrite to output each
line. If I adapt my 'readln' (which ends up calling 'printf') to use
fwrite too, then my timing reduces from 3.1 to 2.1 seconds.

Of that 2.1 seconds, the file-loading time is 0.03 seconds. If I switch
to using a fgetc loop (after determining the file size; it still loads
the whole file), the file-loading takes nearly 2 seconds. Overall the
timing becomes only a little faster than the gcc-compiled C code.

(My compiler doesn't have an equivalent optimiser, but this is mostly
about I/O and algorithm.)

My view is that my approach leads to a simpler program.

Maybe processing very large files that won't fit into memory might be a problem, eg. a 2GB binary. But remember that the output will be a text
file of 4-8GB, which for now has to be processed by a compiler which has
to build data structures in memory to represent that, taking even more
space.

So they would be unviable anyway.

A proper 'embed' feature would most likely have to load the entire
binary file into memory too.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Wed May 29 01:24:56 2024

On Tue, 28 May 2024 23:08:22 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

My machine uses an SSD.

SSDs are not created equal. Especially for writes.

However the tests were run on Windows, so I ran your program again
under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).

3 times slower ?!
I never tested it myself, but I heard that there is a significant
difference in file access speed between WSL's own file system and
mounted Windows directories. The difference under WSL is not as big
as under WSL2 where they say that access of mounted Windows filesystem
is very slow, but still significant.
I don't know if it applies to all file sizes or only to accessing many
small files.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Wed May 29 00:54:23 2024

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100

OK, I had go with your program. I used a random data file of exactly
100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging SSD
on my test machine and ALOT slower than SSD of David Brown.

David Brown's machines are always faster than anyone else's.

Your machine showed 1.3 seconds for 50MB, or 2.6 seconds for 100MB.

Unchanged, the fastest on my machine was 4.1 seconds for 100MB.

I've since tweaked your program to isolate the reading and writing
parts, also to make it load the whole file in one fread call, so that I
can compare overall times better.

One more thing I did was to write to a specific named file, not to stdout.

Overall, gcc-O2/O3 (they're the same) now has a best timing of 1.9
seconds. My language's version with my compiler has a best timing of 2.0 seconds.

(Unoptimised gcc, mcc, tcc give timings of around 2.7 seconds.

DMC (and old 32-bit compiler) is unoptimised, and 1.5 optimised.

lccwin32 takes over 5 seconds in either case.)

I suspect that your system just has a much faster fgetc implementation.
How long does an fgetc() loop over a 100MB input take on your machine?

On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL. Using
DMC, it's 0.65 seconds.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Wed May 29 01:29:00 2024

On 28/05/2024 22:45, Michael S wrote:

On Tue, 28 May 2024 23:23:15 +0300
Michael S <already5chosen@yahoo.com> wrote:

Also, I think that random numbers are close to worst case for branch
predictor / loop length predictor in my inner loop.
Were I thinking about random case upfront, I'd code an inner loop
differently. I'd always copy 4 octets (comma would be stored in the
same table). After that I would update outptr by length taken from
additional table, similarly, but not identically to your method below.

That's what I had in mind:

unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1]; //
bin2dec[MAX_CHAR_PER_NUM] => length for (int i = 0; i < 256;++i) {

Is this a comment that has wrapped?

After fixing a few such line breaks, this runs at 3.6 seconds compared
with 4.1 seconds for the original.

Although I don't quite understand the comments about branch prediction.

I think runtime is still primarily spent in I/O.

If I take the 1.9 second version, and remove the fwrite, then it runs in
0.8 seconds. 0.7 of that is generating the text (366MB's worth, a line
at a time).

In my language that part takes 0.9 seconds, which is a more typical
difference due to gcc's superior optimiser.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Wed May 29 09:21:09 2024

On Wed, 29 May 2024 01:29:00 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 22:45, Michael S wrote:

On Tue, 28 May 2024 23:23:15 +0300
Michael S <already5chosen@yahoo.com> wrote:

Also, I think that random numbers are close to worst case for
branch predictor / loop length predictor in my inner loop.
Were I thinking about random case upfront, I'd code an inner loop
differently. I'd always copy 4 octets (comma would be stored in the
same table). After that I would update outptr by length taken from
additional table, similarly, but not identically to your method
below.

That's what I had in mind:

unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1]; //
bin2dec[MAX_CHAR_PER_NUM] => length for (int i = 0; i < 256;++i)
{

Is this a comment that has wrapped?

After fixing a few such line breaks, this runs at 3.6 seconds
compared with 4.1 seconds for the original.

Although I don't quite understand the comments about branch
prediction.

I think runtime is still primarily spent in I/O.

That's undoubtedly correct.
But high branch mispredict rate still can add to total time.
Suppose we have branch misprediction at the end of inner loop in 40% of
the input bytes. On the processor that was running my original test
(Intel Haswell at 4GHz) each mispredict cost ~15 clocks = 3.75 ns.
3.75ns * 0.4 * 100M = 150 msec
I don't know how much it costs on your hardware since you didn't tell
me what it is.

But I am more intrigued by slowness on WSL.
Did you compare native vs mounted file systems?

If I take the 1.9 second version, and remove the fwrite, then it runs
in 0.8 seconds. 0.7 of that is generating the text (366MB's worth, a
line at a time).

In my language that part takes 0.9 seconds, which is a more typical difference due to gcc's superior optimiser.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Wed May 29 10:32:29 2024

On 29/05/2024 01:54, bart wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100

OK, I had go with your program. I used a random data file of exactly
100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging SSD
on my test machine and ALOT slower than SSD of David Brown.

David Brown's machines are always faster than anyone else's.

That seems /highly/ unlikely. Admittedly the machine I tested on is
fairly new - less than a year old. But it's a little NUC-style machine
at around the $1000 price range, with a laptop processor. The only
thing exciting about it is 64 GB ram (I like to run a lot of things at
the same time in different workspaces).

But I am better than some people at getting my machines to run programs efficiently. I don't use Windows for such things (I happily run Windows
on a different machine for other purposes), and I certainly don't use
layers of OS or filesystem emulation such as WSL and expect code to run
at maximal speed.

And as I said in an earlier post, I didn't have the files on any kind of
disk or SSD at all - they were all in a tmpfs filesystem to eliminate
that bottleneck.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Wed May 29 10:18:45 2024

On 28/05/2024 22:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 16:56, Michael S wrote:

On Tue, 28 May 2024 15:06:40 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 12:41, Michael S wrote:

On Sun, 26 May 2024 13:09:36 +0200
David Brown <david.brown@hesbynett.no> wrote:

I think you might be missing the point here.

I don't think so.
I understand your points and agree with just about everything. My
post was off topic, intentionally so.

If we talk about practicalities, the problems with xxd, if there are
problems at all, are not its speed, but the size of the text file
it produces (~6x the size of original binary) and its availability.
I don't know to which package it belongs in typical Linux or BSD
distributions, but at least on Windows/msys2 it is part of Vim -
rather big package for which, apart from xxd, I have no use at all.

OK, I had go with your program. I used a random data file of exactly
100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging SSD
on my test machine and ALOT slower than SSD of David Brown.

I didn't use an SSD - I used a tmpfs filesystem, so no disk at all.
There are still limits from memory bandwidth, of course.

I then tried a simple program in my language, which took 10 seconds.

I looked more closely at yours, and saw you used a clever method of a
table of precalculated stringified numbers.

Using a similar table, plus more direct string handling, the fastest
timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
supposed to match your layout, but that turned out to be variable.)

Yes, I try to get line length almost fixed (77 to 80 characters) and
make no attempts to control number of entries per line.
Since you used random generator, a density advantage of my approach is smaller than in more typical situations, where 2-digit numbers are more common than 3-digit numbers.

Also, I think that random numbers are close to worst case for branch predictor / loop length predictor in my inner loop.

That makes them a good test here.

Were I thinking about random case upfront, I'd code an inner loop differently. I'd always copy 4 octets (comma would be stored in the same table). After that I would update outptr by length taken from
additional table, similarly, but not identically to your method below.

There exist files that have near-random distribution, e.g. anything
zipped or anything encrypted, but I would think that we rarely want
them embedded.

I'd say these are actually quite common cases.

Both programs have a trailing comma on the last number, which may be
problematical, but also not hard to fix.

I don't see where (in C) it could be a problem. On the other hand, I can imagine situations where absence of trailing comma is inconvinient.
Now, if your language borrow its array initialization syntax from
Pascal then trailing commas are indeed undesirable.

For initialising arrays, an extra comma at the end is no issue for C.
But for more general use, #embed can also be used for things like
function call or macro parameters, and the extra comma is then a
problem. So your program is not a direct alternative to "xxd -i" or
#embed in those cases. (I don't think such uses would be common in
practice.)

I then tried xxd under WSL, and that took 28 seconds, real time, with
a much larger output (616KB instead of 366KB).

616 MB, I suppose.
Timing is very similar to my measurements. It is obvious that in case
of xxd, unlike in the rest of our cases, the bottleneck is in CPU rather
than in HD.

But it's using fixed
width columns of hex, complete with a '0x' prefix.

Below is that program but in my language. I tried transpiling to C,
hoping it might be even faster, but it got slower (4.5 seconds with
gcc-O3). I don't know why. It would need manual porting to C.

Why do you measure with gcc -O3 instead of more robust and more popular
-O2 ? Not that it matters in this particular case, but in general I
don't think that it is a good idea.

Bart always likes to use "gcc -O3", no matter how often people tell him
that it should not be assumed to give faster results than "gcc -O2". It
often results in a few extra percent of speed, but sometimes works
against speed (depending on things like cache sizes, branch prediction,
and other hard to predict factors). Alternatively, he uses gcc without
any optimisation and still thinks the speed results are relevant.

Reading whole file upfront is undoubtly faster than interleaving of
reads and writes. But by my set of unwritten rules that I imposed on
myself, it is cheating.

I would expect that for big files, mmap would be the most efficient
method. You might also want to use cache pre-loads of later parts of
the file as you work through it. But I don't know portable that would
be, and it would be more effort to write.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Wed May 29 12:44:43 2024

On Wed, 29 May 2024 09:21:09 +0300
Michael S <already5chosen@yahoo.com> wrote:

On Wed, 29 May 2024 01:29:00 +0100
bart <bc@freeuk.com> wrote:

I think runtime is still primarily spent in I/O.

That's undoubtedly correct.

:(
Two hours later it turned out to be completely incorrect. That is, the
time was spent in routine related to I/O, but in the 'soft' part of it
rather than in the I/O itself.
Hopefully, next time I'd remember to avoid the word 'undoubtedly'.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Wed May 29 13:08:18 2024

On Wed, 29 May 2024 10:32:29 +0200
David Brown <david.brown@hesbynett.no> wrote:

On 29/05/2024 01:54, bart wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

David Brown's machines are always faster than anyone else's.

That seems /highly/ unlikely. Admittedly the machine I tested on is
fairly new - less than a year old. But it's a little NUC-style
machine at around the $1000 price range, with a laptop processor.
The only thing exciting about it is 64 GB ram (I like to run a lot of
things at the same time in different workspaces).

Modern laptop processors with adequate cooling can be as fast as
desktop (and faster than server) for a task that uses only 1 or 2
cores. Especially when no heavy vector math involved. If the task runs
only for few seconds, like in our tests, then they CPU can be fast even
without good cooling.
And $1000 is not exactly low price for mini-PC without display. Last
time I bought one for my mother, it costed ~$650 including Win11 Home
Ed.

But I am better than some people at getting my machines to run
programs efficiently. I don't use Windows for such things (I happily
run Windows on a different machine for other purposes), and I
certainly don't use layers of OS or filesystem emulation such as WSL
and expect code to run at maximal speed.

WSL would not affect user-level CPU-bound part and even majority of kernel-level CPU-bound parts. It can slow down I/O, yes. But it turned
out (see my post above) that the bottleneck was in CPU.

And as I said in an earlier post, I didn't have the files on any kind
of disk or SSD at all - they were all in a tmpfs filesystem to
eliminate that bottleneck.

You should have said it yesterday.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Wed May 29 12:38:52 2024

On Wed, 29 May 2024 00:54:23 +0100
bart <bc@freeuk.com> wrote:

I suspect that your system just has a much faster fgetc
implementation. How long does an fgetc() loop over a 100MB input take
on your machine?

On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL.
Using DMC, it's 0.65 seconds.

Your suspicion proved incorrect, but it turned out to be pretty good
question!

#include <stdio.h>

static const char usage[] =
"fgetc_test - read file with fgetc() and calculate xor checksum\n"
"Usage:\n"
" fgetc_test infile\n"
;
int main(int argz, char** argv)
{
// process command line
if (argz < 2) {
fprintf(stderr, "%s", usage);
return 1;
}

char* infilename = argv[1];
static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
"--?" }; const int n_help_aliases =
sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
fprintf(stderr, "%s", usage);
return 0;
}
}

// open files
FILE* fpin = fopen(infilename, "rb");
if (!fpin) {
perror(infilename);
return 1;
}

size_t n = 0;
unsigned char cs = 0;
int c;
while ((c = fgetc(fpin)) >= 0) {
cs ^= (unsigned char)c;
++n;
}
if (ferror(fpin)) {
perror(infilename);
return 1;
}

printf("%zd byte. xor sum %d.\n", n, cs);
return 0;
}

$ time ../quick_xxd/getc_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m3.604s
user 0m0.000s
sys 0m0.000s

52 MB/s. Very very slow!

The same test with getc() instead of fgetc().

$ time ../quick_xxd/getc_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m3.588s
user 0m0.000s
sys 0m0.000s

54 MB/s. Almost the same as above.

So, may be, fgetc() is not at fault? May be, its OS and the crap that
the corporate IT adds on top of the OS?
Let's test this hipothesys.

#include <stdio.h>

static const char usage[] =
"fread_test - read file with fread() and calculate xor checksum\n"
"Usage:\n"
" fread_test infile\n"
;
int main(int argz, char** argv)
{
// process command line
if (argz < 2) {
fprintf(stderr, "%s", usage);
return 1;
}

char* infilename = argv[1];
static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
"--?" }; const int n_help_aliases =
sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
fprintf(stderr, "%s", usage);
return 0;
}
}

// open files
FILE* fpin = fopen(infilename, "rb");
if (!fpin) {
perror(infilename);
return 1;
}

size_t n = 0;
unsigned char cs = 0;
for (;;) {
enum { BUF_SZ = 128*1024 };
unsigned char inpbuf[BUF_SZ];
size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
n += len;
for (int i = 0; i < (int)len; ++i)
cs ^= inpbuf[i];
if (len != BUF_SZ)
break;
}
if (ferror(fpin)) {
perror(infilename);
return 1;
}

printf("%zd byte. xor sum %d.\n", n, cs);
return 0;
}

$ time ../quick_xxd/fread_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m0.312s
user 0m0.000s
sys 0m0.000s

$ time ../quick_xxd/fread_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m0.109s
user 0m0.000s
sys 0m0.000s

$ time ../quick_xxd/fread_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m0.094s
user 0m0.000s
sys 0m0.000s

So, at least for reading of multi-megabyte file the OS and corporate
crap are not holding me back. The first read is 620 MB/s - as expected
for SATA-3 SSD. Repeating reads are from OS cache - not as fast as on
Linux, but fast enough to not be a bottleneck in our xxd replacement
gear.

So, let's rewrite our tiny app with fread().

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static const char usage[] =
"bin_to_list - convert binary file to comma-delimited list of decimal numbers\n" "Usage:\n"
" bin_to_list infile [oufile]\n"
"When output file is not specified, the result is written to standard output.\n" ;
int main(int argz, char** argv)
{
// process command line
if (argz < 2) {
fprintf(stderr, "%s", usage);
return 1;
}

char* infilename = argv[1];
static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
"--?" }; const int n_help_aliases =
sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
fprintf(stderr, "%s", usage);
return 0;
}
}

// open files
FILE* fpin = fopen(infilename, "rb");
if (!fpin) {
perror(infilename);
return 1;
}

FILE* fpout = stdout;
char* outfilename = NULL;
if (argz > 2) {
outfilename = argv[2];
fpout = fopen(outfilename, "w");
if (!fpout) {
perror(outfilename);
fclose(fpin);
return 1;
}
}

enum { MAX_CHAR_PER_LINE = 80, MAX_CHAR_PER_NUM = 4,
ALMOST_FULL_THR = MAX_CHAR_PER_LINE-MAX_CHAR_PER_NUM };
// Initialize table
unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1];
// bin2dec[MAX_CHAR_PER_NUM] => length
for (int i = 0; i < 256;++i) {
char tmp[8];
int len = sprintf(tmp, "%d,", i);
memcpy(bin2dec[i], tmp, MAX_CHAR_PER_NUM);
bin2dec[i][MAX_CHAR_PER_NUM] = (unsigned char)len;
}

// main loop
int err = 0;
unsigned char outbuf[MAX_CHAR_PER_LINE+MAX_CHAR_PER_NUM];
// provide space for EOL
unsigned char* outptr = outbuf;
for (;;) {
enum { BUF_SZ = 128*1024 };
unsigned char inpbuf[BUF_SZ];
size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
for (int i = 0; i < (int)len; ++i) {
unsigned char* dec = bin2dec[inpbuf[i] & 255];
memcpy(outptr, dec, MAX_CHAR_PER_NUM);
outptr += dec[MAX_CHAR_PER_NUM];
if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf) {
err = 2;
break;
}
outptr = outbuf;
}
}
if (err || len != BUF_SZ)
break;
}
if (ferror(fpin)) {
perror(infilename);
err = 1;
}
// last line
if (outptr != outbuf && err == 0) {
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf)
err = 2;
}

// completion and cleanup
if (err == 2 && outfilename)
perror(outfilename);

fclose(fpin);
if (outfilename) {
fclose(fpout);
if (err)
remove(outfilename);
}
return err;
}

Now the test. Input file size: 88,200,192 bytes
$ time ../quick_xxd/bin_to_listmb
/d/intelFPGA/18.1/quartus/bin64/db_wys.dll uu.txt

real 0m0.577s
user 0m0.000s
sys 0m0.000s

152.8 MB/s. That's much better. Some people would even say that it is
good enough.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Wed May 29 14:10:14 2024

On 29/05/2024 12:08, Michael S wrote:

On Wed, 29 May 2024 10:32:29 +0200
David Brown <david.brown@hesbynett.no> wrote:

On 29/05/2024 01:54, bart wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

David Brown's machines are always faster than anyone else's.

That seems /highly/ unlikely. Admittedly the machine I tested on is
fairly new - less than a year old. But it's a little NUC-style
machine at around the $1000 price range, with a laptop processor.
The only thing exciting about it is 64 GB ram (I like to run a lot of
things at the same time in different workspaces).

Modern laptop processors with adequate cooling can be as fast as
desktop (and faster than server) for a task that uses only 1 or 2
cores. Especially when no heavy vector math involved. If the task runs
only for few seconds, like in our tests, then they CPU can be fast even without good cooling.

Sure, it is not a slow processor - but it is nothing extreme. Bart has regularly accused me of using top-range super fast computers when I've
given speed tests that are faster than he gets, but generally it's just
more efficient use of the computer.

And $1000 is not exactly low price for mini-PC without display. Last
time I bought one for my mother, it costed ~$650 including Win11 Home
Ed.

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD) don't
come free. (And I buy these bare-bones. Machines with Windows
"pre-installed" are often cheaper because they are sponsored by the
junk-ware and ad-ware forced on unsuspecting users.)

But it is not a dramatic speed-demon PC. I get faster results by using
Linux instead of Windows (or worse, WSL on Windows), and by using tmpfs
rather than a filesystem on a disk.

But I am better than some people at getting my machines to run
programs efficiently. I don't use Windows for such things (I happily
run Windows on a different machine for other purposes), and I
certainly don't use layers of OS or filesystem emulation such as WSL
and expect code to run at maximal speed.

WSL would not affect user-level CPU-bound part and even majority of kernel-level CPU-bound parts. It can slow down I/O, yes. But it turned
out (see my post above) that the bottleneck was in CPU.

OK.

I've seen odd things with timings due to Windows' relatively poor IO,
file and disk handling. Many years ago when I had need of speed-testing
some large windows-based build system, I found it was faster running in
a virtual windows machine on VirtualBox on a Linux host, than in native
Windows on the same hardware.

But usually the OS will not much affect code that is cpu or memory bound.

And as I said in an earlier post, I didn't have the files on any kind
of disk or SSD at all - they were all in a tmpfs filesystem to
eliminate that bottleneck.

You should have said it yesterday.

I did. I mentioned it in my post comparing the timings of xxd, your
program, and some extremely simple Python code giving the same outputs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Wed May 29 12:23:51 2024

On 29/05/2024 10:38, Michael S wrote:

On Wed, 29 May 2024 00:54:23 +0100
bart <bc@freeuk.com> wrote:

I suspect that your system just has a much faster fgetc
implementation. How long does an fgetc() loop over a 100MB input take
on your machine?

On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL.
Using DMC, it's 0.65 seconds.

Your suspicion proved incorrect, but it turned out to be pretty good question!

$ time ../quick_xxd/getc_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m3.604s
user 0m0.000s
sys 0m0.000s

52 MB/s. Very very slow!

I got these results for a 100MB input. All are optimised where possible:

mcc 1.9 seconds
gcc 1.9
tcc 1.95
lccwin32 0.7
DMC 0.7

The first three likely just use fgetc from msvcrt.dll. The other two
probably use their own libraries.

So, may be, fgetc() is not at fault? May be, its OS and the crap that
the corporate IT adds on top of the OS?

Let's test this hipothesys.

$ time ../quick_xxd/fread_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m0.094s
user 0m0.000s
sys 0m0.000s

I get these results:

mcc 0.25 seconds
gcc 0.25
tcc 0.35
lccwin32 0.35
DMC 0.3

All are repeated runs of the same file, so all timings likely used
cached version of the data file.

Most of my tests assume that since (1) I don't know how to to do a
'cold' load without restarting my machines; (2) in real applications
such as compilers the same files are repeatedly processed anyway, eg.
you're compiling the file you've just edited, or just downloaded, or
just copied...

So, let's rewrite our tiny app with fread().

real 0m0.577s
user 0m0.000s
sys 0m0.000s

152.8 MB/s. That's much better. Some people would even say that it is
good enough.

I now get:

mcc 2.3 seconds
gcc 1.6
tcc 2.3
lccwin32 2.9
DMC 2.9

You might remember that the last revised version of your test, compiled
with gcc, took 3.6 seconds, of which 2 seconds was reading the file a
byte at a time took 2 seconds.

By using a 128KB buffer, you get most of the benefits of reading the
whole file at once (it just lacks the simplicity). So nearly all of that
2 seconds is saved.

3.6 - 2.0 is 1.6, pretty much the timing here.

Two hours later it turned out to be completely incorrect. That is, the

time was spent in routine related to I/O, but in the 'soft' part of it
rather than in the I/O itself.

You don't count time spent within file-functions as I/O? To me 'I/O' is whatever happens the other side of those f* functions, including
whatever poor buffering strategies they could be using.

Because 'fgetc' could also have been implemented using a 128KB buffer
instead of 512 bytes or whatever it uses.

I discovered the poor qualities of fgetc many years ago and generally
avoid it; it seems you've only just realised its problems.

BTW I also tweaked the code in my own-language version of the benchmark.
(I also ported it to C, but that version got accidentally deleted). The
fastest timing of this test is now 1.65 seconds.

If I comment out the 'fwrite' call, the timing becomes 0.7 seconds, of
which 50ms is reading in the file, leaving 0.65 seconds.

So the I/O in this case accounts for 1.0 seconds of the 1.65 seconds
runtime, so when I said:

I think runtime is still primarily spent in I/O.

That was actually correct.

If I comment out the 'fwrite' calls in your program, the runtime reduces
to 0.2 seconds, so it is even more correct in that case. Or is 'fwrite'
a 'soft' I/O call too?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Wed May 29 15:27:54 2024

On Wed, 29 May 2024 14:10:14 +0200
David Brown <david.brown@hesbynett.no> wrote:

I did. I mentioned it in my post comparing the timings of xxd, your
program, and some extremely simple Python code giving the same
outputs.

Then both me and Bart didn't pay attention to this part of your
yesterday's post. Sorry.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Bonita Montero on Wed May 29 15:43:08 2024

On Wed, 29 May 2024 14:38:14 +0200
Bonita Montero <Bonita.Montero@gmail.com> wrote:

but I muliple times
struggled with ifstream and ofstream in terms of performance.

You earned it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Wed May 29 15:23:25 2024

On Wed, 29 May 2024 12:23:51 +0100
bart <bc@freeuk.com> wrote:

On 29/05/2024 10:38, Michael S wrote:

On Wed, 29 May 2024 00:54:23 +0100
bart <bc@freeuk.com> wrote:

I suspect that your system just has a much faster fgetc
implementation. How long does an fgetc() loop over a 100MB input
take on your machine?

On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL.
Using DMC, it's 0.65 seconds.

Your suspicion proved incorrect, but it turned out to be pretty good question!

$ time ../quick_xxd/getc_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m3.604s
user 0m0.000s
sys 0m0.000s

52 MB/s. Very very slow!

I got these results for a 100MB input. All are optimised where
possible:

mcc 1.9 seconds
gcc 1.9
tcc 1.95
lccwin32 0.7
DMC 0.7

The first three likely just use fgetc from msvcrt.dll. The other two
probably use their own libraries.

So, may be, fgetc() is not at fault? May be, its OS and the crap
that the corporate IT adds on top of the OS?

Let's test this hipothesys.

$ time ../quick_xxd/fread_test.exe uu.txt
193426754 byte. xor sum 1.

real 0m0.094s
user 0m0.000s
sys 0m0.000s

I get these results:

mcc 0.25 seconds
gcc 0.25
tcc 0.35
lccwin32 0.35
DMC 0.3

All are repeated runs of the same file, so all timings likely used
cached version of the data file.

Most of my tests assume that since (1) I don't know how to to do a
'cold' load without restarting my machines; (2) in real applications
such as compilers the same files are repeatedly processed anyway, eg.
you're compiling the file you've just edited, or just downloaded, or
just copied...

So, let's rewrite our tiny app with fread().

real 0m0.577s
user 0m0.000s
sys 0m0.000s

152.8 MB/s. That's much better. Some people would even say that it
is good enough.

I now get:

mcc 2.3 seconds
gcc 1.6
tcc 2.3
lccwin32 2.9
DMC 2.9

Mine was with MSVC from VS2019. gcc on msys2 (ucrt64 variant) should be identical.
I wonder why your results are so much slower than mine.
Slow write speed of SSD or slow CPU?

You might remember that the last revised version of your test,
compiled with gcc, took 3.6 seconds, of which 2 seconds was reading
the file a byte at a time took 2 seconds.

By using a 128KB buffer, you get most of the benefits of reading the
whole file at once

I hope so.

(it just lacks the simplicity).

The simplicity in your case is due to complexity of figuring out the
size of the file and of memory allocation and of handling potential
failure of memory allocation all hidden within run-time library of your language.
And even despite all the upfront work that went into your
infrastructure, it probably would not be able to deal with big files on
32-bit system.
Yes, I know, 32-bit compiler would not be able to compile the resulting
big include anyway. But still...

So nearly all of that 2 seconds is saved.

3.6 - 2.0 is 1.6, pretty much the timing here.

Two hours later it turned out to be completely incorrect. That is,
the

time was spent in routine related to I/O, but in the 'soft' part of it
rather than in the I/O itself.

You don't count time spent within file-functions as I/O? To me 'I/O'
is whatever happens the other side of those f* functions, including
whatever poor buffering strategies they could be using.

Because 'fgetc' could also have been implemented using a 128KB buffer
instead of 512 bytes or whatever it uses.

I discovered the poor qualities of fgetc many years ago and generally
avoid it; it seems you've only just realised its problems.

Octet-by-octet processing of big files is not how I earn butter to put
on my bread.
When I write this sort of utilities for real work, the size of the
input is not arbitrary. Typically I deal with small files (small
relatively to memory capacity of target machines, so 100 MB is still
considered small). So for real work more often than not I use the same
strategy as your did in the program in your language.

BTW I also tweaked the code in my own-language version of the
benchmark. (I also ported it to C, but that version got accidentally deleted). The fastest timing of this test is now 1.65 seconds.

If I comment out the 'fwrite' call, the timing becomes 0.7 seconds,
of which 50ms is reading in the file, leaving 0.65 seconds.

So the I/O in this case accounts for 1.0 seconds of the 1.65 seconds
runtime, so when I said:

I think runtime is still primarily spent in I/O.

That was actually correct.

That was incorrect for the previous variant of my program.
Quite likely, it was correct for the program in your language that was
loading a full file before processing.
Quite likely it is correct for my latest variant.
Pay attention how I remember to avoid categorical statements ;-)

If I comment out the 'fwrite' calls in your program, the runtime
reduces to 0.2 seconds, so it is even more correct in that case. Or
is 'fwrite' a 'soft' I/O call too?

I think not (on my test gear), but I no longer know.
However I suspect that on Linux with plenty of memory the program could
return before even a single byte of output data was sent to SSD.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Wed May 29 15:19:14 2024

On 29/05/2024 14:27, Michael S wrote:

On Wed, 29 May 2024 14:10:14 +0200
David Brown <david.brown@hesbynett.no> wrote:

I did. I mentioned it in my post comparing the timings of xxd, your
program, and some extremely simple Python code giving the same
outputs.

Then both me and Bart didn't pay attention to this part of your
yesterday's post. Sorry.

No problem. It was just a brief comment, along with things like the gcc version used.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Wed May 29 15:16:06 2024

On 29/05/2024 13:23, Michael S wrote:

On Wed, 29 May 2024 12:23:51 +0100
bart <bc@freeuk.com> wrote:

So, let's rewrite our tiny app with fread().

real 0m0.577s
user 0m0.000s
sys 0m0.000s

152.8 MB/s. That's much better. Some people would even say that it
is good enough.

I now get:

mcc 2.3 seconds
gcc 1.6
tcc 2.3
lccwin32 2.9
DMC 2.9

Mine was with MSVC from VS2019. gcc on msys2 (ucrt64 variant) should be identical.
I wonder why your results are so much slower than mine.
Slow write speed of SSD or slow CPU?

You'd need to isolate i/o from the data processing to determine that.

However, the fastest timing on my machine is 1.4 seconds to read 100MB
and write 360MB.

Your timing is 0.6 seconds to read 88MB and write, what, 300MB of text?

The difference is about 2:1, which is not that unusual given two
different processors, two kinds of storage device, two kinds of OS (?)
and two different compilers.

But remember that a day or two ago, your original program took over 4
seconds, and it now takes 1.6 seconds (some timings are 1.4 seconds, but
I think that's the C port of my code).

(BTW I guess that superimposing your own faster buffer is not consdered cheating any more!)

You might remember that the last revised version of your test,
compiled with gcc, took 3.6 seconds, of which 2 seconds was reading
the file a byte at a time took 2 seconds.

By using a 128KB buffer, you get most of the benefits of reading the
whole file at once

I hope so.

(it just lacks the simplicity).

The simplicity in your case is due to complexity of figuring out the
size of the file and of memory allocation and of handling potential
failure of memory allocation all hidden within run-time library of you > language.

Yes, that moves those details out of the way to keep the main body of
the code clean.

Your C code looks chaotic (sorry), and I had quite a few problems in understanding and trying to modify or refactor parts of it.

Below is the main body of my C code. Below that is the main body of your
latest program, not including the special handling for the last line,
that mine doesn't need.

------------------------------------------
while (n) {
m = n;
if (m > perline) m = perline;
n -= m;
p = str;

for (int i = 0; i < m; ++i) {
bb = *data++;
s = numtable[bb];
slen = numlengths[bb];

*p++ = *s;
if (slen > 1)
*p++ = *(s+1);
if (slen > 2)
*p++ = *(s+2);

*p++ = ',';
}
*p++ = '\n';

fwrite(str, 1, p-str, f);
}

------------------------------------------

for (;;) {
enum { BUF_SZ = 128*1024 };
unsigned char inpbuf[BUF_SZ];
size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
for (int i = 0; i < (int)len; ++i) {
unsigned char* dec = bin2dec[inpbuf[i] & 255];
memcpy(outptr, dec, MAX_CHAR_PER_NUM);
outptr += dec[MAX_CHAR_PER_NUM];
if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf) {
err = 2;
break;
}
outptr = outbuf;
}
}
if (err || len != BUF_SZ)
break;
}
------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to bart on Wed May 29 18:32:18 2024

On Wed, 29 May 2024 15:16:06 +0100
bart <bc@freeuk.com> wrote:

On 29/05/2024 13:23, Michael S wrote:

On Wed, 29 May 2024 12:23:51 +0100
bart <bc@freeuk.com> wrote:

So, let's rewrite our tiny app with fread().

real 0m0.577s
user 0m0.000s
sys 0m0.000s

152.8 MB/s. That's much better. Some people would even say that it
is good enough.

I now get:

mcc 2.3 seconds
gcc 1.6
tcc 2.3
lccwin32 2.9
DMC 2.9

Mine was with MSVC from VS2019. gcc on msys2 (ucrt64 variant)
should be identical.
I wonder why your results are so much slower than mine.
Slow write speed of SSD or slow CPU?

You'd need to isolate i/o from the data processing to determine that.

However, the fastest timing on my machine is 1.4 seconds to read
100MB and write 360MB.

Your timing is 0.6 seconds to read 88MB and write, what, 300MB of
text?

Much less. Only 193 MB. It seems, this DLL I was textualizing is stuffed
with small numbers. That explains big part of the difference.

I did another test with big 7z archive as an input:
Input size: 116255887
Output size: 425944020
$ time ../quick_xxd/bin_to_listmb /d/bin/tmp.7z uu.txt

real 0m1.170s
user 0m0.000s
sys 0m0.000s

Almost exactly 100 MB/s which is only 1.4-1.6 times faster than your measurements.

The difference is about 2:1, which is not that unusual given two
different processors, two kinds of storage device, two kinds of OS
(?) and two different compilers.

But remember that a day or two ago, your original program took over 4 seconds, and it now takes 1.6 seconds (some timings are 1.4 seconds,
but I think that's the C port of my code).

(BTW I guess that superimposing your own faster buffer is not
consdered cheating any more!)

No, moderately sized buffer with size that does not depend on the size
of the input or output is not a cheating. It's just additional work that
I wanted to avoid, but did not succeed.

I likely could have majority of the benefit with setvbuf() +
replacement of fgetc() by getc(). But by now it does not make sense to
go back to original variants.

You might remember that the last revised version of your test,
compiled with gcc, took 3.6 seconds, of which 2 seconds was reading
the file a byte at a time took 2 seconds.

By using a 128KB buffer, you get most of the benefits of reading
the whole file at once

I hope so.

(it just lacks the simplicity).

The simplicity in your case is due to complexity of figuring out the
size of the file and of memory allocation and of handling potential
failure of memory allocation all hidden within run-time library of
you > language.

Yes, that moves those details out of the way to keep the main body of
the code clean.

Your C code looks chaotic (sorry), and I had quite a few problems in understanding and trying to modify or refactor parts of it.

Below is the main body of my C code. Below that is the main body of
your latest program, not including the special handling for the last
line, that mine doesn't need.

------------------------------------------
while (n) {
m = n;
if (m > perline) m = perline;
n -= m;
p = str;

for (int i = 0; i < m; ++i) {
bb = *data++;
s = numtable[bb];
slen = numlengths[bb];

*p++ = *s;
if (slen > 1)
*p++ = *(s+1);
if (slen > 2)
*p++ = *(s+2);

*p++ = ',';
}
*p++ = '\n';

fwrite(str, 1, p-str, f);
}

------------------------------------------

for (;;) {
enum { BUF_SZ = 128*1024 };
unsigned char inpbuf[BUF_SZ];
size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
for (int i = 0; i < (int)len; ++i) {
unsigned char* dec = bin2dec[inpbuf[i] & 255];
memcpy(outptr, dec, MAX_CHAR_PER_NUM);
outptr += dec[MAX_CHAR_PER_NUM];
if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
*outptr++ = '\n';
ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
if (wrlen != outptr-outbuf) {
err = 2;
break;
}
outptr = outbuf;
}
}
if (err || len != BUF_SZ)
break;
}
------------------------------------------

Each to his own.
For me your code is unreadable, mostly due to very short names of
variables that give no hint of usage, absence of declarations (I'd
guess, you have them at the top of the function, for me it's no better
than not having them at all) and zero comments.

Besides, our snippets are not functionally identical. Yours don't handle
write failures. Practically, on "big" computer it's a reasonable choice, because real I/O problems are unlikely to be detected at fwrite. They
tend to manifest themselves much later. But on comp.lang.c we like to
pretend that life is simpler and more black&white than it is in reality.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Malcolm McLean on Wed May 29 19:27:51 2024

On 29/05/2024 18:27, Malcolm McLean wrote:

On 29/05/2024 13:10, David Brown wrote:

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
don't come free. (And I buy these bare-bones. Machines with Windows
"pre-installed" are often cheaper because they are sponsored by the
junk-ware and ad-ware forced on unsuspecting users.)

Yes, I got a job at Cambridge which didn't work out (Cantab dons, much
less tolerant people then their counterparts at another university, but that's another story). And I was given a brand new Windows machine, and
told that we had to use Linux. So I installed a Linux version which ran
on top of Windows. No good, I was told. Might cause problems with that "interesting" set up. So I had to scrub a brand new version of Windows.
It felt like the most extravagant waste.

While I prefer to get machines without an OS, I can't see any issue with
wiping existing Windows - I've done that countless times. When you buy
a machine with Windows "pre-installed", no one has paid more than a
handful of dollars/pounds/kroner for it. Indeed, the crap-ware vendors
might have sponsored it, and I don't mind if their money was wasted.

Even for a Windows machine, I often prefer to install Windows myself
from scratch. The exception is for laptops where it is often a pain to
get all the drivers like.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Wed May 29 18:41:25 2024

On 29/05/2024 16:32, Michael S wrote:

On Wed, 29 May 2024 15:16:06 +0100
bart <bc@freeuk.com> wrote:

Your timing is 0.6 seconds to read 88MB and write, what, 300MB of
text?

Much less. Only 193 MB. It seems, this DLL I was textualizing is stuffed
with small numbers. That explains big part of the difference.

I did another test with big 7z archive as an input:
Input size: 116255887
Output size: 425944020
$ time ../quick_xxd/bin_to_listmb /d/bin/tmp.7z uu.txt

real 0m1.170s
user 0m0.000s
sys 0m0.000s

Almost exactly 100 MB/s which is only 1.4-1.6 times faster than your measurements.

Actually, the fastest timing I've got was 1.25 seconds (100MB input,
360MB output), but that was from my C version compiled with DMC (a
32-bit compiler). gcc was a bit slower.

Each to his own.
For me your code is unreadable, mostly due to very short names of
variables that give no hint of usage, absence of declarations (I'd
guess, you have them at the top of the function, for me it's no better
than not having them at all) and zero comments.

Besides, our snippets are not functionally identical. Yours don't handle write failures. Practically, on "big" computer it's a reasonable choice, because real I/O problems are unlikely to be detected at fwrite. They
tend to manifest themselves much later. But on comp.lang.c we like to
pretend that life is simpler and more black&white than it is in reality.

Below is a version with no declarations at all. It is in a dynamic
scripting language.

It runs in 7.3 seconds (or 6.4 seconds if newlines are dispensed with).

It reads the input as a byte array, and assembles the output as a single string. The 'readbinarray' function conceivably be replaced by one based
around 'mmap'.

------------------------------------
numtable ::= (0:)
for i in 0..255 do
numtable &:= tostr(i)+","
od

s::=""
k:=0
for bb in readbinfile("data100") do
s +:= numtable[bb]

if ++k = 21 then
s +:= '\n'
k := 0
fi
od

writestrfile("data.txt", s)
------------------------------------

An advantage of higher-level code is being able to trivually do stuff
like this (output data in reverse order):

for bb in reverse(readbinfile("data100")) do

While functions like 'writestrfile' could also check that the resulting
file size is the same length as the string being written.

Still looks complicated? Here's a one-line version:

writestrfile("data.txt", tostr(readbinfile("data100")))

However, the output looks like this:

(38, 111, ... 197)

This is acceptable syntax for my languages, but C would require braces:

writestrfile("data.txt", "{" +
tostr(readbinfile("/c/data100"))[2..$-1] + "}")

It is convenient to deal with data in whole-file blobs. There are huge
memory capacities available now; why not take advantage?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to Malcolm McLean on Wed May 29 14:07:00 2024

On 29/05/2024 18:27, Malcolm McLean wrote:

On 29/05/2024 13:10, David Brown wrote:

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
don't come free. (And I buy these bare-bones. Machines with Windows
"pre-installed" are often cheaper because they are sponsored by the
junk-ware and ad-ware forced on unsuspecting users.)

Yes, I got a job at Cambridge which didn't work out (Cantab dons, much
less tolerant people then their counterparts at another university, but that's another story). And I was given a brand new Windows machine, and
told that we had to use Linux. So I installed a Linux version which ran
on top of Windows. No good, I was told. Might cause problems with that "interesting" set up. ...

They're quite right in that regard, as I can testify from personal
experience.

... So I had to scrub a brand new version of Windows.
It felt like the most extravagant waste.

Keep in mind that, as David pointed out, the "waste" was probably
negative. You got a better price on the machine than you would have
otherwise, and erasing that malware gave you more space to put useful
stuff on your machine.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to James Kuyper on Wed May 29 22:59:05 2024

On Wed, 29 May 2024 14:07:00 -0400
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

On 29/05/2024 18:27, Malcolm McLean wrote:

On 29/05/2024 13:10, David Brown wrote:

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
don't come free.� (And I buy these bare-bones.� Machines with
Windows "pre-installed" are often cheaper because they are
sponsored by the junk-ware and ad-ware forced on unsuspecting
users.)

Yes, I got a job at Cambridge which didn't work out (Cantab dons,
much less tolerant people then their counterparts at another
university, but that's another story). And I was given a brand new
Windows machine, and told that we had to use Linux. So I installed
a Linux version which ran on top of Windows. No good, I was told.
Might cause problems with that "interesting" set up. ...

They're quite right in that regard, as I can testify from personal experience.

... So I had to scrub a brand new version of Windows.
It felt like the most extravagant waste.

Keep in mind that, as David pointed out, the "waste" was probably
negative. You got a better price on the machine than you would have otherwise, and erasing that malware gave you more space to put useful
stuff on your machine.

May be, for laptps that is true. But for mini-PCs it is very different.
Windows is surprisingly expensive in this case. OEM license is sold for
~75% of retail license price.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to bart on Wed May 29 21:31:54 2024

On 29/05/2024 18:41, bart wrote:

Below is a version with no declarations at all. It is in a dynamic
scripting language.

It runs in 7.3 seconds (or 6.4 seconds if newlines are dispensed with).

That is slower that native code solutions, but it is still faster than xxd!

My scripting language doesn't normally run under Linux (to make the
comparison with xxd fairer), but I managed to make it work for this test:

C:\qx52>mc -c -linux qc
M6 Compiling qc.m to qc.c

C:\qx52>wsl
root@DESKTOP-11:/mnt/c/qx52# gcc -O3 qc.c -o qc -lm -ldl -fno-builtin
root@DESKTOP-11:/mnt/c/qx52# time ./qc -nosys fred

real 0m10.562s
user 0m9.198s
sys 0m0.681s
root@DESKTOP-11:/mnt/c/qx52#

I start off under Windows, run a transpiler on a special version of it
that is transpilable, then compile that C file it under WSL. (The -nosys
option leaves out standard libs using WinAPI.)

It takes 10.5 seconds; not as fast as on Windows, but there I'm using an accelerator which needs inline assembly, which I can't turn into C.

Here is how xxd fared on a slightly faster second run:

root@DESKTOP-11:/mnt/c/qx52# time xxd -i <data100 >xxd.txt
real 0m33.066s

Conclusion: beating xxd is apparently not hard if even a scripting
language can do so. I wonder what slows it down? The output format is
more bloated, but that can only be part of it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to David Brown on Wed May 29 22:08:31 2024

On 28/05/2024 16:34, David Brown wrote:

On 28/05/2024 13:41, Michael S wrote:

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

I used 100 MB of random data:

dd if=/dev/urandom bs=1M count=100 of=100MB

I compiled your code with "gcc-11 -O2 -march=native".

I ran everything in a tmpfs filesystem, completely in ram.

xxd took 5.4 seconds - that's the baseline.

Your simple C code took 4.35 seconds. Your second program took 0.9
seconds - a big improvement.

One line of Python code took 8 seconds :

print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

That one took 90 seconds on my machine (CPython 3.11).

A slightly nicer Python program took 14.3 seconds :

import sys
bs = open(sys.argv[1], "rb").read()
xs = "".join([" 0x%02x," % b for b in bs])
ln = len(xs)
print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

This one was 104 seconds (128 seconds with PyPy).

This can't be blamed on the slowness of my storage devices, or moans
about Windows, because I know that amount of data (the output is 65%
bigger because of using hex format) could be processed in a couple of a
seconds using a fast native code program.

It's just Python being Python.

(I have had reason to include a 0.5 MB file in a statically linked
single binary - I'm not sure when you'd need very fast handling of multi-megabyte embeds.)

I have played with generating custom executable formats (they can be
portable between OSes, and I believe less visible to AV software), but
they require a normal small executable to launch them and fix them up.

To give the illusion of a conventional single executable, the program
needs to be part of that stub file.

There are a few ways of doing it, like simply concatenating the files,
but extracting is slightly awkward. Embedding as data is one way.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Malcolm McLean on Thu May 30 01:18:01 2024

On 29/05/2024 22:46, Malcolm McLean wrote:

On 29/05/2024 20:59, Michael S wrote:

On Wed, 29 May 2024 14:07:00 -0400
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

On 29/05/2024 18:27, Malcolm McLean wrote:

On 29/05/2024 13:10, David Brown wrote:

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
don't come free. (And I buy these bare-bones. Machines with
Windows "pre-installed" are often cheaper because they are
sponsored by the junk-ware and ad-ware forced on unsuspecting
users.)

Yes, I got a job at Cambridge which didn't work out (Cantab dons,
much less tolerant people then their counterparts at another
university, but that's another story). And I was given a brand new
Windows machine, and told that we had to use Linux. So I installed
a Linux version which ran on top of Windows. No good, I was told.
Might cause problems with that "interesting" set up. ...

They're quite right in that regard, as I can testify from personal
experience.

... So I had to scrub a brand new version of Windows.
It felt like the most extravagant waste.

Keep in mind that, as David pointed out, the "waste" was probably
negative. You got a better price on the machine than you would have
otherwise, and erasing that malware gave you more space to put useful
stuff on your machine.

May be, for laptps that is true. But for mini-PCs it is very different.
Windows is surprisingly expensive in this case. OEM license is sold for
~75% of retail license price.

Exactly. Windows costs a fortune.

Actually I've no idea how much it costs.

But whatever it is, I'm not adverse to the idea of having to pay for
software. After all you have to pay for hardware, and for computers, I
would happily pay extra to have something that works out of the box.

And Microsoft spend billions
developing it.

Baby X can't compete.

Huh? I didn't know Baby X was an OS!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Thu May 30 02:35:06 2024

On Wed, 29 May 2024 01:24:56 +0300, Michael S wrote:

I never tested it myself, but I heard that there is a significant
difference in file access speed between WSL's own file system and
mounted Windows directories.

WSL (both 1 and 2) is just a band-aid. Give up and use native Linux
already.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to bart on Thu May 30 00:06:40 2024

On 5/29/2024 8:18 PM, bart wrote:

On 29/05/2024 22:46, Malcolm McLean wrote:

On 29/05/2024 20:59, Michael S wrote:

On Wed, 29 May 2024 14:07:00 -0400
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

On 29/05/2024 18:27, Malcolm McLean wrote:

On 29/05/2024 13:10, David Brown wrote:

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
don't come free. (And I buy these bare-bones. Machines with
Windows "pre-installed" are often cheaper because they are
sponsored by the junk-ware and ad-ware forced on unsuspecting
users.)

Yes, I got a job at Cambridge which didn't work out (Cantab dons,
much less tolerant people then their counterparts at another
university, but that's another story). And I was given a brand new
Windows machine, and told that we had to use Linux. So I installed
a Linux version which ran on top of Windows. No good, I was told.
Might cause problems with that "interesting" set up. ...

They're quite right in that regard, as I can testify from personal
experience.

... So I had to scrub a brand new version of Windows.
It felt like the most extravagant waste.

Keep in mind that, as David pointed out, the "waste" was probably
negative. You got a better price on the machine than you would have
otherwise, and erasing that malware gave you more space to put useful
stuff on your machine.

May be, for laptps that is true. But for mini-PCs it is very different.
Windows is surprisingly expensive in this case. OEM license is sold for
~75% of retail license price.

Exactly. Windows costs a fortune.

Actually I've no idea how much it costs.

But whatever it is, I'm not adverse to the idea of having to pay for software. After all you have to pay for hardware, and for computers, I would happily pay extra to have something that works out of the box.

And Microsoft spend billions developing it.

Baby X can't compete.

Huh? I didn't know Baby X was an OS!

Windows costs anywhere from say $20 to $150.

When the price gets down to $5, we get "suspicious".

People on USENET who have bought the cheap licenses,
for the most part are not reporting problems, so somehow,
these keys are legit. And they're not "manufactured" keys,
they seem to be normal keys. But with no "readout mechanism"
for users to use, we don't really know where they came from.
They could be Enterprise machine COA keys (Enterprise setups
don't tend to use the key the machine came with), they could
be MSDN subscription keys chopped up, and so on.
Some sort of weird source. They are likely to be OEM and
not transferable to another machine.

Retail keys can be moved, such as if a machine dies, you
can move the key to a replacement machine. But then you'll pay
a high price for that.

The two copies of Windows 8 I bought, were $40 each direct
from Microsoft (introductory offer). I don't think Win10 and
Win11 had an intro offer, nor the 3-pak of the family pack edition.

If you want the machine to have a key, you can do it, but
you don't need to pay full price for it.

The $20 keys are not located at your local computer store,
they come from sketchy online sellers. Part of the fun is
not knowing what is going to happen.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Michael S on Thu May 30 00:40:07 2024

On 5/28/2024 6:24 PM, Michael S wrote:

On Tue, 28 May 2024 23:08:22 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

My machine uses an SSD.

SSDs are not created equal. Especially for writes.

However the tests were run on Windows, so I ran your program again
under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).

3 times slower ?!
I never tested it myself, but I heard that there is a significant
difference in file access speed between WSL's own file system and
mounted Windows directories. The difference under WSL is not as big
as under WSL2 where they say that access of mounted Windows filesystem
is very slow, but still significant.
I don't know if it applies to all file sizes or only to accessing many
small files.

WSL uses containers, so of course it is slow. Even if this was sitting
on an NVMe SSD, the software stack is going to extract a penalty. If
WSL was talking to /mnt/c then that will have a different speed than /home/username within WSL.

C:\Users\username\AppData\Local\Packages\
CanonicalGroupLimited.Ubuntu20.04LTS_79rhkp1fndgsc\localstate\ext4.vhdx 6,631,194,624 bytes

If you right-click the file, the "Mount" option in the context menu
should not work, because... it is ext4. You'd need a Dokan or equivalent (FUSE-like) file system support in Windows, to access it that way.

And you can mount the container and have it viewable within File Explorer, preferably with the "wsl --shutdown" first. If you want to look around in
your slash, you should be able to. The WSL team figured out some way
to do this anyway. This is something you can test when you're bored
and not in the middle of something. Yes, I tried it. But I don't
mess with stuff like this when I'm busy. Enter this in the File Explorer box.

\\wsl$ # Access WSL from Windows

To find that file, rather than just search for it, I used SequoiaView,
and that happens to be the biggest file on my C: drive. I keep my
virtual machines on a separate partition. It's easy to spot that
container from 60,000 feet.

On one OS, VirtualBox I/O is 150MB/sec, on another (presumably via paravirtualization), the rate is 600MB/sec and a bit "wobbly". And
really, for anyone complaining about this, you did not really enjoy
VMs in the old days. VMs were so slow, that on graphics, you could
see (and count) individual pixels being updated in a horizontal
line across the screen. It used to be pure molasses. Today, it's
actually usable.

Virtual machines can have passthru storage, and that is a form
of direct access. The very first time I tried that (Connectix VirtualPC),
I was unaware it had a 137GB limit and I connected a 200GB partition,
and... the file system got corrupted and destroyed. Such was my
introduction to passthru. I was an instant fan of the idea. I don't
happen to know if any Hosting software offers that now or not.

Containers support relatively large disks. I think the testing I did
recently, I got jammed up at around 500TB for a virtual disk. It did
not go as far as Wikipedia claimed. Now, you don't actually "use" the
space, it's purely used for testing that software does not explode
when it sees weird devices. For example, if you put NTFS on that,
it uses extra-large clusters and does not happen to mention it to you.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Thu May 30 10:01:42 2024

On 29/05/2024 21:59, Michael S wrote:

On Wed, 29 May 2024 14:07:00 -0400
James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

On 29/05/2024 18:27, Malcolm McLean wrote:

On 29/05/2024 13:10, David Brown wrote:

It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
don't come free. (And I buy these bare-bones. Machines with
Windows "pre-installed" are often cheaper because they are
sponsored by the junk-ware and ad-ware forced on unsuspecting
users.)

Yes, I got a job at Cambridge which didn't work out (Cantab dons,
much less tolerant people then their counterparts at another
university, but that's another story). And I was given a brand new
Windows machine, and told that we had to use Linux. So I installed
a Linux version which ran on top of Windows. No good, I was told.
Might cause problems with that "interesting" set up. ...

They're quite right in that regard, as I can testify from personal
experience.

... So I had to scrub a brand new version of Windows.
It felt like the most extravagant waste.

Keep in mind that, as David pointed out, the "waste" was probably
negative. You got a better price on the machine than you would have
otherwise, and erasing that malware gave you more space to put useful
stuff on your machine.

May be, for laptps that is true. But for mini-PCs it is very different. Windows is surprisingly expensive in this case. OEM license is sold for
~75% of retail license price.

The cheapest mini PC's from our main supplier all come with Windows
(excluding a few "thin client" type systems - I am thinking of Intel NUC
class of systems). In fact, I think /all/ the pre-built ones have
Windows. But if you buy bare-bones - no RAM or SSD - there is no OS.
This seems to be extremely common, right from the manufacturers.

And no, companies like Intel or ASUS don't pay anything close to 75% of
the retail price for the Windows license they install. I took a quick
check on our supplier's site - the cheapest ASUS Mini PC with Windows 11
Pro is only 20% more than a stand-alone Windows 11 Pro license. And
these ASUS machines don't actually come with much sponsored nonsense
software, IME.

I prefer to buy these machines bare-bones because they never have the
memory or SSD sizes that I want, so I have to replace these anyway.

I'd buy laptops bare-bones too (unless I needed Windows on it), but few
places sell bare-bones laptops. Actually, it's probably 15-20 years
since I last bought a laptop, so maybe my opinions there are not very
relevant! (I typically use hand-me-downs from sales folk - when their
machines get too slow from clogged up Windows, they buy new ones. I
upgrade the memory, wipe the disk and install Linux, and the machine is
better than it was when new.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Paul on Thu May 30 10:40:09 2024

On Thu, 30 May 2024 00:40:07 -0400
Paul <nospam@needed.invalid> wrote:

On 5/28/2024 6:24 PM, Michael S wrote:

On Tue, 28 May 2024 23:08:22 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

My machine uses an SSD.

SSDs are not created equal. Especially for writes.

However the tests were run on Windows, so I ran your program again
under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).

3 times slower ?!
I never tested it myself, but I heard that there is a significant difference in file access speed between WSL's own file system and
mounted Windows directories. The difference under WSL is not as big
as under WSL2 where they say that access of mounted Windows
filesystem is very slow, but still significant.
I don't know if it applies to all file sizes or only to accessing
many small files.

WSL uses containers, <snip>

It seems, you are discussing a speed of access and methods of access
from the host side. My question is opposite - is access from Linux
guest to Windows host files running at the same speed as Linux (WSL,
not WSL2) guest to its own file system?
I heard that it isn't, but it was not conclusive and with insufficient
details. I am going to test our specific case of big files. Now.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Thu May 30 11:33:30 2024

On Thu, 30 May 2024 10:01:42 +0200
David Brown <david.brown@hesbynett.no> wrote:

And no, companies like Intel or ASUS don't pay anything close to 75%
of the retail price for the Windows license they install.

I don't know how much Intel or ASUS pays. I don't care about it.
What I do know and care about that for me, as a buyer, Intel or ASUS (I actually like Gigabyte Brix better, but recently they become too
expensive) mini-PC with Win11 Home will cost $140 more than exactly the
same box without Windows.
That's if bought it in big or medium store.
In little 1-2-men shop I can get legal Windows license on similar box
for, may be, $50. But I don't know if it will be a round 11 months
later if something breaks.
Pay attention that even in little shop mini-PC with Windows on it will
cost me more than the same box without OS. I didn't try it, but would
guess that [in a little shop] box with Linux preinstalled would cost me
~$25 above box without OS, i.e. still cheaper than with Windows.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Michael S on Thu May 30 12:13:59 2024

On 30/05/2024 09:33, Michael S wrote:

On Thu, 30 May 2024 10:01:42 +0200
David Brown <david.brown@hesbynett.no> wrote:

And no, companies like Intel or ASUS don't pay anything close to 75%
of the retail price for the Windows license they install.

I don't know how much Intel or ASUS pays. I don't care about it.
What I do know and care about that for me, as a buyer, Intel or ASUS (I actually like Gigabyte Brix better, but recently they become too
expensive) mini-PC with Win11 Home will cost $140 more than exactly the
same box without Windows.
That's if bought it in big or medium store.
In little 1-2-men shop I can get legal Windows license on similar box
for, may be, $50. But I don't know if it will be a round 11 months
later if something breaks.
Pay attention that even in little shop mini-PC with Windows on it will
cost me more than the same box without OS. I didn't try it, but would
guess that [in a little shop] box with Linux preinstalled would cost me
~$25 above box without OS, i.e. still cheaper than with Windows.

40 years ago, my company made 8-bit business computers (my job was
designing the boards that went into them).

Adjusted for inflation, a floppy-based machine cost £4000, and one with
a 10MB HDD cost £9400.

They came with our own clone of CP/M, to avoid paying licence fees for it.

Compared to that, the cost of hardware now with a 4-6 magnitude higher
spec is peanuts, even with a premium for a pre-installed OS.

But suppose a high-spec machine now cost £1000; for someone using it
daily in their job, who might be paid a salary of £50-£100K or more, it
is again peanuts by comparison. Just their car to drive to work could
cost 20 times as much.

One tankful of fuel might cost the same as one Windows licence!

I'm astonished that professionals here are quibbling over the minor
extra margins needed to cover the cost of an important piece of software.

I guess the demand for a machine+Windows is high enough to get lower
volume pricing, while machine-only or machine+Linux is more niche?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Michael S on Thu May 30 14:41:54 2024

On Wed, 29 May 2024 01:24:56 +0300
Michael S <already5chosen@yahoo.com> wrote:

On Tue, 28 May 2024 23:08:22 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

My machine uses an SSD.

SSDs are not created equal. Especially for writes.

However the tests were run on Windows, so I ran your program again
under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).

3 times slower ?!
I never tested it myself, but I heard that there is a significant
difference in file access speed between WSL's own file system and
mounted Windows directories. The difference under WSL is not as big
as under WSL2 where they say that access of mounted Windows filesystem
is very slow, but still significant.
I don't know if it applies to all file sizes or only to accessing many
small files.

I tested it under WSL (not WSL2 !).
Host: Windows Server 2019
Guest: Debian boolworm.
uname -r
4.4.0-17763-Microsoft

I see now slowness at all. In fact getc/fgetc are the same speed and 3
times faster than on Windows with MSVC compiler on the same computer.
fread test is ~30% faster.
Full bin_to_list (latest variant) is 10-25% faster than Windows, but
both are very fast and results are not very stable so precise comparison
is hard.
Access to mounted Windows files via /mnt/d/... is very fast - read
speed is approximately the same as "guest's native" ext4-in-container;
write speed is up to 20% faster than ext4.
All that, of course, bulk read and write speed on huge files. It is
possible that for small files the table is turned.

For the record: On WSL 'xxd -i' took 13.6 seconds to process 151 MB
input file (7z archive) and produce 992 MB output text (both input and
output on /mnt/d). That's much faster than 39 seconds under msys2 on the
same computer, but ~10 times slower than my latest variant.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Thu May 30 14:14:21 2024

On 30/05/2024 13:13, bart wrote:

On 30/05/2024 09:33, Michael S wrote:

On Thu, 30 May 2024 10:01:42 +0200
David Brown <david.brown@hesbynett.no> wrote:

And no, companies like Intel or ASUS don't pay anything close to 75%
of the retail price for the Windows license they install.

I don't know how much Intel or ASUS pays. I don't care about it.
What I do know and care about that for me, as a buyer, Intel or ASUS (I
actually like Gigabyte Brix better, but recently they become too
expensive) mini-PC with Win11 Home will cost $140 more than exactly the
same box without Windows.

That runs contrary to everything I have ever seen.

Most retailers here (in Norway, and I believe most of Europe) don't
offer a choice of systems without an OS, unless it is a custom build.
Since the majority of purchasers want Windows, and big manufacturers pay peanuts for it, no-OS versions are an extra inventory cost.

If you are getting a custom-built machine, or from a small manufacturer,
then getting one with an OS will cost you more - but that is primarily
for the service of installing and configuring it, or at least putting
together the driver packages, not the license cost for Windows. And
smaller houses will pay more per license each license than big ones.
For big manufacturers, that cost is amortized over a very large number
of systems and "installation" is done by massive disk duplication
systems before the drives are installed.

That's if bought it in big or medium store.
In little 1-2-men shop I can get legal Windows license on similar box
for, may be, $50. But I don't know if it will be a round 11 months
later if something breaks.
Pay attention that even in little shop mini-PC with Windows on it will
cost me more than the same box without OS. I didn't try it, but would
guess that [in a little shop] box with Linux preinstalled would cost me
~$25 above box without OS, i.e. still cheaper than with Windows.

40 years ago, my company made 8-bit business computers (my job was
designing the boards that went into them).

Adjusted for inflation, a floppy-based machine cost £4000, and one with
a 10MB HDD cost £9400.

They came with our own clone of CP/M, to avoid paying licence fees for it.

Compared to that, the cost of hardware now with a 4-6 magnitude higher
spec is peanuts, even with a premium for a pre-installed OS.

But suppose a high-spec machine now cost £1000; for someone using it
daily in their job, who might be paid a salary of £50-£100K or more, it
is again peanuts by comparison. Just their car to drive to work could
cost 20 times as much.

One tankful of fuel might cost the same as one Windows licence!

I'm astonished that professionals here are quibbling over the minor
extra margins needed to cover the cost of an important piece of software.

For my part, I am not complaining about it - I am just discussing it.
However, I am against paying for a Windows license that I don't use as a
matter of principle, the cost involved is irrelevant. (And I'm fine
with paying for a Windows license that I /do/ use.)

I guess the demand for a machine+Windows is high enough to get lower
volume pricing, while machine-only or machine+Linux is more niche?

Standard PC's and laptops are very low-margin products. Neither the manufacturer nor the shop makes a significant profit from selling you
the machine itself. They make the profit from selling extras - a new
cable for your monitor, a carry-case for the laptop, or an "extended
warranty". And the main goal is to persuade you to buy software. The
profit margin on a PC is a few percent at most, while the profit margin
for a license of MS Office or Norton Security is perhaps 90%. No one
wants to sell computers with Linux unless they are doing so as part of a service agreement to a company - there's no scope for profit for a shop
selling a Linux machine with no extra software, and where you won't even
come back and pay them to clear out malware.

(This is also not a complaint or condemnation - you can't really expect
people to make or sell things that have low expected profit returns.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From bart@21:1/5 to Malcolm McLean on Thu May 30 12:23:38 2024

On 30/05/2024 02:31, Malcolm McLean wrote:

On 30/05/2024 01:18, bart wrote:

On 29/05/2024 22:46, Malcolm McLean wrote:

Baby X can't compete.

Huh? I didn't know Baby X was an OS!

Its an API. You call the Baby X API to get buttons and menus and other graphical elements, instead of Windows APIs. And it has just got its own
file system.

Hardly anybody uses the WinAPI directly.

Everyone uses wrapper libraries, usually cross-platform. Ones like GTK,
SDL, Raylib, maybe even OpenGL.

Those are your competitors, although I'm not sure which one corresponds
more closely with what BBX does (which I thought was also cross-platform?)

I'm not sure what you mean by having its own file system. But I don't
use the file system calls of WinAPI either; I use the C standard library.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Thu May 30 13:31:18 2024

On 30/05/2024 02:18, bart wrote:

On 29/05/2024 22:46, Malcolm McLean wrote:

Exactly. Windows costs a fortune.

Actually I've no idea how much it costs.

The retail version is too much for a cheap machine, but a minor part of
the cost of a more serious computer. The server versions and things
like MSSQL server are ridiculous prices - for many setups, they cost
more than the hardware, and that's before you consider the client access licenses.

But whatever it is, I'm not adverse to the idea of having to pay for software. After all you have to pay for hardware, and for computers, I
would happily pay extra to have something that works out of the box.

I have nothing against paying for software either. I mainly use Linux
because it is better, not because it is free - that's just an added convenience. I have bought a number of Windows retail licenses over the decades, to use with machines I put together myself rather than OEM installations.

I'm not so sure about "works out of the box", however. On most systems
with so-called "pre-installed" Windows, it takes hours for the
installation to complete, and you need to answer questions or click
things along the way so you can't just leave it to itself. And if the manufacturer has taken sponsorship from ad-ware and crap-ware vendors,
it takes more hours to install, and then you have hours of work to
uninstall the junk.

Installing Windows from a retail version DVD or USB stick is usually
faster than getting "pre-installed" Windows up and running. The only
problem is drivers, though Windows is a bit better than it used to be.
As long as you have a fairly common Ethernet interface then there is a reasonable chance that Windows has a driver that will work, and then can
get drivers for the other bits and pieces. Laptops, OTOH, can be a real
PITA for installation of retail Windows.

IME installing Linux is faster and simpler than installing Windows on
almost any hardware. The only drivers that have been an issue for me
for decades is for very new Wireless interfaces.

So - agreeing with your logic - I'd be willing to pay for Linux rather
than using free Windows. (But I'm even happier that, for most of my
use, I don't have to pay for Linux.)

And Microsoft spend billions developing it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to bart on Thu May 30 15:05:17 2024

On 29/05/2024 23:08, bart wrote:

On 28/05/2024 16:34, David Brown wrote:

On 28/05/2024 13:41, Michael S wrote:

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

I used 100 MB of random data:

dd if=/dev/urandom bs=1M count=100 of=100MB

I compiled your code with "gcc-11 -O2 -march=native".

I ran everything in a tmpfs filesystem, completely in ram.

xxd took 5.4 seconds - that's the baseline.

Your simple C code took 4.35 seconds. Your second program took 0.9
seconds - a big improvement.

One line of Python code took 8 seconds :

print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

That one took 90 seconds on my machine (CPython 3.11).

A slightly nicer Python program took 14.3 seconds :

import sys
bs = open(sys.argv[1], "rb").read()
xs = "".join([" 0x%02x," % b for b in bs])
ln = len(xs)
print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

This one was 104 seconds (128 seconds with PyPy).

This can't be blamed on the slowness of my storage devices, or moans
about Windows, because I know that amount of data (the output is 65%
bigger because of using hex format) could be processed in a couple of a seconds using a fast native code program.

It's just Python being Python.

I have two systems at work with close to identical hardware, both about
10 years old. The Windows one has a little bit faster disk, the Linux
one has more memory, but the processor is the same. The Windows system
is Win7 and as old as the machine, while the Linux system was installed
about 6 years ago. Both machines have a number of other programs open
(the Linux machine has vastly more), but none of these are particularly demanding when not in direct use.

On the Linux machine, that program took 25 seconds (with python 3.7).
On the Windows machine, it took 48 seconds (with python 3.8). In both
cases, the source binary file was recently written and therefore should
be in cache, and both the source and destination were on the disk (ssd
for Windows, hd for Linux).

Python throws all this kind of stuff over to the C code - it is pretty
good at optimising such list comprehensions. (But they are obviously
still slower than carefully written native C code.) If it were running
through these loops with the interpreter, it would be orders of
magnitude slower.

So what I see from this is that my new Linux PC took 14 seconds while my
old Linux PC took 25 seconds - it makes sense that the new processor is something like to 80% faster than the old one for a single-threaded calculation. And Windows (noting that this is Windows 7, not a recent
version of Windows) doubles that time for some reason.

(I have had reason to include a 0.5 MB file in a statically linked
single binary - I'm not sure when you'd need very fast handling of
multi-megabyte embeds.)

I have played with generating custom executable formats (they can be
portable between OSes, and I believe less visible to AV software), but
they require a normal small executable to launch them and fix them up.

To give the illusion of a conventional single executable, the program
needs to be part of that stub file.

There are a few ways of doing it, like simply concatenating the files,
but extracting is slightly awkward. Embedding as data is one way.

Sure.

The typical use I have is for embedded systems where there is a network
with a master card and a collection of slave devices (or perhaps
multiple microcontrollers on the same board). A software update will
typically involve updating the master board and have that pass on
updates to the other devices. So the firmware for the other devices
will be built into the executable for the master board.

Another use-case is small web servers in program, often for
installation, monitoring or fault-finding. There are fixed files such
as index.html, perhaps a logo, and maybe jquery or other javascript
library file.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Thu May 30 15:15:57 2024

On Thu, 30 May 2024 13:31:18 +0200
David Brown <david.brown@hesbynett.no> wrote:

On 30/05/2024 02:18, bart wrote:

On 29/05/2024 22:46, Malcolm McLean wrote:

Exactly. Windows costs a fortune.

Actually I've no idea how much it costs.

The retail version is too much for a cheap machine, but a minor part
of the cost of a more serious computer. The server versions and
things like MSSQL server are ridiculous prices - for many setups,
they cost more than the hardware, and that's before you consider the
client access licenses.

It depends.
If you need Windows server just to run your own applications or
certain 3rd-party applications without being file server and without
being terminal server (i.e. at most 2 interactive users logged on simultaneously) then you can get away with Windows Server Essential.
It costs less than typical low end server hardware.
MS-SQL also has many editions with very different pricing.
I think, nowadays even Oracle has editions that is not ridiculously
expensive. Not sure about IBM DB2.

But whatever it is, I'm not adverse to the idea of having to pay
for software. After all you have to pay for hardware, and for
computers, I would happily pay extra to have something that works
out of the box.

I have nothing against paying for software either. I mainly use
Linux because it is better, not because it is free - that's just an
added convenience. I have bought a number of Windows retail licenses
over the decades, to use with machines I put together myself rather
than OEM installations.

I'm not so sure about "works out of the box", however. On most
systems with so-called "pre-installed" Windows, it takes hours for
the installation to complete, and you need to answer questions or
click things along the way so you can't just leave it to itself. And
if the manufacturer has taken sponsorship from ad-ware and crap-ware
vendors, it takes more hours to install, and then you have hours of
work to uninstall the junk.

I don't remember anything like that in case of cheap mini-PC from my
previous post. It took a little longer than for previous mini-PC with
Win10 that it replaced, and longer than desktop with Win7, but we are
still talking about 10-15 minutes, not hours.
May be, quick Internet connection helps (but I heard that in Norway it
is quicker).
Or, may be, people that sold me a box, did some preliminary work.
Or, may be, your case of installation was very unusual.

On the other hand, I routinely see IT personal at work spending several
hours installing non-OEM Windows, esp. on laptops and servers. On
desktops it tends to be less bad.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Thu May 30 16:09:00 2024

On 30/05/2024 14:15, Michael S wrote:

On Thu, 30 May 2024 13:31:18 +0200
David Brown <david.brown@hesbynett.no> wrote:

On 30/05/2024 02:18, bart wrote:

On 29/05/2024 22:46, Malcolm McLean wrote:

Exactly. Windows costs a fortune.

Actually I've no idea how much it costs.

The retail version is too much for a cheap machine, but a minor part
of the cost of a more serious computer. The server versions and
things like MSSQL server are ridiculous prices - for many setups,
they cost more than the hardware, and that's before you consider the
client access licenses.

It depends.
If you need Windows server just to run your own applications or
certain 3rd-party applications without being file server and without
being terminal server (i.e. at most 2 interactive users logged on simultaneously) then you can get away with Windows Server Essential.
It costs less than typical low end server hardware.

Yes - Windows Server Essential was a good choice, and a lot more price-efficient for small usage. You also don't need CALs, saving a lot
of cost and a huge amount of effort and bureaucracy. That's why MS
stopped selling it retail after server 2019 - it was too popular.

The last Windows server I set up was for a third-party application that required Windows Server 2022 (not 2019) and MSSQL server. I have no
idea why - the task could have been written to run on a Raspberry Pi
with extra storage.

To give Windows server it's fair due, you get a nice 180 days evaluation
period and installation was quite straightforward on a VM on a Proxmox
mini PC. But towards the end of that trial period we will have to
decide if we want to pay the full server licence cost, or buy a monster
rack server from someone like Dell or HP that can sell the Essentials
version. (Dell and HP make reasonable enough systems, but I'd rather
use 5% of the processing capacity of a little mini PC than 2% capacity
of a rack monster that sounds like a jet engine.)

Ultimately, the cost of even the standard version of Windows server is a
small part of the cost of this rather specialised third-party software,
and the whole thing will (if it works like we hope) save our company a
good deal more than it costs. So we'll pay the Windows server license.
It is simply annoying that we have to pay a high price for the full
server licence, when we are doing so little with it.

MS-SQL also has many editions with very different pricing.

Last I looked, they have the free version that covers a lot of basic
usage (and I think that's what we are using at the moment), an expensive standard version with absurdly complicated CALs, and then /really/
expensive versions beyond that.

I think, nowadays even Oracle has editions that is not ridiculously expensive. Not sure about IBM DB2.

They have to, to stay relevant for new users. The main reason anyone
ever chooses to buy MS SQL, DB2 or Oracle is because they have always
bought those servers and are locked into them due to proprietary
extensions, additional software (their own or third-party), training and familiarity, and support contracts. For new systems that don't have the
legacy requirements, customers will wonder why they should buy one of
these when something like PostgreSQL is free, has most of the features (including its own unique ones), and will happily scale to the huge
majority of database needs. Sure, it does not have the management tools
of the big commercial database servers, but you can get a lot of
commercial support for the money you save on licensing.

But whatever it is, I'm not adverse to the idea of having to pay
for software. After all you have to pay for hardware, and for
computers, I would happily pay extra to have something that works
out of the box.

I have nothing against paying for software either. I mainly use
Linux because it is better, not because it is free - that's just an
added convenience. I have bought a number of Windows retail licenses
over the decades, to use with machines I put together myself rather
than OEM installations.

I'm not so sure about "works out of the box", however. On most
systems with so-called "pre-installed" Windows, it takes hours for
the installation to complete, and you need to answer questions or
click things along the way so you can't just leave it to itself. And
if the manufacturer has taken sponsorship from ad-ware and crap-ware
vendors, it takes more hours to install, and then you have hours of
work to uninstall the junk.

I don't remember anything like that in case of cheap mini-PC from my
previous post. It took a little longer than for previous mini-PC with
Win10 that it replaced, and longer than desktop with Win7, but we are
still talking about 10-15 minutes, not hours.

This can vary significantly from manufacturer to manufacturer. And
perhaps it is not as bad as it used to be - most systems I have set up
in recent years have been bare-bones.

May be, quick Internet connection helps (but I heard that in Norway it
is quicker).

There's no issue there. It's the unpacking of overweight programs from
one "hidden" part of the disk and installation in the main partition,
along with the endless reboots, that takes time. And every so often the
whole process stops to ask you a question.

Or, may be, people that sold me a box, did some preliminary work.

That is certainly a service IT suppliers can offer.

Or, may be, your case of installation was very unusual.

Or maybe yours was unusual :-)

Or maybe I have been less lucky in the manufacturers, or the type of
machine. Or, as I suggested above, maybe it's more a thing of the past
and not so relevant now.

On the other hand, I routinely see IT personal at work spending several
hours installing non-OEM Windows, esp. on laptops and servers. On
desktops it tends to be less bad.

It's all a matter of the hardware, and what is supported out of the box
and what needs external drivers. Windows is definitely improving in
that area, but has a very long way to go to reach the convenience of
Linux. (But if your favourite Linux distribution doesn't have a driver
for the hardware in question, you generally have a lot more effort than
you do compared to Windows missing the the driver. I've yet to meet the perfect OS.)

But I think one of the most "entertaining" cases I had was installing
Windows Server on a Dell server some years back. The Dell machine had
only USB 3 slots, while Windows Server did not have native support for
anything beyond USB 2. So you could get through the first part of the installation fine, as the installer uses the BIOS for keyboard, mouse
and USB disk services. Then it switches to its own drivers and has no
access to the keyboard, mouse, or USB drives. (Dell had a solution, of
course, but it was a fair bit of extra fuss.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lew Pitcher@21:1/5 to Malcolm McLean on Thu May 30 16:00:07 2024

On Thu, 30 May 2024 16:50:27 +0100, Malcolm McLean wrote:

On 30/05/2024 12:31, David Brown wrote:

On 30/05/2024 02:18, bart wrote:

IME installing Linux is faster and simpler than installing Windows on
almost any hardware. The only drivers that have been an issue for me
for decades is for very new Wireless interfaces.

So I wanted to add audio to Baby X.

IIRC, the X11 protocol does not support an audio stream, and networked
audio is handled by other servers and protocols.

How did you tie audio into Baby X?

And I stole an MP3 decoder from
Fabrice Bellard of tcc fame, and it took an afternoon to get audio up
and running under Baby X on Windows. Then do the samefor Linux. And it
was a complete nightmare, and it still isn't fit to push.

--
Lew Pitcher
"In Skills We Trust"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Michael S on Thu May 30 14:04:39 2024

On 5/30/2024 3:40 AM, Michael S wrote:

On Thu, 30 May 2024 00:40:07 -0400
Paul <nospam@needed.invalid> wrote:

On 5/28/2024 6:24 PM, Michael S wrote:

On Tue, 28 May 2024 23:08:22 +0100
bart <bc@freeuk.com> wrote:

On 28/05/2024 21:23, Michael S wrote:

On Tue, 28 May 2024 19:57:38 +0100
bart <bc@freeuk.com> wrote:

OK, I had go with your program. I used a random data file of
exactly 100M bytes.

Runtimes varied from 4.1 to 5 seconds depending on compiler. The
fastest time was with gcc -O3.

It sounds like your mass storage device is much slower than aging
SSD on my test machine and ALOT slower than SSD of David Brown.

My machine uses an SSD.

SSDs are not created equal. Especially for writes.

However the tests were run on Windows, so I ran your program again
under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).

3 times slower ?!
I never tested it myself, but I heard that there is a significant
difference in file access speed between WSL's own file system and
mounted Windows directories. The difference under WSL is not as big
as under WSL2 where they say that access of mounted Windows
filesystem is very slow, but still significant.
I don't know if it applies to all file sizes or only to accessing
many small files.

WSL uses containers, <snip>

It seems, you are discussing a speed of access and methods of access
from the host side. My question is opposite - is access from Linux
guest to Windows host files running at the same speed as Linux (WSL,
not WSL2) guest to its own file system?
I heard that it isn't, but it was not conclusive and with insufficient details. I am going to test our specific case of big files. Now.

WSL Ubuntu20.04 version 2

This is a test of reaching Windows mounts from Linux, the C: drive in particular.
C: is stored on a 3.5GB/sec NVMe in this case. At the very least, this
path is a non-paravirtualized path, just an ordinary hypervisor passing
sectors through.

user@MACHINE:/mnt/c/Users/user/AppData/Local/Packages/
CanonicalGroupLimited.Ubuntu20.04onWindows_79rhkp1fndgsc/LocalState$
time dd if=ext4.vhdx of=/dev/null bs=1048576
4769+0 records in
4769+0 records out
5000658944 bytes (5.0 GB, 4.7 GiB) copied, 20.2495 s, 247 MB/s

real 0m20.268s
user 0m0.000s
sys 0m0.395s

*******

Whereas this is a test of the container ext4.vhdx and materials inside it.
If this is paravirtualization, it's damn fast. The NVMe only does 3.5GB/sec (and of course that depends on the PCIe buffer size in the hub, where early PCIe hardwares only ran at 50% link rate due to "tiny buffers"). The test
file is random numbers, in order that "real space" be taken up in the
container ext4.vhdx .

user@MACHINE:/home/user/Downloads$

219 echo 3 | sudo tee /proc/sys/vm/drop_caches
220 top
221 history
user@MACHINE:/home/user/Downloads$ time dd if=random.bin of=/dev/null bs=1048576
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.548747 s, 2.0 GB/s

real 0m0.560s
user 0m0.000s
sys 0m0.301s
user@MACHINE:/home/user/Downloads$

Notice I dropped my caches (which on modern Linux is mostly a waste
of time, as there still seem to be caches in there -- benching and
eliminating caches is tough to do now). I verified in "top", that
the command took, and some cached material dropped. But it made
no difference at all to the test results.

Conclusion: If you're working your 100MB file in ~/Downloads in WSL, things
should be damn fast, as fast as the media can manage in the
case of hard drives. Whereas accessing /mnt/c is not nearly as fast.

Note: In the above, the container is being used two ways. In the first test, the
"outside" of the container is being sampled via a /mnt/c mount. The second test
is materials inside the container, coming through a different software stack
in the hypervisor supporting this stuff. WSL is HyperV just as VirtualBox on
Windows is now forced to be a HyperV client. So whatever properties are exposed
above, some of them are traceable to HyperV, and might show up if you used
HyperV for hosting a Linux guest instead.

There are no block diagrams of the WSL era, the built-in Linux kernel or whatever,
that I can find on learn.microsoft.com . Only an earlier diagram, before virtualization
was used in earnest, is available. The "main OS" on your computer, is virtualized.
It's not actually physical. That's because HyperV is an inverted hypervisor and not
a conventional one. Can I prove that ? Of course not. Just the single diagram
I've seen, hints at it. The Windows 11 Task Manager is "trash", to put it lightly,
and is now an abomination. Worthless. It makes you wonder what the staff at MS use
for visualizing what is going on. I use the Kill-O-Watt power meter connected to this
PC, to tell me whether anything is going on under the hood, or whether it is truly idle.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to David Brown on Thu May 30 14:20:43 2024

On 5/30/2024 9:05 AM, David Brown wrote:

On 29/05/2024 23:08, bart wrote:

On 28/05/2024 16:34, David Brown wrote:

On 28/05/2024 13:41, Michael S wrote:

Let's start another round of private parts' measurements turnament!
'xxd -i' vs DIY

I used 100 MB of random data:

dd if=/dev/urandom bs=1M count=100 of=100MB

I compiled your code with "gcc-11 -O2 -march=native".

I ran everything in a tmpfs filesystem, completely in ram.

xxd took 5.4 seconds - that's the baseline.

Your simple C code took 4.35 seconds. Your second program took 0.9 seconds - a big improvement.

One line of Python code took 8 seconds :

print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

That one took 90 seconds on my machine (CPython 3.11).

A slightly nicer Python program took 14.3 seconds :

import sys
bs = open(sys.argv[1], "rb").read()
xs = "".join([" 0x%02x," % b for b in bs])
ln = len(xs)
print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

This one was 104 seconds (128 seconds with PyPy).

This can't be blamed on the slowness of my storage devices, or moans about Windows, because I know that amount of data (the output is 65% bigger because of using hex format) could be processed in a couple of a seconds using a fast native code program.

It's just Python being Python.

I have two systems at work with close to identical hardware, both about 10 years old. The Windows one has a little bit faster disk, the Linux one has more memory, but the processor is the same. The Windows system is Win7 and as old as the machine,

while the Linux system was installed about 6 years ago. Both machines have a number of other programs open (the Linux machine has vastly more), but none of these are particularly demanding when not in direct use.

On the Linux machine, that program took 25 seconds (with python 3.7). On the Windows machine, it took 48 seconds (with python 3.8). In both cases, the source binary file was recently written and therefore should be in cache, and both the source and

destination were on the disk (ssd for Windows, hd for Linux).

Python throws all this kind of stuff over to the C code - it is pretty good at optimising such list comprehensions. (But they are obviously still slower than carefully written native C code.) If it were running through these loops with the

interpreter, it would be orders of magnitude slower.

So what I see from this is that my new Linux PC took 14 seconds while my old Linux PC took 25 seconds - it makes sense that the new processor is something like to 80% faster than the old one for a single-threaded calculation. And Windows (noting that

this is Windows 7, not a recent version of Windows) doubles that time for some reason.

(I have had reason to include a 0.5 MB file in a statically linked single binary - I'm not sure when you'd need very fast handling of multi-megabyte embeds.)

I have played with generating custom executable formats (they can be portable between OSes, and I believe less visible to AV software), but they require a normal small executable to launch them and fix them up.

To give the illusion of a conventional single executable, the program needs to be part of that stub file.

There are a few ways of doing it, like simply concatenating the files, but extracting is slightly awkward. Embedding as data is one way.

Sure.

The typical use I have is for embedded systems where there is a network with a master card and a collection of slave devices (or perhaps multiple microcontrollers on the same board). A software update will typically involve updating the master board

and have that pass on updates to the other devices. So the firmware for the other devices will be built into the executable for the master board.

Another use-case is small web servers in program, often for installation, monitoring or fault-finding. There are fixed files such as index.html, perhaps a logo, and maybe jquery or other javascript library file.

Did you turn off Windows Defender while benching ?

[Picture]

https://i.postimg.cc/QCgLJLHQ/windows11-AV-off-control.gif

Benching on Windows is an art, because of all the crap going
on under the hood.

I've had programs slowed to 1/8th normal speed to 1/20th normal
speed, by forgetting to turn off a series of things. Once all
that is done, now you're getting into the same ballpark as Linux.

I also have to turn off the crap salad in Windows, when Windows Update
is running!!! The OS is too stupid to optimize conditions for its
own activity. My laptop for example, ran out of RAM, because "SearchApp"
was eating a three-course meal while I was working. Attempting to kill
that mother, caused the incoming Update to install at closer to normal
speed.

It takes practice to get good at benching modern Windows. On
an OS like Windows 2000, it was always ready to bench. It came
with no AV. It had no secret agenda. It just worked. Each succeeding
version is more of a nightmare.

Imagine when the local AI is running on the machine, and the power
consumption is 200W while it "listens to your voice". At least they're
staying true to their design principles.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Paul on Thu May 30 19:47:09 2024

Paul <nospam@needed.invalid> writes:

On 5/30/2024 3:40 AM, Michael S wrote:

On Thu, 30 May 2024 00:40:07 -0400

Notice I dropped my caches (which on modern Linux is mostly a waste
of time, as there still seem to be caches in there -- benching and >eliminating caches is tough to do now).

Use

$ dd if=random.bin of=/dev/null bs=1m conv=direct

This will bypass all kernel and libc buffering. That
said, O_DIRECT isn't quite as good as opening a traditional
unix raw device (where the I/O bypasses kernel buffering
completely) as O_DIRECT on linux may still do a copy to avoid having to
lock down the user buffer during the I/O.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Paul on Thu May 30 22:31:40 2024

On Thu, 30 May 2024 14:04:39 -0400
Paul <nospam@needed.invalid> wrote:

WSL Ubuntu20.04 version 2

Are you sure that you tested WSL, not WLS-2?
Your results looks very much like WLS2.
Your explanationns sound very much as if you are talking about WSL-2.

My WSL testing results are opposit from yours - read speed identical,
write speed consitently faster when writing to /mnt/d/... then when
writing to WSL's native FS.
Part of the reason could be that SSD D: is physically faster than SSD
C: that hosts WSL. I should have tested with /mnt/c as well, but
forgot to do it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Paul on Fri May 31 09:55:49 2024

(Your Usenet client is really messing up linebreaks. I don't know what
you are using, but I haven't seen such problems since google groups
postings.)

On 30/05/2024 20:20, Paul wrote:

On 5/30/2024 9:05 AM, David Brown wrote:

On 29/05/2024 23:08, bart wrote:

On 28/05/2024 16:34, David Brown wrote:

On 28/05/2024 13:41, Michael S wrote:

So what I see from this is that my new Linux PC took 14 seconds
while my old Linux PC took 25 seconds - it makes sense that the new
processor is something like to 80% faster than the old one for a
single-threaded calculation. And Windows (noting that this is
Windows 7, not a recent version of Windows) doubles that time for
some reason.

Did you turn off Windows Defender while benching ?

[Picture]

https://i.postimg.cc/QCgLJLHQ/windows11-AV-off-control.gif

I don't have that kind of stuff turned on - so no need to turn it off.
The trick to keeping Windows malware-free is not to run malware on it.

Benching on Windows is an art, because of all the crap going
on under the hood.

Yes, it can be. But I've done it before :-) And there's less of that
in Windows 7 than Windows 11. (And for balance, there are plenty of
background processes on typical Linux desktops too.)

None of my testing here was accurate benchmarking, it was just ballpark figures. There was lots of other stuff running on all the machines, but
with very low average cpu usage on a 4-core cpu, it doesn't make a big difference.

I've had programs slowed to 1/8th normal speed to 1/20th normal
speed, by forgetting to turn off a series of things. Once all
that is done, now you're getting into the same ballpark as Linux.

Does that mean you are happy running normal programs at these speeds?
If you have all this background stuff running when you are not
benchmarking, but simply working on the computer, then surely it has a
similar effect for your compiler, IDE, browser, and whatever else you
are doing? I know some people run multiple anti-virus and other
"security" programs that slow down some tasks on Windows, but not /that/
much.

I also have to turn off the crap salad in Windows, when Windows Update
is running!!!

Why would Windows Update be running? In particular, why would it be
running when you are using the machine for other purposes?

The OS is too stupid to optimize conditions for its
own activity. My laptop for example, ran out of RAM, because "SearchApp"
was eating a three-course meal while I was working. Attempting to kill
that mother, caused the incoming Update to install at closer to normal
speed.

It takes practice to get good at benching modern Windows. On
an OS like Windows 2000, it was always ready to bench. It came
with no AV. It had no secret agenda. It just worked. Each succeeding
version is more of a nightmare.

Imagine when the local AI is running on the machine, and the power consumption is 200W while it "listens to your voice". At least they're staying true to their design principles.

Imagine turning off (or never enabling) the services that you don't find
useful and can be a significant drain. I always disable Windows
updates, indexing services, and would never have a "voice AI" on a
computer. Linux does not have anything like as much of this kind of
nonsense on normal desktops (though I believe Ubuntu had some nasty
automatic search systems for a while). The only one I can think of is "updatedb" for the "locate" command. While "locate" can sometimes be
useful, trawling the filesystem can be very time-consuming if it is
large. But it's easy to tune updatedb to cover only the bits you need.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to David Brown on Fri May 31 13:45:07 2024

On Fri, 31 May 2024 09:55:49 +0200
David Brown <david.brown@hesbynett.no> wrote:

Imagine turning off (or never enabling) the services that you don't
find useful and can be a significant drain. I always disable Windows updates, indexing services, and would never have a "voice AI" on a
computer. Linux does not have anything like as much of this kind of
nonsense on normal desktops (though I believe Ubuntu had some nasty
automatic search systems for a while). The only one I can think of
is "updatedb" for the "locate" command. While "locate" can sometimes
be useful, trawling the filesystem can be very time-consuming if it
is large. But it's easy to tune updatedb to cover only the bits you
need.

Most of the things that you mentioned above are not easy to achieve on
Home Editions of Windows beyond 7.
Some of them are not easy to achieve even on Pro edition.
That's a major reason for me to remain on 7 for as long as I can.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Michael S on Fri May 31 13:33:15 2024

On 31/05/2024 12:45, Michael S wrote:

On Fri, 31 May 2024 09:55:49 +0200
David Brown <david.brown@hesbynett.no> wrote:

Imagine turning off (or never enabling) the services that you don't
find useful and can be a significant drain. I always disable Windows
updates, indexing services, and would never have a "voice AI" on a
computer. Linux does not have anything like as much of this kind of
nonsense on normal desktops (though I believe Ubuntu had some nasty
automatic search systems for a while). The only one I can think of
is "updatedb" for the "locate" command. While "locate" can sometimes
be useful, trawling the filesystem can be very time-consuming if it
is large. But it's easy to tune updatedb to cover only the bits you
need.

Most of the things that you mentioned above are not easy to achieve on
Home Editions of Windows beyond 7.

I've never used any home edition of Windows, so I will take your word
for it.

Some of them are not easy to achieve even on Pro edition.
That's a major reason for me to remain on 7 for as long as I can.

I have many reasons for staying with Windows 7, but I'll add that one to
the list. (There are some things in Windows 11 that I would like to
have, but not enough to change over.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Michael S on Fri May 31 15:20:02 2024

On 5/30/2024 3:31 PM, Michael S wrote:

On Thu, 30 May 2024 14:04:39 -0400
Paul <nospam@needed.invalid> wrote:

WSL Ubuntu20.04 version 2

Are you sure that you tested WSL, not WLS-2?
Your results looks very much like WLS2.
Your explanationns sound very much as if you are talking about WSL-2.

My WSL testing results are opposit from yours - read speed identical,
write speed consitently faster when writing to /mnt/d/... then when
writing to WSL's native FS.
Part of the reason could be that SSD D: is physically faster than SSD
C: that hosts WSL. I should have tested with /mnt/c as well, but
forgot to do it.

I can't test WSL, because it won't start. It throws an error.

I used what I had.

I am specifically trying to test on the
box with the NVMe in it (to eliminate slower devices from the
picture). I only own one NVMe and one slot to load it.

*******

As for your general problem, you can easily malloc a buffer
for the entire file, and process the table as stored in RAM.
That should help eliminate your variable file system overhead
when benching.

That's not scalable for general usage, but during benchmarking
and fast prototyping stage, you might test with it. That way,
moving the executable around, the filesystem component is removed.
Or, the filesystem component can be timestamped if you want.
I just send timestamps to stderr so they won't interfere with stdout.

*******

I just had a thought. If I use "df" in WSL2, the slash almost
looks like it is on a TMPFS (Ram). That could be why I got 2GB/sec.
Check in wsl environment, and using "df", check for evidence of
how the file systems were set up there.

$ df
Filesystem 1K-blocks Used Available Use% Mounted on
none 32904160 960 32903200 1% /run
none 32904160 0 32904160 0% /run/lock
none 32904160 0 32904160 0% /run/shm
tmpfs 32904160 0 32904160 0% /sys/fs/cgroup
...
C:\ 124493820 60595608 63898212 49% /mnt/c

$ top

top - 15:15:45 up 5 min, 1 user, load average: 0.00, 0.00, 0.00
Tasks: 45 total, 1 running, 44 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64265.9 total, 63170.6 free, 631.5 used, 463.8 buff/cache
MiB Swap: 16384.0 total, 16384.0 free, 0.0 used. 63035.9 avail Mem

Like a LiveDVD, the TMPFS is using up to a half of available RAM.
It behaves the same way when you boot a LiveDVD.

Your WSL instance, could have quite a different look to the mounts in "df".

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Paul on Sun Jun 2 04:16:37 2024

On Thu, 30 May 2024 14:20:43 -0400, Paul wrote:

Did you turn off Windows Defender while benching ?

Isn’t that trusting that your benchmark isn’t virus-infected?

Seems to defeat the point of real-world benchmarks, doesn’t it?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Sun Jun 2 04:19:03 2024

On Fri, 31 May 2024 09:55:49 +0200, David Brown wrote:

The only one I can think of is
"updatedb" for the "locate" command. While "locate" can sometimes be
useful, trawling the filesystem can be very time-consuming if it is
large. But it's easy to tune updatedb to cover only the bits you need.

On Linux, there is the concept of “ionice”, which is to I/O what “nice” is
to CPU usage. So for example if updatedb had its ionice dropped to “idle” priority, that allows it to be pushed to the back of the queue when
regular apps need to do any I/O. Result is much less system impact from
such background update tasks.

Maybe there’s an option to set this somewhere?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Sun Jun 2 13:40:35 2024

On 02/06/2024 06:19, Lawrence D'Oliveiro wrote:

On Fri, 31 May 2024 09:55:49 +0200, David Brown wrote:

The only one I can think of is
"updatedb" for the "locate" command. While "locate" can sometimes be
useful, trawling the filesystem can be very time-consuming if it is
large. But it's easy to tune updatedb to cover only the bits you need.

On Linux, there is the concept of “ionice”, which is to I/O what “nice” is
to CPU usage. So for example if updatedb had its ionice dropped to “idle” priority, that allows it to be pushed to the back of the queue when
regular apps need to do any I/O. Result is much less system impact from
such background update tasks.

Maybe there’s an option to set this somewhere?

Since the updatedb task is (IME at least) started from a cron job, it's
not hard to use ionice with it if that helps. For many systems,
updatedb is run in the middle of the night anyway, and is not a problem.

The worst trouble I've had with updatedb was on a file server that was
used to hold archives - it was a small machine with little ram (but
several big spinning rust disks), holding a great many files that were
rarely accessed. It was mysteriously slow, until I discovered that
every night it started an updatedb job that took so long to crawl
through everything that it didn't always finish before the next run.
ionice would not have helped at all - disabling updatedb was the answer.
(Pruning the trees scanned by updatedb could also have helped, but I
didn't need "locate" anyway.)

There are other situations where ionice can be helpful, however, and I
have used it a few times (though I can't remember offhand when).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 03:12:06 2024

On Thu, 30 May 2024 10:01:42 +0200, David Brown wrote:

In fact, I think /all/ the pre-built ones have Windows.

I have set up two MSI Cubi 5 machines for friends, both with Linux Mint. Neither came with Windows, either in the box or preinstalled.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Mon Jun 3 03:10:42 2024

On Wed, 29 May 2024 22:46:56 +0100, Malcolm McLean wrote:

Windows costs a fortune. And Microsoft spend billions
developing it.

It may not be quite as profitable as it once was. That is why Microsoft
has been cutting corners on Windows QA lately, and now even resorting to
force ads on Windows users, in an attempt to shore up sagging revenues.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Paul on Mon Jun 3 03:15:27 2024

On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

WSL uses containers, so of course it is slow.

WSL1 had a Linux “personality” on top of the NT kernel. So this was emulation, not containers.

WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

Linux has containers, which are based entirely on namespace isolation (and cgroups for process management). These are all standard kernel mechanisms,
so there should be very little overhead in using them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 08:57:16 2024

On 03/06/2024 05:15, Lawrence D'Oliveiro wrote:

On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

WSL uses containers, so of course it is slow.

WSL1 had a Linux “personality” on top of the NT kernel. So this was emulation, not containers.

WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

Linux has containers, which are based entirely on namespace isolation (and cgroups for process management). These are all standard kernel mechanisms,
so there should be very little overhead in using them.

I can't answer for WSL, having not used it myself. But I have used
Linux containers of various sorts since OpenVZ (and even chroot jails
before that), and there's no doubt that the overhead is usually negligible.

The whole deal with containers is that everything runs on the same
kernel, but with different namespaces and file system root. If WSL were
to work by containers, it would need to run the Linux processes as
processes under the NT kernel. I suppose that might be possible, with a translation layer for all system API calls. After all, you can run
Windows processes on Linux with Wine - perhaps a similar principle can
work for Windows?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Mon Jun 3 07:54:15 2024

On Wed, 29 May 2024 14:38:14 +0200, Bonita Montero wrote:

Am 29.05.2024 um 14:10 schrieb David Brown:

I've seen odd things with timings due to Windows' relatively poor IO,
file and disk handling. Many years ago when I had need of
speed-testing some large windows-based build system, I found it was
faster running in a virtual windows machine on VirtualBox on a Linux
host, than in native Windows on the same hardware.

Windows kernel I/O is rather efficient ...

But you can’t get to it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Mon Jun 3 07:49:00 2024

On Wed, 29 May 2024 21:31:54 +0100, bart wrote:

Conclusion: beating xxd is apparently not hard if even a scripting
language can do so. I wonder what slows it down?

It’s written in a very stdio-dependent vanilla C style.

Have a look at the source for yourself. It’s part of the “vim” package on Debian and no doubt other distros. The xxd binary itself is built from a
single source file of just some 1200 lines, and the hex-to-binary
conversion is done in a function called “huntype” of just 133 lines.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 07:51:43 2024

On Thu, 30 May 2024 14:14:21 +0200, David Brown wrote:

Standard PC's and laptops are very low-margin products.

I think the most expensive components in a typical Windows PC are the
Intel/AMD CPU and the Windows OS. So Microsoft and Intel (and possibly AMD
as well) are making money hand over fist, while the PC vendor itself has
to endure a net margin of maybe 1-2%.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 07:59:46 2024

On Mon, 3 Jun 2024 08:57:16 +0200, David Brown wrote:

If WSL were to work by containers, it would need to run the Linux
processes as processes under the NT kernel. I suppose that might be possible, with a translation layer for all system API calls. After all,
you can run Windows processes on Linux with Wine - perhaps a similar principle can work for Windows?

Microsoft tried to get Docker--a well-known Linux container technology-- working under Windows, but gave up.

“Containers” are not actually a primitive facility that the Linux kernel offers: what it does offer are “namespaces” and “cgroups”. There are maybe
a dozen different kinds of “containers” you can get that are built on top of these, such as LXC, LXD, systemd-nspawn, Docker and no doubt others I haven’t even heard of.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 11:02:48 2024

On Mon, 3 Jun 2024 03:15:27 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

WSL uses containers, so of course it is slow.

WSL1 had a Linux “personality” on top of the NT kernel. So this was emulation, not containers.

WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

Linux has containers, which are based entirely on namespace isolation
(and cgroups for process management). These are all standard kernel mechanisms, so there should be very little overhead in using them.

The word "container" has many meanings.
As far as host FS is concerned, guest FS is a one huge file. Despite
very different tech under the hood it equally applies both to WSL and
to WSL-2. Calling this file 'container' sounds like proper use of the
term.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 10:57:28 2024

On Mon, 3 Jun 2024 03:12:06 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Thu, 30 May 2024 10:01:42 +0200, David Brown wrote:

In fact, I think /all/ the pre-built ones have Windows.

I have set up two MSI Cubi 5 machines for friends, both with Linux
Mint. Neither came with Windows, either in the box or preinstalled.

That is what I had seen too. Mini-PCs appear very different from
laptops (and from AOI desktops?) in that regard. Windows does *not*
appear subsidized on this type of computers.
May be, in Norway it's different, I heard that it depends on importers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 08:29:05 2024

On Thu, 30 May 2024 16:09:00 +0200, David Brown wrote:

For new systems that don't have the
legacy requirements, customers will wonder why they should buy one of
these when something like PostgreSQL is free, has most of the features (including its own unique ones), and will happily scale to the huge
majority of database needs.

Open-Source software offers its own unique capabilities. For example,
Microsoft Office offers (at extra cost) that Access database, which has limitations that make SQLite seem powerful. (OK, so SQLite *is* pretty powerful.)

LibreOffice Base has an equivalent DBMS backend. But it can also interface
to databases in SQLite, MySQL/MariaDB and no doubt others I haven’t tried. Though you do have to pay the $0 extra monthly fee for the “Pro” version.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to bart on Mon Jun 3 08:31:17 2024

On Thu, 30 May 2024 12:23:38 +0100, bart wrote:

Hardly anybody uses the WinAPI directly.

What happened to WinRT?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 13:01:45 2024

On Mon, 3 Jun 2024 07:49:00 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Wed, 29 May 2024 21:31:54 +0100, bart wrote:

Conclusion: beating xxd is apparently not hard if even a scripting
language can do so. I wonder what slows it down?

It’s written in a very stdio-dependent vanilla C style.

So are all our [much much faster] mini-utils.

Have a look at the source for yourself. It’s part of the “vim”
package on Debian and no doubt other distros. The xxd binary itself
is built from a single source file of just some 1200 lines, and the hex-to-binary conversion is done in a function called “huntype” of
just 133 lines.

The question was about binary-to-hex rather than hex-to-binary.

BTW, it seems that 'xxd -i -r' does not work at all. Or, at least, I was
unable to figure out right combination of flags.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Michael S on Mon Jun 3 14:41:43 2024

On 6/3/2024 4:02 AM, Michael S wrote:

On Mon, 3 Jun 2024 03:15:27 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

WSL uses containers, so of course it is slow.

WSL1 had a Linux “personality” on top of the NT kernel. So this was
emulation, not containers.

WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

Linux has containers, which are based entirely on namespace isolation
(and cgroups for process management). These are all standard kernel
mechanisms, so there should be very little overhead in using them.

The word "container" has many meanings.
As far as host FS is concerned, guest FS is a one huge file. Despite
very different tech under the hood it equally applies both to WSL and
to WSL-2. Calling this file 'container' sounds like proper use of the
term.

I finally found a slightly older Win10 setup on an SSD.
It has WSL1.

It's uncontained. I put Ubuntu 18.04 in WSL1, because it had
no distro. This is an example of a file in the slash tree, in /usr/lib . Permissions are restricted in the tree, and Everything.exe search
tool, I think it failed to index this tree of files.

C:\Users\Bullwinkle\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04LTS_79rhkp1fndgsc\
LocalState\rootfs\usr\lib\x86_64-linux-gnu\perl5\5.26\vars.pm

A process called "init" can be seen running in Task Manager.

This is a test of the / tree for speed.

bullwinkle@DRAX:/$ dd if=testfile.bin of=/dev/null bs=1048576
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.06495 s, 520 MB/s bullwinkle@DRAX:/$ ls -al testfile.bin
-rw-rw-rw- 1 root root 1073741824 Jun 3 13:57 testfile.bin
bullwinkle@DRAX:/$

The /mnt/c speed is next. I didn't bother shutting off Windows Defender for this.
It's close enough to device speed (SATA SSD), it's basically the same speed
as the other test.

bullwinkle@DRAX:/mnt/c/users/bullwinkle/Downloads$ dd if=WIN10-WADK.7z of=/dev/null bs=1048576
3453+1 records in
3453+1 records out
3621128316 bytes (3.6 GB, 3.4 GiB) copied, 7.41205 s, 489 MB/s bullwinkle@DRAX:/mnt/c/users/bullwinkle/Downloads$ ls -al WIN10-WADK.7z -rwxrwxrwx 1 bullwinkle bullwinkle 3621128316 May 10 2021 WIN10-WADK.7z

[Picture]

https://i.postimg.cc/Y2RQd3LM/wsl1-with-ubuntu-1804-and-XMing.gif

*******

WSL2 on the machine I'm typing on, uses "ext4.vhdx" currently (6,698,303,488 bytes).
And that is a container. Instead of "init", "vmmemWSL" can be seen running. It does not use third-party XMing Xserver, and uses WSLg instead for graphics. Either
of the two setups can run Firefox browser.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Tue Jun 4 02:07:59 2024

On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

MSVC quitted compilation after allocating 50GB of
memory, gcc and clang compiled for minutes.

Next time, don’t even bother with MSVC.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 4 02:07:01 2024

On Mon, 3 Jun 2024 11:02:48 +0300, Michael S wrote:

On Mon, 3 Jun 2024 03:15:27 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

Linux has containers, which are based entirely on namespace isolation
(and cgroups for process management). These are all standard kernel
mechanisms, so there should be very little overhead in using them.

The word "container" has many meanings.
As far as host FS is concerned, guest FS is a one huge file.

That may be true for Docker, certainly not (necessarily) true for the
others I mentioned.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Bonita Montero on Mon Jun 3 22:46:07 2024

On 6/3/2024 12:39 PM, Bonita Montero wrote:

There's a good reason for sth. like #embed: I just wrote a very quick
xxd alternative and generated a C-file with a char-array and the size
of this file is 1,2GB. MSVC quitted compilation after allocating 50GB
of memory, gcc and clang compiled for minutes. This would be better
with an #embed-Tag. So there's really good reason for that.
This is my xxd-substitude. Compared to xxd it only can dump C-files.
On my PC it's about 15 times faster than xxd because it does its own I/O-buffering.

#include <iostream>
#include <fstream>
#include <charconv>
#include <span>
#include <vector>

using namespace std;

int main( int argc, char **argv )
{
    if( argc < 4 )
        return EXIT_FAILURE;
    char const
        *inFile = argv[1],
        *symbol = argv[2],
        *outFile = argv[3];
    ifstream ifs;
    ifs.exceptions( ifstream::failbit | ifstream::badbit );     ifs.open( inFile, ifstream::binary | ifstream::ate );     streampos size( ifs.tellg() );
    if( size > (size_t)-1 )
        return EXIT_FAILURE;
    ifs.seekg( ifstream::beg );
    union ndi { char c; ndi() {} };
    vector<ndi> rawBytes( size );
    span<char> bytes( &rawBytes.data()->c, rawBytes.size() );     ifs.read( bytes.data(), bytes.size() );
    ofstream ofs;
    ofs.exceptions( ofstream::failbit | ofstream::badbit );     ofs.open( outFile );
    vector<ndi> rawBuf( 0x100000 );
    span<char> buf( &rawBuf.begin()->c, rawBuf.size() );
    ofs << "unsigned char " << symbol << "[" << (size_t)size << "] = \n{\n";
    auto rd = bytes.begin();
    auto wrt = buf.begin();
    auto flush = [&]
    {
        ofs.write( buf.data(), wrt - buf.begin() );
        wrt = buf.begin();
    };
    while( rd != bytes.end() )
    {
        size_t remaining = bytes.end() - rd;
        constexpr size_t N_LINE = 1 + 12 * 6 - 1 + 1;
        size_t n = remaining > 12 ? 12 : remaining;
        auto rowEnd = rd + n;
        *wrt++ = '\t';
        do
        {
            *wrt++ = '0';
            *wrt++ = 'x';
            char *wb = to_address( wrt );
            (void)(wrt + 2);
            auto tcr = to_chars( wb, wb + 2, (unsigned char)*rd++, 16 );
            if( tcr.ptr == wb + 1 )
                wb[1] = wb[0],
                wb[0] = '0';
            wrt += 2;
            if( rd != bytes.end() )
            {
                *wrt++ = ',';
                if( rd != rowEnd )                     *wrt++ = ' ';
            }
        } while( rd != rowEnd );
        *wrt++ = '\n';
        if( buf.end() - wrt < N_LINE )
            flush();
    }
    flush();
    ofs << "};\n" << endl;
}

It would be nice to see small samplings of the files used
for input and output. Just a couple lines of the "meat" section
would be sufficient.

For anything larger, https://pastebin.com/ could hold up to 512KB
of text, for a small sample of each. Some of the USENET servers
have limits on message size. For a binary file, you can just use
the hex editor representation of the binary file.

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000 4D 5A 50 00 02 00 00 00 04 00 0F 00 FF FF 00 00 MZP.........ÿÿ.. 00000010 B8 00 00 00 00 00 00 00 40 00 1A 00 00 00 00 00 ¸.......@....... 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 ................ 00000040 BA 10 00 0E 1F B4 09 CD 21 B8 01 4C CD 21 90 90 º....´.Í!¸.LÍ!..
00000050 54 68 69 73 20 70 72 6F 67 72 61 6D 20 6D 75 73 This program mus 00000060 74 20 62 65 20 72 75 6E 20 75 6E 64 65 72 20 57 t be run under W 00000070 69 6E 33 32 0D 0A 24 37 00 00 00 00 00 00 00 00 in32..$7........

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Tue Jun 4 03:58:49 2024

On Tue, 4 Jun 2024 04:46:59 +0200, Bonita Montero wrote:

Am 04.06.2024 um 04:07 schrieb Lawrence D'Oliveiro:

On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

MSVC quitted compilation after allocating 50GB of
memory, gcc and clang compiled for minutes.

Next time, don’t even bother with MSVC.

MSVC has the most conforming C++20-frontend.

Somehow I find that hard to believe <https://gcc.gnu.org/gcc-14/changes.html#cxx>.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 09:52:04 2024

On 04/06/2024 05:58, Lawrence D'Oliveiro wrote:

On Tue, 4 Jun 2024 04:46:59 +0200, Bonita Montero wrote:

Am 04.06.2024 um 04:07 schrieb Lawrence D'Oliveiro:

On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

MSVC quitted compilation after allocating 50GB of
memory, gcc and clang compiled for minutes.

Next time, don’t even bother with MSVC.

MSVC has the most conforming C++20-frontend.

Somehow I find that hard to believe <https://gcc.gnu.org/gcc-14/changes.html#cxx>.

For a more independent source, look at:

<https://en.cppreference.com/w/cpp/compiler_support/20>

Given the position of "cppreference.com" as the reference site
recommended by the C++ committee, that's about as close to an "official"
list as you can reasonably get.

And it shows that gcc has had most of C++20 language support since
version 9 or 10, and by version 11 there are just minor issues left.
(There are /always/ a few minor issues.) Equally, MSVC supports
everything but a few minor issues. (So does clang.)

It's a different matter going forward - for C++23, gcc and clang are
basically complete, while MSVC has just got started on the language
features (they are doing well for the C++23 library).

More relevant for /this/ group would be C support:

<https://en.cppreference.com/w/c/compiler_support/23>

gcc and clang have implemented most of C23 already, while MSVC has just
a couple of small bits backported from recent C++ standards. (And
nobody has #embed yet!)

I am sure MSVC has lots to recommend it as a C++ tool for Windows. But
rapid support for new standards is /not/ a strong point in comparison to
the other big toolchains.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michael S@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 11:01:56 2024

On Tue, 4 Jun 2024 02:07:59 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

MSVC quitted compilation after allocating 50GB of
memory, gcc and clang compiled for minutes.

Next time, don’t even bother with MSVC.

For smaller file (but still much bigger than what is likely to be
encountered in practice, 155 MB binary, 641 MB after conversion to
text) MSVC was both ~1.5x faster than gcc and had lower peak memory consumption.
And that was a "new", slow MSVC. "Old", faster MSVC should fare even
better, if I ever dare to install it on computer with enough RAM.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 06:57:56 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 06:47:28 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	149:05:05
Calls:	10,383
Calls today:	8
Files:	14,054
D/L today:	2 files (1,861K bytes)
Messages:	6,417,763

xxd -i vs DIY Was: C23 thoughts and opinions

Who's Online

Recent Visitors

System Info