• xxd -i vs DIY Was: C23 thoughts and opinions

    From Michael S@21:1/5 to David Brown on Tue May 28 14:41:18 2024
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    No, it does /not/. That's the /whole/ point of #embed, and the main motivation for its existence. People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs).
    I've done so myself on many projects, all integrated together in
    makefiles.


    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY

    /c/altera/13.0sp1/quartus/bin64/db_wys.dll is 52 MB file

    $ time xxd -i < /c/altera/13.0sp1/quartus/bin64/db_wys.dll > xxd.txt

    real 0m15.288s
    user 0m15.054s
    sys 0m0.187s

    $ time ../quick_xxd/bin_to_list1
    /c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list1.txt

    real 0m8.502s
    user 0m0.000s
    sys 0m0.000s

    $ time ../quick_xxd/bin_to_list
    /c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list.txt

    real 0m1.326s
    user 0m0.000s
    sys 0m0.000s

    bin_to_list probably limited by write speed of SSD that in this
    particular case is ~9 y.o. and was used rather intensively during these
    years.

    bin_to_list1 is DYI written in ~5 min.
    bin_to_list is DYI written in ~55 min.
    In post above David Brown mentioned 'other tools (common or
    specialised)'. I'd like to know what they are and how fast they are.


    Appendix A.
    // bin_to_list1.c
    #include <stdio.h>
    #include <stdlib.h>

    int main(int argz, char** argv)
    {
    if (argz > 1) {
    FILE* fp = fopen(argv[1], "rb");
    if (fp) {
    int c;
    while ((c = fgetc(fp)) >= 0)
    printf("%d,\n", c);
    fclose(fp);
    } else {
    perror(argv[1]);
    return 1;
    }
    }
    return 0;
    }
    // end of bin_to_list1.c


    Appendix B.
    // bin_to_list.c
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    static const char usage[] =
    "bin_to_list - convert binary file to comma-delimited list of decimal numbers\n" "Usage:\n"
    " bin_to_list infile [oufile]\n"
    "When output file is not specified, the result is written to standard output.\n" ;
    int main(int argz, char** argv)
    {
    // process command line
    if (argz < 2) {
    fprintf(stderr, "%s", usage);
    return 1;
    }

    char* infilename = argv[1];
    static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
    "--?" }; const int n_help_aliases =
    sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
    n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
    fprintf(stderr, "%s", usage);
    return 0;
    }
    }

    // open files
    FILE* fpin = fopen(infilename, "rb");
    if (!fpin) {
    perror(infilename);
    return 1;
    }

    FILE* fpout = stdout;
    char* outfilename = NULL;
    if (argz > 2) {
    outfilename = argv[2];
    fpout = fopen(outfilename, "w");
    if (!fpout) {
    perror(outfilename);
    fclose(fpin);
    return 1;
    }
    }

    // Initialize table
    char bin2dec[256][4];
    for (int i = 0; i < 256;++i)
    sprintf(bin2dec[i], "%d", i);

    // main loop
    int err = 0;
    int c;
    enum { MAX_CHAR_PER_LINE = 80, MAX_CHAR_PER_NUM = 4,
    ALMOST_FULL_THR = MAX_CHAR_PER_LINE-MAX_CHAR_PER_NUM };
    char outbuf[MAX_CHAR_PER_LINE+1]; // provide space for EOL
    char* outptr = outbuf;
    while ((c = fgetc(fpin)) >= 0) {
    char* dec = bin2dec[c & 255];
    do
    *outptr++ = *dec++;
    while (*dec);
    *outptr++ = ',';
    if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf) {
    err = 2;
    break;
    }
    outptr = outbuf;
    }
    }
    if (ferror(fpin)) {
    perror(infilename);
    err = 1;
    }
    // last line
    if (outptr != outbuf && err == 0) {
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf)
    err = 2;
    }

    // completion and cleanup
    if (err == 2 && outfilename)
    perror(outfilename);

    fclose(fpin);
    if (outfilename) {
    fclose(fpout);
    if (err)
    remove(outfilename);
    }
    return err;
    }
    // end of bin_to_list.c

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Tue May 28 15:06:40 2024
    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    No, it does /not/. That's the /whole/ point of #embed, and the main
    motivation for its existence. People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs).
    I've done so myself on many projects, all integrated together in
    makefiles.


    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY

    /c/altera/13.0sp1/quartus/bin64/db_wys.dll is 52 MB file

    $ time xxd -i < /c/altera/13.0sp1/quartus/bin64/db_wys.dll > xxd.txt

    real 0m15.288s
    user 0m15.054s
    sys 0m0.187s

    $ time ../quick_xxd/bin_to_list1
    /c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list1.txt

    real 0m8.502s
    user 0m0.000s
    sys 0m0.000s

    $ time ../quick_xxd/bin_to_list
    /c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list.txt

    real 0m1.326s
    user 0m0.000s
    sys 0m0.000s

    bin_to_list probably limited by write speed of SSD that in this
    particular case is ~9 y.o. and was used rather intensively during these years.

    bin_to_list1 is DYI written in ~5 min.
    bin_to_list is DYI written in ~55 min.
    In post above David Brown mentioned 'other tools (common or
    specialised)'. I'd like to know what they are and how fast they are.


    I think you might be missing the point here.

    The start point is a possibly large binary data file.

    The end point is to end up with an application whose binary code has
    embedded that data file. (And which makes that data available inside the
    C program as a C data structure.)

    Without #embed, one technique (which I've only learnt about this week)
    is to use a tool called 'xxd' to turn that binary file into C source
    code which contains an initialised array or whatever.

    But, that isn't the bottleneck. You run that conversion once (or
    whenever the binary changes), and use the same resulting C code time you
    build the application. And quite likely, the makefile recognises you
    don't need to compile it anyway.

    It is that building process that can be slow if that C source describing
    the data is large.

    That is what #embed helps to address. At least, if it takes the fast
    path that has been discussed. But implemented naively, or the fast path
    is not viable, then it can be just as slow as compiling that
    xxd-generated C.

    It will at least however have eliminated that xxd step.

    The only translation going on here might be:

    * Expanding a binary file to text, or tokens (if #embed is done poorly)
    * Parsing that text or tokens into the compiler's internal rep

    But all that is happening inside the compiler.

    It might be that when xxd /is/ used, there might be a faster program to
    do the same thing, but I've not heard anyone say xxd's speed is a
    problem, only that it's a nuisance to do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Tue May 28 17:34:19 2024
    On 28/05/2024 13:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    No, it does /not/. That's the /whole/ point of #embed, and the main
    motivation for its existence. People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs).
    I've done so myself on many projects, all integrated together in
    makefiles.


    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY


    I used 100 MB of random data:

    dd if=/dev/urandom bs=1M count=100 of=100MB

    I compiled your code with "gcc-11 -O2 -march=native".

    I ran everything in a tmpfs filesystem, completely in ram.


    xxd took 5.4 seconds - that's the baseline.

    Your simple C code took 4.35 seconds. Your second program took 0.9
    seconds - a big improvement.

    One line of Python code took 8 seconds :

    print(", ".join([hex(b) for b in open("100MB", "rb").read()]))


    A slightly nicer Python program took 14.3 seconds :

    import sys
    bs = open(sys.argv[1], "rb").read()
    xs = "".join([" 0x%02x," % b for b in bs])
    ln = len(xs)
    print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))


    Like "xxd -i", that one split the output into lines of 12 bytes. Some compilers might not like a single 300-600 MB line !


    I didn't try compiling a test file from the 100 MB source data, but gcc
    took about 16 seconds for an include file generated from 20 MB of random
    data. It didn't make a significant difference if the data was in
    decimal or hex, one line or multiple lines.

    But since compilation took about ten times as long as the single line of
    Python code, my conclusion is that the speed of generating the include
    file is pretty much irrelevant. Compared to the one-line Python code
    and considering the generation and compilation combined, using xxd saves
    5% of the time and your best code saves 9% - out of possible 10% cost
    saving.

    Thus if you want to save build time when including large arrays of data
    in the generated executable, time spent on beating xxd is wasted -
    implementing optimised #embed is the only way to make an impact.


    (I have had reason to include a 0.5 MB file in a statically linked
    single binary - I'm not sure when you'd need very fast handling of multi-megabyte embeds.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Tue May 28 17:42:15 2024
    On 28/05/2024 16:06, bart wrote:
    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    No, it does /not/.  That's the /whole/ point of #embed, and the main
    motivation for its existence.  People have always managed to embed
    binary source files into their binary output files - using linker
    tricks, or using xxd or other tools (common or specialised) to turn
    binary files into initialisers for constant arrays (or structs).
    I've done so myself on many projects, all integrated together in
    makefiles.


    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY

    /c/altera/13.0sp1/quartus/bin64/db_wys.dll is 52 MB file

    $ time xxd -i < /c/altera/13.0sp1/quartus/bin64/db_wys.dll > xxd.txt

    real    0m15.288s
    user    0m15.054s
    sys     0m0.187s

    $ time ../quick_xxd/bin_to_list1
    /c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list1.txt

    real    0m8.502s
    user    0m0.000s
    sys     0m0.000s

    $ time ../quick_xxd/bin_to_list
    /c/altera/13.0sp1/quartus/bin64/db_wys.dll > bin_to_list.txt

    real    0m1.326s
    user    0m0.000s
    sys     0m0.000s

    bin_to_list probably limited by write speed of SSD that in this
    particular case is ~9 y.o. and was used rather intensively during these
    years.

    bin_to_list1 is DYI written in ~5 min.
    bin_to_list is DYI written in ~55 min.
    In post above David Brown mentioned 'other tools (common or
    specialised)'. I'd like to know what they are and how fast they are.


    I think you might be missing the point here.

    The start point is a possibly large binary data file.

    The end point is to end up with an application whose binary code has
    embedded that data file. (And which makes that data available inside the
    C program as a C data structure.)

    Without #embed, one technique (which I've only learnt about this week)
    is to use a tool called 'xxd' to turn that binary file into C source
    code which contains an initialised array or whatever.

    But, that isn't the bottleneck. You run that conversion once (or
    whenever the binary changes), and use the same resulting C code time you build the application. And quite likely, the makefile recognises you
    don't need to compile it anyway.

    Exactly, yes.

    (Still, speed tests can be fun as long as you don't pretend they
    actually matter!)


    It is that building process that can be slow if that C source describing
    the data is large.

    It is actually the link step that is typically the bottleneck, as that
    does not easily run in parallel.

    When the "data.bin" file changes, your make (or other build system) will
    see the change and use xxd (or whatever) to generate data.c, and then
    compile that to data.o. It's done once, whenever data.bin changes.
    It's the linking that has to be done at every build of the executable,
    whether "data.bin" changes or not.


    That is what #embed helps to address. At least, if it takes the fast
    path that has been discussed. But implemented naively, or the fast path
    is not viable, then it can be just as slow as compiling that
    xxd-generated C.

    It will at least however have eliminated that xxd step.

    Yes - it makes things a little neater and self-contained. I don't see
    it as a game-changed, but if you want to avoid make, you might like it.


    The only translation going on here might be:

    * Expanding a binary file to text, or tokens (if #embed is done poorly)
    * Parsing that text or tokens into the compiler's internal rep

    But all that is happening inside the compiler.

    It might be that when xxd /is/ used, there might be a faster program to
    do the same thing, but I've not heard anyone say xxd's speed is a
    problem, only that it's a nuisance to do.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Tue May 28 18:14:02 2024
    On 28/05/2024 17:56, Michael S wrote:
    On Tue, 28 May 2024 15:06:40 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    I think you might be missing the point here.


    I don't think so.
    I understand your points and agree with just about everything. My post
    was off topic, intentionally so.

    If we talk about practicalities, the problems with xxd, if there are
    problems at all, are not its speed, but the size of the text file
    it produces (~6x the size of original binary) and its availability.
    I don't know to which package it belongs in typical Linux or BSD distributions, but at least on Windows/msys2 it is part of Vim - rather
    big package for which, apart from xxd, I have no use at all.


    On Debian, xxd is in a package called "xxd" which contains just xxd and directly associated files (like man pages). I see that "vim-common"
    depends on "xxd", so it will get pulled in if you install vim, but not vice-versa. (It's pretty common to have vim installed on *nix systems
    anyway, as part of the base set of general packages.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Tue May 28 18:56:24 2024
    On Tue, 28 May 2024 15:06:40 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    I think you might be missing the point here.


    I don't think so.
    I understand your points and agree with just about everything. My post
    was off topic, intentionally so.

    If we talk about practicalities, the problems with xxd, if there are
    problems at all, are not its speed, but the size of the text file
    it produces (~6x the size of original binary) and its availability.
    I don't know to which package it belongs in typical Linux or BSD
    distributions, but at least on Windows/msys2 it is part of Vim - rather
    big package for which, apart from xxd, I have no use at all.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Tue May 28 19:20:51 2024
    On Tue, 28 May 2024 18:14:02 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On Debian, xxd is in a package called "xxd" which contains just xxd
    and directly associated files (like man pages).


    Good.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Tue May 28 19:57:38 2024
    On 28/05/2024 16:56, Michael S wrote:
    On Tue, 28 May 2024 15:06:40 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    I think you might be missing the point here.


    I don't think so.
    I understand your points and agree with just about everything. My post
    was off topic, intentionally so.

    If we talk about practicalities, the problems with xxd, if there are
    problems at all, are not its speed, but the size of the text file
    it produces (~6x the size of original binary) and its availability.
    I don't know to which package it belongs in typical Linux or BSD distributions, but at least on Windows/msys2 it is part of Vim - rather
    big package for which, apart from xxd, I have no use at all.




    OK, I had go with your program. I used a random data file of exactly
    100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The fastest
    time was with gcc -O3.

    I then tried a simple program in my language, which took 10 seconds.

    I looked more closely at yours, and saw you used a clever method of a
    table of precalculated stringified numbers.

    Using a similar table, plus more direct string handling, the fastest
    timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
    supposed to match your layout, but that turned out to be variable.)

    Both programs have a trailing comma on the last number, which may be problematical, but also not hard to fix.

    I then tried xxd under WSL, and that took 28 seconds, real time, with a
    much larger output (616KB instead of 366KB). But it's using fixed width
    columns of hex, complete with a '0x' prefix.

    Below is that program but in my language. I tried transpiling to C,
    hoping it might be even faster, but it got slower (4.5 seconds with
    gcc-O3). I don't know why. It would need manual porting to C.

    This hardcodes the input filename. 'readfile' is a function in my library.

    --------------------------------

    [0:256]ichar numtable
    [0:256]int numlengths

    proc main=
    ref byte data
    [256]char str
    const perline=21
    int m, n, slen
    byte bb
    ichar s, p

    for i in 0..255 do
    numtable[i] := strdup(strint(i))
    numlengths[i] := strlen(numtable[i])
    od

    data := readfile("/c/data100")
    n := rfsize

    while n do
    m := min(n, perline)
    n- := m
    p := &str[1]

    to m do
    bb := data++^
    s := numtable[bb]
    slen := numlengths[bb]

    to slen do
    p++^ := s++^
    od
    p++^ := ','

    od
    p^ := 0

    println str
    od
    end

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Tue May 28 23:23:15 2024
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 16:56, Michael S wrote:
    On Tue, 28 May 2024 15:06:40 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    I think you might be missing the point here.


    I don't think so.
    I understand your points and agree with just about everything. My
    post was off topic, intentionally so.

    If we talk about practicalities, the problems with xxd, if there are problems at all, are not its speed, but the size of the text file
    it produces (~6x the size of original binary) and its availability.
    I don't know to which package it belongs in typical Linux or BSD distributions, but at least on Windows/msys2 it is part of Vim -
    rather big package for which, apart from xxd, I have no use at all.




    OK, I had go with your program. I used a random data file of exactly
    100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging SSD
    on my test machine and ALOT slower than SSD of David Brown.

    I then tried a simple program in my language, which took 10 seconds.

    I looked more closely at yours, and saw you used a clever method of a
    table of precalculated stringified numbers.

    Using a similar table, plus more direct string handling, the fastest
    timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was supposed to match your layout, but that turned out to be variable.)


    Yes, I try to get line length almost fixed (77 to 80 characters) and
    make no attempts to control number of entries per line.
    Since you used random generator, a density advantage of my approach is
    smaller than in more typical situations, where 2-digit numbers are more
    common than 3-digit numbers.

    Also, I think that random numbers are close to worst case for branch
    predictor / loop length predictor in my inner loop.
    Were I thinking about random case upfront, I'd code an inner loop
    differently. I'd always copy 4 octets (comma would be stored in the same table). After that I would update outptr by length taken from
    additional table, similarly, but not identically to your method below.

    There exist files that have near-random distribution, e.g. anything
    zipped or anything encrypted, but I would think that we rarely want
    them embedded.

    Both programs have a trailing comma on the last number, which may be problematical, but also not hard to fix.


    I don't see where (in C) it could be a problem. On the other hand, I can imagine situations where absence of trailing comma is inconvinient.
    Now, if your language borrow its array initialization syntax from
    Pascal then trailing commas are indeed undesirable.

    I then tried xxd under WSL, and that took 28 seconds, real time, with
    a much larger output (616KB instead of 366KB).

    616 MB, I suppose.
    Timing is very similar to my measurements. It is obvious that in case
    of xxd, unlike in the rest of our cases, the bottleneck is in CPU rather
    than in HD.

    But it's using fixed
    width columns of hex, complete with a '0x' prefix.

    Below is that program but in my language. I tried transpiling to C,
    hoping it might be even faster, but it got slower (4.5 seconds with
    gcc-O3). I don't know why. It would need manual porting to C.


    Why do you measure with gcc -O3 instead of more robust and more popular
    -O2 ? Not that it matters in this particular case, but in general I
    don't think that it is a good idea.

    This hardcodes the input filename. 'readfile' is a function in my
    library.

    --------------------------------

    [0:256]ichar numtable
    [0:256]int numlengths

    proc main=
    ref byte data
    [256]char str
    const perline=21
    int m, n, slen
    byte bb
    ichar s, p

    for i in 0..255 do
    numtable[i] := strdup(strint(i))
    numlengths[i] := strlen(numtable[i])
    od

    data := readfile("/c/data100")
    n := rfsize

    while n do
    m := min(n, perline)
    n- := m
    p := &str[1]

    to m do
    bb := data++^
    s := numtable[bb]
    slen := numlengths[bb]

    to slen do
    p++^ := s++^
    od
    p++^ := ','

    od
    p^ := 0

    println str
    od
    end


    Reading whole file upfront is undoubtly faster than interleaving of
    reads and writes. But by my set of unwritten rules that I imposed on
    myself, it is cheating.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Wed May 29 00:45:30 2024
    On Tue, 28 May 2024 23:23:15 +0300
    Michael S <already5chosen@yahoo.com> wrote:


    Also, I think that random numbers are close to worst case for branch predictor / loop length predictor in my inner loop.
    Were I thinking about random case upfront, I'd code an inner loop differently. I'd always copy 4 octets (comma would be stored in the
    same table). After that I would update outptr by length taken from
    additional table, similarly, but not identically to your method below.


    That's what I had in mind:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    static const char usage[] =
    "bin_to_list - convert binary file to comma-delimited list of decimal numbers\n" "Usage:\n"
    " bin_to_list infile [oufile]\n"
    "When output file is not specified, the result is written to standard output.\n" ;
    int main(int argz, char** argv)
    {
    // process command line
    if (argz < 2) {
    fprintf(stderr, "%s", usage);
    return 1;
    }

    char* infilename = argv[1];
    static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
    "--?" }; const int n_help_aliases =
    sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
    n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
    fprintf(stderr, "%s", usage);
    return 0;
    }
    }

    // open files
    FILE* fpin = fopen(infilename, "rb");
    if (!fpin) {
    perror(infilename);
    return 1;
    }

    FILE* fpout = stdout;
    char* outfilename = NULL;
    if (argz > 2) {
    outfilename = argv[2];
    fpout = fopen(outfilename, "w");
    if (!fpout) {
    perror(outfilename);
    fclose(fpin);
    return 1;
    }
    }

    enum { MAX_CHAR_PER_LINE = 80, MAX_CHAR_PER_NUM = 4,
    ALMOST_FULL_THR = MAX_CHAR_PER_LINE-MAX_CHAR_PER_NUM };
    // Initialize table
    unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1]; //
    bin2dec[MAX_CHAR_PER_NUM] => length for (int i = 0; i < 256;++i) {
    char tmp[8];
    int len = sprintf(tmp, "%d,", i);
    memcpy(bin2dec[i], tmp, MAX_CHAR_PER_NUM);
    bin2dec[i][MAX_CHAR_PER_NUM] = (unsigned char)len;
    }

    // main loop
    int err = 0;
    int c;
    unsigned char outbuf[MAX_CHAR_PER_LINE+MAX_CHAR_PER_NUM]; // provide
    space for EOL unsigned char* outptr = outbuf;
    while ((c = fgetc(fpin)) >= 0) {
    unsigned char* dec = bin2dec[c & 255];
    memcpy(outptr, dec, MAX_CHAR_PER_NUM);
    outptr += dec[MAX_CHAR_PER_NUM];
    if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf) {
    err = 2;
    break;
    }
    outptr = outbuf;
    }
    }
    if (ferror(fpin)) {
    perror(infilename);
    err = 1;
    }
    // last line
    if (outptr != outbuf && err == 0) {
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf)
    err = 2;
    }

    // completion and cleanup
    if (err == 2 && outfilename)
    perror(outfilename);

    fclose(fpin);
    if (outfilename) {
    fclose(fpout);
    if (err)
    remove(outfilename);
    }
    return err;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Tue May 28 23:08:22 2024
    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:


    OK, I had go with your program. I used a random data file of exactly
    100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging SSD
    on my test machine and ALOT slower than SSD of David Brown.


    My machine uses an SSD.

    However the tests were run on Windows, so I ran your program again under
    WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).


    I then tried a simple program in my language, which took 10 seconds.

    I looked more closely at yours, and saw you used a clever method of a
    table of precalculated stringified numbers.

    Using a similar table, plus more direct string handling, the fastest
    timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
    supposed to match your layout, but that turned out to be variable.)


    Yes, I try to get line length almost fixed (77 to 80 characters) and
    make no attempts to control number of entries per line.
    Since you used random generator, a density advantage of my approach is smaller than in more typical situations, where 2-digit numbers are more common than 3-digit numbers.

    Also, I think that random numbers are close to worst case for branch predictor / loop length predictor in my inner loop.
    Were I thinking about random case upfront, I'd code an inner loop differently. I'd always copy 4 octets (comma would be stored in the same table). After that I would update outptr by length taken from
    additional table, similarly, but not identically to your method below.

    The difference in file sizes for N bytes will be a factor of 2:1 maximum
    (all "1," or all "123," for example).

    There exist files that have near-random distribution, e.g. anything
    zipped or anything encrypted, but I would think that we rarely want
    them embedded.

    This hardcodes the input filename. 'readfile' is a function in my
    library.

    --------------------------------

    [0:256]ichar numtable
    [0:256]int numlengths

    proc main=
    ref byte data
    [256]char str
    const perline=21
    int m, n, slen
    byte bb
    ichar s, p

    for i in 0..255 do
    numtable[i] := strdup(strint(i))
    numlengths[i] := strlen(numtable[i])
    od

    data := readfile("/c/data100")

    Reading whole file upfront is undoubtly faster than interleaving of
    reads and writes. But by my set of unwritten rules that I imposed on
    myself, it is cheating.

    Why not? Isn't the whole point to have a practical tool which is faster
    than xxd?

    I never use buffered file input or request file data a character at a
    time from a file system API; who knows how inefficient it might be.

    I looked at your code again, and saw you're using fwrite to output each
    line. If I adapt my 'readln' (which ends up calling 'printf') to use
    fwrite too, then my timing reduces from 3.1 to 2.1 seconds.

    Of that 2.1 seconds, the file-loading time is 0.03 seconds. If I switch
    to using a fgetc loop (after determining the file size; it still loads
    the whole file), the file-loading takes nearly 2 seconds. Overall the
    timing becomes only a little faster than the gcc-compiled C code.

    (My compiler doesn't have an equivalent optimiser, but this is mostly
    about I/O and algorithm.)

    My view is that my approach leads to a simpler program.

    Maybe processing very large files that won't fit into memory might be a problem, eg. a 2GB binary. But remember that the output will be a text
    file of 4-8GB, which for now has to be processed by a compiler which has
    to build data structures in memory to represent that, taking even more
    space.

    So they would be unviable anyway.

    A proper 'embed' feature would most likely have to load the entire
    binary file into memory too.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Wed May 29 01:24:56 2024
    On Tue, 28 May 2024 23:08:22 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:


    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.


    My machine uses an SSD.

    SSDs are not created equal. Especially for writes.


    However the tests were run on Windows, so I ran your program again
    under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).



    3 times slower ?!
    I never tested it myself, but I heard that there is a significant
    difference in file access speed between WSL's own file system and
    mounted Windows directories. The difference under WSL is not as big
    as under WSL2 where they say that access of mounted Windows filesystem
    is very slow, but still significant.
    I don't know if it applies to all file sizes or only to accessing many
    small files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Wed May 29 00:54:23 2024
    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100

    OK, I had go with your program. I used a random data file of exactly
    100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging SSD
    on my test machine and ALOT slower than SSD of David Brown.

    David Brown's machines are always faster than anyone else's.

    Your machine showed 1.3 seconds for 50MB, or 2.6 seconds for 100MB.

    Unchanged, the fastest on my machine was 4.1 seconds for 100MB.

    I've since tweaked your program to isolate the reading and writing
    parts, also to make it load the whole file in one fread call, so that I
    can compare overall times better.

    One more thing I did was to write to a specific named file, not to stdout.

    Overall, gcc-O2/O3 (they're the same) now has a best timing of 1.9
    seconds. My language's version with my compiler has a best timing of 2.0 seconds.

    (Unoptimised gcc, mcc, tcc give timings of around 2.7 seconds.

    DMC (and old 32-bit compiler) is unoptimised, and 1.5 optimised.

    lccwin32 takes over 5 seconds in either case.)

    I suspect that your system just has a much faster fgetc implementation.
    How long does an fgetc() loop over a 100MB input take on your machine?

    On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL. Using
    DMC, it's 0.65 seconds.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Wed May 29 01:29:00 2024
    On 28/05/2024 22:45, Michael S wrote:
    On Tue, 28 May 2024 23:23:15 +0300
    Michael S <already5chosen@yahoo.com> wrote:


    Also, I think that random numbers are close to worst case for branch
    predictor / loop length predictor in my inner loop.
    Were I thinking about random case upfront, I'd code an inner loop
    differently. I'd always copy 4 octets (comma would be stored in the
    same table). After that I would update outptr by length taken from
    additional table, similarly, but not identically to your method below.


    That's what I had in mind:


    unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1]; //
    bin2dec[MAX_CHAR_PER_NUM] => length for (int i = 0; i < 256;++i) {

    Is this a comment that has wrapped?

    After fixing a few such line breaks, this runs at 3.6 seconds compared
    with 4.1 seconds for the original.

    Although I don't quite understand the comments about branch prediction.

    I think runtime is still primarily spent in I/O.

    If I take the 1.9 second version, and remove the fwrite, then it runs in
    0.8 seconds. 0.7 of that is generating the text (366MB's worth, a line
    at a time).

    In my language that part takes 0.9 seconds, which is a more typical
    difference due to gcc's superior optimiser.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Wed May 29 09:21:09 2024
    On Wed, 29 May 2024 01:29:00 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 22:45, Michael S wrote:
    On Tue, 28 May 2024 23:23:15 +0300
    Michael S <already5chosen@yahoo.com> wrote:


    Also, I think that random numbers are close to worst case for
    branch predictor / loop length predictor in my inner loop.
    Were I thinking about random case upfront, I'd code an inner loop
    differently. I'd always copy 4 octets (comma would be stored in the
    same table). After that I would update outptr by length taken from
    additional table, similarly, but not identically to your method
    below.

    That's what I had in mind:


    unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1]; //
    bin2dec[MAX_CHAR_PER_NUM] => length for (int i = 0; i < 256;++i)
    {

    Is this a comment that has wrapped?

    After fixing a few such line breaks, this runs at 3.6 seconds
    compared with 4.1 seconds for the original.

    Although I don't quite understand the comments about branch
    prediction.

    I think runtime is still primarily spent in I/O.


    That's undoubtedly correct.
    But high branch mispredict rate still can add to total time.
    Suppose we have branch misprediction at the end of inner loop in 40% of
    the input bytes. On the processor that was running my original test
    (Intel Haswell at 4GHz) each mispredict cost ~15 clocks = 3.75 ns.
    3.75ns * 0.4 * 100M = 150 msec
    I don't know how much it costs on your hardware since you didn't tell
    me what it is.

    But I am more intrigued by slowness on WSL.
    Did you compare native vs mounted file systems?

    If I take the 1.9 second version, and remove the fwrite, then it runs
    in 0.8 seconds. 0.7 of that is generating the text (366MB's worth, a
    line at a time).

    In my language that part takes 0.9 seconds, which is a more typical difference due to gcc's superior optimiser.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Wed May 29 10:32:29 2024
    On 29/05/2024 01:54, bart wrote:
    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100

    OK, I had go with your program. I used a random data file of exactly
    100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging SSD
    on my test machine and ALOT slower than SSD of David Brown.

    David Brown's machines are always faster than anyone else's.

    That seems /highly/ unlikely. Admittedly the machine I tested on is
    fairly new - less than a year old. But it's a little NUC-style machine
    at around the $1000 price range, with a laptop processor. The only
    thing exciting about it is 64 GB ram (I like to run a lot of things at
    the same time in different workspaces).

    But I am better than some people at getting my machines to run programs efficiently. I don't use Windows for such things (I happily run Windows
    on a different machine for other purposes), and I certainly don't use
    layers of OS or filesystem emulation such as WSL and expect code to run
    at maximal speed.

    And as I said in an earlier post, I didn't have the files on any kind of
    disk or SSD at all - they were all in a tmpfs filesystem to eliminate
    that bottleneck.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Wed May 29 10:18:45 2024
    On 28/05/2024 22:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 16:56, Michael S wrote:
    On Tue, 28 May 2024 15:06:40 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 12:41, Michael S wrote:
    On Sun, 26 May 2024 13:09:36 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    I think you might be missing the point here.


    I don't think so.
    I understand your points and agree with just about everything. My
    post was off topic, intentionally so.

    If we talk about practicalities, the problems with xxd, if there are
    problems at all, are not its speed, but the size of the text file
    it produces (~6x the size of original binary) and its availability.
    I don't know to which package it belongs in typical Linux or BSD
    distributions, but at least on Windows/msys2 it is part of Vim -
    rather big package for which, apart from xxd, I have no use at all.




    OK, I had go with your program. I used a random data file of exactly
    100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging SSD
    on my test machine and ALOT slower than SSD of David Brown.

    I didn't use an SSD - I used a tmpfs filesystem, so no disk at all.
    There are still limits from memory bandwidth, of course.


    I then tried a simple program in my language, which took 10 seconds.

    I looked more closely at yours, and saw you used a clever method of a
    table of precalculated stringified numbers.

    Using a similar table, plus more direct string handling, the fastest
    timing on mine was 3.1 seconds, with 21 numbers per line. (The 21 was
    supposed to match your layout, but that turned out to be variable.)


    Yes, I try to get line length almost fixed (77 to 80 characters) and
    make no attempts to control number of entries per line.
    Since you used random generator, a density advantage of my approach is smaller than in more typical situations, where 2-digit numbers are more common than 3-digit numbers.

    Also, I think that random numbers are close to worst case for branch predictor / loop length predictor in my inner loop.

    That makes them a good test here.

    Were I thinking about random case upfront, I'd code an inner loop differently. I'd always copy 4 octets (comma would be stored in the same table). After that I would update outptr by length taken from
    additional table, similarly, but not identically to your method below.

    There exist files that have near-random distribution, e.g. anything
    zipped or anything encrypted, but I would think that we rarely want
    them embedded.


    I'd say these are actually quite common cases.

    Both programs have a trailing comma on the last number, which may be
    problematical, but also not hard to fix.


    I don't see where (in C) it could be a problem. On the other hand, I can imagine situations where absence of trailing comma is inconvinient.
    Now, if your language borrow its array initialization syntax from
    Pascal then trailing commas are indeed undesirable.


    For initialising arrays, an extra comma at the end is no issue for C.
    But for more general use, #embed can also be used for things like
    function call or macro parameters, and the extra comma is then a
    problem. So your program is not a direct alternative to "xxd -i" or
    #embed in those cases. (I don't think such uses would be common in
    practice.)

    I then tried xxd under WSL, and that took 28 seconds, real time, with
    a much larger output (616KB instead of 366KB).

    616 MB, I suppose.
    Timing is very similar to my measurements. It is obvious that in case
    of xxd, unlike in the rest of our cases, the bottleneck is in CPU rather
    than in HD.

    But it's using fixed
    width columns of hex, complete with a '0x' prefix.

    Below is that program but in my language. I tried transpiling to C,
    hoping it might be even faster, but it got slower (4.5 seconds with
    gcc-O3). I don't know why. It would need manual porting to C.


    Why do you measure with gcc -O3 instead of more robust and more popular
    -O2 ? Not that it matters in this particular case, but in general I
    don't think that it is a good idea.


    Bart always likes to use "gcc -O3", no matter how often people tell him
    that it should not be assumed to give faster results than "gcc -O2". It
    often results in a few extra percent of speed, but sometimes works
    against speed (depending on things like cache sizes, branch prediction,
    and other hard to predict factors). Alternatively, he uses gcc without
    any optimisation and still thinks the speed results are relevant.


    Reading whole file upfront is undoubtly faster than interleaving of
    reads and writes. But by my set of unwritten rules that I imposed on
    myself, it is cheating.


    I would expect that for big files, mmap would be the most efficient
    method. You might also want to use cache pre-loads of later parts of
    the file as you work through it. But I don't know portable that would
    be, and it would be more effort to write.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Wed May 29 12:44:43 2024
    On Wed, 29 May 2024 09:21:09 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Wed, 29 May 2024 01:29:00 +0100
    bart <bc@freeuk.com> wrote:


    I think runtime is still primarily spent in I/O.


    That's undoubtedly correct.

    :(
    Two hours later it turned out to be completely incorrect. That is, the
    time was spent in routine related to I/O, but in the 'soft' part of it
    rather than in the I/O itself.
    Hopefully, next time I'd remember to avoid the word 'undoubtedly'.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Wed May 29 13:08:18 2024
    On Wed, 29 May 2024 10:32:29 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 29/05/2024 01:54, bart wrote:
    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100

    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.

    David Brown's machines are always faster than anyone else's.

    That seems /highly/ unlikely. Admittedly the machine I tested on is
    fairly new - less than a year old. But it's a little NUC-style
    machine at around the $1000 price range, with a laptop processor.
    The only thing exciting about it is 64 GB ram (I like to run a lot of
    things at the same time in different workspaces).


    Modern laptop processors with adequate cooling can be as fast as
    desktop (and faster than server) for a task that uses only 1 or 2
    cores. Especially when no heavy vector math involved. If the task runs
    only for few seconds, like in our tests, then they CPU can be fast even
    without good cooling.
    And $1000 is not exactly low price for mini-PC without display. Last
    time I bought one for my mother, it costed ~$650 including Win11 Home
    Ed.

    But I am better than some people at getting my machines to run
    programs efficiently. I don't use Windows for such things (I happily
    run Windows on a different machine for other purposes), and I
    certainly don't use layers of OS or filesystem emulation such as WSL
    and expect code to run at maximal speed.


    WSL would not affect user-level CPU-bound part and even majority of kernel-level CPU-bound parts. It can slow down I/O, yes. But it turned
    out (see my post above) that the bottleneck was in CPU.

    And as I said in an earlier post, I didn't have the files on any kind
    of disk or SSD at all - they were all in a tmpfs filesystem to
    eliminate that bottleneck.


    You should have said it yesterday.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Wed May 29 12:38:52 2024
    On Wed, 29 May 2024 00:54:23 +0100
    bart <bc@freeuk.com> wrote:


    I suspect that your system just has a much faster fgetc
    implementation. How long does an fgetc() loop over a 100MB input take
    on your machine?

    On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL.
    Using DMC, it's 0.65 seconds.


    Your suspicion proved incorrect, but it turned out to be pretty good
    question!


    #include <stdio.h>

    static const char usage[] =
    "fgetc_test - read file with fgetc() and calculate xor checksum\n"
    "Usage:\n"
    " fgetc_test infile\n"
    ;
    int main(int argz, char** argv)
    {
    // process command line
    if (argz < 2) {
    fprintf(stderr, "%s", usage);
    return 1;
    }

    char* infilename = argv[1];
    static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
    "--?" }; const int n_help_aliases =
    sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
    n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
    fprintf(stderr, "%s", usage);
    return 0;
    }
    }

    // open files
    FILE* fpin = fopen(infilename, "rb");
    if (!fpin) {
    perror(infilename);
    return 1;
    }

    size_t n = 0;
    unsigned char cs = 0;
    int c;
    while ((c = fgetc(fpin)) >= 0) {
    cs ^= (unsigned char)c;
    ++n;
    }
    if (ferror(fpin)) {
    perror(infilename);
    return 1;
    }

    printf("%zd byte. xor sum %d.\n", n, cs);
    return 0;
    }

    $ time ../quick_xxd/getc_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m3.604s
    user 0m0.000s
    sys 0m0.000s

    52 MB/s. Very very slow!

    The same test with getc() instead of fgetc().

    $ time ../quick_xxd/getc_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m3.588s
    user 0m0.000s
    sys 0m0.000s


    54 MB/s. Almost the same as above.


    So, may be, fgetc() is not at fault? May be, its OS and the crap that
    the corporate IT adds on top of the OS?
    Let's test this hipothesys.

    #include <stdio.h>

    static const char usage[] =
    "fread_test - read file with fread() and calculate xor checksum\n"
    "Usage:\n"
    " fread_test infile\n"
    ;
    int main(int argz, char** argv)
    {
    // process command line
    if (argz < 2) {
    fprintf(stderr, "%s", usage);
    return 1;
    }

    char* infilename = argv[1];
    static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
    "--?" }; const int n_help_aliases =
    sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
    n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
    fprintf(stderr, "%s", usage);
    return 0;
    }
    }

    // open files
    FILE* fpin = fopen(infilename, "rb");
    if (!fpin) {
    perror(infilename);
    return 1;
    }

    size_t n = 0;
    unsigned char cs = 0;
    for (;;) {
    enum { BUF_SZ = 128*1024 };
    unsigned char inpbuf[BUF_SZ];
    size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
    n += len;
    for (int i = 0; i < (int)len; ++i)
    cs ^= inpbuf[i];
    if (len != BUF_SZ)
    break;
    }
    if (ferror(fpin)) {
    perror(infilename);
    return 1;
    }

    printf("%zd byte. xor sum %d.\n", n, cs);
    return 0;
    }

    $ time ../quick_xxd/fread_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m0.312s
    user 0m0.000s
    sys 0m0.000s

    $ time ../quick_xxd/fread_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m0.109s
    user 0m0.000s
    sys 0m0.000s

    $ time ../quick_xxd/fread_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m0.094s
    user 0m0.000s
    sys 0m0.000s

    So, at least for reading of multi-megabyte file the OS and corporate
    crap are not holding me back. The first read is 620 MB/s - as expected
    for SATA-3 SSD. Repeating reads are from OS cache - not as fast as on
    Linux, but fast enough to not be a bottleneck in our xxd replacement
    gear.

    So, let's rewrite our tiny app with fread().

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    static const char usage[] =
    "bin_to_list - convert binary file to comma-delimited list of decimal numbers\n" "Usage:\n"
    " bin_to_list infile [oufile]\n"
    "When output file is not specified, the result is written to standard output.\n" ;
    int main(int argz, char** argv)
    {
    // process command line
    if (argz < 2) {
    fprintf(stderr, "%s", usage);
    return 1;
    }

    char* infilename = argv[1];
    static const char *help_aliases[] = { "-h", "-H", "-?", "--help",
    "--?" }; const int n_help_aliases =
    sizeof(help_aliases)/sizeof(help_aliases[0]); for (int i = 0; i <
    n_help_aliases; ++i) { if (strcmp(infilename, help_aliases[i])==0) {
    fprintf(stderr, "%s", usage);
    return 0;
    }
    }

    // open files
    FILE* fpin = fopen(infilename, "rb");
    if (!fpin) {
    perror(infilename);
    return 1;
    }

    FILE* fpout = stdout;
    char* outfilename = NULL;
    if (argz > 2) {
    outfilename = argv[2];
    fpout = fopen(outfilename, "w");
    if (!fpout) {
    perror(outfilename);
    fclose(fpin);
    return 1;
    }
    }

    enum { MAX_CHAR_PER_LINE = 80, MAX_CHAR_PER_NUM = 4,
    ALMOST_FULL_THR = MAX_CHAR_PER_LINE-MAX_CHAR_PER_NUM };
    // Initialize table
    unsigned char bin2dec[256][MAX_CHAR_PER_NUM+1];
    // bin2dec[MAX_CHAR_PER_NUM] => length
    for (int i = 0; i < 256;++i) {
    char tmp[8];
    int len = sprintf(tmp, "%d,", i);
    memcpy(bin2dec[i], tmp, MAX_CHAR_PER_NUM);
    bin2dec[i][MAX_CHAR_PER_NUM] = (unsigned char)len;
    }

    // main loop
    int err = 0;
    unsigned char outbuf[MAX_CHAR_PER_LINE+MAX_CHAR_PER_NUM];
    // provide space for EOL
    unsigned char* outptr = outbuf;
    for (;;) {
    enum { BUF_SZ = 128*1024 };
    unsigned char inpbuf[BUF_SZ];
    size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
    for (int i = 0; i < (int)len; ++i) {
    unsigned char* dec = bin2dec[inpbuf[i] & 255];
    memcpy(outptr, dec, MAX_CHAR_PER_NUM);
    outptr += dec[MAX_CHAR_PER_NUM];
    if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf) {
    err = 2;
    break;
    }
    outptr = outbuf;
    }
    }
    if (err || len != BUF_SZ)
    break;
    }
    if (ferror(fpin)) {
    perror(infilename);
    err = 1;
    }
    // last line
    if (outptr != outbuf && err == 0) {
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf)
    err = 2;
    }

    // completion and cleanup
    if (err == 2 && outfilename)
    perror(outfilename);

    fclose(fpin);
    if (outfilename) {
    fclose(fpout);
    if (err)
    remove(outfilename);
    }
    return err;
    }

    Now the test. Input file size: 88,200,192 bytes
    $ time ../quick_xxd/bin_to_listmb
    /d/intelFPGA/18.1/quartus/bin64/db_wys.dll uu.txt

    real 0m0.577s
    user 0m0.000s
    sys 0m0.000s

    152.8 MB/s. That's much better. Some people would even say that it is
    good enough.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Wed May 29 14:10:14 2024
    On 29/05/2024 12:08, Michael S wrote:
    On Wed, 29 May 2024 10:32:29 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 29/05/2024 01:54, bart wrote:
    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100

    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.

    David Brown's machines are always faster than anyone else's.

    That seems /highly/ unlikely. Admittedly the machine I tested on is
    fairly new - less than a year old. But it's a little NUC-style
    machine at around the $1000 price range, with a laptop processor.
    The only thing exciting about it is 64 GB ram (I like to run a lot of
    things at the same time in different workspaces).


    Modern laptop processors with adequate cooling can be as fast as
    desktop (and faster than server) for a task that uses only 1 or 2
    cores. Especially when no heavy vector math involved. If the task runs
    only for few seconds, like in our tests, then they CPU can be fast even without good cooling.

    Sure, it is not a slow processor - but it is nothing extreme. Bart has regularly accused me of using top-range super fast computers when I've
    given speed tests that are faster than he gets, but generally it's just
    more efficient use of the computer.

    And $1000 is not exactly low price for mini-PC without display. Last
    time I bought one for my mother, it costed ~$650 including Win11 Home
    Ed.

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD) don't
    come free. (And I buy these bare-bones. Machines with Windows
    "pre-installed" are often cheaper because they are sponsored by the
    junk-ware and ad-ware forced on unsuspecting users.)

    But it is not a dramatic speed-demon PC. I get faster results by using
    Linux instead of Windows (or worse, WSL on Windows), and by using tmpfs
    rather than a filesystem on a disk.


    But I am better than some people at getting my machines to run
    programs efficiently. I don't use Windows for such things (I happily
    run Windows on a different machine for other purposes), and I
    certainly don't use layers of OS or filesystem emulation such as WSL
    and expect code to run at maximal speed.


    WSL would not affect user-level CPU-bound part and even majority of kernel-level CPU-bound parts. It can slow down I/O, yes. But it turned
    out (see my post above) that the bottleneck was in CPU.

    OK.

    I've seen odd things with timings due to Windows' relatively poor IO,
    file and disk handling. Many years ago when I had need of speed-testing
    some large windows-based build system, I found it was faster running in
    a virtual windows machine on VirtualBox on a Linux host, than in native
    Windows on the same hardware.

    But usually the OS will not much affect code that is cpu or memory bound.


    And as I said in an earlier post, I didn't have the files on any kind
    of disk or SSD at all - they were all in a tmpfs filesystem to
    eliminate that bottleneck.


    You should have said it yesterday.


    I did. I mentioned it in my post comparing the timings of xxd, your
    program, and some extremely simple Python code giving the same outputs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Wed May 29 12:23:51 2024
    On 29/05/2024 10:38, Michael S wrote:
    On Wed, 29 May 2024 00:54:23 +0100
    bart <bc@freeuk.com> wrote:


    I suspect that your system just has a much faster fgetc
    implementation. How long does an fgetc() loop over a 100MB input take
    on your machine?

    On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL.
    Using DMC, it's 0.65 seconds.


    Your suspicion proved incorrect, but it turned out to be pretty good question!



    $ time ../quick_xxd/getc_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m3.604s
    user 0m0.000s
    sys 0m0.000s

    52 MB/s. Very very slow!

    I got these results for a 100MB input. All are optimised where possible:

    mcc 1.9 seconds
    gcc 1.9
    tcc 1.95
    lccwin32 0.7
    DMC 0.7

    The first three likely just use fgetc from msvcrt.dll. The other two
    probably use their own libraries.

    So, may be, fgetc() is not at fault? May be, its OS and the crap that
    the corporate IT adds on top of the OS?



    Let's test this hipothesys.

    $ time ../quick_xxd/fread_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m0.094s
    user 0m0.000s
    sys 0m0.000s

    I get these results:

    mcc 0.25 seconds
    gcc 0.25
    tcc 0.35
    lccwin32 0.35
    DMC 0.3

    All are repeated runs of the same file, so all timings likely used
    cached version of the data file.

    Most of my tests assume that since (1) I don't know how to to do a
    'cold' load without restarting my machines; (2) in real applications
    such as compilers the same files are repeatedly processed anyway, eg.
    you're compiling the file you've just edited, or just downloaded, or
    just copied...

    So, let's rewrite our tiny app with fread().

    real 0m0.577s
    user 0m0.000s
    sys 0m0.000s

    152.8 MB/s. That's much better. Some people would even say that it is
    good enough.

    I now get:

    mcc 2.3 seconds
    gcc 1.6
    tcc 2.3
    lccwin32 2.9
    DMC 2.9

    You might remember that the last revised version of your test, compiled
    with gcc, took 3.6 seconds, of which 2 seconds was reading the file a
    byte at a time took 2 seconds.

    By using a 128KB buffer, you get most of the benefits of reading the
    whole file at once (it just lacks the simplicity). So nearly all of that
    2 seconds is saved.

    3.6 - 2.0 is 1.6, pretty much the timing here.


    Two hours later it turned out to be completely incorrect. That is, the
    time was spent in routine related to I/O, but in the 'soft' part of it
    rather than in the I/O itself.

    You don't count time spent within file-functions as I/O? To me 'I/O' is whatever happens the other side of those f* functions, including
    whatever poor buffering strategies they could be using.

    Because 'fgetc' could also have been implemented using a 128KB buffer
    instead of 512 bytes or whatever it uses.

    I discovered the poor qualities of fgetc many years ago and generally
    avoid it; it seems you've only just realised its problems.

    BTW I also tweaked the code in my own-language version of the benchmark.
    (I also ported it to C, but that version got accidentally deleted). The
    fastest timing of this test is now 1.65 seconds.

    If I comment out the 'fwrite' call, the timing becomes 0.7 seconds, of
    which 50ms is reading in the file, leaving 0.65 seconds.

    So the I/O in this case accounts for 1.0 seconds of the 1.65 seconds
    runtime, so when I said:

    I think runtime is still primarily spent in I/O.

    That was actually correct.

    If I comment out the 'fwrite' calls in your program, the runtime reduces
    to 0.2 seconds, so it is even more correct in that case. Or is 'fwrite'
    a 'soft' I/O call too?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Wed May 29 15:27:54 2024
    On Wed, 29 May 2024 14:10:14 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    I did. I mentioned it in my post comparing the timings of xxd, your
    program, and some extremely simple Python code giving the same
    outputs.


    Then both me and Bart didn't pay attention to this part of your
    yesterday's post. Sorry.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Bonita Montero on Wed May 29 15:43:08 2024
    On Wed, 29 May 2024 14:38:14 +0200
    Bonita Montero <Bonita.Montero@gmail.com> wrote:


    but I muliple times
    struggled with ifstream and ofstream in terms of performance.


    You earned it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Wed May 29 15:23:25 2024
    On Wed, 29 May 2024 12:23:51 +0100
    bart <bc@freeuk.com> wrote:

    On 29/05/2024 10:38, Michael S wrote:
    On Wed, 29 May 2024 00:54:23 +0100
    bart <bc@freeuk.com> wrote:


    I suspect that your system just has a much faster fgetc
    implementation. How long does an fgetc() loop over a 100MB input
    take on your machine?

    On mine it's about 2 seconds on Windows, and 3.7 seconds on WSL.
    Using DMC, it's 0.65 seconds.


    Your suspicion proved incorrect, but it turned out to be pretty good question!



    $ time ../quick_xxd/getc_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m3.604s
    user 0m0.000s
    sys 0m0.000s

    52 MB/s. Very very slow!

    I got these results for a 100MB input. All are optimised where
    possible:

    mcc 1.9 seconds
    gcc 1.9
    tcc 1.95
    lccwin32 0.7
    DMC 0.7

    The first three likely just use fgetc from msvcrt.dll. The other two
    probably use their own libraries.

    So, may be, fgetc() is not at fault? May be, its OS and the crap
    that the corporate IT adds on top of the OS?



    Let's test this hipothesys.

    $ time ../quick_xxd/fread_test.exe uu.txt
    193426754 byte. xor sum 1.

    real 0m0.094s
    user 0m0.000s
    sys 0m0.000s

    I get these results:

    mcc 0.25 seconds
    gcc 0.25
    tcc 0.35
    lccwin32 0.35
    DMC 0.3

    All are repeated runs of the same file, so all timings likely used
    cached version of the data file.

    Most of my tests assume that since (1) I don't know how to to do a
    'cold' load without restarting my machines; (2) in real applications
    such as compilers the same files are repeatedly processed anyway, eg.
    you're compiling the file you've just edited, or just downloaded, or
    just copied...

    So, let's rewrite our tiny app with fread().

    real 0m0.577s
    user 0m0.000s
    sys 0m0.000s

    152.8 MB/s. That's much better. Some people would even say that it
    is good enough.

    I now get:

    mcc 2.3 seconds
    gcc 1.6
    tcc 2.3
    lccwin32 2.9
    DMC 2.9


    Mine was with MSVC from VS2019. gcc on msys2 (ucrt64 variant) should be identical.
    I wonder why your results are so much slower than mine.
    Slow write speed of SSD or slow CPU?

    You might remember that the last revised version of your test,
    compiled with gcc, took 3.6 seconds, of which 2 seconds was reading
    the file a byte at a time took 2 seconds.

    By using a 128KB buffer, you get most of the benefits of reading the
    whole file at once

    I hope so.

    (it just lacks the simplicity).

    The simplicity in your case is due to complexity of figuring out the
    size of the file and of memory allocation and of handling potential
    failure of memory allocation all hidden within run-time library of your language.
    And even despite all the upfront work that went into your
    infrastructure, it probably would not be able to deal with big files on
    32-bit system.
    Yes, I know, 32-bit compiler would not be able to compile the resulting
    big include anyway. But still...

    So nearly all of that 2 seconds is saved.

    3.6 - 2.0 is 1.6, pretty much the timing here.


    Two hours later it turned out to be completely incorrect. That is,
    the
    time was spent in routine related to I/O, but in the 'soft' part of it
    rather than in the I/O itself.

    You don't count time spent within file-functions as I/O? To me 'I/O'
    is whatever happens the other side of those f* functions, including
    whatever poor buffering strategies they could be using.

    Because 'fgetc' could also have been implemented using a 128KB buffer
    instead of 512 bytes or whatever it uses.

    I discovered the poor qualities of fgetc many years ago and generally
    avoid it; it seems you've only just realised its problems.


    Octet-by-octet processing of big files is not how I earn butter to put
    on my bread.
    When I write this sort of utilities for real work, the size of the
    input is not arbitrary. Typically I deal with small files (small
    relatively to memory capacity of target machines, so 100 MB is still
    considered small). So for real work more often than not I use the same
    strategy as your did in the program in your language.

    BTW I also tweaked the code in my own-language version of the
    benchmark. (I also ported it to C, but that version got accidentally deleted). The fastest timing of this test is now 1.65 seconds.

    If I comment out the 'fwrite' call, the timing becomes 0.7 seconds,
    of which 50ms is reading in the file, leaving 0.65 seconds.

    So the I/O in this case accounts for 1.0 seconds of the 1.65 seconds
    runtime, so when I said:

    I think runtime is still primarily spent in I/O.

    That was actually correct.


    That was incorrect for the previous variant of my program.
    Quite likely, it was correct for the program in your language that was
    loading a full file before processing.
    Quite likely it is correct for my latest variant.
    Pay attention how I remember to avoid categorical statements ;-)

    If I comment out the 'fwrite' calls in your program, the runtime
    reduces to 0.2 seconds, so it is even more correct in that case. Or
    is 'fwrite' a 'soft' I/O call too?


    I think not (on my test gear), but I no longer know.
    However I suspect that on Linux with plenty of memory the program could
    return before even a single byte of output data was sent to SSD.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Wed May 29 15:19:14 2024
    On 29/05/2024 14:27, Michael S wrote:
    On Wed, 29 May 2024 14:10:14 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    I did. I mentioned it in my post comparing the timings of xxd, your
    program, and some extremely simple Python code giving the same
    outputs.


    Then both me and Bart didn't pay attention to this part of your
    yesterday's post. Sorry.


    No problem. It was just a brief comment, along with things like the gcc version used.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Wed May 29 15:16:06 2024
    On 29/05/2024 13:23, Michael S wrote:
    On Wed, 29 May 2024 12:23:51 +0100
    bart <bc@freeuk.com> wrote:

    So, let's rewrite our tiny app with fread().

    real 0m0.577s
    user 0m0.000s
    sys 0m0.000s

    152.8 MB/s. That's much better. Some people would even say that it
    is good enough.

    I now get:

    mcc 2.3 seconds
    gcc 1.6
    tcc 2.3
    lccwin32 2.9
    DMC 2.9


    Mine was with MSVC from VS2019. gcc on msys2 (ucrt64 variant) should be identical.
    I wonder why your results are so much slower than mine.
    Slow write speed of SSD or slow CPU?

    You'd need to isolate i/o from the data processing to determine that.

    However, the fastest timing on my machine is 1.4 seconds to read 100MB
    and write 360MB.

    Your timing is 0.6 seconds to read 88MB and write, what, 300MB of text?

    The difference is about 2:1, which is not that unusual given two
    different processors, two kinds of storage device, two kinds of OS (?)
    and two different compilers.

    But remember that a day or two ago, your original program took over 4
    seconds, and it now takes 1.6 seconds (some timings are 1.4 seconds, but
    I think that's the C port of my code).

    (BTW I guess that superimposing your own faster buffer is not consdered cheating any more!)

    You might remember that the last revised version of your test,
    compiled with gcc, took 3.6 seconds, of which 2 seconds was reading
    the file a byte at a time took 2 seconds.

    By using a 128KB buffer, you get most of the benefits of reading the
    whole file at once

    I hope so.

    (it just lacks the simplicity).

    The simplicity in your case is due to complexity of figuring out the
    size of the file and of memory allocation and of handling potential
    failure of memory allocation all hidden within run-time library of you > language.

    Yes, that moves those details out of the way to keep the main body of
    the code clean.

    Your C code looks chaotic (sorry), and I had quite a few problems in understanding and trying to modify or refactor parts of it.

    Below is the main body of my C code. Below that is the main body of your
    latest program, not including the special handling for the last line,
    that mine doesn't need.

    ------------------------------------------
    while (n) {
    m = n;
    if (m > perline) m = perline;
    n -= m;
    p = str;

    for (int i = 0; i < m; ++i) {
    bb = *data++;
    s = numtable[bb];
    slen = numlengths[bb];

    *p++ = *s;
    if (slen > 1)
    *p++ = *(s+1);
    if (slen > 2)
    *p++ = *(s+2);

    *p++ = ',';
    }
    *p++ = '\n';

    fwrite(str, 1, p-str, f);
    }

    ------------------------------------------

    for (;;) {
    enum { BUF_SZ = 128*1024 };
    unsigned char inpbuf[BUF_SZ];
    size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
    for (int i = 0; i < (int)len; ++i) {
    unsigned char* dec = bin2dec[inpbuf[i] & 255];
    memcpy(outptr, dec, MAX_CHAR_PER_NUM);
    outptr += dec[MAX_CHAR_PER_NUM];
    if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf) {
    err = 2;
    break;
    }
    outptr = outbuf;
    }
    }
    if (err || len != BUF_SZ)
    break;
    }
    ------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to bart on Wed May 29 18:32:18 2024
    On Wed, 29 May 2024 15:16:06 +0100
    bart <bc@freeuk.com> wrote:

    On 29/05/2024 13:23, Michael S wrote:
    On Wed, 29 May 2024 12:23:51 +0100
    bart <bc@freeuk.com> wrote:

    So, let's rewrite our tiny app with fread().

    real 0m0.577s
    user 0m0.000s
    sys 0m0.000s

    152.8 MB/s. That's much better. Some people would even say that it
    is good enough.

    I now get:

    mcc 2.3 seconds
    gcc 1.6
    tcc 2.3
    lccwin32 2.9
    DMC 2.9


    Mine was with MSVC from VS2019. gcc on msys2 (ucrt64 variant)
    should be identical.
    I wonder why your results are so much slower than mine.
    Slow write speed of SSD or slow CPU?

    You'd need to isolate i/o from the data processing to determine that.

    However, the fastest timing on my machine is 1.4 seconds to read
    100MB and write 360MB.

    Your timing is 0.6 seconds to read 88MB and write, what, 300MB of
    text?


    Much less. Only 193 MB. It seems, this DLL I was textualizing is stuffed
    with small numbers. That explains big part of the difference.

    I did another test with big 7z archive as an input:
    Input size: 116255887
    Output size: 425944020
    $ time ../quick_xxd/bin_to_listmb /d/bin/tmp.7z uu.txt

    real 0m1.170s
    user 0m0.000s
    sys 0m0.000s

    Almost exactly 100 MB/s which is only 1.4-1.6 times faster than your measurements.

    The difference is about 2:1, which is not that unusual given two
    different processors, two kinds of storage device, two kinds of OS
    (?) and two different compilers.

    But remember that a day or two ago, your original program took over 4 seconds, and it now takes 1.6 seconds (some timings are 1.4 seconds,
    but I think that's the C port of my code).

    (BTW I guess that superimposing your own faster buffer is not
    consdered cheating any more!)


    No, moderately sized buffer with size that does not depend on the size
    of the input or output is not a cheating. It's just additional work that
    I wanted to avoid, but did not succeed.

    I likely could have majority of the benefit with setvbuf() +
    replacement of fgetc() by getc(). But by now it does not make sense to
    go back to original variants.

    You might remember that the last revised version of your test,
    compiled with gcc, took 3.6 seconds, of which 2 seconds was reading
    the file a byte at a time took 2 seconds.

    By using a 128KB buffer, you get most of the benefits of reading
    the whole file at once

    I hope so.

    (it just lacks the simplicity).

    The simplicity in your case is due to complexity of figuring out the
    size of the file and of memory allocation and of handling potential
    failure of memory allocation all hidden within run-time library of
    you > language.

    Yes, that moves those details out of the way to keep the main body of
    the code clean.

    Your C code looks chaotic (sorry), and I had quite a few problems in understanding and trying to modify or refactor parts of it.

    Below is the main body of my C code. Below that is the main body of
    your latest program, not including the special handling for the last
    line, that mine doesn't need.

    ------------------------------------------
    while (n) {
    m = n;
    if (m > perline) m = perline;
    n -= m;
    p = str;

    for (int i = 0; i < m; ++i) {
    bb = *data++;
    s = numtable[bb];
    slen = numlengths[bb];

    *p++ = *s;
    if (slen > 1)
    *p++ = *(s+1);
    if (slen > 2)
    *p++ = *(s+2);

    *p++ = ',';
    }
    *p++ = '\n';

    fwrite(str, 1, p-str, f);
    }

    ------------------------------------------

    for (;;) {
    enum { BUF_SZ = 128*1024 };
    unsigned char inpbuf[BUF_SZ];
    size_t len = fread(inpbuf, 1, BUF_SZ, fpin);
    for (int i = 0; i < (int)len; ++i) {
    unsigned char* dec = bin2dec[inpbuf[i] & 255];
    memcpy(outptr, dec, MAX_CHAR_PER_NUM);
    outptr += dec[MAX_CHAR_PER_NUM];
    if (outptr > &outbuf[ALMOST_FULL_THR]) { // spill output buffer
    *outptr++ = '\n';
    ptrdiff_t wrlen = fwrite(outbuf, 1, outptr-outbuf, fpout);
    if (wrlen != outptr-outbuf) {
    err = 2;
    break;
    }
    outptr = outbuf;
    }
    }
    if (err || len != BUF_SZ)
    break;
    }
    ------------------------------------------




    Each to his own.
    For me your code is unreadable, mostly due to very short names of
    variables that give no hint of usage, absence of declarations (I'd
    guess, you have them at the top of the function, for me it's no better
    than not having them at all) and zero comments.

    Besides, our snippets are not functionally identical. Yours don't handle
    write failures. Practically, on "big" computer it's a reasonable choice, because real I/O problems are unlikely to be detected at fwrite. They
    tend to manifest themselves much later. But on comp.lang.c we like to
    pretend that life is simpler and more black&white than it is in reality.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Malcolm McLean on Wed May 29 19:27:51 2024
    On 29/05/2024 18:27, Malcolm McLean wrote:
    On 29/05/2024 13:10, David Brown wrote:

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
    don't come free.  (And I buy these bare-bones.  Machines with Windows
    "pre-installed" are often cheaper because they are sponsored by the
    junk-ware and ad-ware forced on unsuspecting users.)

    Yes, I got a job at Cambridge which didn't work out (Cantab dons, much
    less tolerant people then their counterparts at another university, but that's another story). And I was given a brand new Windows machine, and
    told that we had to use Linux. So I installed a Linux version which ran
    on top of Windows. No good, I was told. Might cause problems with that "interesting" set up. So I had to scrub a brand new version of Windows.
    It felt like the most extravagant waste.


    While I prefer to get machines without an OS, I can't see any issue with
    wiping existing Windows - I've done that countless times. When you buy
    a machine with Windows "pre-installed", no one has paid more than a
    handful of dollars/pounds/kroner for it. Indeed, the crap-ware vendors
    might have sponsored it, and I don't mind if their money was wasted.

    Even for a Windows machine, I often prefer to install Windows myself
    from scratch. The exception is for laptops where it is often a pain to
    get all the drivers like.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Wed May 29 18:41:25 2024
    On 29/05/2024 16:32, Michael S wrote:
    On Wed, 29 May 2024 15:16:06 +0100
    bart <bc@freeuk.com> wrote:

    Your timing is 0.6 seconds to read 88MB and write, what, 300MB of
    text?


    Much less. Only 193 MB. It seems, this DLL I was textualizing is stuffed
    with small numbers. That explains big part of the difference.

    I did another test with big 7z archive as an input:
    Input size: 116255887
    Output size: 425944020
    $ time ../quick_xxd/bin_to_listmb /d/bin/tmp.7z uu.txt

    real 0m1.170s
    user 0m0.000s
    sys 0m0.000s

    Almost exactly 100 MB/s which is only 1.4-1.6 times faster than your measurements.

    Actually, the fastest timing I've got was 1.25 seconds (100MB input,
    360MB output), but that was from my C version compiled with DMC (a
    32-bit compiler). gcc was a bit slower.

    Each to his own.
    For me your code is unreadable, mostly due to very short names of
    variables that give no hint of usage, absence of declarations (I'd
    guess, you have them at the top of the function, for me it's no better
    than not having them at all) and zero comments.

    Besides, our snippets are not functionally identical. Yours don't handle write failures. Practically, on "big" computer it's a reasonable choice, because real I/O problems are unlikely to be detected at fwrite. They
    tend to manifest themselves much later. But on comp.lang.c we like to
    pretend that life is simpler and more black&white than it is in reality.

    Below is a version with no declarations at all. It is in a dynamic
    scripting language.

    It runs in 7.3 seconds (or 6.4 seconds if newlines are dispensed with).

    It reads the input as a byte array, and assembles the output as a single string. The 'readbinarray' function conceivably be replaced by one based
    around 'mmap'.

    ------------------------------------
    numtable ::= (0:)
    for i in 0..255 do
    numtable &:= tostr(i)+","
    od

    s::=""
    k:=0
    for bb in readbinfile("data100") do
    s +:= numtable[bb]

    if ++k = 21 then
    s +:= '\n'
    k := 0
    fi
    od

    writestrfile("data.txt", s)
    ------------------------------------

    An advantage of higher-level code is being able to trivually do stuff
    like this (output data in reverse order):

    for bb in reverse(readbinfile("data100")) do

    While functions like 'writestrfile' could also check that the resulting
    file size is the same length as the string being written.

    Still looks complicated? Here's a one-line version:

    writestrfile("data.txt", tostr(readbinfile("data100")))

    However, the output looks like this:

    (38, 111, ... 197)

    This is acceptable syntax for my languages, but C would require braces:

    writestrfile("data.txt", "{" +
    tostr(readbinfile("/c/data100"))[2..$-1] + "}")

    It is convenient to deal with data in whole-file blobs. There are huge
    memory capacities available now; why not take advantage?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Malcolm McLean on Wed May 29 14:07:00 2024
    On 29/05/2024 18:27, Malcolm McLean wrote:
    On 29/05/2024 13:10, David Brown wrote:

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
    don't come free.  (And I buy these bare-bones.  Machines with Windows
    "pre-installed" are often cheaper because they are sponsored by the
    junk-ware and ad-ware forced on unsuspecting users.)

    Yes, I got a job at Cambridge which didn't work out (Cantab dons, much
    less tolerant people then their counterparts at another university, but that's another story). And I was given a brand new Windows machine, and
    told that we had to use Linux. So I installed a Linux version which ran
    on top of Windows. No good, I was told. Might cause problems with that "interesting" set up. ...

    They're quite right in that regard, as I can testify from personal
    experience.

    ... So I had to scrub a brand new version of Windows.
    It felt like the most extravagant waste.

    Keep in mind that, as David pointed out, the "waste" was probably
    negative. You got a better price on the machine than you would have
    otherwise, and erasing that malware gave you more space to put useful
    stuff on your machine.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to James Kuyper on Wed May 29 22:59:05 2024
    On Wed, 29 May 2024 14:07:00 -0400
    James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

    On 29/05/2024 18:27, Malcolm McLean wrote:
    On 29/05/2024 13:10, David Brown wrote:

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
    don't come free. (And I buy these bare-bones. Machines with
    Windows "pre-installed" are often cheaper because they are
    sponsored by the junk-ware and ad-ware forced on unsuspecting
    users.)
    Yes, I got a job at Cambridge which didn't work out (Cantab dons,
    much less tolerant people then their counterparts at another
    university, but that's another story). And I was given a brand new
    Windows machine, and told that we had to use Linux. So I installed
    a Linux version which ran on top of Windows. No good, I was told.
    Might cause problems with that "interesting" set up. ...

    They're quite right in that regard, as I can testify from personal experience.

    ... So I had to scrub a brand new version of Windows.
    It felt like the most extravagant waste.

    Keep in mind that, as David pointed out, the "waste" was probably
    negative. You got a better price on the machine than you would have otherwise, and erasing that malware gave you more space to put useful
    stuff on your machine.

    May be, for laptps that is true. But for mini-PCs it is very different.
    Windows is surprisingly expensive in this case. OEM license is sold for
    ~75% of retail license price.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to bart on Wed May 29 21:31:54 2024
    On 29/05/2024 18:41, bart wrote:

    Below is a version with no declarations at all. It is in a dynamic
    scripting language.

    It runs in 7.3 seconds (or 6.4 seconds if newlines are dispensed with).

    That is slower that native code solutions, but it is still faster than xxd!

    My scripting language doesn't normally run under Linux (to make the
    comparison with xxd fairer), but I managed to make it work for this test:

    C:\qx52>mc -c -linux qc
    M6 Compiling qc.m to qc.c

    C:\qx52>wsl
    root@DESKTOP-11:/mnt/c/qx52# gcc -O3 qc.c -o qc -lm -ldl -fno-builtin
    root@DESKTOP-11:/mnt/c/qx52# time ./qc -nosys fred

    real 0m10.562s
    user 0m9.198s
    sys 0m0.681s
    root@DESKTOP-11:/mnt/c/qx52#

    I start off under Windows, run a transpiler on a special version of it
    that is transpilable, then compile that C file it under WSL. (The -nosys
    option leaves out standard libs using WinAPI.)

    It takes 10.5 seconds; not as fast as on Windows, but there I'm using an accelerator which needs inline assembly, which I can't turn into C.

    Here is how xxd fared on a slightly faster second run:

    root@DESKTOP-11:/mnt/c/qx52# time xxd -i <data100 >xxd.txt
    real 0m33.066s

    Conclusion: beating xxd is apparently not hard if even a scripting
    language can do so. I wonder what slows it down? The output format is
    more bloated, but that can only be part of it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to David Brown on Wed May 29 22:08:31 2024
    On 28/05/2024 16:34, David Brown wrote:
    On 28/05/2024 13:41, Michael S wrote:

    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY


    I used 100 MB of random data:

    dd if=/dev/urandom bs=1M count=100 of=100MB

    I compiled your code with "gcc-11 -O2 -march=native".

    I ran everything in a tmpfs filesystem, completely in ram.


    xxd took 5.4 seconds - that's the baseline.

    Your simple C code took 4.35 seconds.  Your second program took 0.9
    seconds - a big improvement.

    One line of Python code took 8 seconds :

    print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

    That one took 90 seconds on my machine (CPython 3.11).

    A slightly nicer Python program took 14.3 seconds :

    import sys
    bs = open(sys.argv[1], "rb").read()
    xs = "".join([" 0x%02x," % b for b in bs])
    ln = len(xs)
    print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

    This one was 104 seconds (128 seconds with PyPy).

    This can't be blamed on the slowness of my storage devices, or moans
    about Windows, because I know that amount of data (the output is 65%
    bigger because of using hex format) could be processed in a couple of a
    seconds using a fast native code program.

    It's just Python being Python.

    (I have had reason to include a 0.5 MB file in a statically linked
    single binary - I'm not sure when you'd need very fast handling of multi-megabyte embeds.)

    I have played with generating custom executable formats (they can be
    portable between OSes, and I believe less visible to AV software), but
    they require a normal small executable to launch them and fix them up.

    To give the illusion of a conventional single executable, the program
    needs to be part of that stub file.

    There are a few ways of doing it, like simply concatenating the files,
    but extracting is slightly awkward. Embedding as data is one way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Malcolm McLean on Thu May 30 01:18:01 2024
    On 29/05/2024 22:46, Malcolm McLean wrote:
    On 29/05/2024 20:59, Michael S wrote:
    On Wed, 29 May 2024 14:07:00 -0400
    James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

    On 29/05/2024 18:27, Malcolm McLean wrote:
    On 29/05/2024 13:10, David Brown wrote:

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
    don't come free.  (And I buy these bare-bones.  Machines with
    Windows "pre-installed" are often cheaper because they are
    sponsored by the junk-ware and ad-ware forced on unsuspecting
    users.)
    Yes, I got a job at Cambridge which didn't work out (Cantab dons,
    much less tolerant people then their counterparts at another
    university, but that's another story). And I was given a brand new
    Windows machine, and told that we had to use Linux. So I installed
    a Linux version which ran on top of Windows. No good, I was told.
    Might cause problems with that "interesting" set up. ...

    They're quite right in that regard, as I can testify from personal
    experience.

    ... So I had to scrub a brand new version of Windows.
    It felt like the most extravagant waste.

    Keep in mind that, as David pointed out, the "waste" was probably
    negative. You got a better price on the machine than you would have
    otherwise, and erasing that malware gave you more space to put useful
    stuff on your machine.

    May be, for laptps that is true. But for mini-PCs it is very different.
    Windows is surprisingly expensive in this case. OEM license is sold for
    ~75% of retail license price.

    Exactly. Windows costs a fortune.

    Actually I've no idea how much it costs.

    But whatever it is, I'm not adverse to the idea of having to pay for
    software. After all you have to pay for hardware, and for computers, I
    would happily pay extra to have something that works out of the box.

    And Microsoft spend billions
    developing it.

    Baby X can't compete.

    Huh? I didn't know Baby X was an OS!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Thu May 30 02:35:06 2024
    On Wed, 29 May 2024 01:24:56 +0300, Michael S wrote:

    I never tested it myself, but I heard that there is a significant
    difference in file access speed between WSL's own file system and
    mounted Windows directories.

    WSL (both 1 and 2) is just a band-aid. Give up and use native Linux
    already.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to bart on Thu May 30 00:06:40 2024
    On 5/29/2024 8:18 PM, bart wrote:
    On 29/05/2024 22:46, Malcolm McLean wrote:
    On 29/05/2024 20:59, Michael S wrote:
    On Wed, 29 May 2024 14:07:00 -0400
    James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

    On 29/05/2024 18:27, Malcolm McLean wrote:
    On 29/05/2024 13:10, David Brown wrote:

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
    don't come free.  (And I buy these bare-bones.  Machines with
    Windows "pre-installed" are often cheaper because they are
    sponsored by the junk-ware and ad-ware forced on unsuspecting
    users.)
    Yes, I got a job at Cambridge which didn't work out (Cantab dons,
    much less tolerant people then their counterparts at another
    university, but that's another story). And I was given a brand new
    Windows machine, and told that we had to use Linux. So I installed
    a Linux version which ran on top of Windows. No good, I was told.
    Might cause problems with that "interesting" set up. ...

    They're quite right in that regard, as I can testify from personal
    experience.

    ... So I had to scrub a brand new version of Windows.
    It felt like the most extravagant waste.

    Keep in mind that, as David pointed out, the "waste" was probably
    negative. You got a better price on the machine than you would have
    otherwise, and erasing that malware gave you more space to put useful
    stuff on your machine.

    May be, for laptps that is true. But for mini-PCs it is very different.
    Windows is surprisingly expensive in this case. OEM license is sold for
    ~75% of retail license price.

    Exactly. Windows costs a fortune.

    Actually I've no idea how much it costs.

    But whatever it is, I'm not adverse to the idea of having to pay for software. After all you have to pay for hardware, and for computers, I would happily pay extra to have something that works out of the box.

    And Microsoft spend billions developing it.

    Baby X can't compete.

    Huh? I didn't know Baby X was an OS!

    Windows costs anywhere from say $20 to $150.

    When the price gets down to $5, we get "suspicious".

    People on USENET who have bought the cheap licenses,
    for the most part are not reporting problems, so somehow,
    these keys are legit. And they're not "manufactured" keys,
    they seem to be normal keys. But with no "readout mechanism"
    for users to use, we don't really know where they came from.
    They could be Enterprise machine COA keys (Enterprise setups
    don't tend to use the key the machine came with), they could
    be MSDN subscription keys chopped up, and so on.
    Some sort of weird source. They are likely to be OEM and
    not transferable to another machine.

    Retail keys can be moved, such as if a machine dies, you
    can move the key to a replacement machine. But then you'll pay
    a high price for that.

    The two copies of Windows 8 I bought, were $40 each direct
    from Microsoft (introductory offer). I don't think Win10 and
    Win11 had an intro offer, nor the 3-pak of the family pack edition.

    If you want the machine to have a key, you can do it, but
    you don't need to pay full price for it.

    The $20 keys are not located at your local computer store,
    they come from sketchy online sellers. Part of the fun is
    not knowing what is going to happen.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Michael S on Thu May 30 00:40:07 2024
    On 5/28/2024 6:24 PM, Michael S wrote:
    On Tue, 28 May 2024 23:08:22 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:


    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.


    My machine uses an SSD.

    SSDs are not created equal. Especially for writes.


    However the tests were run on Windows, so I ran your program again
    under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).



    3 times slower ?!
    I never tested it myself, but I heard that there is a significant
    difference in file access speed between WSL's own file system and
    mounted Windows directories. The difference under WSL is not as big
    as under WSL2 where they say that access of mounted Windows filesystem
    is very slow, but still significant.
    I don't know if it applies to all file sizes or only to accessing many
    small files.

    WSL uses containers, so of course it is slow. Even if this was sitting
    on an NVMe SSD, the software stack is going to extract a penalty. If
    WSL was talking to /mnt/c then that will have a different speed than /home/username within WSL.

    C:\Users\username\AppData\Local\Packages\
    CanonicalGroupLimited.Ubuntu20.04LTS_79rhkp1fndgsc\localstate\ext4.vhdx 6,631,194,624 bytes

    If you right-click the file, the "Mount" option in the context menu
    should not work, because... it is ext4. You'd need a Dokan or equivalent (FUSE-like) file system support in Windows, to access it that way.

    And you can mount the container and have it viewable within File Explorer, preferably with the "wsl --shutdown" first. If you want to look around in
    your slash, you should be able to. The WSL team figured out some way
    to do this anyway. This is something you can test when you're bored
    and not in the middle of something. Yes, I tried it. But I don't
    mess with stuff like this when I'm busy. Enter this in the File Explorer box.

    \\wsl$ # Access WSL from Windows

    To find that file, rather than just search for it, I used SequoiaView,
    and that happens to be the biggest file on my C: drive. I keep my
    virtual machines on a separate partition. It's easy to spot that
    container from 60,000 feet.

    On one OS, VirtualBox I/O is 150MB/sec, on another (presumably via paravirtualization), the rate is 600MB/sec and a bit "wobbly". And
    really, for anyone complaining about this, you did not really enjoy
    VMs in the old days. VMs were so slow, that on graphics, you could
    see (and count) individual pixels being updated in a horizontal
    line across the screen. It used to be pure molasses. Today, it's
    actually usable.

    Virtual machines can have passthru storage, and that is a form
    of direct access. The very first time I tried that (Connectix VirtualPC),
    I was unaware it had a 137GB limit and I connected a 200GB partition,
    and... the file system got corrupted and destroyed. Such was my
    introduction to passthru. I was an instant fan of the idea. I don't
    happen to know if any Hosting software offers that now or not.

    Containers support relatively large disks. I think the testing I did
    recently, I got jammed up at around 500TB for a virtual disk. It did
    not go as far as Wikipedia claimed. Now, you don't actually "use" the
    space, it's purely used for testing that software does not explode
    when it sees weird devices. For example, if you put NTFS on that,
    it uses extra-large clusters and does not happen to mention it to you.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu May 30 10:01:42 2024
    On 29/05/2024 21:59, Michael S wrote:
    On Wed, 29 May 2024 14:07:00 -0400
    James Kuyper <jameskuyper@alumni.caltech.edu> wrote:

    On 29/05/2024 18:27, Malcolm McLean wrote:
    On 29/05/2024 13:10, David Brown wrote:

    It wasn't the cheapest available, and 64 GB memory (and 4 TB SSD)
    don't come free.  (And I buy these bare-bones.  Machines with
    Windows "pre-installed" are often cheaper because they are
    sponsored by the junk-ware and ad-ware forced on unsuspecting
    users.)
    Yes, I got a job at Cambridge which didn't work out (Cantab dons,
    much less tolerant people then their counterparts at another
    university, but that's another story). And I was given a brand new
    Windows machine, and told that we had to use Linux. So I installed
    a Linux version which ran on top of Windows. No good, I was told.
    Might cause problems with that "interesting" set up. ...

    They're quite right in that regard, as I can testify from personal
    experience.

    ... So I had to scrub a brand new version of Windows.
    It felt like the most extravagant waste.

    Keep in mind that, as David pointed out, the "waste" was probably
    negative. You got a better price on the machine than you would have
    otherwise, and erasing that malware gave you more space to put useful
    stuff on your machine.

    May be, for laptps that is true. But for mini-PCs it is very different. Windows is surprisingly expensive in this case. OEM license is sold for
    ~75% of retail license price.



    The cheapest mini PC's from our main supplier all come with Windows
    (excluding a few "thin client" type systems - I am thinking of Intel NUC
    class of systems). In fact, I think /all/ the pre-built ones have
    Windows. But if you buy bare-bones - no RAM or SSD - there is no OS.
    This seems to be extremely common, right from the manufacturers.

    And no, companies like Intel or ASUS don't pay anything close to 75% of
    the retail price for the Windows license they install. I took a quick
    check on our supplier's site - the cheapest ASUS Mini PC with Windows 11
    Pro is only 20% more than a stand-alone Windows 11 Pro license. And
    these ASUS machines don't actually come with much sponsored nonsense
    software, IME.


    I prefer to buy these machines bare-bones because they never have the
    memory or SSD sizes that I want, so I have to replace these anyway.

    I'd buy laptops bare-bones too (unless I needed Windows on it), but few
    places sell bare-bones laptops. Actually, it's probably 15-20 years
    since I last bought a laptop, so maybe my opinions there are not very
    relevant! (I typically use hand-me-downs from sales folk - when their
    machines get too slow from clogged up Windows, they buy new ones. I
    upgrade the memory, wipe the disk and install Linux, and the machine is
    better than it was when new.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Paul on Thu May 30 10:40:09 2024
    On Thu, 30 May 2024 00:40:07 -0400
    Paul <nospam@needed.invalid> wrote:

    On 5/28/2024 6:24 PM, Michael S wrote:
    On Tue, 28 May 2024 23:08:22 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:


    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.



    My machine uses an SSD.

    SSDs are not created equal. Especially for writes.


    However the tests were run on Windows, so I ran your program again
    under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).



    3 times slower ?!
    I never tested it myself, but I heard that there is a significant difference in file access speed between WSL's own file system and
    mounted Windows directories. The difference under WSL is not as big
    as under WSL2 where they say that access of mounted Windows
    filesystem is very slow, but still significant.
    I don't know if it applies to all file sizes or only to accessing
    many small files.

    WSL uses containers, <snip>

    It seems, you are discussing a speed of access and methods of access
    from the host side. My question is opposite - is access from Linux
    guest to Windows host files running at the same speed as Linux (WSL,
    not WSL2) guest to its own file system?
    I heard that it isn't, but it was not conclusive and with insufficient
    details. I am going to test our specific case of big files. Now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Thu May 30 11:33:30 2024
    On Thu, 30 May 2024 10:01:42 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    And no, companies like Intel or ASUS don't pay anything close to 75%
    of the retail price for the Windows license they install.

    I don't know how much Intel or ASUS pays. I don't care about it.
    What I do know and care about that for me, as a buyer, Intel or ASUS (I actually like Gigabyte Brix better, but recently they become too
    expensive) mini-PC with Win11 Home will cost $140 more than exactly the
    same box without Windows.
    That's if bought it in big or medium store.
    In little 1-2-men shop I can get legal Windows license on similar box
    for, may be, $50. But I don't know if it will be a round 11 months
    later if something breaks.
    Pay attention that even in little shop mini-PC with Windows on it will
    cost me more than the same box without OS. I didn't try it, but would
    guess that [in a little shop] box with Linux preinstalled would cost me
    ~$25 above box without OS, i.e. still cheaper than with Windows.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Thu May 30 12:13:59 2024
    On 30/05/2024 09:33, Michael S wrote:
    On Thu, 30 May 2024 10:01:42 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    And no, companies like Intel or ASUS don't pay anything close to 75%
    of the retail price for the Windows license they install.

    I don't know how much Intel or ASUS pays. I don't care about it.
    What I do know and care about that for me, as a buyer, Intel or ASUS (I actually like Gigabyte Brix better, but recently they become too
    expensive) mini-PC with Win11 Home will cost $140 more than exactly the
    same box without Windows.
    That's if bought it in big or medium store.
    In little 1-2-men shop I can get legal Windows license on similar box
    for, may be, $50. But I don't know if it will be a round 11 months
    later if something breaks.
    Pay attention that even in little shop mini-PC with Windows on it will
    cost me more than the same box without OS. I didn't try it, but would
    guess that [in a little shop] box with Linux preinstalled would cost me
    ~$25 above box without OS, i.e. still cheaper than with Windows.


    40 years ago, my company made 8-bit business computers (my job was
    designing the boards that went into them).

    Adjusted for inflation, a floppy-based machine cost £4000, and one with
    a 10MB HDD cost £9400.

    They came with our own clone of CP/M, to avoid paying licence fees for it.

    Compared to that, the cost of hardware now with a 4-6 magnitude higher
    spec is peanuts, even with a premium for a pre-installed OS.

    But suppose a high-spec machine now cost £1000; for someone using it
    daily in their job, who might be paid a salary of £50-£100K or more, it
    is again peanuts by comparison. Just their car to drive to work could
    cost 20 times as much.

    One tankful of fuel might cost the same as one Windows licence!

    I'm astonished that professionals here are quibbling over the minor
    extra margins needed to cover the cost of an important piece of software.

    I guess the demand for a machine+Windows is high enough to get lower
    volume pricing, while machine-only or machine+Linux is more niche?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Thu May 30 14:41:54 2024
    On Wed, 29 May 2024 01:24:56 +0300
    Michael S <already5chosen@yahoo.com> wrote:

    On Tue, 28 May 2024 23:08:22 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:


    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.



    My machine uses an SSD.

    SSDs are not created equal. Especially for writes.


    However the tests were run on Windows, so I ran your program again
    under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).



    3 times slower ?!
    I never tested it myself, but I heard that there is a significant
    difference in file access speed between WSL's own file system and
    mounted Windows directories. The difference under WSL is not as big
    as under WSL2 where they say that access of mounted Windows filesystem
    is very slow, but still significant.
    I don't know if it applies to all file sizes or only to accessing many
    small files.




    I tested it under WSL (not WSL2 !).
    Host: Windows Server 2019
    Guest: Debian boolworm.
    uname -r
    4.4.0-17763-Microsoft

    I see now slowness at all. In fact getc/fgetc are the same speed and 3
    times faster than on Windows with MSVC compiler on the same computer.
    fread test is ~30% faster.
    Full bin_to_list (latest variant) is 10-25% faster than Windows, but
    both are very fast and results are not very stable so precise comparison
    is hard.
    Access to mounted Windows files via /mnt/d/... is very fast - read
    speed is approximately the same as "guest's native" ext4-in-container;
    write speed is up to 20% faster than ext4.
    All that, of course, bulk read and write speed on huge files. It is
    possible that for small files the table is turned.

    For the record: On WSL 'xxd -i' took 13.6 seconds to process 151 MB
    input file (7z archive) and produce 992 MB output text (both input and
    output on /mnt/d). That's much faster than 39 seconds under msys2 on the
    same computer, but ~10 times slower than my latest variant.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Thu May 30 14:14:21 2024
    On 30/05/2024 13:13, bart wrote:
    On 30/05/2024 09:33, Michael S wrote:
    On Thu, 30 May 2024 10:01:42 +0200
    David Brown <david.brown@hesbynett.no> wrote:


    And no, companies like Intel or ASUS don't pay anything close to 75%
    of the retail price for the Windows license they install.

    I don't know how much Intel or ASUS pays. I don't care about it.
    What I do know and care about that for me, as a buyer, Intel or ASUS (I
    actually like Gigabyte Brix better, but recently they become too
    expensive) mini-PC with Win11 Home will cost $140 more than exactly the
    same box without Windows.

    That runs contrary to everything I have ever seen.

    Most retailers here (in Norway, and I believe most of Europe) don't
    offer a choice of systems without an OS, unless it is a custom build.
    Since the majority of purchasers want Windows, and big manufacturers pay peanuts for it, no-OS versions are an extra inventory cost.

    If you are getting a custom-built machine, or from a small manufacturer,
    then getting one with an OS will cost you more - but that is primarily
    for the service of installing and configuring it, or at least putting
    together the driver packages, not the license cost for Windows. And
    smaller houses will pay more per license each license than big ones.
    For big manufacturers, that cost is amortized over a very large number
    of systems and "installation" is done by massive disk duplication
    systems before the drives are installed.

    That's if bought it in big or medium store.
    In little 1-2-men shop I can get legal Windows license on similar box
    for, may be, $50. But I don't know if it will be a round 11 months
    later if something breaks.
    Pay attention that even in little shop mini-PC with Windows on it will
    cost me more than the same box without OS. I didn't try it, but would
    guess that [in a little shop] box with Linux preinstalled would cost me
    ~$25 above box without OS, i.e. still cheaper than with Windows.


    40 years ago, my company made 8-bit business computers (my job was
    designing the boards that went into them).

    Adjusted for inflation, a floppy-based machine cost £4000, and one with
    a 10MB HDD cost £9400.

    They came with our own clone of CP/M, to avoid paying licence fees for it.

    Compared to that, the cost of hardware now with a 4-6 magnitude higher
    spec is peanuts, even with a premium for a pre-installed OS.

    But suppose a high-spec machine now cost £1000; for someone using it
    daily in their job, who might be paid a salary of £50-£100K or more, it
    is again peanuts by comparison. Just their car to drive to work could
    cost 20 times as much.

    One tankful of fuel might cost the same as one Windows licence!

    I'm astonished that professionals here are quibbling over the minor
    extra margins needed to cover the cost of an important piece of software.


    For my part, I am not complaining about it - I am just discussing it.
    However, I am against paying for a Windows license that I don't use as a
    matter of principle, the cost involved is irrelevant. (And I'm fine
    with paying for a Windows license that I /do/ use.)

    I guess the demand for a machine+Windows is high enough to get lower
    volume pricing, while machine-only or machine+Linux is more niche?


    Standard PC's and laptops are very low-margin products. Neither the manufacturer nor the shop makes a significant profit from selling you
    the machine itself. They make the profit from selling extras - a new
    cable for your monitor, a carry-case for the laptop, or an "extended
    warranty". And the main goal is to persuade you to buy software. The
    profit margin on a PC is a few percent at most, while the profit margin
    for a license of MS Office or Norton Security is perhaps 90%. No one
    wants to sell computers with Linux unless they are doing so as part of a service agreement to a company - there's no scope for profit for a shop
    selling a Linux machine with no extra software, and where you won't even
    come back and pay them to clear out malware.

    (This is also not a complaint or condemnation - you can't really expect
    people to make or sell things that have low expected profit returns.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Malcolm McLean on Thu May 30 12:23:38 2024
    On 30/05/2024 02:31, Malcolm McLean wrote:
    On 30/05/2024 01:18, bart wrote:
    On 29/05/2024 22:46, Malcolm McLean wrote:

    Baby X can't compete.

    Huh? I didn't know Baby X was an OS!


    Its an API. You call the Baby X API to get buttons and menus and other graphical elements, instead of Windows APIs. And it has just got its own
    file system.

    Hardly anybody uses the WinAPI directly.

    Everyone uses wrapper libraries, usually cross-platform. Ones like GTK,
    SDL, Raylib, maybe even OpenGL.

    Those are your competitors, although I'm not sure which one corresponds
    more closely with what BBX does (which I thought was also cross-platform?)

    I'm not sure what you mean by having its own file system. But I don't
    use the file system calls of WinAPI either; I use the C standard library.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Thu May 30 13:31:18 2024
    On 30/05/2024 02:18, bart wrote:
    On 29/05/2024 22:46, Malcolm McLean wrote:

    Exactly. Windows costs a fortune.

    Actually I've no idea how much it costs.

    The retail version is too much for a cheap machine, but a minor part of
    the cost of a more serious computer. The server versions and things
    like MSSQL server are ridiculous prices - for many setups, they cost
    more than the hardware, and that's before you consider the client access licenses.


    But whatever it is, I'm not adverse to the idea of having to pay for software. After all you have to pay for hardware, and for computers, I
    would happily pay extra to have something that works out of the box.


    I have nothing against paying for software either. I mainly use Linux
    because it is better, not because it is free - that's just an added convenience. I have bought a number of Windows retail licenses over the decades, to use with machines I put together myself rather than OEM installations.

    I'm not so sure about "works out of the box", however. On most systems
    with so-called "pre-installed" Windows, it takes hours for the
    installation to complete, and you need to answer questions or click
    things along the way so you can't just leave it to itself. And if the manufacturer has taken sponsorship from ad-ware and crap-ware vendors,
    it takes more hours to install, and then you have hours of work to
    uninstall the junk.

    Installing Windows from a retail version DVD or USB stick is usually
    faster than getting "pre-installed" Windows up and running. The only
    problem is drivers, though Windows is a bit better than it used to be.
    As long as you have a fairly common Ethernet interface then there is a reasonable chance that Windows has a driver that will work, and then can
    get drivers for the other bits and pieces. Laptops, OTOH, can be a real
    PITA for installation of retail Windows.

    IME installing Linux is faster and simpler than installing Windows on
    almost any hardware. The only drivers that have been an issue for me
    for decades is for very new Wireless interfaces.

    So - agreeing with your logic - I'd be willing to pay for Linux rather
    than using free Windows. (But I'm even happier that, for most of my
    use, I don't have to pay for Linux.)




    And Microsoft spend billions developing it.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to bart on Thu May 30 15:05:17 2024
    On 29/05/2024 23:08, bart wrote:
    On 28/05/2024 16:34, David Brown wrote:
    On 28/05/2024 13:41, Michael S wrote:

    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY


    I used 100 MB of random data:

    dd if=/dev/urandom bs=1M count=100 of=100MB

    I compiled your code with "gcc-11 -O2 -march=native".

    I ran everything in a tmpfs filesystem, completely in ram.


    xxd took 5.4 seconds - that's the baseline.

    Your simple C code took 4.35 seconds.  Your second program took 0.9
    seconds - a big improvement.

    One line of Python code took 8 seconds :

    print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

    That one took 90 seconds on my machine (CPython 3.11).

    A slightly nicer Python program took 14.3 seconds :

    import sys
    bs = open(sys.argv[1], "rb").read()
    xs = "".join([" 0x%02x," % b for b in bs])
    ln = len(xs)
    print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

    This one was 104 seconds (128 seconds with PyPy).

    This can't be blamed on the slowness of my storage devices, or moans
    about Windows, because I know that amount of data (the output is 65%
    bigger because of using hex format) could be processed in a couple of a seconds using a fast native code program.

    It's just Python being Python.

    I have two systems at work with close to identical hardware, both about
    10 years old. The Windows one has a little bit faster disk, the Linux
    one has more memory, but the processor is the same. The Windows system
    is Win7 and as old as the machine, while the Linux system was installed
    about 6 years ago. Both machines have a number of other programs open
    (the Linux machine has vastly more), but none of these are particularly demanding when not in direct use.

    On the Linux machine, that program took 25 seconds (with python 3.7).
    On the Windows machine, it took 48 seconds (with python 3.8). In both
    cases, the source binary file was recently written and therefore should
    be in cache, and both the source and destination were on the disk (ssd
    for Windows, hd for Linux).

    Python throws all this kind of stuff over to the C code - it is pretty
    good at optimising such list comprehensions. (But they are obviously
    still slower than carefully written native C code.) If it were running
    through these loops with the interpreter, it would be orders of
    magnitude slower.

    So what I see from this is that my new Linux PC took 14 seconds while my
    old Linux PC took 25 seconds - it makes sense that the new processor is something like to 80% faster than the old one for a single-threaded calculation. And Windows (noting that this is Windows 7, not a recent
    version of Windows) doubles that time for some reason.


    (I have had reason to include a 0.5 MB file in a statically linked
    single binary - I'm not sure when you'd need very fast handling of
    multi-megabyte embeds.)

    I have played with generating custom executable formats (they can be
    portable between OSes, and I believe less visible to AV software), but
    they require a normal small executable to launch them and fix them up.

    To give the illusion of a conventional single executable,  the program
    needs to be part of that stub file.

    There are a few ways of doing it, like simply concatenating the files,
    but extracting is slightly awkward. Embedding as data is one way.


    Sure.

    The typical use I have is for embedded systems where there is a network
    with a master card and a collection of slave devices (or perhaps
    multiple microcontrollers on the same board). A software update will
    typically involve updating the master board and have that pass on
    updates to the other devices. So the firmware for the other devices
    will be built into the executable for the master board.

    Another use-case is small web servers in program, often for
    installation, monitoring or fault-finding. There are fixed files such
    as index.html, perhaps a logo, and maybe jquery or other javascript
    library file.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Thu May 30 15:15:57 2024
    On Thu, 30 May 2024 13:31:18 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 30/05/2024 02:18, bart wrote:
    On 29/05/2024 22:46, Malcolm McLean wrote:

    Exactly. Windows costs a fortune.

    Actually I've no idea how much it costs.

    The retail version is too much for a cheap machine, but a minor part
    of the cost of a more serious computer. The server versions and
    things like MSSQL server are ridiculous prices - for many setups,
    they cost more than the hardware, and that's before you consider the
    client access licenses.


    It depends.
    If you need Windows server just to run your own applications or
    certain 3rd-party applications without being file server and without
    being terminal server (i.e. at most 2 interactive users logged on simultaneously) then you can get away with Windows Server Essential.
    It costs less than typical low end server hardware.
    MS-SQL also has many editions with very different pricing.
    I think, nowadays even Oracle has editions that is not ridiculously
    expensive. Not sure about IBM DB2.


    But whatever it is, I'm not adverse to the idea of having to pay
    for software. After all you have to pay for hardware, and for
    computers, I would happily pay extra to have something that works
    out of the box.

    I have nothing against paying for software either. I mainly use
    Linux because it is better, not because it is free - that's just an
    added convenience. I have bought a number of Windows retail licenses
    over the decades, to use with machines I put together myself rather
    than OEM installations.

    I'm not so sure about "works out of the box", however. On most
    systems with so-called "pre-installed" Windows, it takes hours for
    the installation to complete, and you need to answer questions or
    click things along the way so you can't just leave it to itself. And
    if the manufacturer has taken sponsorship from ad-ware and crap-ware
    vendors, it takes more hours to install, and then you have hours of
    work to uninstall the junk.


    I don't remember anything like that in case of cheap mini-PC from my
    previous post. It took a little longer than for previous mini-PC with
    Win10 that it replaced, and longer than desktop with Win7, but we are
    still talking about 10-15 minutes, not hours.
    May be, quick Internet connection helps (but I heard that in Norway it
    is quicker).
    Or, may be, people that sold me a box, did some preliminary work.
    Or, may be, your case of installation was very unusual.

    On the other hand, I routinely see IT personal at work spending several
    hours installing non-OEM Windows, esp. on laptops and servers. On
    desktops it tends to be less bad.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu May 30 16:09:00 2024
    On 30/05/2024 14:15, Michael S wrote:
    On Thu, 30 May 2024 13:31:18 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 30/05/2024 02:18, bart wrote:
    On 29/05/2024 22:46, Malcolm McLean wrote:

    Exactly. Windows costs a fortune.

    Actually I've no idea how much it costs.

    The retail version is too much for a cheap machine, but a minor part
    of the cost of a more serious computer. The server versions and
    things like MSSQL server are ridiculous prices - for many setups,
    they cost more than the hardware, and that's before you consider the
    client access licenses.


    It depends.
    If you need Windows server just to run your own applications or
    certain 3rd-party applications without being file server and without
    being terminal server (i.e. at most 2 interactive users logged on simultaneously) then you can get away with Windows Server Essential.
    It costs less than typical low end server hardware.

    Yes - Windows Server Essential was a good choice, and a lot more price-efficient for small usage. You also don't need CALs, saving a lot
    of cost and a huge amount of effort and bureaucracy. That's why MS
    stopped selling it retail after server 2019 - it was too popular.

    The last Windows server I set up was for a third-party application that required Windows Server 2022 (not 2019) and MSSQL server. I have no
    idea why - the task could have been written to run on a Raspberry Pi
    with extra storage.

    To give Windows server it's fair due, you get a nice 180 days evaluation
    period and installation was quite straightforward on a VM on a Proxmox
    mini PC. But towards the end of that trial period we will have to
    decide if we want to pay the full server licence cost, or buy a monster
    rack server from someone like Dell or HP that can sell the Essentials
    version. (Dell and HP make reasonable enough systems, but I'd rather
    use 5% of the processing capacity of a little mini PC than 2% capacity
    of a rack monster that sounds like a jet engine.)

    Ultimately, the cost of even the standard version of Windows server is a
    small part of the cost of this rather specialised third-party software,
    and the whole thing will (if it works like we hope) save our company a
    good deal more than it costs. So we'll pay the Windows server license.
    It is simply annoying that we have to pay a high price for the full
    server licence, when we are doing so little with it.


    MS-SQL also has many editions with very different pricing.

    Last I looked, they have the free version that covers a lot of basic
    usage (and I think that's what we are using at the moment), an expensive standard version with absurdly complicated CALs, and then /really/
    expensive versions beyond that.

    I think, nowadays even Oracle has editions that is not ridiculously expensive. Not sure about IBM DB2.

    They have to, to stay relevant for new users. The main reason anyone
    ever chooses to buy MS SQL, DB2 or Oracle is because they have always
    bought those servers and are locked into them due to proprietary
    extensions, additional software (their own or third-party), training and familiarity, and support contracts. For new systems that don't have the
    legacy requirements, customers will wonder why they should buy one of
    these when something like PostgreSQL is free, has most of the features (including its own unique ones), and will happily scale to the huge
    majority of database needs. Sure, it does not have the management tools
    of the big commercial database servers, but you can get a lot of
    commercial support for the money you save on licensing.



    But whatever it is, I'm not adverse to the idea of having to pay
    for software. After all you have to pay for hardware, and for
    computers, I would happily pay extra to have something that works
    out of the box.

    I have nothing against paying for software either. I mainly use
    Linux because it is better, not because it is free - that's just an
    added convenience. I have bought a number of Windows retail licenses
    over the decades, to use with machines I put together myself rather
    than OEM installations.

    I'm not so sure about "works out of the box", however. On most
    systems with so-called "pre-installed" Windows, it takes hours for
    the installation to complete, and you need to answer questions or
    click things along the way so you can't just leave it to itself. And
    if the manufacturer has taken sponsorship from ad-ware and crap-ware
    vendors, it takes more hours to install, and then you have hours of
    work to uninstall the junk.


    I don't remember anything like that in case of cheap mini-PC from my
    previous post. It took a little longer than for previous mini-PC with
    Win10 that it replaced, and longer than desktop with Win7, but we are
    still talking about 10-15 minutes, not hours.

    This can vary significantly from manufacturer to manufacturer. And
    perhaps it is not as bad as it used to be - most systems I have set up
    in recent years have been bare-bones.

    May be, quick Internet connection helps (but I heard that in Norway it
    is quicker).

    There's no issue there. It's the unpacking of overweight programs from
    one "hidden" part of the disk and installation in the main partition,
    along with the endless reboots, that takes time. And every so often the
    whole process stops to ask you a question.

    Or, may be, people that sold me a box, did some preliminary work.

    That is certainly a service IT suppliers can offer.

    Or, may be, your case of installation was very unusual.

    Or maybe yours was unusual :-)

    Or maybe I have been less lucky in the manufacturers, or the type of
    machine. Or, as I suggested above, maybe it's more a thing of the past
    and not so relevant now.


    On the other hand, I routinely see IT personal at work spending several
    hours installing non-OEM Windows, esp. on laptops and servers. On
    desktops it tends to be less bad.


    It's all a matter of the hardware, and what is supported out of the box
    and what needs external drivers. Windows is definitely improving in
    that area, but has a very long way to go to reach the convenience of
    Linux. (But if your favourite Linux distribution doesn't have a driver
    for the hardware in question, you generally have a lot more effort than
    you do compared to Windows missing the the driver. I've yet to meet the perfect OS.)

    But I think one of the most "entertaining" cases I had was installing
    Windows Server on a Dell server some years back. The Dell machine had
    only USB 3 slots, while Windows Server did not have native support for
    anything beyond USB 2. So you could get through the first part of the installation fine, as the installer uses the BIOS for keyboard, mouse
    and USB disk services. Then it switches to its own drivers and has no
    access to the keyboard, mouse, or USB drives. (Dell had a solution, of
    course, but it was a fair bit of extra fuss.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lew Pitcher@21:1/5 to Malcolm McLean on Thu May 30 16:00:07 2024
    On Thu, 30 May 2024 16:50:27 +0100, Malcolm McLean wrote:

    On 30/05/2024 12:31, David Brown wrote:
    On 30/05/2024 02:18, bart wrote:

    IME installing Linux is faster and simpler than installing Windows on
    almost any hardware.  The only drivers that have been an issue for me
    for decades is for very new Wireless interfaces.


    So I wanted to add audio to Baby X.

    IIRC, the X11 protocol does not support an audio stream, and networked
    audio is handled by other servers and protocols.

    How did you tie audio into Baby X?

    And I stole an MP3 decoder from
    Fabrice Bellard of tcc fame, and it took an afternoon to get audio up
    and running under Baby X on Windows. Then do the samefor Linux. And it
    was a complete nightmare, and it still isn't fit to push.




    --
    Lew Pitcher
    "In Skills We Trust"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Michael S on Thu May 30 14:04:39 2024
    On 5/30/2024 3:40 AM, Michael S wrote:
    On Thu, 30 May 2024 00:40:07 -0400
    Paul <nospam@needed.invalid> wrote:

    On 5/28/2024 6:24 PM, Michael S wrote:
    On Tue, 28 May 2024 23:08:22 +0100
    bart <bc@freeuk.com> wrote:

    On 28/05/2024 21:23, Michael S wrote:
    On Tue, 28 May 2024 19:57:38 +0100
    bart <bc@freeuk.com> wrote:


    OK, I had go with your program. I used a random data file of
    exactly 100M bytes.

    Runtimes varied from 4.1 to 5 seconds depending on compiler. The
    fastest time was with gcc -O3.


    It sounds like your mass storage device is much slower than aging
    SSD on my test machine and ALOT slower than SSD of David Brown.



    My machine uses an SSD.

    SSDs are not created equal. Especially for writes.


    However the tests were run on Windows, so I ran your program again
    under WSL; now it took 14 seconds (using both gcc-O3 and gcc-O2).



    3 times slower ?!
    I never tested it myself, but I heard that there is a significant
    difference in file access speed between WSL's own file system and
    mounted Windows directories. The difference under WSL is not as big
    as under WSL2 where they say that access of mounted Windows
    filesystem is very slow, but still significant.
    I don't know if it applies to all file sizes or only to accessing
    many small files.

    WSL uses containers, <snip>

    It seems, you are discussing a speed of access and methods of access
    from the host side. My question is opposite - is access from Linux
    guest to Windows host files running at the same speed as Linux (WSL,
    not WSL2) guest to its own file system?
    I heard that it isn't, but it was not conclusive and with insufficient details. I am going to test our specific case of big files. Now.

    WSL Ubuntu20.04 version 2

    This is a test of reaching Windows mounts from Linux, the C: drive in particular.
    C: is stored on a 3.5GB/sec NVMe in this case. At the very least, this
    path is a non-paravirtualized path, just an ordinary hypervisor passing
    sectors through.

    user@MACHINE:/mnt/c/Users/user/AppData/Local/Packages/
    CanonicalGroupLimited.Ubuntu20.04onWindows_79rhkp1fndgsc/LocalState$
    time dd if=ext4.vhdx of=/dev/null bs=1048576
    4769+0 records in
    4769+0 records out
    5000658944 bytes (5.0 GB, 4.7 GiB) copied, 20.2495 s, 247 MB/s

    real 0m20.268s
    user 0m0.000s
    sys 0m0.395s

    *******

    Whereas this is a test of the container ext4.vhdx and materials inside it.
    If this is paravirtualization, it's damn fast. The NVMe only does 3.5GB/sec (and of course that depends on the PCIe buffer size in the hub, where early PCIe hardwares only ran at 50% link rate due to "tiny buffers"). The test
    file is random numbers, in order that "real space" be taken up in the
    container ext4.vhdx .

    user@MACHINE:/home/user/Downloads$

    219 echo 3 | sudo tee /proc/sys/vm/drop_caches
    220 top
    221 history
    user@MACHINE:/home/user/Downloads$ time dd if=random.bin of=/dev/null bs=1048576
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.548747 s, 2.0 GB/s

    real 0m0.560s
    user 0m0.000s
    sys 0m0.301s
    user@MACHINE:/home/user/Downloads$

    Notice I dropped my caches (which on modern Linux is mostly a waste
    of time, as there still seem to be caches in there -- benching and
    eliminating caches is tough to do now). I verified in "top", that
    the command took, and some cached material dropped. But it made
    no difference at all to the test results.

    Conclusion: If you're working your 100MB file in ~/Downloads in WSL, things
    should be damn fast, as fast as the media can manage in the
    case of hard drives. Whereas accessing /mnt/c is not nearly as fast.

    Note: In the above, the container is being used two ways. In the first test, the
    "outside" of the container is being sampled via a /mnt/c mount. The second test
    is materials inside the container, coming through a different software stack
    in the hypervisor supporting this stuff. WSL is HyperV just as VirtualBox on
    Windows is now forced to be a HyperV client. So whatever properties are exposed
    above, some of them are traceable to HyperV, and might show up if you used
    HyperV for hosting a Linux guest instead.

    There are no block diagrams of the WSL era, the built-in Linux kernel or whatever,
    that I can find on learn.microsoft.com . Only an earlier diagram, before virtualization
    was used in earnest, is available. The "main OS" on your computer, is virtualized.
    It's not actually physical. That's because HyperV is an inverted hypervisor and not
    a conventional one. Can I prove that ? Of course not. Just the single diagram
    I've seen, hints at it. The Windows 11 Task Manager is "trash", to put it lightly,
    and is now an abomination. Worthless. It makes you wonder what the staff at MS use
    for visualizing what is going on. I use the Kill-O-Watt power meter connected to this
    PC, to tell me whether anything is going on under the hood, or whether it is truly idle.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to David Brown on Thu May 30 14:20:43 2024
    On 5/30/2024 9:05 AM, David Brown wrote:
    On 29/05/2024 23:08, bart wrote:
    On 28/05/2024 16:34, David Brown wrote:
    On 28/05/2024 13:41, Michael S wrote:

    Let's start another round of private parts' measurements turnament!
    'xxd -i' vs DIY


    I used 100 MB of random data:

    dd if=/dev/urandom bs=1M count=100 of=100MB

    I compiled your code with "gcc-11 -O2 -march=native".

    I ran everything in a tmpfs filesystem, completely in ram.


    xxd took 5.4 seconds - that's the baseline.

    Your simple C code took 4.35 seconds.  Your second program took 0.9 seconds - a big improvement.

    One line of Python code took 8 seconds :

    print(", ".join([hex(b) for b in open("100MB", "rb").read()]))

    That one took 90 seconds on my machine (CPython 3.11).

    A slightly nicer Python program took 14.3 seconds :

    import sys
    bs = open(sys.argv[1], "rb").read()
    xs = "".join([" 0x%02x," % b for b in bs])
    ln = len(xs)
    print("\n".join([xs[i : i + 72] for i in range(0, ln, 72)]))

    This one was 104 seconds (128 seconds with PyPy).

    This can't be blamed on the slowness of my storage devices, or moans about Windows, because I know that amount of data (the output is 65% bigger because of using hex format) could be processed in a couple of a seconds using a fast native code program.

    It's just Python being Python.

    I have two systems at work with close to identical hardware, both about 10 years old.  The Windows one has a little bit faster disk, the Linux one has more memory, but the processor is the same.  The Windows system is Win7 and as old as the machine,
    while the Linux system was installed about 6 years ago.  Both machines have a number of other programs open (the Linux machine has vastly more), but none of these are particularly demanding when not in direct use.

    On the Linux machine, that program took 25 seconds (with python 3.7). On the Windows machine, it took 48 seconds (with python 3.8).  In both cases, the source binary file was recently written and therefore should be in cache, and both the source and
    destination were on the disk (ssd for Windows, hd for Linux).

    Python throws all this kind of stuff over to the C code - it is pretty good at optimising such list comprehensions.  (But they are obviously still slower than carefully written native C code.)  If it were running through these loops with the
    interpreter, it would be orders of magnitude slower.

    So what I see from this is that my new Linux PC took 14 seconds while my old Linux PC took 25 seconds - it makes sense that the new processor is something like to 80% faster than the old one for a single-threaded calculation.  And Windows (noting that
    this is Windows 7, not a recent version of Windows) doubles that time for some reason.


    (I have had reason to include a 0.5 MB file in a statically linked single binary - I'm not sure when you'd need very fast handling of multi-megabyte embeds.)

    I have played with generating custom executable formats (they can be portable between OSes, and I believe less visible to AV software), but they require a normal small executable to launch them and fix them up.

    To give the illusion of a conventional single executable,  the program needs to be part of that stub file.

    There are a few ways of doing it, like simply concatenating the files, but extracting is slightly awkward. Embedding as data is one way.


    Sure.

    The typical use I have is for embedded systems where there is a network with a master card and a collection of slave devices (or perhaps multiple microcontrollers on the same board).  A software update will typically involve updating the master board
    and have that pass on updates to the other devices.  So the firmware for the other devices will be built into the executable for the master board.

    Another use-case is small web servers in program, often for installation, monitoring or fault-finding.  There are fixed files such as index.html, perhaps a logo, and maybe jquery or other javascript library file.

    Did you turn off Windows Defender while benching ?

    [Picture]

    https://i.postimg.cc/QCgLJLHQ/windows11-AV-off-control.gif

    Benching on Windows is an art, because of all the crap going
    on under the hood.

    I've had programs slowed to 1/8th normal speed to 1/20th normal
    speed, by forgetting to turn off a series of things. Once all
    that is done, now you're getting into the same ballpark as Linux.

    I also have to turn off the crap salad in Windows, when Windows Update
    is running!!! The OS is too stupid to optimize conditions for its
    own activity. My laptop for example, ran out of RAM, because "SearchApp"
    was eating a three-course meal while I was working. Attempting to kill
    that mother, caused the incoming Update to install at closer to normal
    speed.

    It takes practice to get good at benching modern Windows. On
    an OS like Windows 2000, it was always ready to bench. It came
    with no AV. It had no secret agenda. It just worked. Each succeeding
    version is more of a nightmare.

    Imagine when the local AI is running on the machine, and the power
    consumption is 200W while it "listens to your voice". At least they're
    staying true to their design principles.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Paul on Thu May 30 19:47:09 2024
    Paul <nospam@needed.invalid> writes:
    On 5/30/2024 3:40 AM, Michael S wrote:
    On Thu, 30 May 2024 00:40:07 -0400

    Notice I dropped my caches (which on modern Linux is mostly a waste
    of time, as there still seem to be caches in there -- benching and >eliminating caches is tough to do now).

    Use

    $ dd if=random.bin of=/dev/null bs=1m conv=direct

    This will bypass all kernel and libc buffering. That
    said, O_DIRECT isn't quite as good as opening a traditional
    unix raw device (where the I/O bypasses kernel buffering
    completely) as O_DIRECT on linux may still do a copy to avoid having to
    lock down the user buffer during the I/O.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Paul on Thu May 30 22:31:40 2024
    On Thu, 30 May 2024 14:04:39 -0400
    Paul <nospam@needed.invalid> wrote:


    WSL Ubuntu20.04 version 2


    Are you sure that you tested WSL, not WLS-2?
    Your results looks very much like WLS2.
    Your explanationns sound very much as if you are talking about WSL-2.

    My WSL testing results are opposit from yours - read speed identical,
    write speed consitently faster when writing to /mnt/d/... then when
    writing to WSL's native FS.
    Part of the reason could be that SSD D: is physically faster than SSD
    C: that hosts WSL. I should have tested with /mnt/c as well, but
    forgot to do it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Paul on Fri May 31 09:55:49 2024
    (Your Usenet client is really messing up linebreaks. I don't know what
    you are using, but I haven't seen such problems since google groups
    postings.)

    On 30/05/2024 20:20, Paul wrote:
    On 5/30/2024 9:05 AM, David Brown wrote:
    On 29/05/2024 23:08, bart wrote:
    On 28/05/2024 16:34, David Brown wrote:
    On 28/05/2024 13:41, Michael S wrote:

    So what I see from this is that my new Linux PC took 14 seconds
    while my old Linux PC took 25 seconds - it makes sense that the new
    processor is something like to 80% faster than the old one for a
    single-threaded calculation. And Windows (noting that this is
    Windows 7, not a recent version of Windows) doubles that time for
    some reason.



    Did you turn off Windows Defender while benching ?

    [Picture]

    https://i.postimg.cc/QCgLJLHQ/windows11-AV-off-control.gif


    I don't have that kind of stuff turned on - so no need to turn it off.
    The trick to keeping Windows malware-free is not to run malware on it.

    Benching on Windows is an art, because of all the crap going
    on under the hood.

    Yes, it can be. But I've done it before :-) And there's less of that
    in Windows 7 than Windows 11. (And for balance, there are plenty of
    background processes on typical Linux desktops too.)

    None of my testing here was accurate benchmarking, it was just ballpark figures. There was lots of other stuff running on all the machines, but
    with very low average cpu usage on a 4-core cpu, it doesn't make a big difference.


    I've had programs slowed to 1/8th normal speed to 1/20th normal
    speed, by forgetting to turn off a series of things. Once all
    that is done, now you're getting into the same ballpark as Linux.

    Does that mean you are happy running normal programs at these speeds?
    If you have all this background stuff running when you are not
    benchmarking, but simply working on the computer, then surely it has a
    similar effect for your compiler, IDE, browser, and whatever else you
    are doing? I know some people run multiple anti-virus and other
    "security" programs that slow down some tasks on Windows, but not /that/
    much.


    I also have to turn off the crap salad in Windows, when Windows Update
    is running!!!

    Why would Windows Update be running? In particular, why would it be
    running when you are using the machine for other purposes?

    The OS is too stupid to optimize conditions for its
    own activity. My laptop for example, ran out of RAM, because "SearchApp"
    was eating a three-course meal while I was working. Attempting to kill
    that mother, caused the incoming Update to install at closer to normal
    speed.

    It takes practice to get good at benching modern Windows. On
    an OS like Windows 2000, it was always ready to bench. It came
    with no AV. It had no secret agenda. It just worked. Each succeeding
    version is more of a nightmare.

    Imagine when the local AI is running on the machine, and the power consumption is 200W while it "listens to your voice". At least they're staying true to their design principles.


    Imagine turning off (or never enabling) the services that you don't find
    useful and can be a significant drain. I always disable Windows
    updates, indexing services, and would never have a "voice AI" on a
    computer. Linux does not have anything like as much of this kind of
    nonsense on normal desktops (though I believe Ubuntu had some nasty
    automatic search systems for a while). The only one I can think of is "updatedb" for the "locate" command. While "locate" can sometimes be
    useful, trawling the filesystem can be very time-consuming if it is
    large. But it's easy to tune updatedb to cover only the bits you need.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Fri May 31 13:45:07 2024
    On Fri, 31 May 2024 09:55:49 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    Imagine turning off (or never enabling) the services that you don't
    find useful and can be a significant drain. I always disable Windows updates, indexing services, and would never have a "voice AI" on a
    computer. Linux does not have anything like as much of this kind of
    nonsense on normal desktops (though I believe Ubuntu had some nasty
    automatic search systems for a while). The only one I can think of
    is "updatedb" for the "locate" command. While "locate" can sometimes
    be useful, trawling the filesystem can be very time-consuming if it
    is large. But it's easy to tune updatedb to cover only the bits you
    need.


    Most of the things that you mentioned above are not easy to achieve on
    Home Editions of Windows beyond 7.
    Some of them are not easy to achieve even on Pro edition.
    That's a major reason for me to remain on 7 for as long as I can.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Fri May 31 13:33:15 2024
    On 31/05/2024 12:45, Michael S wrote:
    On Fri, 31 May 2024 09:55:49 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    Imagine turning off (or never enabling) the services that you don't
    find useful and can be a significant drain. I always disable Windows
    updates, indexing services, and would never have a "voice AI" on a
    computer. Linux does not have anything like as much of this kind of
    nonsense on normal desktops (though I believe Ubuntu had some nasty
    automatic search systems for a while). The only one I can think of
    is "updatedb" for the "locate" command. While "locate" can sometimes
    be useful, trawling the filesystem can be very time-consuming if it
    is large. But it's easy to tune updatedb to cover only the bits you
    need.


    Most of the things that you mentioned above are not easy to achieve on
    Home Editions of Windows beyond 7.

    I've never used any home edition of Windows, so I will take your word
    for it.

    Some of them are not easy to achieve even on Pro edition.
    That's a major reason for me to remain on 7 for as long as I can.


    I have many reasons for staying with Windows 7, but I'll add that one to
    the list. (There are some things in Windows 11 that I would like to
    have, but not enough to change over.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Michael S on Fri May 31 15:20:02 2024
    On 5/30/2024 3:31 PM, Michael S wrote:
    On Thu, 30 May 2024 14:04:39 -0400
    Paul <nospam@needed.invalid> wrote:


    WSL Ubuntu20.04 version 2


    Are you sure that you tested WSL, not WLS-2?
    Your results looks very much like WLS2.
    Your explanationns sound very much as if you are talking about WSL-2.

    My WSL testing results are opposit from yours - read speed identical,
    write speed consitently faster when writing to /mnt/d/... then when
    writing to WSL's native FS.
    Part of the reason could be that SSD D: is physically faster than SSD
    C: that hosts WSL. I should have tested with /mnt/c as well, but
    forgot to do it.


    I can't test WSL, because it won't start. It throws an error.

    I used what I had.

    I am specifically trying to test on the
    box with the NVMe in it (to eliminate slower devices from the
    picture). I only own one NVMe and one slot to load it.

    *******

    As for your general problem, you can easily malloc a buffer
    for the entire file, and process the table as stored in RAM.
    That should help eliminate your variable file system overhead
    when benching.

    That's not scalable for general usage, but during benchmarking
    and fast prototyping stage, you might test with it. That way,
    moving the executable around, the filesystem component is removed.
    Or, the filesystem component can be timestamped if you want.
    I just send timestamps to stderr so they won't interfere with stdout.

    *******

    I just had a thought. If I use "df" in WSL2, the slash almost
    looks like it is on a TMPFS (Ram). That could be why I got 2GB/sec.
    Check in wsl environment, and using "df", check for evidence of
    how the file systems were set up there.

    $ df
    Filesystem 1K-blocks Used Available Use% Mounted on
    none 32904160 960 32903200 1% /run
    none 32904160 0 32904160 0% /run/lock
    none 32904160 0 32904160 0% /run/shm
    tmpfs 32904160 0 32904160 0% /sys/fs/cgroup
    ...
    C:\ 124493820 60595608 63898212 49% /mnt/c

    $ top

    top - 15:15:45 up 5 min, 1 user, load average: 0.00, 0.00, 0.00
    Tasks: 45 total, 1 running, 44 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 64265.9 total, 63170.6 free, 631.5 used, 463.8 buff/cache
    MiB Swap: 16384.0 total, 16384.0 free, 0.0 used. 63035.9 avail Mem

    Like a LiveDVD, the TMPFS is using up to a half of available RAM.
    It behaves the same way when you boot a LiveDVD.

    Your WSL instance, could have quite a different look to the mounts in "df".

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Paul on Sun Jun 2 04:16:37 2024
    On Thu, 30 May 2024 14:20:43 -0400, Paul wrote:

    Did you turn off Windows Defender while benching ?

    Isn’t that trusting that your benchmark isn’t virus-infected?

    Seems to defeat the point of real-world benchmarks, doesn’t it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Sun Jun 2 04:19:03 2024
    On Fri, 31 May 2024 09:55:49 +0200, David Brown wrote:

    The only one I can think of is
    "updatedb" for the "locate" command. While "locate" can sometimes be
    useful, trawling the filesystem can be very time-consuming if it is
    large. But it's easy to tune updatedb to cover only the bits you need.

    On Linux, there is the concept of “ionice”, which is to I/O what “nice” is
    to CPU usage. So for example if updatedb had its ionice dropped to “idle” priority, that allows it to be pushed to the back of the queue when
    regular apps need to do any I/O. Result is much less system impact from
    such background update tasks.

    Maybe there’s an option to set this somewhere?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Sun Jun 2 13:40:35 2024
    On 02/06/2024 06:19, Lawrence D'Oliveiro wrote:
    On Fri, 31 May 2024 09:55:49 +0200, David Brown wrote:

    The only one I can think of is
    "updatedb" for the "locate" command. While "locate" can sometimes be
    useful, trawling the filesystem can be very time-consuming if it is
    large. But it's easy to tune updatedb to cover only the bits you need.

    On Linux, there is the concept of “ionice”, which is to I/O what “nice” is
    to CPU usage. So for example if updatedb had its ionice dropped to “idle” priority, that allows it to be pushed to the back of the queue when
    regular apps need to do any I/O. Result is much less system impact from
    such background update tasks.

    Maybe there’s an option to set this somewhere?

    Since the updatedb task is (IME at least) started from a cron job, it's
    not hard to use ionice with it if that helps. For many systems,
    updatedb is run in the middle of the night anyway, and is not a problem.

    The worst trouble I've had with updatedb was on a file server that was
    used to hold archives - it was a small machine with little ram (but
    several big spinning rust disks), holding a great many files that were
    rarely accessed. It was mysteriously slow, until I discovered that
    every night it started an updatedb job that took so long to crawl
    through everything that it didn't always finish before the next run.
    ionice would not have helped at all - disabling updatedb was the answer.
    (Pruning the trees scanned by updatedb could also have helped, but I
    didn't need "locate" anyway.)

    There are other situations where ionice can be helpful, however, and I
    have used it a few times (though I can't remember offhand when).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 03:12:06 2024
    On Thu, 30 May 2024 10:01:42 +0200, David Brown wrote:

    In fact, I think /all/ the pre-built ones have Windows.

    I have set up two MSI Cubi 5 machines for friends, both with Linux Mint. Neither came with Windows, either in the box or preinstalled.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Malcolm McLean on Mon Jun 3 03:10:42 2024
    On Wed, 29 May 2024 22:46:56 +0100, Malcolm McLean wrote:

    Windows costs a fortune. And Microsoft spend billions
    developing it.

    It may not be quite as profitable as it once was. That is why Microsoft
    has been cutting corners on Windows QA lately, and now even resorting to
    force ads on Windows users, in an attempt to shore up sagging revenues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Paul on Mon Jun 3 03:15:27 2024
    On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

    WSL uses containers, so of course it is slow.

    WSL1 had a Linux “personality” on top of the NT kernel. So this was emulation, not containers.

    WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

    Linux has containers, which are based entirely on namespace isolation (and cgroups for process management). These are all standard kernel mechanisms,
    so there should be very little overhead in using them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 08:57:16 2024
    On 03/06/2024 05:15, Lawrence D'Oliveiro wrote:
    On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

    WSL uses containers, so of course it is slow.

    WSL1 had a Linux “personality” on top of the NT kernel. So this was emulation, not containers.

    WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

    Linux has containers, which are based entirely on namespace isolation (and cgroups for process management). These are all standard kernel mechanisms,
    so there should be very little overhead in using them.

    I can't answer for WSL, having not used it myself. But I have used
    Linux containers of various sorts since OpenVZ (and even chroot jails
    before that), and there's no doubt that the overhead is usually negligible.

    The whole deal with containers is that everything runs on the same
    kernel, but with different namespaces and file system root. If WSL were
    to work by containers, it would need to run the Linux processes as
    processes under the NT kernel. I suppose that might be possible, with a translation layer for all system API calls. After all, you can run
    Windows processes on Linux with Wine - perhaps a similar principle can
    work for Windows?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Mon Jun 3 07:54:15 2024
    On Wed, 29 May 2024 14:38:14 +0200, Bonita Montero wrote:

    Am 29.05.2024 um 14:10 schrieb David Brown:

    I've seen odd things with timings due to Windows' relatively poor IO,
    file and disk handling.  Many years ago when I had need of
    speed-testing some large windows-based build system, I found it was
    faster running in a virtual windows machine on VirtualBox on a Linux
    host, than in native Windows on the same hardware.

    Windows kernel I/O is rather efficient ...

    But you can’t get to it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Mon Jun 3 07:49:00 2024
    On Wed, 29 May 2024 21:31:54 +0100, bart wrote:

    Conclusion: beating xxd is apparently not hard if even a scripting
    language can do so. I wonder what slows it down?

    It’s written in a very stdio-dependent vanilla C style.

    Have a look at the source for yourself. It’s part of the “vim” package on Debian and no doubt other distros. The xxd binary itself is built from a
    single source file of just some 1200 lines, and the hex-to-binary
    conversion is done in a function called “huntype” of just 133 lines.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 07:51:43 2024
    On Thu, 30 May 2024 14:14:21 +0200, David Brown wrote:

    Standard PC's and laptops are very low-margin products.

    I think the most expensive components in a typical Windows PC are the
    Intel/AMD CPU and the Windows OS. So Microsoft and Intel (and possibly AMD
    as well) are making money hand over fist, while the PC vendor itself has
    to endure a net margin of maybe 1-2%.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 07:59:46 2024
    On Mon, 3 Jun 2024 08:57:16 +0200, David Brown wrote:

    If WSL were to work by containers, it would need to run the Linux
    processes as processes under the NT kernel. I suppose that might be possible, with a translation layer for all system API calls. After all,
    you can run Windows processes on Linux with Wine - perhaps a similar principle can work for Windows?

    Microsoft tried to get Docker--a well-known Linux container technology-- working under Windows, but gave up.

    “Containers” are not actually a primitive facility that the Linux kernel offers: what it does offer are “namespaces” and “cgroups”. There are maybe
    a dozen different kinds of “containers” you can get that are built on top of these, such as LXC, LXD, systemd-nspawn, Docker and no doubt others I haven’t even heard of.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 11:02:48 2024
    On Mon, 3 Jun 2024 03:15:27 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

    WSL uses containers, so of course it is slow.

    WSL1 had a Linux “personality” on top of the NT kernel. So this was emulation, not containers.

    WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

    Linux has containers, which are based entirely on namespace isolation
    (and cgroups for process management). These are all standard kernel mechanisms, so there should be very little overhead in using them.

    The word "container" has many meanings.
    As far as host FS is concerned, guest FS is a one huge file. Despite
    very different tech under the hood it equally applies both to WSL and
    to WSL-2. Calling this file 'container' sounds like proper use of the
    term.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 10:57:28 2024
    On Mon, 3 Jun 2024 03:12:06 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Thu, 30 May 2024 10:01:42 +0200, David Brown wrote:

    In fact, I think /all/ the pre-built ones have Windows.

    I have set up two MSI Cubi 5 machines for friends, both with Linux
    Mint. Neither came with Windows, either in the box or preinstalled.

    That is what I had seen too. Mini-PCs appear very different from
    laptops (and from AOI desktops?) in that regard. Windows does *not*
    appear subsidized on this type of computers.
    May be, in Norway it's different, I heard that it depends on importers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Mon Jun 3 08:29:05 2024
    On Thu, 30 May 2024 16:09:00 +0200, David Brown wrote:

    For new systems that don't have the
    legacy requirements, customers will wonder why they should buy one of
    these when something like PostgreSQL is free, has most of the features (including its own unique ones), and will happily scale to the huge
    majority of database needs.

    Open-Source software offers its own unique capabilities. For example,
    Microsoft Office offers (at extra cost) that Access database, which has limitations that make SQLite seem powerful. (OK, so SQLite *is* pretty powerful.)

    LibreOffice Base has an equivalent DBMS backend. But it can also interface
    to databases in SQLite, MySQL/MariaDB and no doubt others I haven’t tried. Though you do have to pay the $0 extra monthly fee for the “Pro” version.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to bart on Mon Jun 3 08:31:17 2024
    On Thu, 30 May 2024 12:23:38 +0100, bart wrote:

    Hardly anybody uses the WinAPI directly.

    What happened to WinRT?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Mon Jun 3 13:01:45 2024
    On Mon, 3 Jun 2024 07:49:00 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Wed, 29 May 2024 21:31:54 +0100, bart wrote:

    Conclusion: beating xxd is apparently not hard if even a scripting
    language can do so. I wonder what slows it down?

    It’s written in a very stdio-dependent vanilla C style.


    So are all our [much much faster] mini-utils.

    Have a look at the source for yourself. It’s part of the “vim”
    package on Debian and no doubt other distros. The xxd binary itself
    is built from a single source file of just some 1200 lines, and the hex-to-binary conversion is done in a function called “huntype” of
    just 133 lines.

    The question was about binary-to-hex rather than hex-to-binary.

    BTW, it seems that 'xxd -i -r' does not work at all. Or, at least, I was
    unable to figure out right combination of flags.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Michael S on Mon Jun 3 14:41:43 2024
    On 6/3/2024 4:02 AM, Michael S wrote:
    On Mon, 3 Jun 2024 03:15:27 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Thu, 30 May 2024 00:40:07 -0400, Paul wrote:

    WSL uses containers, so of course it is slow.

    WSL1 had a Linux “personality” on top of the NT kernel. So this was
    emulation, not containers.

    WSL2 uses Hyper-V to run Linux inside a VM. Again, not containers.

    Linux has containers, which are based entirely on namespace isolation
    (and cgroups for process management). These are all standard kernel
    mechanisms, so there should be very little overhead in using them.

    The word "container" has many meanings.
    As far as host FS is concerned, guest FS is a one huge file. Despite
    very different tech under the hood it equally applies both to WSL and
    to WSL-2. Calling this file 'container' sounds like proper use of the
    term.


    I finally found a slightly older Win10 setup on an SSD.
    It has WSL1.

    It's uncontained. I put Ubuntu 18.04 in WSL1, because it had
    no distro. This is an example of a file in the slash tree, in /usr/lib . Permissions are restricted in the tree, and Everything.exe search
    tool, I think it failed to index this tree of files.

    C:\Users\Bullwinkle\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu18.04LTS_79rhkp1fndgsc\
    LocalState\rootfs\usr\lib\x86_64-linux-gnu\perl5\5.26\vars.pm

    A process called "init" can be seen running in Task Manager.

    This is a test of the / tree for speed.

    bullwinkle@DRAX:/$ dd if=testfile.bin of=/dev/null bs=1048576
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.06495 s, 520 MB/s bullwinkle@DRAX:/$ ls -al testfile.bin
    -rw-rw-rw- 1 root root 1073741824 Jun 3 13:57 testfile.bin
    bullwinkle@DRAX:/$

    The /mnt/c speed is next. I didn't bother shutting off Windows Defender for this.
    It's close enough to device speed (SATA SSD), it's basically the same speed
    as the other test.

    bullwinkle@DRAX:/mnt/c/users/bullwinkle/Downloads$ dd if=WIN10-WADK.7z of=/dev/null bs=1048576
    3453+1 records in
    3453+1 records out
    3621128316 bytes (3.6 GB, 3.4 GiB) copied, 7.41205 s, 489 MB/s bullwinkle@DRAX:/mnt/c/users/bullwinkle/Downloads$ ls -al WIN10-WADK.7z -rwxrwxrwx 1 bullwinkle bullwinkle 3621128316 May 10 2021 WIN10-WADK.7z

    [Picture]

    https://i.postimg.cc/Y2RQd3LM/wsl1-with-ubuntu-1804-and-XMing.gif

    *******

    WSL2 on the machine I'm typing on, uses "ext4.vhdx" currently (6,698,303,488 bytes).
    And that is a container. Instead of "init", "vmmemWSL" can be seen running. It does not use third-party XMing Xserver, and uses WSLg instead for graphics. Either
    of the two setups can run Firefox browser.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Tue Jun 4 02:07:59 2024
    On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

    MSVC quitted compilation after allocating 50GB of
    memory, gcc and clang compiled for minutes.

    Next time, don’t even bother with MSVC.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Tue Jun 4 02:07:01 2024
    On Mon, 3 Jun 2024 11:02:48 +0300, Michael S wrote:

    On Mon, 3 Jun 2024 03:15:27 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    Linux has containers, which are based entirely on namespace isolation
    (and cgroups for process management). These are all standard kernel
    mechanisms, so there should be very little overhead in using them.

    The word "container" has many meanings.
    As far as host FS is concerned, guest FS is a one huge file.

    That may be true for Docker, certainly not (necessarily) true for the
    others I mentioned.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Bonita Montero on Mon Jun 3 22:46:07 2024
    On 6/3/2024 12:39 PM, Bonita Montero wrote:
    There's a good reason for sth. like #embed: I just wrote a very quick
    xxd alternative and generated a C-file with a char-array and the size
    of this file is 1,2GB. MSVC quitted compilation after allocating 50GB
    of memory, gcc and clang compiled for minutes. This would be better
    with an #embed-Tag. So there's really good reason for that.
    This is my xxd-substitude. Compared to xxd it only can dump C-files.
    On my PC it's about 15 times faster than xxd because it does its own I/O-buffering.

    #include <iostream>
    #include <fstream>
    #include <charconv>
    #include <span>
    #include <vector>

    using namespace std;

    int main( int argc, char **argv )
    {
        if( argc < 4 )
            return EXIT_FAILURE;
        char const
            *inFile = argv[1],
            *symbol = argv[2],
            *outFile = argv[3];
        ifstream ifs;
        ifs.exceptions( ifstream::failbit | ifstream::badbit );     ifs.open( inFile, ifstream::binary | ifstream::ate );     streampos size( ifs.tellg() );
        if( size > (size_t)-1 )
            return EXIT_FAILURE;
        ifs.seekg( ifstream::beg );
        union ndi { char c; ndi() {} };
        vector<ndi> rawBytes( size );
        span<char> bytes( &rawBytes.data()->c, rawBytes.size() );     ifs.read( bytes.data(), bytes.size() );
        ofstream ofs;
        ofs.exceptions( ofstream::failbit | ofstream::badbit );     ofs.open( outFile );
        vector<ndi> rawBuf( 0x100000 );
        span<char> buf( &rawBuf.begin()->c, rawBuf.size() );
        ofs << "unsigned char " << symbol << "[" << (size_t)size << "] = \n{\n";
        auto rd = bytes.begin();
        auto wrt = buf.begin();
        auto flush = [&]
        {
            ofs.write( buf.data(), wrt - buf.begin() );
            wrt = buf.begin();
        };
        while( rd != bytes.end() )
        {
            size_t remaining = bytes.end() - rd;
            constexpr size_t N_LINE = 1 + 12 * 6 - 1 + 1;
            size_t n = remaining > 12 ? 12 : remaining;
            auto rowEnd = rd + n;
            *wrt++ = '\t';
            do
            {
                *wrt++ = '0';
                *wrt++ = 'x';
                char *wb = to_address( wrt );
                (void)(wrt + 2);
                auto tcr = to_chars( wb, wb + 2, (unsigned char)*rd++, 16 );
                if( tcr.ptr == wb + 1 )
                    wb[1] = wb[0],
                    wb[0] = '0';
                wrt += 2;
                if( rd != bytes.end() )
                {
                    *wrt++ = ',';
                    if( rd != rowEnd )                     *wrt++ = ' ';
                }
            } while( rd != rowEnd );
            *wrt++ = '\n';
            if( buf.end() - wrt < N_LINE )
                flush();
        }
        flush();
        ofs << "};\n" << endl;
    }

    It would be nice to see small samplings of the files used
    for input and output. Just a couple lines of the "meat" section
    would be sufficient.

    For anything larger, https://pastebin.com/ could hold up to 512KB
    of text, for a small sample of each. Some of the USENET servers
    have limits on message size. For a binary file, you can just use
    the hex editor representation of the binary file.

    Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

    00000000 4D 5A 50 00 02 00 00 00 04 00 0F 00 FF FF 00 00 MZP.........ÿÿ.. 00000010 B8 00 00 00 00 00 00 00 40 00 1A 00 00 00 00 00 ¸.......@....... 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 ................ 00000040 BA 10 00 0E 1F B4 09 CD 21 B8 01 4C CD 21 90 90 º....´.Í!¸.LÍ!..
    00000050 54 68 69 73 20 70 72 6F 67 72 61 6D 20 6D 75 73 This program mus 00000060 74 20 62 65 20 72 75 6E 20 75 6E 64 65 72 20 57 t be run under W 00000070 69 6E 33 32 0D 0A 24 37 00 00 00 00 00 00 00 00 in32..$7........

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Bonita Montero on Tue Jun 4 03:58:49 2024
    On Tue, 4 Jun 2024 04:46:59 +0200, Bonita Montero wrote:

    Am 04.06.2024 um 04:07 schrieb Lawrence D'Oliveiro:

    On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

    MSVC quitted compilation after allocating 50GB of
    memory, gcc and clang compiled for minutes.

    Next time, don’t even bother with MSVC.

    MSVC has the most conforming C++20-frontend.

    Somehow I find that hard to believe <https://gcc.gnu.org/gcc-14/changes.html#cxx>.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 09:52:04 2024
    On 04/06/2024 05:58, Lawrence D'Oliveiro wrote:
    On Tue, 4 Jun 2024 04:46:59 +0200, Bonita Montero wrote:

    Am 04.06.2024 um 04:07 schrieb Lawrence D'Oliveiro:

    On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

    MSVC quitted compilation after allocating 50GB of
    memory, gcc and clang compiled for minutes.

    Next time, don’t even bother with MSVC.

    MSVC has the most conforming C++20-frontend.

    Somehow I find that hard to believe <https://gcc.gnu.org/gcc-14/changes.html#cxx>.

    For a more independent source, look at:

    <https://en.cppreference.com/w/cpp/compiler_support/20>

    Given the position of "cppreference.com" as the reference site
    recommended by the C++ committee, that's about as close to an "official"
    list as you can reasonably get.

    And it shows that gcc has had most of C++20 language support since
    version 9 or 10, and by version 11 there are just minor issues left.
    (There are /always/ a few minor issues.) Equally, MSVC supports
    everything but a few minor issues. (So does clang.)

    It's a different matter going forward - for C++23, gcc and clang are
    basically complete, while MSVC has just got started on the language
    features (they are doing well for the C++23 library).


    More relevant for /this/ group would be C support:

    <https://en.cppreference.com/w/c/compiler_support/23>

    gcc and clang have implemented most of C23 already, while MSVC has just
    a couple of small bits backported from recent C++ standards. (And
    nobody has #embed yet!)


    I am sure MSVC has lots to recommend it as a C++ tool for Windows. But
    rapid support for new standards is /not/ a strong point in comparison to
    the other big toolchains.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Lawrence D'Oliveiro on Tue Jun 4 11:01:56 2024
    On Tue, 4 Jun 2024 02:07:59 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

    On Mon, 3 Jun 2024 18:39:55 +0200, Bonita Montero wrote:

    MSVC quitted compilation after allocating 50GB of
    memory, gcc and clang compiled for minutes.

    Next time, don’t even bother with MSVC.

    For smaller file (but still much bigger than what is likely to be
    encountered in practice, 155 MB binary, 641 MB after conversion to
    text) MSVC was both ~1.5x faster than gcc and had lower peak memory consumption.
    And that was a "new", slow MSVC. "Old", faster MSVC should fare even
    better, if I ever dare to install it on computer with enough RAM.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)