• Compiler speed testing

    From James Harris@21:1/5 to All on Sat Nov 5 15:03:44 2022
    Any thoughts on what would be in a good piece of source code to use to
    test a compiler's speed?

    I guess I could do with something that's (1) realistic, not just an
    assignment statement repeated over and over again, (2) large enough to
    give lengthy timings, (3) easy enough to make, and (4) it would be valid
    over multiple generations of the compiler. But what should be in it?

    Here are some comparative timings taken just now. They read and compile
    my parser (as it's the largest source file in the compiler). It is about
    18k in size and from it the compiler produces am asm file of a little
    over 100k.

    cda 566 ms
    cdb 540 ms
    cdc 600 ms
    cdd 24 ms

    The reason for the gratifying jump in performance at the end is that I
    added input and output buffering to cdd but it's got me wondering about
    testing the times taken by future compilers.

    I should say that I don't expect compile times always to improve. The
    run times of later compilers would likely go up as I add facilities and
    down as I switch to other mechanisms. But it would still be something
    I'd like to keep an eye on.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Nov 5 16:51:23 2022
    On 05/11/2022 15:03, James Harris wrote:
    Any thoughts on what would be in a good piece of source code to use to
    test a compiler's speed?

    I guess I could do with something that's (1) realistic, not just an assignment statement repeated over and over again, (2) large enough to
    give lengthy timings, (3) easy enough to make, and (4) it would be valid
    over multiple generations of the compiler. But what should be in it?

    Here are some comparative timings taken just now. They read and compile
    my parser (as it's the largest source file in the compiler). It is about
    18k in size and from it the compiler produces am asm file of a little
    over 100k.

    cda   566 ms
    cdb   540 ms
    cdc   600 ms
    cdd    24 ms

    The reason for the gratifying jump in performance at the end is that I
    added input and output buffering to cdd but it's got me wondering about testing the times taken by future compilers.

    I should say that I don't expect compile times always to improve. The
    run times of later compilers would likely go up as I add facilities and
    down as I switch to other mechanisms. But it would still be something
    I'd like to keep an eye on.


    Your figures are too confused to comment on meaningfully. For a start,
    I'd like to know the reason for that 25 times speedup! What was it
    spending 96% of its time doing in those earlier versions?

    I don't believe there's any real buffering involved in 18KB input.

    But I'm also used to working with line counts rather than file sizes;
    how many lines was the input, and how many lines was the output?

    Generating ASM source is usually somewhat slower than writing binary,
    but it should still be fast enough (it becomes an issue if you want the
    fastest possible speeds).


    I can tell you that my current compiler manages some 700K lines per
    second (this is source code to EXE) on the test file in my second link
    below.

    If I tell it to generate ASM, it drops to 260K lines per second (taking
    2.8 seconds to generate 2.2M lines. (The input file is 9MB, and the
    generated ASM is 87MB; the EXE was 8MB.)

    Since I now once again support a C target, I can accelerate it further
    by some 30-40%, so just pushing 1Mlps on that first test.


    But ultimately, compilation just has to be fast enough not to be a
    nuisance (beyond that it's just a sport).

    Your 24ms sounds fine. 600ms would be mildly annoying for me, but is
    still tolerable. For a while I ran an /interpreted/ compiler taking up
    to half a second per module, but that was with independent compilation.

    Since I now work with whole program compilers, the speed is a lot more important. Currently, no applications of mine take more than 100ms, once file-cacheing comes into play.



    Here are links to tests that I'm sure I posted before:


    https://github.com/sal55/langs/blob/master/Compilertest1.md

    https://github.com/sal55/langs/blob/master/Compilertest3.md

    These were all done on a slower machine. My current one (where I can get
    1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I think
    is still low-end.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 5 17:54:14 2022
    On 05/11/2022 16:51, Bart wrote:
    On 05/11/2022 15:03, James Harris wrote:
    Any thoughts on what would be in a good piece of source code to use to
    test a compiler's speed?

    I guess I could do with something that's (1) realistic, not just an
    assignment statement repeated over and over again, (2) large enough to
    give lengthy timings, (3) easy enough to make, and (4) it would be
    valid over multiple generations of the compiler. But what should be in
    it?

    Here are some comparative timings taken just now. They read and
    compile my parser (as it's the largest source file in the compiler).
    It is about 18k in size and from it the compiler produces am asm file
    of a little over 100k.

    cda   566 ms
    cdb   540 ms
    cdc   600 ms
    cdd    24 ms

    The reason for the gratifying jump in performance at the end is that I
    added input and output buffering to cdd but it's got me wondering
    about testing the times taken by future compilers.

    I should say that I don't expect compile times always to improve. The
    run times of later compilers would likely go up as I add facilities
    and down as I switch to other mechanisms. But it would still be
    something I'd like to keep an eye on.


    Your figures are too confused to comment on meaningfully. For a start,
    I'd like to know the reason for that 25 times speedup! What was it
    spending 96% of its time doing in those earlier versions?

    In a word, IO. It was reading and writing one character at a time -
    which was enough to start with. You may remember we discussed this a few
    years ago:

    https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ


    I don't believe there's any real buffering involved in 18KB input.

    It was definitely buffering. With the new compiler if I change the
    buffer size to 1 then it takes as long as it did before. Here's how the
    run time changes with different buffer sizes:

    bufsize, approx time in ms
    1, 650
    2, 340
    3, 240
    4, 180
    8, 110
    16, 70
    32, 45
    64, 35
    128, 28
    256, 26
    512, 23
    1024, 23
    2048, 22
    4096, 21


    But I'm also used to working with line counts rather than file sizes;
    how many lines was the input, and how many lines was the output?

    Good question.

    Input: 678 lines including blank lines
    Output: 6742 lines including blank lines


    Generating ASM source is usually somewhat slower than writing binary,
    but it should still be fast enough (it becomes an issue if you want the fastest possible speeds).


    I can tell you that my current compiler manages some 700K lines per
    second (this is source code to EXE) on the test file in my second link
    below.

    If I tell it to generate ASM, it drops to 260K lines per second (taking
    2.8 seconds to generate 2.2M lines. (The input file is 9MB, and the
    generated ASM is 87MB; the EXE was 8MB.)

    Based on the above I make mine 32k lines per second.


    Since I now once again support a C target, I can accelerate it further
    by some 30-40%, so just pushing 1Mlps on that first test.


    But ultimately, compilation just has to be fast enough not to be a
    nuisance (beyond that it's just a sport).

    Yes, the compiler is certainly 'fast enough' for now but it would be fun
    to try to improve the speed once it is more mature.

    There's a lot I need to do to the compiler so that the language it can
    compile is more complete (there's a lot missing at the moment) but IO
    buffering was one thing which I thought should have made a big
    difference fairly easily.


    Your 24ms sounds fine. 600ms would be mildly annoying for me, but is
    still tolerable. For a while I ran an /interpreted/ compiler taking up
    to half a second per module, but that was with independent compilation.

    Since I now work with whole program compilers, the speed is a lot more important. Currently, no applications of mine take more than 100ms, once file-cacheing comes into play.



    Here are links to tests that I'm sure I posted before:


    https://github.com/sal55/langs/blob/master/Compilertest1.md

    https://github.com/sal55/langs/blob/master/Compilertest3.md

    These were all done on a slower machine. My current one (where I can get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I think
    is still low-end.

    Surprisingly, that processor doesn't come up at

    https://www.cpubenchmark.net/singleCompare.php

    From /proc/cpuinfo I have the following.

    Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Nov 5 20:15:03 2022
    On 05/11/2022 17:54, James Harris wrote:
    On 05/11/2022 16:51, Bart wrote:

    Your figures are too confused to comment on meaningfully. For a start,
    I'd like to know the reason for that 25 times speedup! What was it
    spending 96% of its time doing in those earlier versions?

    In a word, IO. It was reading and writing one character at a time -
    which was enough to start with. You may remember we discussed this a few years ago:

     https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ


    I don't believe there's any real buffering involved in 18KB input.

    It was definitely buffering. With the new compiler if I change the
    buffer size to 1 then it takes as long as it did before.

    bufsize, approx time in ms
    1, 650
    2, 340

    I assumed that such a small file would be loaded in one go by the OS
    anyway. Then any calls you do, even reading a character at a time, would
    read from the OS's in-memory buffer.

    So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
    That's roughly the transfer rate from a floppy disk! Yet this is memory
    to memory on a modern PC with GHz clock rates. Something funny is going on.

    A loop like this:

    to 18'000 do
    c:=fgetc(f)
    od

    in /interpreted/ code (calling the C function) is too fast to measure.

    But reading a 7.8MB input file with such a loop, a character at a time
    in scripting code, takes 0.47 seconds.

    If I tell it to generate ASM, it drops to 260K lines per second
    (taking 2.8 seconds to generate 2.2M lines. (The input file is 9MB,
    and the generated ASM is 87MB; the EXE was 8MB.)

    Based on the above I make mine 32k lines per second.

    This is the sort of speed of compilers like gcc. Is yours still written
    in Python? I thought you had it self-hosted.

    These were all done on a slower machine. My current one (where I can
    get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
    think is still low-end.

    Surprisingly, that processor doesn't come up at

      https://www.cpubenchmark.net/singleCompare.php

    From /proc/cpuinfo I have the following.

      Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz


    Probably because it's 3650U not 2650U. The rating on that site shows it
    at 3900 compared with 3050 of your device.

    So not as low end as I expected (I assumed it was, being the second
    cheapest PC in the shop). My friend has a laptop with an i5 in it and my
    brief tests with that showed it to be 50% faster than my machine.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bart on Sat Nov 5 20:44:32 2022
    On 05/11/2022 20:15, Bart wrote:

    These were all done on a slower machine. My current one (where I can
    get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
    think is still low-end.

    Surprisingly, that processor doesn't come up at

       https://www.cpubenchmark.net/singleCompare.php

     From /proc/cpuinfo I have the following.

       Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz


    Probably because it's 3650U not 2650U. The rating on that site shows it

    It's not, it's 3250U. I know it's right this time because I did copy+paste!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bart on Sat Nov 5 20:52:14 2022
    On 05/11/2022 20:15, Bart wrote:
    On 05/11/2022 17:54, James Harris wrote:
    On 05/11/2022 16:51, Bart wrote:

    Your figures are too confused to comment on meaningfully. For a
    start, I'd like to know the reason for that 25 times speedup! What
    was it spending 96% of its time doing in those earlier versions?

    In a word, IO. It was reading and writing one character at a time -
    which was enough to start with. You may remember we discussed this a
    few years ago:

      https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ >>

    I don't believe there's any real buffering involved in 18KB input.

    It was definitely buffering. With the new compiler if I change the
    buffer size to 1 then it takes as long as it did before.

    bufsize, approx time in ms
    1, 650
    2, 340

    I assumed that such a small file would be loaded in one go by the OS
    anyway. Then any calls you do, even reading a character at a time, would
    read from the OS's in-memory buffer.

    So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
    That's roughly the transfer rate from a floppy disk! Yet this is memory
    to memory on a modern PC with GHz clock rates. Something funny is going on.

    I should have read your link first. There I compared the speed to a
    serial port at 2400 baud. At least it has improved from that!

    I think with source files, unless you're trying to run on a restricted
    machine that only has a few KB of RAM, just grab the whole file at once
    into a buffer or a string directly accessible by your lexer.

    Otherwise you're going to be getting these silly speeds. Loading and
    scanning 18KB of data should be too fast to measure.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 5 21:33:12 2022
    On 05/11/2022 20:52, Bart wrote:
    On 05/11/2022 20:15, Bart wrote:
    On 05/11/2022 17:54, James Harris wrote:
    On 05/11/2022 16:51, Bart wrote:

    Your figures are too confused to comment on meaningfully. For a
    start, I'd like to know the reason for that 25 times speedup! What
    was it spending 96% of its time doing in those earlier versions?

    In a word, IO. It was reading and writing one character at a time -
    which was enough to start with. You may remember we discussed this a
    few years ago:

      https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ >>>

    I don't believe there's any real buffering involved in 18KB input.

    It was definitely buffering. With the new compiler if I change the
    buffer size to 1 then it takes as long as it did before.

    bufsize, approx time in ms
    1, 650
    2, 340

    I assumed that such a small file would be loaded in one go by the OS
    anyway. Then any calls you do, even reading a character at a time,
    would read from the OS's in-memory buffer.

    So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
    That's roughly the transfer rate from a floppy disk! Yet this is
    memory to memory on a modern PC with GHz clock rates. Something funny
    is going on.

    I should have read your link first.

    Always!

    ;)

    There I compared the speed to a
    serial port at 2400 baud. At least it has improved from that!

    I think with source files, unless you're trying to run on a restricted machine that only has a few KB of RAM, just grab the whole file at once
    into a buffer or a string directly accessible by your lexer.

    There's absolutely no need to do that. I just tried it by making each
    buffer 256k. Both files are smaller than the buffers so there would be
    only one read and one write. But it made no clear difference to the run
    time. A buffer of 256k and a buffer of 0.5k both came in with runs
    around 20 to 30 ms.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Nov 5 22:10:05 2022
    On 05/11/2022 21:21, James Harris wrote:
    On 05/11/2022 20:15, Bart wrote:

    So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
    That's roughly the transfer rate from a floppy disk! Yet this is
    memory to memory on a modern PC with GHz clock rates. Something funny
    is going on.

    Don't forget all the context switching to and from kernel mode - for
    both read (18k) and write (100k).

    Yeah but, why?


    A loop like this:

        to 18'000 do
           c:=fgetc(f)
        od

    in /interpreted/ code (calling the C function) is too fast to measure.

    Two points:

    1. That only reads - c. 18k. To be a fair test you would also have to
    write c. 100k.

    Nope, still zero, even with an fputc loop run 100K times. These are
    still tiny files.

    Whatever fgetc/fputc are doing behind the scenes with buffering, it's
    being done right.

    2. I'd expect fgetc to be buffered.

    You can see the difference if you run under strace. It will show the individual syscalls. Without buffering there should be something like
    118,000 reads/writes. With buffering of 512 there should only be about
    230 such syscalls - about 1/500th of the number. I suspect that kernel
    calls and returns is where a lot of the time will be going if there's no buffering.

    Still not convinced? Take a look at dd. When run on a file with 112,000 bytes:

    time dd if=infile of=/dev/null bs=1
    time dd if=infile of=/dev/null bs=512

    The first takes 370 ms. The second just 5ms. QED, I think. :)

    OK. TBH I still don't understand what's going on (I don't know what 'dd'
    is). Are you specifically doing file reads by doing as low-level systems
    calls as possible, and requesting it's all done a character at a time?

    If so, why? I just use C's `fread()` to read an entire file at once; job
    done. If all the 100s of apps on my PC all used the same slow file read methods, the machine would grind to a halt.


    Perhaps more interesting is where an individual compiler spends its
    time: how long in lexing, statement parsing, expression parsing, IR generation, IR alterations, optimisation, code gen, etc. At some point I
    may add code to gather such info.


    Here are some figures from a recent test (any smaller input, is too fast
    to measure accurately):

    Load modules 0 msec Load all sources (here, 9MB mostly in 1 file)
    Parsing 216 msec Parse and create AST
    Name Resolve 47 msec Scan AST and resolve names
    Type Analysis 84 msec
    Codegen 131 msec To x64 representation
    'SS' 138 msec To x64 machine code
    Write EXE 8 msec (varies 0-16 ms) Build EXE image & write (7.6MB)

    Total ~630 msec

    This benefits from file-cacheing. Maybe the OS has unfinished business
    with committing the EXE to disk after it returns from an `fclose` call;
    I don't know. The elapsed time is some 0.7 seconds for the whole job.

    File-loading should simply not be an issue. Usually a compiler is run on
    a file that has just been edited, so it should still be in memory.

    Compilations from 'cold' are uncommon. But file ops are anyway not under
    your control (or maybe they are for you!).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Nov 5 21:21:58 2022
    On 05/11/2022 20:15, Bart wrote:
    On 05/11/2022 17:54, James Harris wrote:
    On 05/11/2022 16:51, Bart wrote:

    Your figures are too confused to comment on meaningfully. For a
    start, I'd like to know the reason for that 25 times speedup! What
    was it spending 96% of its time doing in those earlier versions?

    In a word, IO. It was reading and writing one character at a time -
    which was enough to start with. You may remember we discussed this a
    few years ago:

      https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ >>

    I don't believe there's any real buffering involved in 18KB input.

    It was definitely buffering. With the new compiler if I change the
    buffer size to 1 then it takes as long as it did before.

    bufsize, approx time in ms
    1, 650
    2, 340

    I assumed that such a small file would be loaded in one go by the OS
    anyway. Then any calls you do, even reading a character at a time, would
    read from the OS's in-memory buffer.

    So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
    That's roughly the transfer rate from a floppy disk! Yet this is memory
    to memory on a modern PC with GHz clock rates. Something funny is going on.

    Don't forget all the context switching to and from kernel mode - for
    both read (18k) and write (100k).


    A loop like this:

       to 18'000 do
          c:=fgetc(f)
       od

    in /interpreted/ code (calling the C function) is too fast to measure.

    Two points:

    1. That only reads - c. 18k. To be a fair test you would also have to
    write c. 100k.

    2. I'd expect fgetc to be buffered.

    You can see the difference if you run under strace. It will show the
    individual syscalls. Without buffering there should be something like
    118,000 reads/writes. With buffering of 512 there should only be about
    230 such syscalls - about 1/500th of the number. I suspect that kernel
    calls and returns is where a lot of the time will be going if there's no buffering.

    Still not convinced? Take a look at dd. When run on a file with 112,000
    bytes:

    time dd if=infile of=/dev/null bs=1
    time dd if=infile of=/dev/null bs=512

    The first takes 370 ms. The second just 5ms. QED, I think. :)

    Perhaps more interesting is where an individual compiler spends its
    time: how long in lexing, statement parsing, expression parsing, IR
    generation, IR alterations, optimisation, code gen, etc. At some point I
    may add code to gather such info.



    But reading a 7.8MB input file with such a loop, a character at a time
    in scripting code, takes 0.47 seconds.

    If I tell it to generate ASM, it drops to 260K lines per second
    (taking 2.8 seconds to generate 2.2M lines. (The input file is 9MB,
    and the generated ASM is 87MB; the EXE was 8MB.)

    Based on the above I make mine 32k lines per second.

    This is the sort of speed of compilers like gcc. Is yours still written
    in Python? I thought you had it self-hosted.

    The compiler is self hosted. The first one, which I now call cda, was
    written in asm. The others, cdb to cdd, are written in my language and
    compiled via asm. No other language is used in compilation.

    What you may be remembering is that Python is used for running test scripts.


    These were all done on a slower machine. My current one (where I can
    get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
    think is still low-end.

    Surprisingly, that processor doesn't come up at

       https://www.cpubenchmark.net/singleCompare.php

     From /proc/cpuinfo I have the following.

       Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz


    Probably because it's 3650U not 2650U. The rating on that site shows it
    at 3900 compared with 3050 of your device.

    I see your other post that it's the 3250U.

    As for comparing, unless your compiler is multithreaded it's probably
    best to use the CPU's single-thread rating. Mine comes in at 1206. Yours
    at 1812 - about 50% faster.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Nov 6 08:36:52 2022
    On 05/11/2022 22:10, Bart wrote:
    On 05/11/2022 21:21, James Harris wrote:
    On 05/11/2022 20:15, Bart wrote:

    So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
    That's roughly the transfer rate from a floppy disk! Yet this is
    memory to memory on a modern PC with GHz clock rates. Something funny
    is going on.

    Don't forget all the context switching to and from kernel mode - for
    both read (18k) and write (100k).

    Yeah but, why?

    I don't understand the question but see below on timings.

    ..

    Still not convinced? Take a look at dd. When run on a file with
    112,000 bytes:

    time dd if=infile of=/dev/null bs=1
    time dd if=infile of=/dev/null bs=512

    The first takes 370 ms. The second just 5ms. QED, I think. :)

    OK. TBH I still don't understand what's going on (I don't know what 'dd'
    is). Are you specifically doing file reads by doing as low-level systems calls as possible, and requesting it's all done a character at a time?

    In the example dd copies infile to /dev/null (which drops anything sent
    to it). bs= sets the block size, i.e. how much can be returned from each
    read() syscall.

    The one which executes about 230 read() syscalls to do so executes in 5
    ms. The one which reads the same amount of data but uses 118,000
    syscalls takes over 70 times as long. As the input file is the same in
    each case the comparison shows how slow syscalls can be, each one adding
    about 3 microseconds, if I've calculated it correctly, to the overall task.


    If so, why? I just use C's `fread()` to read an entire file at once; job done. If all the 100s of apps on my PC all used the same slow file read methods, the machine would grind to a halt.

    If your parser need to backtrack in the source that makes sense. But as
    shown in my prior post there's no appreciable speed advantage.



    Perhaps more interesting is where an individual compiler spends its
    time: how long in lexing, statement parsing, expression parsing, IR
    generation, IR alterations, optimisation, code gen, etc. At some point
    I may add code to gather such info.


    Here are some figures from a recent test (any smaller input, is too fast
    to measure accurately):

    Load modules    0 msec  Load all sources (here, 9MB mostly in 1 file)

    Are you sure that's not mapping the file into virtual memory rather than
    read it?

    Parsing       216 msec  Parse and create AST
    Name Resolve   47 msec  Scan AST and resolve names
    Type Analysis  84 msec
    Codegen       131 msec  To x64 representation
    'SS'          138 msec  To x64 machine code
    Write EXE       8 msec (varies 0-16 ms) Build EXE image & write (7.6MB)

    Total        ~630 msec

    This benefits from file-cacheing. Maybe the OS has unfinished business
    with committing the EXE to disk after it returns from an `fclose` call;
    I don't know. The elapsed time is some 0.7 seconds for the whole job.

    File-loading should simply not be an issue. Usually a compiler is run on
    a file that has just been edited, so it should still be in memory.

    Yes, the whole file could be cached.


    Compilations from 'cold' are uncommon. But file ops are anyway not under
    your control (or maybe they are for you!).



    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sun Nov 6 14:58:03 2022
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once;
    job done. If all the 100s of apps on my PC all used the same slow file
    read methods, the machine would grind to a halt.

    If your parser need to backtrack in the source that makes sense. But as
    shown in my prior post there's no appreciable speed advantage.


    Backtracking isn't the reason. In a fast tokeniser, you want to traverse
    source code by simply incrementing a pointer, for example:

    doswitch lxsptr++^
    when 'a'..'z', '_', '$' then
    ...

    What you don't want is to have to call into the OS for every character.

    A basic lexer (recognising tokens but not doing identifier lookups) can
    get through some 300Mcps on my machine.

    But a loop scanning a file with fgetc(), and nothing else, manages only
    60Mcps. And that's my apparently fast fgetc.

    Here's that test program that manages 300Mcps on my machine:

    https://github.com/sal55/langs/blob/master/clex.c

    This is machine generated from my language, and ought to be work on
    Linux. (However if your Linux doesn't use 1000 ticks per second for
    clock(), the figures shown will be out by 1000; this should be obvious;
    maybe fix it on line 1000)

    Instructions to build and run are at the top. While ostensibly for C
    syntax, it'll probably cope with anything including your parser code. If
    it does, prepare a version which is 100 or 1000 times bigger (just
    duplicate).





    Perhaps more interesting is where an individual compiler spends its
    time: how long in lexing, statement parsing, expression parsing, IR
    generation, IR alterations, optimisation, code gen, etc. At some
    point I may add code to gather such info.


    Here are some figures from a recent test (any smaller input, is too
    fast to measure accurately):

    Load modules    0 msec  Load all sources (here, 9MB mostly in 1 file)

    Are you sure that's not mapping the file into virtual memory rather than
    read it?

    It's just doing fread(). However this is not the correct load time. It
    excludes loading the project file for the program, which is the lead
    module, usually very small. But here, the lead module is the primary 9MB
    source file.

    But even properly timed, I still usually get a time of 0ms (resolution
    is 8/16msec anyway). After all, this read mainly comes down to copying
    9MB of data from one memory buffer to another; how long can that take?

    If create a test-file which is 10 times the size (90MB) then it takes
    30msec. This is 5000 times the size of your parser module. As I said,
    loading should take no time at all.

    (I like to exclude file load and write times, firstly because I'm timing
    my compiler, not OS operations. But also because the compiler can be
    invoked on source code that is already in memory, as part of some
    resident application, and it may generate code in memory to be run
    directly. But as it happens these are tiny overheads anyway.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Wed Nov 9 03:54:13 2022
    On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once;
    job done. If all the 100s of apps on my PC all used the same slow file
    read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed or
    if the hardware lacks paging.


    If your parser need to backtrack in the source that makes sense. But as shown in my prior post there's no appreciable speed advantage.
    Backtracking isn't the reason. In a fast tokeniser, you want to traverse source code by simply incrementing a pointer, for example:

    doswitch lxsptr++^
    when 'a'..'z', '_', '$' then
    ...

    Understood. Presumably you find the beginning and end of the token and then, where appropriate, copy it from there to a symbol table (or have the symbol table point at it where it stands).

    I am assuming from our other discussion that lxsptr++^ retrieves the character which lxsptr points at and then advances lxsptr to point to the next character position.


    What you don't want is to have to call into the OS for every character.

    For sure. If each syscall adds 2 or 3 microseconds, say, to the work required they can soon add up.


    A basic lexer (recognising tokens but not doing identifier lookups) can
    get through some 300Mcps on my machine.

    Surely such an algorithm may come down to little more than how fast the machine can scan memory. To make a good test of a compiler's performance a load of other things would need to be included in the source it had to compile. Such things as many
    identifiers of different lengths, symtab insertions, retrievals, scope creation and destruction, different constructs: loops, ifs, switches, exceptions, etc. It would make sense also to have plenty of inefficient source code for the optimiser to do its
    magic on.

    Such source is probably best generated by a program - which would be satisfyingly contrarian. ;)

    --
    James

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Wed Nov 9 14:12:29 2022
    On 09/11/2022 11:54, James Harris wrote:
    On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once;
    job done. If all the 100s of apps on my PC all used the same slow file >>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed or
    if the hardware lacks paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is
    some embedded device.

    Even the original Raspberry Pi had some 0.5GB IIRC.

    I'm not planning to run my tools on any 64KB or smaller system anytime
    soon. I passed that stage decades ago.

    So, what is the largest source file you're ever likely to have to deal
    with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
    PC's memory.

    The largest input I deal with is 50,000 lines which represents all the
    source files of the application, for my /whole-program/ compiler.

    This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be ludicrous to start using tiny buffers on the off-chance that I would one
    day have to deal with inputs 1000s of times bigger representing one
    single program.

    Also bear in mind that a smaller target usually means a correspondingly
    smaller program.

    Understood. Presumably you find the beginning and end of the token and then, where appropriate, copy it from there to a symbol table (or have the symbol table point at it where it stands).

    I am assuming from our other discussion that lxsptr++^ retrieves the character which lxsptr points at and then advances lxsptr to point to the next character position.

    Yes.


    What you don't want is to have to call into the OS for every character.

    For sure. If each syscall adds 2 or 3 microseconds, say, to the work required they can soon add up.


    A basic lexer (recognising tokens but not doing identifier lookups) can
    get through some 300Mcps on my machine.

    Surely such an algorithm may come down to little more than how fast the machine can scan memory.

    The 300Mcps figure is how many characters it gets through while it's
    only doing tokenising. The actual figure was 200-360Mcps during basic
    tokening, depending on input (sqlite3.c is 40% comments so that gives a
    faster throughput).

    A loop that only scan characters (say adding up their values to ensure
    it's not optimised out) manages up to 2500Mcps (here it's independent of
    style of content).

    My point was that if you introduce a bottleneck like a complex OS call per-character, it can have a significant impact, one that is trivially
    avoided.

    To make a good test of a compiler's performance a load of other things would need to be included in the source it had to compile. Such things as many identifiers of different lengths, symtab insertions, retrievals, scope creation and destruction,
    different constructs: loops, ifs, switches, exceptions, etc. It would make sense also to have plenty of inefficient source code for the optimiser to do its magic on.

    Such source is probably best generated by a program - which would be satisfyingly contrarian. ;)

    Such source is best taken from real programs, as synthesised ones will
    never have that variety. Machine-generated input is best for development
    as you can then concentrate on specific token types, or short vs long
    names, or random vs similar identifiers, that sort of thing.

    (Have a look at sqlite3.c here: https://github.com/sal55/langs/blob/master/Parsing/sqlite3.c. This is
    quite brutal code for a C compiler. Also https://github.com/sal55/langs/blob/master/fann4.zip shows my own
    740Kloc test input.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Wed Nov 9 15:44:57 2022
    On 09/11/2022 15:12, Bart wrote:
    On 09/11/2022 11:54, James Harris wrote:
    On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of
    hardware so I don't want to assume that the memory size will always be
    greater than the source size - though I can see definite advantages to
    your approach, especially in terms of speed or if the hardware lacks
    paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is
    some embedded device.

    Even the original Raspberry Pi had some 0.5GB IIRC.

    I'm not planning to run my tools on any 64KB or smaller system anytime
    soon. I passed that stage decades ago.

    So, what is the largest source file you're ever likely to have to deal
    with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
    PC's memory.

    The largest input I deal with is 50,000 lines which represents all the
    source files of the application, for my /whole-program/ compiler.

    This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be ludicrous to start using tiny buffers on the off-chance that I would one
    day have to deal with inputs 1000s of times bigger representing one
    single program.

    Also bear in mind that a smaller target usually means a correspondingly smaller program.

    I agree with everything Bart says here. It has been perhaps 50 years
    since it has been necessary to assume a compiler might not be able to
    handle a whole file at a time. (And that means having enough memory to
    do the compilation, optimisation and generation of the object file - not
    just hold the source code.)

    For smaller targets, people use bigger development systems - again, for
    the last 50 years or so. Cross-compilation is the norm for small targets.

    Memory space on build machines only matters when you have huge programs,
    and even then it is usually linking rather than pure compilation (and especially link-time optimisation) that you need a more powerful build
    system. So if you are building Firefox or LibreOffice, with link-time optimisation, you'll want a system with as many cores and as much ram as
    you can afford. For more "normal" programs, you can happily assume the
    hosting computer has unlimited memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Thu Nov 10 10:56:00 2022
    On 09/11/2022 11:54, James Harris wrote:

    Understood. Presumably you find the beginning and end of the token and then, where appropriate, copy it from there to a symbol table (or have the symbol table point at it where it stands).

    I did use this latter idea for a while: the ST points into the source
    code. But it had too many problems:

    * The rest of the program likes zero-terminated strings, but I can't
    inject a zero here, because it could overwrite part of the next token. I
    had a scheme where it worked a token in hand, or used some convoluted
    logic to get around it, but it got too much

    * It means the source text is modified, either because, due to case-insensitivity, `Abc` is changed to `abc`, or also because there
    could be an injected 0 byte. In either case this caused problems (for
    example in reporting an error and displaying that line of source code,
    and in some cases where the same source has to be retokenised).

    In the end I took the hit and copied the identifier onto the heap. The
    compiler is still plenty fast.

    There are similar, and more severe, issues with:

    * Scanning numeric constants with separators. I liked to remove the
    separators to form a compact token (so `1_234_567` becomes `1234567`),
    to simplify later conversion to its value (you don't always know while
    scanning whether it's int or float for example). But this really messes
    up the source code if done in-place.

    * Scanning string constants with embedded escape sequences. I wanted the
    final string, with escapes converted, to be in the same place. (The new
    string will never be longer than the original - I think).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to James Harris on Thu Nov 10 18:45:25 2022
    James Harris <james.harris.1@gmail.com> wrote:

    time dd if=infile of=/dev/null bs=1
    time dd if=infile of=/dev/null bs=512

    The first takes 370 ms. The second just 5ms. QED, I think. :)

    I was going to write that this is too much, system calls should
    be much faster. But then I checked and on two machines I got
    timings rougly agreeing with yours (almost the same time on
    a slow one and about halt on a rather fast one). But I also
    checked on third and I got much lower time: 24ms for 100k file.

    The third machine runs old OS, two first have newer system. So
    it looks that we see security patches in action. The claim was
    that slowdown of "typical" programs will be moderate, but 'bs=1'
    is atypical and give some idea how bad it can get.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Thu Nov 10 19:20:34 2022
    Bart <bc@freeuk.com> wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>> job done. If all the 100s of apps on my PC all used the same slow file >>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed
    or if the hardware lacks paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is
    some embedded device.

    Even the original Raspberry Pi had some 0.5GB IIRC.

    I'm not planning to run my tools on any 64KB or smaller system anytime
    soon. I passed that stage decades ago.

    So, what is the largest source file you're ever likely to have to deal
    with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
    PC's memory.

    The largest input I deal with is 50,000 lines which represents all the
    source files of the application, for my /whole-program/ compiler.

    This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be ludicrous to start using tiny buffers on the off-chance that I would one
    day have to deal with inputs 1000s of times bigger representing one
    single program.

    You seem to care about speed. In the past buffers of order 4k
    theoretically should give best performance. Namely, buffer
    was sized to be significantly smaller than L1 cache, so that
    reads would not disturb too much content of L1 cache. In
    modern security climate it seems that OS takes any possible
    pretext to fush caches, defeating benefits from small buffers.
    Still, using small buffers is easy, so why use big ones if
    they do not give measurable advantages?

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to antispam@math.uni.wroc.pl on Thu Nov 10 20:54:02 2022
    On 10/11/2022 19:20, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed
    or if the hardware lacks paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is
    some embedded device.

    Even the original Raspberry Pi had some 0.5GB IIRC.

    I'm not planning to run my tools on any 64KB or smaller system anytime
    soon. I passed that stage decades ago.

    So, what is the largest source file you're ever likely to have to deal
    with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
    PC's memory.

    The largest input I deal with is 50,000 lines which represents all the
    source files of the application, for my /whole-program/ compiler.

    This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be
    ludicrous to start using tiny buffers on the off-chance that I would one
    day have to deal with inputs 1000s of times bigger representing one
    single program.

    You seem to care about speed. In the past buffers of order 4k
    theoretically should give best performance. Namely, buffer
    was sized to be significantly smaller than L1 cache, so that
    reads would not disturb too much content of L1 cache. In
    modern security climate it seems that OS takes any possible
    pretext to fush caches, defeating benefits from small buffers.
    Still, using small buffers is easy, so why use big ones if
    they do not give measurable advantages?

    Because it's extra complexity for no benefit. Lexing code will have to
    deal with the possibility that a token is split across two buffers which
    means checking that on every character, which is much more likely to
    slow things down.

    I remember /having/ to use such techniques over 30 years ago, because
    memory was limited. Now there is tons of memory, but we still have to
    access the file system using 4KB buffers within our apps?

    The largest source file I have is 80KB, which is 0.001% of the memory on
    my PC already. Now you say I should access that 80KB in 4KB chunks; 4KB
    is 0.00005% of my machine's 8GB. It seems that it was pretty pointless
    having all that memory, as I'm not allowed to use it!

    Sorry, but I can't see the point. This would only affect loadtime
    (getting a file's contents into /my/ buffer), which is already too fast
    to measure: about 30ms to read 90MB, so perhaps 30us to read my 80KB in
    one go.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Fri Nov 11 02:32:50 2022
    Bart <bc@freeuk.com> wrote:
    On 10/11/2022 19:20, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of
    speed or if the hardware lacks paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is >> some embedded device.

    Even the original Raspberry Pi had some 0.5GB IIRC.

    I'm not planning to run my tools on any 64KB or smaller system anytime
    soon. I passed that stage decades ago.

    So, what is the largest source file you're ever likely to have to deal
    with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
    PC's memory.

    The largest input I deal with is 50,000 lines which represents all the
    source files of the application, for my /whole-program/ compiler.

    This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be
    ludicrous to start using tiny buffers on the off-chance that I would one >> day have to deal with inputs 1000s of times bigger representing one
    single program.

    You seem to care about speed. In the past buffers of order 4k theoretically should give best performance. Namely, buffer
    was sized to be significantly smaller than L1 cache, so that
    reads would not disturb too much content of L1 cache. In
    modern security climate it seems that OS takes any possible
    pretext to fush caches, defeating benefits from small buffers.
    Still, using small buffers is easy, so why use big ones if
    they do not give measurable advantages?

    Because it's extra complexity for no benefit. Lexing code will have to
    deal with the possibility that a token is split across two buffers which means checking that on every character, which is much more likely to
    slow things down.

    Well, with reasonable language and compiler organization cost is
    very low: one puts otherwise invalid charater after buffer.
    If one uses dispatch table handling invalid character is just one
    more position in table, with almost no cost for processing
    normal characters.

    I remember /having/ to use such techniques over 30 years ago, because
    memory was limited. Now there is tons of memory, but we still have to
    access the file system using 4KB buffers within our apps?

    The largest source file I have is 80KB, which is 0.001% of the memory on
    my PC already. Now you say I should access that 80KB in 4KB chunks; 4KB
    is 0.00005% of my machine's 8GB. It seems that it was pretty pointless
    having all that memory, as I'm not allowed to use it!

    Well, your file is already in memory (in system cache), so you
    are using memory. And there are other good uses.

    Sorry, but I can't see the point. This would only affect loadtime
    (getting a file's contents into /my/ buffer), which is already too fast
    to measure: about 30ms to read 90MB, so perhaps 30us to read my 80KB in
    one go.

    Well, you should be used to fact that other folks do not see
    the point of what you are doing. So do not worry, points of
    view differ...

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to antispam@math.uni.wroc.pl on Fri Nov 11 11:57:04 2022
    On 11/11/2022 02:32, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:
    On 10/11/2022 19:20, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>>>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>>>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of
    speed or if the hardware lacks paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is >>>> some embedded device.

    Even the original Raspberry Pi had some 0.5GB IIRC.

    I'm not planning to run my tools on any 64KB or smaller system anytime >>>> soon. I passed that stage decades ago.

    So, what is the largest source file you're ever likely to have to deal >>>> with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
    PC's memory.

    The largest input I deal with is 50,000 lines which represents all the >>>> source files of the application, for my /whole-program/ compiler.

    This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be >>>> ludicrous to start using tiny buffers on the off-chance that I would one >>>> day have to deal with inputs 1000s of times bigger representing one
    single program.

    You seem to care about speed. In the past buffers of order 4k
    theoretically should give best performance. Namely, buffer
    was sized to be significantly smaller than L1 cache, so that
    reads would not disturb too much content of L1 cache. In
    modern security climate it seems that OS takes any possible
    pretext to fush caches, defeating benefits from small buffers.
    Still, using small buffers is easy, so why use big ones if
    they do not give measurable advantages?

    Because it's extra complexity for no benefit. Lexing code will have to
    deal with the possibility that a token is split across two buffers which
    means checking that on every character, which is much more likely to
    slow things down.

    Well, with reasonable language and compiler organization cost is
    very low: one puts otherwise invalid charater after buffer.

    Which currently is seen as end of token. One technique I use is to first identify the span or length of a 'long' token (name, number, string) and
    then to deal separately with that substring (copy to ST, turn into a
    number, etc).

    If I had to use buffering, I'm sure I could find a way of making it work efficiently (I must have done in the past), but it will be more effort
    while not being as efficient or as simple.

    It's convenient these days to treat small files (say files under 100MB)
    as in-memory strings.

    This is from my scripting language:

    s:=readstrfile("/mx/big/fann4.m")
    println s.len

    It reads that same 9MB test file (here already cached). It takes 60ms to
    load that file and create a string which is 9M character long.

    Well, your file is already in memory (in system cache), so you
    are using memory. And there are other good uses.

    Yes, in most cases the file is already somewhere in memory, just outside
    my program. Because:

    * I've just been editing it
    * It has already been compiled in the last minute or so
    * It has just been downloaded, copied etc.

    So the cost I'm seeing is that of transfering that data (up to 80KB per
    file) from wherever the OS is keeping it, into my memory space.

    I've just restarted my machine. The load time for a 9MB input is:

    * 13/31ms from SSD (two different copies; I don't know the reason
    they're that different)
    * 185ms from HD

    Second time around, load times were 0ms for both (or nearer to 0 than to
    8 or 16ms)

    This file is about 100 times bigger than my biggest single module. I
    don't imagine that reading a file buffered would make any difference.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Nov 13 15:48:16 2022
    On 09/11/2022 14:12, Bart wrote:
    On 09/11/2022 11:54, James Harris wrote:
    On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>> read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of
    hardware so I don't want to assume that the memory size will always be
    greater than the source size - though I can see definite advantages to
    your approach, especially in terms of speed or if the hardware lacks
    paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target is
    some embedded device.

    When I started this I took your view - that compilations (including
    cross compilations) would run on normal PCs. That meant that compilers
    and other build programs would have plenty of resources to play with.
    However, someone pointed out to me that an application which is running
    on a target machine may also want to execute a compile step. I don't
    imagine it would be used much but it's a fair point. Therefore, in my
    case, it makes sense to use only resources which are needed. A program
    which works on a small machine will still work on a full-blown PC
    whereas the converse is not necessarily true.

    ...

    To make a good test of a compiler's performance a load of other things
    would need to be included in the source it had to compile. Such things
    as many identifiers of different lengths, symtab insertions,
    retrievals, scope creation and destruction, different constructs:
    loops, ifs, switches, exceptions, etc. It would make sense also to
    have plenty of inefficient source code for the optimiser to do its
    magic on.

    Such source is probably best generated by a program - which would be
    satisfyingly contrarian. ;)

    Such source is best taken from real programs, as synthesised ones will
    never have that variety. Machine-generated input is best for development
    as you can then concentrate on specific token types, or short vs long
    names, or random vs similar identifiers, that sort of thing.

    Well, I suspect that a human-written program may be too small for an
    indicative speed test. The time taken to compile such source would
    likely be overshadowed by overheads.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sun Nov 13 21:28:59 2022
    On 13/11/2022 15:48, James Harris wrote:
    On 09/11/2022 14:12, Bart wrote:
    On 09/11/2022 11:54, James Harris wrote:
    On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
    On 06/11/2022 08:36, James Harris wrote:
    On 05/11/2022 22:10, Bart wrote:

    If so, why? I just use C's `fread()` to read an entire file at once; >>>>>> job done. If all the 100s of apps on my PC all used the same slow
    file
    read methods, the machine would grind to a halt.

    I should say I want my compiler to be able to run on a range of
    hardware so I don't want to assume that the memory size will always
    be greater than the source size - though I can see definite
    advantages to your approach, especially in terms of speed or if the
    hardware lacks paging.

    I think that these days everyone runs their compilers on decent
    hardware, like PCs with plenty of ram and storage. Even if the target
    is some embedded device.

    When I started this I took your view - that compilations (including
    cross compilations) would run on normal PCs. That meant that compilers
    and other build programs would have plenty of resources to play with. However, someone pointed out to me that an application which is running
    on a target machine may also want to execute a compile step. I don't
    imagine it would be used much but it's a fair point. Therefore, in my
    case, it makes sense to use only resources which are needed. A program
    which works on a small machine will still work on a full-blown PC
    whereas the converse is not necessarily true.

    So the requirement when writing any cross-compiler is that the compiler
    should be able to self-host on the same target?

    I'm sceptical as to how useful and how practical that will be.

    Because, it's not just about the source file. You may be able to reduce
    memory requirements for scanning the source file to 4KB, but all the
    other data structures still need to represent the whole file.

    In my compiler, all those data structures, if the memory is never freed
    when no longer needed, add to up to approx 30 times the size of the
    source code. About 10 times if considering only the AST and symbol and
    type tables.

    Now you could structure the compiler so that all those outputs are
    written out to files too. Then, congratulations, you've written a 1970s compiler.

    But consider also that the target machine might not have a file system,
    or if it has, it's too small.

    Well, I suspect that a human-written program may be too small for an indicative speed test. The time taken to compile such source would
    likely be overshadowed by overheads.

    The SQLite example is some 250Kloc of actual code in one file.

    My 'Fann' example is 50-100 lines (depending on language) which is
    duplicated for a large line count. So it's semi-synthised, but look at
    any lines at random however, and it looks like real code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sun Nov 13 22:05:20 2022
    On 13/11/2022 21:28, Bart wrote:
    On 13/11/2022 15:48, James Harris wrote:

    ...

    When I started this I took your view - that compilations (including
    cross compilations) would run on normal PCs. That meant that compilers
    and other build programs would have plenty of resources to play with.
    However, someone pointed out to me that an application which is
    running on a target machine may also want to execute a compile step. I
    don't imagine it would be used much but it's a fair point. Therefore,
    in my case, it makes sense to use only resources which are needed. A
    program which works on a small machine will still work on a full-blown
    PC whereas the converse is not necessarily true.

    So the requirement when writing any cross-compiler is that the compiler should be able to self-host on the same target?

    No, the goal is just to use resources as required! See below.

    I could not quite work out what you meant about self hosting on the same
    target and while I used the common term I don't have a specific concept
    of a cross compiler. My build process is intended to be:

    source to IR
    IR adjustments to suit target
    IR to target

    The latter part would be target specific.


    I'm sceptical as to how useful and how practical that will be.

    Because, it's not just about the source file. You may be able to reduce memory requirements for scanning the source file to 4KB, but all the
    other data structures still need to represent the whole file.

    Sure.


    In my compiler, all those data structures, if the memory is never freed
    when no longer needed, add to up to approx 30 times the size of the
    source code. About 10 times if considering only the AST and symbol and
    type tables.

    That's interesting. How large would your tree and your composite symbol
    table typically be compared with the source?


    Now you could structure the compiler so that all those outputs are
    written out to files too. Then, congratulations, you've written a 1970s compiler.

    As it happens, my IR and ST are intended to be written out! They are
    meant to be the primary means of program distribution, though I am not
    doing that at the moment.


    But consider also that the target machine might not have a file system,
    or if it has, it's too small.

    I wouldn't see a file system as essential but the executable program has
    to be got into the target machine's memory somehow. Maybe you are
    thinking of some environments which I would find difficult to compile for?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Mon Nov 14 00:23:14 2022
    On 13/11/2022 22:05, James Harris wrote:
    On 13/11/2022 21:28, Bart wrote:

    In my compiler, all those data structures, if the memory is never
    freed when no longer needed, add to up to approx 30 times the size of
    the source code. About 10 times if considering only the AST and symbol
    and type tables.

    That's interesting. How large would your tree and your composite symbol
    table typically be compared with the source?

    I'd expect the AST to take more memory than the source code. For example
    this line:

    a:=b+c*d

    takes 9/10 bytes.

    But this is represented as 7 AST nodes:

    (assign a (add b (mul c d)))

    Each AST node in my static compiler is 64 bytes (this is on x64 where
    pointers are 8 bytes). So that's 448 bytes. ST entries are less relevant
    here as they are shared (there are only 4 in all).

    So, 20-40 times more than the source code for this example. My tests
    showed 10x, because a lot of the source doesn't generate ASTs
    (declarations, comments) and involves more white space, plus usually
    there are keywords and longer identifiers.

    The next data structure is a representation of x64 code using a list of records. There is one per x64 instruction and occupies 32 bytes. My
    expression generates 5 instructions, so that's a further 160 bytes.

    The actual machine code generated in the next stage is some 20 bytes,
    when a, b, c, d are stack-frame variables.

    This probably comes across as excessive, but long repetitions of
    `a:=b+c*d` was one of my early tests, and a lot of compilers have severe problems including running out of memory.

    My compiler takes 1.5GB for 2M lines of this, of which the above
    accounts for 1.3GB. Some compilers had trouble getting beyond even 20K
    lines.

    The source in this case is 18MB, which is dwarfed by the 1500MB memory
    usage. My compiler turns that 18MB source into a 40MB EXE in about 2
    seconds.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Tom Lake on Thu Nov 24 14:53:13 2022
    On 24/11/2022 14:34, Tom Lake wrote:
    On Saturday, November 5, 2022 at 11:03:47 AM UTC-4, James Harris wrote:
    Any thoughts on what would be in a good piece of source code to use to
    test a compiler's speed?

    I guess I could do with something that's (1) realistic, not just an
    assignment statement repeated over and over again, (2) large enough to
    give lengthy timings, (3) easy enough to make, and (4) it would be valid
    over multiple generations of the compiler. But what should be in it?

    Here are some comparative timings taken just now. They read and compile
    my parser (as it's the largest source file in the compiler). It is about
    18k in size and from it the compiler produces am asm file of a little
    over 100k.

    cda 566 ms
    cdb 540 ms
    cdc 600 ms
    cdd 24 ms

    The reason for the gratifying jump in performance at the end is that I
    added input and output buffering to cdd but it's got me wondering about
    testing the times taken by future compilers.

    I should say that I don't expect compile times always to improve. The
    run times of later compilers would likely go up as I add facilities and
    down as I switch to other mechanisms. But it would still be something
    I'd like to keep an eye on.


    So you don't care about how fast the generated code runs, instead focusing on how fast the program compiles?
    That seems irrelevant to me since you're going to compile only once or a few times to debug it but run the program many times (if it's useful)

    I can run my compiler hundreds of times a day so fast edit-run cycles
    are vital during intensive development (and I am very impatient, nor do
    I like to lose concentration).

    The speed of the generated code is actually less important: for my
    stuff, unoptimised code might only be half the speed of optimised, but
    during development is largely irrelevant anyway. For production versions
    of working programs, there are ways to get those faster if necessary.

    However, compilation speed can easily vary by 100:1 between fast and
    slow compilers (on one test, but for diverse languages, I estimated
    80,000:1). So, yes, a slow compiler could do with speeding up.

    Complicated languages can hinder that, but I believe the language here,
    like mine, is straightforward to compile.

    As a matter of interest, how long do you typically wait on build-time
    for a project? (And for what sort of line-count.) How long if
    incremental compilation tricks are taken out of play, and a full build
    is needed?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Lake@21:1/5 to James Harris on Thu Nov 24 06:34:16 2022
    On Saturday, November 5, 2022 at 11:03:47 AM UTC-4, James Harris wrote:
    Any thoughts on what would be in a good piece of source code to use to
    test a compiler's speed?

    I guess I could do with something that's (1) realistic, not just an assignment statement repeated over and over again, (2) large enough to
    give lengthy timings, (3) easy enough to make, and (4) it would be valid
    over multiple generations of the compiler. But what should be in it?

    Here are some comparative timings taken just now. They read and compile
    my parser (as it's the largest source file in the compiler). It is about
    18k in size and from it the compiler produces am asm file of a little
    over 100k.

    cda 566 ms
    cdb 540 ms
    cdc 600 ms
    cdd 24 ms

    The reason for the gratifying jump in performance at the end is that I
    added input and output buffering to cdd but it's got me wondering about testing the times taken by future compilers.

    I should say that I don't expect compile times always to improve. The
    run times of later compilers would likely go up as I add facilities and
    down as I switch to other mechanisms. But it would still be something
    I'd like to keep an eye on.


    --
    James Harris
    So you don't care about how fast the generated code runs, instead focusing on how fast the program compiles?
    That seems irrelevant to me since you're going to compile only once or a few times to debug it but run the program many times (if it's useful)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)