Any thoughts on what would be in a good piece of source code to use to
test a compiler's speed?
I guess I could do with something that's (1) realistic, not just an assignment statement repeated over and over again, (2) large enough to
give lengthy timings, (3) easy enough to make, and (4) it would be valid
over multiple generations of the compiler. But what should be in it?
Here are some comparative timings taken just now. They read and compile
my parser (as it's the largest source file in the compiler). It is about
18k in size and from it the compiler produces am asm file of a little
over 100k.
cda 566 ms
cdb 540 ms
cdc 600 ms
cdd 24 ms
The reason for the gratifying jump in performance at the end is that I
added input and output buffering to cdd but it's got me wondering about testing the times taken by future compilers.
I should say that I don't expect compile times always to improve. The
run times of later compilers would likely go up as I add facilities and
down as I switch to other mechanisms. But it would still be something
I'd like to keep an eye on.
On 05/11/2022 15:03, James Harris wrote:
Any thoughts on what would be in a good piece of source code to use to
test a compiler's speed?
I guess I could do with something that's (1) realistic, not just an
assignment statement repeated over and over again, (2) large enough to
give lengthy timings, (3) easy enough to make, and (4) it would be
valid over multiple generations of the compiler. But what should be in
it?
Here are some comparative timings taken just now. They read and
compile my parser (as it's the largest source file in the compiler).
It is about 18k in size and from it the compiler produces am asm file
of a little over 100k.
cda 566 ms
cdb 540 ms
cdc 600 ms
cdd 24 ms
The reason for the gratifying jump in performance at the end is that I
added input and output buffering to cdd but it's got me wondering
about testing the times taken by future compilers.
I should say that I don't expect compile times always to improve. The
run times of later compilers would likely go up as I add facilities
and down as I switch to other mechanisms. But it would still be
something I'd like to keep an eye on.
Your figures are too confused to comment on meaningfully. For a start,
I'd like to know the reason for that 25 times speedup! What was it
spending 96% of its time doing in those earlier versions?
I don't believe there's any real buffering involved in 18KB input.
But I'm also used to working with line counts rather than file sizes;
how many lines was the input, and how many lines was the output?
Generating ASM source is usually somewhat slower than writing binary,
but it should still be fast enough (it becomes an issue if you want the fastest possible speeds).
I can tell you that my current compiler manages some 700K lines per
second (this is source code to EXE) on the test file in my second link
below.
If I tell it to generate ASM, it drops to 260K lines per second (taking
2.8 seconds to generate 2.2M lines. (The input file is 9MB, and the
generated ASM is 87MB; the EXE was 8MB.)
Since I now once again support a C target, I can accelerate it further
by some 30-40%, so just pushing 1Mlps on that first test.
But ultimately, compilation just has to be fast enough not to be a
nuisance (beyond that it's just a sport).
Your 24ms sounds fine. 600ms would be mildly annoying for me, but is
still tolerable. For a while I ran an /interpreted/ compiler taking up
to half a second per module, but that was with independent compilation.
Since I now work with whole program compilers, the speed is a lot more important. Currently, no applications of mine take more than 100ms, once file-cacheing comes into play.
Here are links to tests that I'm sure I posted before:
https://github.com/sal55/langs/blob/master/Compilertest1.md
https://github.com/sal55/langs/blob/master/Compilertest3.md
These were all done on a slower machine. My current one (where I can get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I think
is still low-end.
On 05/11/2022 16:51, Bart wrote:
Your figures are too confused to comment on meaningfully. For a start,
I'd like to know the reason for that 25 times speedup! What was it
spending 96% of its time doing in those earlier versions?
In a word, IO. It was reading and writing one character at a time -
which was enough to start with. You may remember we discussed this a few years ago:
https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ
I don't believe there's any real buffering involved in 18KB input.
It was definitely buffering. With the new compiler if I change the
buffer size to 1 then it takes as long as it did before.
bufsize, approx time in ms
1, 650
2, 340
If I tell it to generate ASM, it drops to 260K lines per second
(taking 2.8 seconds to generate 2.2M lines. (The input file is 9MB,
and the generated ASM is 87MB; the EXE was 8MB.)
Based on the above I make mine 32k lines per second.
These were all done on a slower machine. My current one (where I can
get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
think is still low-end.
Surprisingly, that processor doesn't come up at
https://www.cpubenchmark.net/singleCompare.php
From /proc/cpuinfo I have the following.
Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz
These were all done on a slower machine. My current one (where I can
get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
think is still low-end.
Surprisingly, that processor doesn't come up at
https://www.cpubenchmark.net/singleCompare.php
From /proc/cpuinfo I have the following.
Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz
Probably because it's 3650U not 2650U. The rating on that site shows it
On 05/11/2022 17:54, James Harris wrote:
On 05/11/2022 16:51, Bart wrote:
Your figures are too confused to comment on meaningfully. For a
start, I'd like to know the reason for that 25 times speedup! What
was it spending 96% of its time doing in those earlier versions?
In a word, IO. It was reading and writing one character at a time -
which was enough to start with. You may remember we discussed this a
few years ago:
https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ >>
I don't believe there's any real buffering involved in 18KB input.
It was definitely buffering. With the new compiler if I change the
buffer size to 1 then it takes as long as it did before.
bufsize, approx time in ms
1, 650
2, 340
I assumed that such a small file would be loaded in one go by the OS
anyway. Then any calls you do, even reading a character at a time, would
read from the OS's in-memory buffer.
So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
That's roughly the transfer rate from a floppy disk! Yet this is memory
to memory on a modern PC with GHz clock rates. Something funny is going on.
On 05/11/2022 20:15, Bart wrote:
On 05/11/2022 17:54, James Harris wrote:
On 05/11/2022 16:51, Bart wrote:
Your figures are too confused to comment on meaningfully. For a
start, I'd like to know the reason for that 25 times speedup! What
was it spending 96% of its time doing in those earlier versions?
In a word, IO. It was reading and writing one character at a time -
which was enough to start with. You may remember we discussed this a
few years ago:
https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ >>>
I don't believe there's any real buffering involved in 18KB input.
It was definitely buffering. With the new compiler if I change the
buffer size to 1 then it takes as long as it did before.
bufsize, approx time in ms
1, 650
2, 340
I assumed that such a small file would be loaded in one go by the OS
anyway. Then any calls you do, even reading a character at a time,
would read from the OS's in-memory buffer.
So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
That's roughly the transfer rate from a floppy disk! Yet this is
memory to memory on a modern PC with GHz clock rates. Something funny
is going on.
I should have read your link first.
There I compared the speed to a
serial port at 2400 baud. At least it has improved from that!
I think with source files, unless you're trying to run on a restricted machine that only has a few KB of RAM, just grab the whole file at once
into a buffer or a string directly accessible by your lexer.
On 05/11/2022 20:15, Bart wrote:
So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
That's roughly the transfer rate from a floppy disk! Yet this is
memory to memory on a modern PC with GHz clock rates. Something funny
is going on.
Don't forget all the context switching to and from kernel mode - for
both read (18k) and write (100k).
A loop like this:
to 18'000 do
c:=fgetc(f)
od
in /interpreted/ code (calling the C function) is too fast to measure.
Two points:
1. That only reads - c. 18k. To be a fair test you would also have to
write c. 100k.
2. I'd expect fgetc to be buffered.
You can see the difference if you run under strace. It will show the individual syscalls. Without buffering there should be something like
118,000 reads/writes. With buffering of 512 there should only be about
230 such syscalls - about 1/500th of the number. I suspect that kernel
calls and returns is where a lot of the time will be going if there's no buffering.
Still not convinced? Take a look at dd. When run on a file with 112,000 bytes:
time dd if=infile of=/dev/null bs=1
time dd if=infile of=/dev/null bs=512
The first takes 370 ms. The second just 5ms. QED, I think. :)
Perhaps more interesting is where an individual compiler spends its
time: how long in lexing, statement parsing, expression parsing, IR generation, IR alterations, optimisation, code gen, etc. At some point I
may add code to gather such info.
On 05/11/2022 17:54, James Harris wrote:
On 05/11/2022 16:51, Bart wrote:
Your figures are too confused to comment on meaningfully. For a
start, I'd like to know the reason for that 25 times speedup! What
was it spending 96% of its time doing in those earlier versions?
In a word, IO. It was reading and writing one character at a time -
which was enough to start with. You may remember we discussed this a
few years ago:
https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ >>
I don't believe there's any real buffering involved in 18KB input.
It was definitely buffering. With the new compiler if I change the
buffer size to 1 then it takes as long as it did before.
bufsize, approx time in ms
1, 650
2, 340
I assumed that such a small file would be loaded in one go by the OS
anyway. Then any calls you do, even reading a character at a time, would
read from the OS's in-memory buffer.
So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
That's roughly the transfer rate from a floppy disk! Yet this is memory
to memory on a modern PC with GHz clock rates. Something funny is going on.
A loop like this:
to 18'000 do
c:=fgetc(f)
od
in /interpreted/ code (calling the C function) is too fast to measure.
But reading a 7.8MB input file with such a loop, a character at a time
in scripting code, takes 0.47 seconds.
If I tell it to generate ASM, it drops to 260K lines per second
(taking 2.8 seconds to generate 2.2M lines. (The input file is 9MB,
and the generated ASM is 87MB; the EXE was 8MB.)
Based on the above I make mine 32k lines per second.
This is the sort of speed of compilers like gcc. Is yours still written
in Python? I thought you had it self-hosted.
These were all done on a slower machine. My current one (where I can
get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
think is still low-end.
Surprisingly, that processor doesn't come up at
https://www.cpubenchmark.net/singleCompare.php
From /proc/cpuinfo I have the following.
Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz
Probably because it's 3650U not 2650U. The rating on that site shows it
at 3900 compared with 3050 of your device.
On 05/11/2022 21:21, James Harris wrote:
On 05/11/2022 20:15, Bart wrote:
So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
That's roughly the transfer rate from a floppy disk! Yet this is
memory to memory on a modern PC with GHz clock rates. Something funny
is going on.
Don't forget all the context switching to and from kernel mode - for
both read (18k) and write (100k).
Yeah but, why?
Still not convinced? Take a look at dd. When run on a file with
112,000 bytes:
time dd if=infile of=/dev/null bs=1
time dd if=infile of=/dev/null bs=512
The first takes 370 ms. The second just 5ms. QED, I think. :)
OK. TBH I still don't understand what's going on (I don't know what 'dd'
is). Are you specifically doing file reads by doing as low-level systems calls as possible, and requesting it's all done a character at a time?
If so, why? I just use C's `fread()` to read an entire file at once; job done. If all the 100s of apps on my PC all used the same slow file read methods, the machine would grind to a halt.
Perhaps more interesting is where an individual compiler spends its
time: how long in lexing, statement parsing, expression parsing, IR
generation, IR alterations, optimisation, code gen, etc. At some point
I may add code to gather such info.
Here are some figures from a recent test (any smaller input, is too fast
to measure accurately):
Load modules 0 msec Load all sources (here, 9MB mostly in 1 file)
Parsing 216 msec Parse and create AST
Name Resolve 47 msec Scan AST and resolve names
Type Analysis 84 msec
Codegen 131 msec To x64 representation
'SS' 138 msec To x64 machine code
Write EXE 8 msec (varies 0-16 ms) Build EXE image & write (7.6MB)
Total ~630 msec
This benefits from file-cacheing. Maybe the OS has unfinished business
with committing the EXE to disk after it returns from an `fclose` call;
I don't know. The elapsed time is some 0.7 seconds for the whole job.
File-loading should simply not be an issue. Usually a compiler is run on
a file that has just been edited, so it should still be in memory.
Compilations from 'cold' are uncommon. But file ops are anyway not under
your control (or maybe they are for you!).
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once;
job done. If all the 100s of apps on my PC all used the same slow file
read methods, the machine would grind to a halt.
If your parser need to backtrack in the source that makes sense. But as
shown in my prior post there's no appreciable speed advantage.
Perhaps more interesting is where an individual compiler spends its
time: how long in lexing, statement parsing, expression parsing, IR
generation, IR alterations, optimisation, code gen, etc. At some
point I may add code to gather such info.
Here are some figures from a recent test (any smaller input, is too
fast to measure accurately):
Load modules 0 msec Load all sources (here, 9MB mostly in 1 file)
Are you sure that's not mapping the file into virtual memory rather than
read it?
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once;
job done. If all the 100s of apps on my PC all used the same slow file
read methods, the machine would grind to a halt.
If your parser need to backtrack in the source that makes sense. But as shown in my prior post there's no appreciable speed advantage.Backtracking isn't the reason. In a fast tokeniser, you want to traverse source code by simply incrementing a pointer, for example:
doswitch lxsptr++^
when 'a'..'z', '_', '$' then
...
What you don't want is to have to call into the OS for every character.
A basic lexer (recognising tokens but not doing identifier lookups) can
get through some 300Mcps on my machine.
On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:if the hardware lacks paging.
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once;
job done. If all the 100s of apps on my PC all used the same slow file >>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed or
Understood. Presumably you find the beginning and end of the token and then, where appropriate, copy it from there to a symbol table (or have the symbol table point at it where it stands).
I am assuming from our other discussion that lxsptr++^ retrieves the character which lxsptr points at and then advances lxsptr to point to the next character position.
What you don't want is to have to call into the OS for every character.
For sure. If each syscall adds 2 or 3 microseconds, say, to the work required they can soon add up.
A basic lexer (recognising tokens but not doing identifier lookups) can
get through some 300Mcps on my machine.
Surely such an algorithm may come down to little more than how fast the machine can scan memory.
To make a good test of a compiler's performance a load of other things would need to be included in the source it had to compile. Such things as many identifiers of different lengths, symtab insertions, retrievals, scope creation and destruction,different constructs: loops, ifs, switches, exceptions, etc. It would make sense also to have plenty of inefficient source code for the optimiser to do its magic on.
Such source is probably best generated by a program - which would be satisfyingly contrarian. ;)
On 09/11/2022 11:54, James Harris wrote:
On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of
hardware so I don't want to assume that the memory size will always be
greater than the source size - though I can see definite advantages to
your approach, especially in terms of speed or if the hardware lacks
paging.
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target is
some embedded device.
Even the original Raspberry Pi had some 0.5GB IIRC.
I'm not planning to run my tools on any 64KB or smaller system anytime
soon. I passed that stage decades ago.
So, what is the largest source file you're ever likely to have to deal
with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
PC's memory.
The largest input I deal with is 50,000 lines which represents all the
source files of the application, for my /whole-program/ compiler.
This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be ludicrous to start using tiny buffers on the off-chance that I would one
day have to deal with inputs 1000s of times bigger representing one
single program.
Also bear in mind that a smaller target usually means a correspondingly smaller program.
Understood. Presumably you find the beginning and end of the token and then, where appropriate, copy it from there to a symbol table (or have the symbol table point at it where it stands).
time dd if=infile of=/dev/null bs=1
time dd if=infile of=/dev/null bs=512
The first takes 370 ms. The second just 5ms. QED, I think. :)
or if the hardware lacks paging.On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>> job done. If all the 100s of apps on my PC all used the same slow file >>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target is
some embedded device.
Even the original Raspberry Pi had some 0.5GB IIRC.
I'm not planning to run my tools on any 64KB or smaller system anytime
soon. I passed that stage decades ago.
So, what is the largest source file you're ever likely to have to deal
with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
PC's memory.
The largest input I deal with is 50,000 lines which represents all the
source files of the application, for my /whole-program/ compiler.
This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be ludicrous to start using tiny buffers on the off-chance that I would one
day have to deal with inputs 1000s of times bigger representing one
single program.
Bart <bc@freeuk.com> wrote:or if the hardware lacks paging.
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of speed
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target is
some embedded device.
Even the original Raspberry Pi had some 0.5GB IIRC.
I'm not planning to run my tools on any 64KB or smaller system anytime
soon. I passed that stage decades ago.
So, what is the largest source file you're ever likely to have to deal
with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
PC's memory.
The largest input I deal with is 50,000 lines which represents all the
source files of the application, for my /whole-program/ compiler.
This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be
ludicrous to start using tiny buffers on the off-chance that I would one
day have to deal with inputs 1000s of times bigger representing one
single program.
You seem to care about speed. In the past buffers of order 4k
theoretically should give best performance. Namely, buffer
was sized to be significantly smaller than L1 cache, so that
reads would not disturb too much content of L1 cache. In
modern security climate it seems that OS takes any possible
pretext to fush caches, defeating benefits from small buffers.
Still, using small buffers is easy, so why use big ones if
they do not give measurable advantages?
On 10/11/2022 19:20, antispam@math.uni.wroc.pl wrote:speed or if the hardware lacks paging.
Bart <bc@freeuk.com> wrote:
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target is >> some embedded device.
Even the original Raspberry Pi had some 0.5GB IIRC.
I'm not planning to run my tools on any 64KB or smaller system anytime
soon. I passed that stage decades ago.
So, what is the largest source file you're ever likely to have to deal
with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
PC's memory.
The largest input I deal with is 50,000 lines which represents all the
source files of the application, for my /whole-program/ compiler.
This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be
ludicrous to start using tiny buffers on the off-chance that I would one >> day have to deal with inputs 1000s of times bigger representing one
single program.
You seem to care about speed. In the past buffers of order 4k theoretically should give best performance. Namely, buffer
was sized to be significantly smaller than L1 cache, so that
reads would not disturb too much content of L1 cache. In
modern security climate it seems that OS takes any possible
pretext to fush caches, defeating benefits from small buffers.
Still, using small buffers is easy, so why use big ones if
they do not give measurable advantages?
Because it's extra complexity for no benefit. Lexing code will have to
deal with the possibility that a token is split across two buffers which means checking that on every character, which is much more likely to
slow things down.
I remember /having/ to use such techniques over 30 years ago, because
memory was limited. Now there is tons of memory, but we still have to
access the file system using 4KB buffers within our apps?
The largest source file I have is 80KB, which is 0.001% of the memory on
my PC already. Now you say I should access that 80KB in 4KB chunks; 4KB
is 0.00005% of my machine's 8GB. It seems that it was pretty pointless
having all that memory, as I'm not allowed to use it!
Sorry, but I can't see the point. This would only affect loadtime
(getting a file's contents into /my/ buffer), which is already too fast
to measure: about 30ms to read 90MB, so perhaps 30us to read my 80KB in
one go.
Bart <bc@freeuk.com> wrote:speed or if the hardware lacks paging.
On 10/11/2022 19:20, antispam@math.uni.wroc.pl wrote:
Bart <bc@freeuk.com> wrote:
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>>>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>>>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of hardware so I don't want to assume that the memory size will always be greater than the source size - though I can see definite advantages to your approach, especially in terms of
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target is >>>> some embedded device.
Even the original Raspberry Pi had some 0.5GB IIRC.
I'm not planning to run my tools on any 64KB or smaller system anytime >>>> soon. I passed that stage decades ago.
So, what is the largest source file you're ever likely to have to deal >>>> with? Even a 1Mloc file will only be 30-40MB, barely 1% of a typical
PC's memory.
The largest input I deal with is 50,000 lines which represents all the >>>> source files of the application, for my /whole-program/ compiler.
This totals approx 1MB, or 1/8000th of my machine's RAM. So it would be >>>> ludicrous to start using tiny buffers on the off-chance that I would one >>>> day have to deal with inputs 1000s of times bigger representing one
single program.
You seem to care about speed. In the past buffers of order 4k
theoretically should give best performance. Namely, buffer
was sized to be significantly smaller than L1 cache, so that
reads would not disturb too much content of L1 cache. In
modern security climate it seems that OS takes any possible
pretext to fush caches, defeating benefits from small buffers.
Still, using small buffers is easy, so why use big ones if
they do not give measurable advantages?
Because it's extra complexity for no benefit. Lexing code will have to
deal with the possibility that a token is split across two buffers which
means checking that on every character, which is much more likely to
slow things down.
Well, with reasonable language and compiler organization cost is
very low: one puts otherwise invalid charater after buffer.
Well, your file is already in memory (in system cache), so you
are using memory. And there are other good uses.
On 09/11/2022 11:54, James Harris wrote:
On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>>> job done. If all the 100s of apps on my PC all used the same slow file >>>>> read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of
hardware so I don't want to assume that the memory size will always be
greater than the source size - though I can see definite advantages to
your approach, especially in terms of speed or if the hardware lacks
paging.
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target is
some embedded device.
To make a good test of a compiler's performance a load of other things
would need to be included in the source it had to compile. Such things
as many identifiers of different lengths, symtab insertions,
retrievals, scope creation and destruction, different constructs:
loops, ifs, switches, exceptions, etc. It would make sense also to
have plenty of inefficient source code for the optimiser to do its
magic on.
Such source is probably best generated by a program - which would be
satisfyingly contrarian. ;)
Such source is best taken from real programs, as synthesised ones will
never have that variety. Machine-generated input is best for development
as you can then concentrate on specific token types, or short vs long
names, or random vs similar identifiers, that sort of thing.
On 09/11/2022 14:12, Bart wrote:
On 09/11/2022 11:54, James Harris wrote:
On Sunday, 6 November 2022 at 14:58:06 UTC, Bart wrote:
On 06/11/2022 08:36, James Harris wrote:
On 05/11/2022 22:10, Bart wrote:
If so, why? I just use C's `fread()` to read an entire file at once; >>>>>> job done. If all the 100s of apps on my PC all used the same slow
file
read methods, the machine would grind to a halt.
I should say I want my compiler to be able to run on a range of
hardware so I don't want to assume that the memory size will always
be greater than the source size - though I can see definite
advantages to your approach, especially in terms of speed or if the
hardware lacks paging.
I think that these days everyone runs their compilers on decent
hardware, like PCs with plenty of ram and storage. Even if the target
is some embedded device.
When I started this I took your view - that compilations (including
cross compilations) would run on normal PCs. That meant that compilers
and other build programs would have plenty of resources to play with. However, someone pointed out to me that an application which is running
on a target machine may also want to execute a compile step. I don't
imagine it would be used much but it's a fair point. Therefore, in my
case, it makes sense to use only resources which are needed. A program
which works on a small machine will still work on a full-blown PC
whereas the converse is not necessarily true.
Well, I suspect that a human-written program may be too small for an indicative speed test. The time taken to compile such source would
likely be overshadowed by overheads.
On 13/11/2022 15:48, James Harris wrote:
When I started this I took your view - that compilations (including
cross compilations) would run on normal PCs. That meant that compilers
and other build programs would have plenty of resources to play with.
However, someone pointed out to me that an application which is
running on a target machine may also want to execute a compile step. I
don't imagine it would be used much but it's a fair point. Therefore,
in my case, it makes sense to use only resources which are needed. A
program which works on a small machine will still work on a full-blown
PC whereas the converse is not necessarily true.
So the requirement when writing any cross-compiler is that the compiler should be able to self-host on the same target?
I'm sceptical as to how useful and how practical that will be.
Because, it's not just about the source file. You may be able to reduce memory requirements for scanning the source file to 4KB, but all the
other data structures still need to represent the whole file.
In my compiler, all those data structures, if the memory is never freed
when no longer needed, add to up to approx 30 times the size of the
source code. About 10 times if considering only the AST and symbol and
type tables.
Now you could structure the compiler so that all those outputs are
written out to files too. Then, congratulations, you've written a 1970s compiler.
But consider also that the target machine might not have a file system,
or if it has, it's too small.
On 13/11/2022 21:28, Bart wrote:
In my compiler, all those data structures, if the memory is never
freed when no longer needed, add to up to approx 30 times the size of
the source code. About 10 times if considering only the AST and symbol
and type tables.
That's interesting. How large would your tree and your composite symbol
table typically be compared with the source?
On Saturday, November 5, 2022 at 11:03:47 AM UTC-4, James Harris wrote:
Any thoughts on what would be in a good piece of source code to use to
test a compiler's speed?
I guess I could do with something that's (1) realistic, not just an
assignment statement repeated over and over again, (2) large enough to
give lengthy timings, (3) easy enough to make, and (4) it would be valid
over multiple generations of the compiler. But what should be in it?
Here are some comparative timings taken just now. They read and compile
my parser (as it's the largest source file in the compiler). It is about
18k in size and from it the compiler produces am asm file of a little
over 100k.
cda 566 ms
cdb 540 ms
cdc 600 ms
cdd 24 ms
The reason for the gratifying jump in performance at the end is that I
added input and output buffering to cdd but it's got me wondering about
testing the times taken by future compilers.
I should say that I don't expect compile times always to improve. The
run times of later compilers would likely go up as I add facilities and
down as I switch to other mechanisms. But it would still be something
I'd like to keep an eye on.
So you don't care about how fast the generated code runs, instead focusing on how fast the program compiles?
That seems irrelevant to me since you're going to compile only once or a few times to debug it but run the program many times (if it's useful)
Any thoughts on what would be in a good piece of source code to use to
test a compiler's speed?
I guess I could do with something that's (1) realistic, not just an assignment statement repeated over and over again, (2) large enough to
give lengthy timings, (3) easy enough to make, and (4) it would be valid
over multiple generations of the compiler. But what should be in it?
Here are some comparative timings taken just now. They read and compile
my parser (as it's the largest source file in the compiler). It is about
18k in size and from it the compiler produces am asm file of a little
over 100k.
cda 566 ms
cdb 540 ms
cdc 600 ms
cdd 24 ms
The reason for the gratifying jump in performance at the end is that I
added input and output buffering to cdd but it's got me wondering about testing the times taken by future compilers.
I should say that I don't expect compile times always to improve. The
run times of later compilers would likely go up as I add facilities and
down as I switch to other mechanisms. But it would still be something
I'd like to keep an eye on.
--So you don't care about how fast the generated code runs, instead focusing on how fast the program compiles?
James Harris
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (0 / 16) |
Uptime: | 169:09:14 |
Calls: | 10,385 |
Calls today: | 2 |
Files: | 14,057 |
Messages: | 6,416,551 |