Most CPUs have operators for register-based count-down loops
that are blazingly fast.
F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in=|
_CX register)
and you'll happily count down from 5 to 1.
FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter. ||>Should do speedy enough. ;-)||
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
minforth <minforth@gmx.net> writes:
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
Which "operators" do you have in mind, and what do you mean with
"blazingly fast".
Anyway, we have discussed this repeatedly, e.g., in ><2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your >posting <f4b89e0b-2ded-4b18-8dc1-bba6dcda47bbn@googlegroups.com>, and
cited earlier discussions in the topic.
|"minf...@arcor.de" <minforth@arcor.de> writes:
[...]
F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in= >|> _CX register)|
and you'll happily count down from 5 to 1.
|Yes, but why would one do this? As we have established in an earlier >|discussion (see below), the LOOP instruction is typically not faster
|than a sequence of simpler instructions:
|
|<2018Jun6.184616@mips.complang.tuwien.ac.at>:
||minforth@arcor.de writes:
FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter. >||>Should do speedy enough. ;-)||
||Have you measured it? I have >||<2017Mar14.183125@mips.complang.tuwien.ac.at> >||<2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the
||following loops:
||
||.L5: .L5:
|| subq $1, %rax loop .L5
|| jne .L5
||
||I found that for these loops Sandy Bridge, Haswell, and Skylake take
||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when >||using jne.
|
|<2018Jun7.141731@mips.complang.tuwien.ac.at>:
||cycles for 1000 iterations
|| K10 Excavator Zen
||Phenom II Athlon X4 845 Ryzen 1600X
|| 3021 1314 1051 loop
|| 2020 1484 1051 sub; jne
|| 2026 1489 1053 add; cmp; jne
|
|There is no performance advantage on modern AMD and Intel CPUs for the >|instruction LOOP over a good implementation of the Forth word LOOP (as
|in the third example).
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
You obviously ignore repeated refutations of your claims of superior >performance for LOOP-instruction-based counted loops. Maybe you
should implement and measure such a counted loop yourself and compare
it to the LOOP word on SwiftForth and VFX Forth.
- anton
You obviously ignore repeated refutations of your claims of superior >performance for LOOP-instruction-based counted loops. Maybe you
should implement and measure such a counted loop yourself and compare
it to the LOOP word on SwiftForth and VFX Forth.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 502 |
Nodes: | 16 (2 / 14) |
Uptime: | 216:41:22 |
Calls: | 9,878 |
Calls today: | 6 |
Files: | 13,791 |
Messages: | 6,205,686 |