Forum: >>> Magnum BBS <<<

Re: Hints in the instruction set (was: Redundant prefixes break fsrm ..

From Thomas Koenig@21:1/5 to Anton Ertl on Sun Nov 19 15:02:05 2023

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

Suddenly, all the things done for optimizing in hardware in the
general case (branch prediction, cache eviction, ...) can make
performance for the unusual, but relevant, case worse.

I believe there is active research going on on how to overcome this
"bias for the common case" with today's processors.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Dallman@21:1/5 to Lurndal on Sun Nov 19 16:09:00 2023

In article <4aq6N.1807$Jbsd.1159@fx03.iad>, scott@slp53.sl.home (Scott
Lurndal) wrote:

ARMv8 has a large space reserved for hint instructions, which
include a wide-ranging set of functionality, such as:

Some pointer authentication instructions.

Another is "Branch Target Indicator" (BTI) which has no-op semantics in
itself. But if BTI traps are enabled, any branch that doesn't arrive at a
BTI traps, as an illegal instruction in the Android implementation I've
been tinkering with.

BTI traps on Android are enabled by the OS, using the page table AFAICS,
for the text segments of executables and shared libraries which are
marked as compatible. They're marked that way by the LLVM linker if all
the object files that went into them were marked as compatible; the
compiler sets that option if it's given the correct option.

John

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Thomas Koenig on Sun Nov 19 15:51:28 2023

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

ARMv8 has a large space reserved for hint instructions, which
include a wide-ranging set of functionality, such as:

WFI, WFE (wait for interrupt or event). These are specified
such that they don't need to actually do anything, but may
if an implemetation e.g. supports entering low power states
while waiting.

Barrier instructions, which may be no-ops on some implementations.
(e.g. trace buffer barrier, exception synchronization barrier, etc).

YIELD instruction, which may be a no-op on some implementations.

Some pointer authentication instructions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Thomas Koenig on Sun Nov 19 15:43:17 2023

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case. And I have not heard
about such proofs for dynamic branch predictors, so static branch
prediction (hints) may indeed be the way to go.

Suddenly, all the things done for optimizing in hardware in the
general case (branch prediction, cache eviction, ...) can make
performance for the unusual, but relevant, case worse.

Actually, for caches with LRU replacement I have heard that they can
be analysed; that was research coming out of Saarbruecken ~20 years
ago, and I think they did a spin-off for commercializing it. One
problem that they had was that the usual n-way set-associative caches
with n>2 don't have proper LRU replacement, but pseudo-LRU; with these
caches the guarantees degrade to those of a 2-way cache (with the same
way sizes). I don't remember if they used that for data or for
instructions.

I have not heard any advances in WCET since that time, but maybe I
just went to the wrong meetings.

I believe there is active research going on on how to overcome this
"bias for the common case" with today's processors.

When I heard about the cache work, they also talked about the
processors and what you know about their execution time. IIRC they
found that most high-performance architectures of the day were
problematic.

One other thing I remember is that on some PowerPC CPU one could lock
certain cache lines in the cache, so they will not be replaced. So if
you use 6 of your 8 ways (of a 32KB cache with 4KB ways) for locking
stuff into the cache, and the other two ways for performing analysable accesses, it's a lot better than a CPU without a cache.

Now ARM offers cores with the R profile (e.g., ARMv8-R), where R
stands for real-time. I have not looked what the properties of these
cores are. I found it surprising that the big market for them seems
to be disk drives and such things.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Anton Ertl on Sun Nov 19 16:36:10 2023

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases.

There is one exception, whose value to society is debatable, but it
exists nonetheless: High-speed financial trading. These guys (or
rather, their computers) spend a lot time analyzing. They make very
few trades per CPU cycle, but if they do, they want to be fast. So,
latency for the less-commonly travelled path becomes the main objective.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Anton Ertl on Sun Nov 19 17:21:26 2023

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

One other thing I remember is that on some PowerPC CPU one could lock
certain cache lines in the cache, so they will not be replaced.

The Cavium MIPS cores had similar capabilities. They also had
a mechanism that allowed software to push a complete reserved
cache line (128 bytes) to an on-chip coprocessor atomically.

ARMv8 has an optional architectural feature called memory
partitioning and management (MPAM) that supports cache
allocation and memory controller bandwidth allocation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Thomas Koenig on Sun Nov 19 17:22:46 2023

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so >>>> they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases.

There is one exception, whose value to society is debatable, but it
exists nonetheless: High-speed financial trading. These guys (or
rather, their computers) spend a lot time analyzing. They make very
few trades per CPU cycle, but if they do, they want to be fast. So,
latency for the less-commonly travelled path becomes the main objective.

Although they're more generally interested in network latency than
cache latency, to the extent that they colocate their trading systems
at the exchange data centers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Thomas Koenig on Sun Nov 19 18:00:15 2023

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases.

There is one exception, whose value to society is debatable, but it
exists nonetheless: High-speed financial trading. These guys (or
rather, their computers) spend a lot time analyzing. They make very
few trades per CPU cycle, but if they do, they want to be fast. So,
latency for the less-commonly travelled path becomes the main objective.

I don't think that this use fits in the hard RTS frame at all. They
have no deadline, but want to be as fast as possible, in as many cases
as possible, i.e., the typical setup for mainstream processors. They
don't fail if they miss a deadline, they fail if the competitors make
the trade faster than they do. But failures are not catastrophic.
It's good enough if they are faster than the majority of the
competitors the majority of time.

Concerning the uncommonness of actually making a trade, one way of
addressing this that comes to my mind is to have a core that just
performs trades, and waits for it with a spinlock (so this core does
not have a low clockspeed when the trade comes along). Because it
only does trades and the spinlock, everything is warmed up for
trading, there is only one branch miss for coming out of the spinlock.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Scott Lurndal on Sun Nov 19 18:09:31 2023

Scott Lurndal <scott@slp53.sl.home> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For >>>>> the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors. >>>>>
* Prefetch instructions? Hardware prefetchers tend to work better, so >>>>> they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases.

There is one exception, whose value to society is debatable, but it
exists nonetheless: High-speed financial trading. These guys (or
rather, their computers) spend a lot time analyzing. They make very
few trades per CPU cycle, but if they do, they want to be fast. So, >>latency for the less-commonly travelled path becomes the main objective.

Although they're more generally interested in network latency than
cache latency, to the extent that they colocate their trading systems
at the exchange data centers.

True.

However, if competitor's machines are in the same building, execution
speed can still play a decisive role...

(If it was up to me to regulate, I would probably add a mandated
random delay to each transaction, with audits to prove later that
this has been applied fairly).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Anton Ertl on Sun Nov 19 19:25:50 2023

Anton Ertl wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case. And I have not heard
about such proofs for dynamic branch predictors, so static branch
prediction (hints) may indeed be the way to go.

Suddenly, all the things done for optimizing in hardware in the
general case (branch prediction, cache eviction, ...) can make
performance for the unusual, but relevant, case worse.

Actually, for caches with LRU replacement I have heard that they can
be analysed; that was research coming out of Saarbruecken ~20 years
ago, and I think they did a spin-off for commercializing it. One
problem that they had was that the usual n-way set-associative caches
with n>2 don't have proper LRU replacement, but pseudo-LRU; with these
caches the guarantees degrade to those of a 2-way cache (with the same
way sizes). I don't remember if they used that for data or for
instructions.

I have not seen real LRU for 2 decades. We mostly use what is called::
"not recently used" which is a set of n-bits (n==sets):: When then n-th
bit gets set, all n-bits are cleared.

I have not heard any advances in WCET since that time, but maybe I
just went to the wrong meetings.

I believe there is active research going on on how to overcome this
"bias for the common case" with today's processors.

When I heard about the cache work, they also talked about the
processors and what you know about their execution time. IIRC they
found that most high-performance architectures of the day were
problematic.

Hard Real Time does not like caches if you are within 50% of consuming
all CPU cycles; and does not like branch predictors, or prefetchers.

One other thing I remember is that on some PowerPC CPU one could lock
certain cache lines in the cache, so they will not be replaced. So if
you use 6 of your 8 ways (of a 32KB cache with 4KB ways) for locking
stuff into the cache, and the other two ways for performing analysable accesses, it's a lot better than a CPU without a cache.

We used to use Cache line locking to take a set of cache lines (say 4)
and lock one whose data or tag store was "bad" and that line would go
from n-way set associative to (n-1)-way set associative.

Now ARM offers cores with the R profile (e.g., ARMv8-R), where R
stands for real-time. I have not looked what the properties of these
cores are. I found it surprising that the big market for them seems
to be disk drives and such things.

Why would disk drives NEED Real Time controller ??

- anton

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stephen Fuld@21:1/5 to MitchAlsup on Sun Nov 19 12:46:10 2023

On 11/19/2023 11:25 AM, MitchAlsup wrote:

Anton Ertl wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For >>>> the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so >>>> they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case. And I have not heard
about such proofs for dynamic branch predictors, so static branch
prediction (hints) may indeed be the way to go.

Suddenly, all the things done for optimizing in hardware in the
general case (branch prediction, cache eviction, ...) can make
performance for the unusual, but relevant, case worse.

Actually, for caches with LRU replacement I have heard that they can
be analysed; that was research coming out of Saarbruecken ~20 years
ago, and I think they did a spin-off for commercializing it. One
problem that they had was that the usual n-way set-associative caches
with n>2 don't have proper LRU replacement, but pseudo-LRU; with these
caches the guarantees degrade to those of a 2-way cache (with the same
way sizes). I don't remember if they used that for data or for
instructions.

I have not seen real LRU for 2 decades. We mostly use what is called::
"not recently used" which is a set of n-bits (n==sets):: When then n-th
bit gets set, all n-bits are cleared.

I have not heard any advances in WCET since that time, but maybe I
just went to the wrong meetings.

I believe there is active research going on on how to overcome this
"bias for the common case" with today's processors.

When I heard about the cache work, they also talked about the
processors and what you know about their execution time. IIRC they
found that most high-performance architectures of the day were
problematic.

Hard Real Time does not like caches if you are within 50% of consuming
all CPU cycles; and does not like branch predictors, or prefetchers.

One other thing I remember is that on some PowerPC CPU one could lock
certain cache lines in the cache, so they will not be replaced. So if
you use 6 of your 8 ways (of a 32KB cache with 4KB ways) for locking
stuff into the cache, and the other two ways for performing analysable
accesses, it's a lot better than a CPU without a cache.

We used to use Cache line locking to take a set of cache lines (say 4)
and lock one whose data or tag store was "bad" and that line would go
from n-way set associative to (n-1)-way set associative.

Now ARM offers cores with the R profile (e.g., ARMv8-R), where R
stands for real-time. I have not looked what the properties of these
cores are. I found it surprising that the big market for them seems
to be disk drives and such things.

Why would disk drives NEED Real Time controller ??

Caveat. This was all true about 25 years ago when I retired, but may
have changed.

Several areas. As the disk spins, you have a limited amount of time
from when the header comes under the heads to read the header, verify
the ECC, check if the record number in the header matches the desired
one and does not represent a defect area to be skipped, and if
everything is correct, start the transfer into the buffer, or from the
buffer to the write circuitry. Of course, in this case, time is space,
so you want to minimize this time so you can minimize the gap space to
allow maximum use of the track for data.

Another area is is disk arm tracking. Due to run out, the tracks are
not perfectly circular, and so the head position must be adjusted in
real time. There are servo bursts spaced periodically around the track
and the hardware decodes these to provide information to the head
positioning algorithm to slightly move the head boom in or out a little
to place it optimally above the data.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to Anton Ertl on Mon Nov 20 01:42:10 2023

On 2023-11-19 17:43, Anton Ertl wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so
they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case.

Yes, for the classical "static" WCET analysis approach. There are other "hybrid" or "probabilistic" methods to compute "WCET estimates" that
should have a very small probability of being exceeded. The idea is that
once that probability is smaller than the probability of system failure
from other causes (say, uncorrectable HW failure) the "WCET" estimate is
good enough.

And I have not heard
about such proofs for dynamic branch predictors, so static branch
prediction (hints) may indeed be the way to go.

There are published "static" WCET analyses of various kinds of dynamic
branch predictors, in general analogous to static WCET analyses of
caches. But I don't know how well they perform.

Suddenly, all the things done for optimizing in hardware in the
general case (branch prediction, cache eviction, ...) can make
performance for the unusual, but relevant, case worse.

Indeed, and often they also create side channels for security breaches.

Actually, for caches with LRU replacement I have heard that they can
be analysed; that was research coming out of Saarbruecken ~20 years
ago, and I think they did a spin-off for commercializing it.

Yes, AbsInt GmbH. See www.absint.com for their commercial WCET analysis
tools. There are also several open-source, non-commercial tools.

One problem that they had was that the usual n-way set-associative
caches with n>2 don't have proper LRU replacement, but pseudo-LRU;
with these caches the guarantees degrade to those of a 2-way cache
(with the same way sizes). I don't remember if they used that for
data or for instructions.

Yep, and some caches even have randomized replacement (which can in fact
be good for the probabilistic WCET-analysis methods). Another problem is
the use of united I+D caches, where the difficulty of statically
predicting D addresses harms the analysis of I accesses with statically
known addresses.

I have not heard any advances in WCET since that time, but maybe I
just went to the wrong meetings.

For many years there has been an annual WCET Analysis Workshop in
connection with the ECRTS conferences (Euromicro Conference on Real-Time Systems. https://www.ecrts.org/about-ecrts/).

I believe there is active research going on on how to overcome this
"bias for the common case" with today's processors.

When I heard about the cache work, they also talked about the
processors and what you know about their execution time. IIRC they
found that most high-performance architectures of the day were
problematic.

Indeed. The problems are partly due to increasing processor complexity,
partly to the poor or lacking (unavailable) documentation of processor microarchitectures, making their cycle-accurate modelling/analysis
impossible.

In response, AbsInt have broadened their tool-set to include hybrid measurement-and-analysis WCET tools and approximate static analysers for "exploring" the likely execution times of an application, without
producing a guaranteed WCET bound.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to All on Sun Nov 19 23:28:19 2023

On Sun, 19 Nov 2023 19:25:50 +0000, mitchalsup@aol.com (MitchAlsup)
wrote:

Hard Real Time does not like caches if you are within 50% of consuming
all CPU cycles; and does not like branch predictors, or prefetchers.

Please forgive me if I am badly mistaken, but it sounds to me like you
may be conflating "real time" with "real fast".

Although they often do go hand in hand, "real time" is only about
meeting deadlines: speed, load, cache behavior, etc., are relevant
only to the extent that they cause you to miss deadlines.

I used to do HRT machine vision industrial QA/QC systems. Unless the
machinery is on fire[*], the conveyor keeps going - so these systems
were "hard" in the sense that there were absolute deadlines to provide
results.

[*] sometimes the conveyor keeps going even if the machinery is on
fire. 8-)

Some systems simultaneously checked multiple parts at different stages
of production and with differing deadlines. Sometimes the objective
was to waylay a bad part at an upcoming sort station, other times a
bad part would just continue on down the line and the objective was to
notify the machine(s) to avoid doing any more work on it.

At the same time, the systems had to drive graphic displays showing
the operator what was going on in near real time. This usually took
the form of one or more (reduced in size) images of actual inspected
parts overlaid with identified "defects" [colored to distinguish
warnings from failures], along with runtime counts of passed, failed,
warned, and total parts so the operator could judge progress of the
job and performance of their machinery.

Most systems had to be made to work with already existing machinery,
so I usually had no control over deadlines and compute intervals - I
simply had to deal with them. Deadlines ranged from ~20ms at the very
low end to ~900ms at the very high end. Depending on cameras just
capturing images could take 16..66ms before computation could even
start. Often there were multiple threads[*] simultaneously performing different inspections on different cameras with different deadlines.

[*] using "thread" loosely here: some systems really were co-routines
or multiple processes due to OS/RTS not supporting real threads.

Naturally, the idea was to do the job using the lowest cost hardware
possible, and often that meant a SIMD capable Pentium SBC. I had
systems running on everything from P5-MMX to Pentium 4. No reason to
serve up a 1GHz Pentium 4 [with SSE(2,3,4.x)] if a 233MHz Pentium II
[with MMX] would do the job. Often that meant coding multiple
versions for MMX and SSE(2,3,4.x), or for MMX+FPU and MMX+SSE so
[where possible] we had a choice to run on different CPUs at different
price points.

YMMV,
George

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to George Neuner on Mon Nov 20 06:18:11 2023

George Neuner <gneuner2@comcast.net> schrieb:

Although they often do go hand in hand, "real time" is only about
meeting deadlines: speed, load, cache behavior, etc., are relevant
only to the extent that they cause you to miss deadlines.

The most succinct definition I have heard of a real-time system
is that "a late answer is a wrong answer".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Niklas Holsti on Mon Nov 20 08:18:14 2023

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

On 2023-11-19 17:43, Anton Ertl wrote:

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case.

Yes, for the classical "static" WCET analysis approach. There are other >"hybrid" or "probabilistic" methods to compute "WCET estimates" that
should have a very small probability of being exceeded. The idea is that
once that probability is smaller than the probability of system failure
from other causes (say, uncorrectable HW failure) the "WCET" estimate is
good enough.

The question is how much we can trust such estimates. If you asked
experts in nuclear security in 2010 to estimate the probablility of
having n meltdowns in light-water reactors in one year, they would
have provided a vanishingly small number for n=3, probably lower than
the "probability of system failure" you mention. Yet n=3 happened in
2011, and, I think, if you asked such experts after 2011, they would
not give such low estimates.

Actually, for caches with LRU replacement I have heard that they can
be analysed; that was research coming out of Saarbruecken ~20 years
ago, and I think they did a spin-off for commercializing it.

Yes, AbsInt GmbH. See www.absint.com for their commercial WCET analysis >tools. There are also several open-source, non-commercial tools.

Yes, that's the company, thanks for reminding me.

For many years there has been an annual WCET Analysis Workshop in
connection with the ECRTS conferences (Euromicro Conference on Real-Time >Systems. https://www.ecrts.org/about-ecrts/).

This is not my research area, so I only ever heard about this stuff in
other compiler researcher meetings. Anyway, if they still have
meetings, I assume there is still progress in that area, although I
would have to look at it in more detail to see if there is still work
on static WCET, or if that work has stopped and they are making do
with probablistic methods, because the users want more performance
than can be guaranteed with static WCET methods.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to Anton Ertl on Mon Nov 20 11:51:28 2023

On 2023-11-20 10:18, Anton Ertl wrote:

Niklas Holsti <niklas.holsti@tidorum.invalid> writes:

On 2023-11-19 17:43, Anton Ertl wrote:

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case.

Yes, for the classical "static" WCET analysis approach. There are other
"hybrid" or "probabilistic" methods to compute "WCET estimates" that
should have a very small probability of being exceeded. The idea is that
once that probability is smaller than the probability of system failure >>from other causes (say, uncorrectable HW failure) the "WCET" estimate is
good enough.

The question is how much we can trust such estimates.

Indeed, and I don't trust them much or at all.

If you asked experts in nuclear security in 2010 to estimate the
probablility of having n meltdowns in light-water reactors in one
year, they would have provided a vanishingly small number for n=3,
probably lower than the "probability of system failure" you mention.
Yet n=3 happened in 2011, and, I think, if you asked such experts
after 2011, they would not give such low estimates.

I think the probabilistic WCET estimates are, or try to be, on a bit
more solid ground. They start by measuring the execution times of basic
code blocks in a suite of tests, followed by constructing the worst-case execution path based on the measured block times, with an estimate of
the probability of exceeding that worst case based on "extreme value statistics" (https://en.wikipedia.org/wiki/Extreme_value_theory). But
this depends on assumptions about the statistics and statistical
independence of the variations of the execution time of different code
blocks, which IMO is suspect for conventional processors, but is perhaps
true for randomized HW such as caches with randomized replacement policies.

There are of course several publications of benchmark and case studies
showing that such "pWCET" estimates were not exceeded in their tests.
But this does not convince me that the statistical analysis is correct,
because for non-trivial programs the construction of the worst-case
execution path usually introduces a lot of pessimism that increases the
pWCET estimate and may hide the details of the extreme-value theory.

For many years there has been an annual WCET Analysis Workshop in
connection with the ECRTS conferences (Euromicro Conference on Real-Time
Systems. https://www.ecrts.org/about-ecrts/).

This is not my research area, so I only ever heard about this stuff in
other compiler researcher meetings. Anyway, if they still have
meetings, I assume there is still progress in that area, although I
would have to look at it in more detail to see if there is still work
on static WCET, or if that work has stopped and they are making do
with probablistic methods, because the users want more performance
than can be guaranteed with static WCET methods.

One problem is that it is becoming harder to find any processors with
simple, fixed execution times. Even small microcontrollers often have
"flash accelerators" that work a bit like instruction caches.

Current research in static WCET analysis seems focused mostly on
multi-core processors, with analyses trying to bound inter-core
interference / blocking caused by shared resources such as buses and higher-level caches. The most common approach is some variation of Time-Division Multiple Access (TDMA) methods, which imply restrictions
on task scheduling that are not pleasant for SW developers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From George Neuner@21:1/5 to tkoenig@netcologne.de on Mon Nov 20 10:45:33 2023

On Mon, 20 Nov 2023 06:18:11 -0000 (UTC), Thomas Koenig
<tkoenig@netcologne.de> wrote:

George Neuner <gneuner2@comcast.net> schrieb:

Although they often do go hand in hand, "real time" is only about
meeting deadlines: speed, load, cache behavior, etc., are relevant
only to the extent that they cause you to miss deadlines.

The most succinct definition I have heard of a real-time system
is that "a late answer is a wrong answer".

In many systems early answers also are wrong.

The truth is that a hard real time (HRT) computation has a defined
time window during which a delivered result is meaningful. Outside of
that defined window, any result is meaningless.
[Of course the delivery "window" may start concurrently with beginning
the computation, but that isn't always the case.]

Soft real time (SRT) is distinguished from HRT in that a result /may/
still be meaningful even if delivered late.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Mon Nov 20 11:03:31 2023

Yes, for the classical "static" WCET analysis approach. There are other "hybrid" or "probabilistic" methods to compute "WCET estimates" that should have a very small probability of being exceeded. The idea is that once that probability is smaller than the probability of system failure from other causes (say, uncorrectable HW failure) the "WCET" estimate is good enough.

I expect there are often other factors in determining the
desired probability. IOW it's probably(!) more like "the probability is
low enough that we can tolerate this rate of failure".

IOW this turns the distinction between "soft" and "hard" real time
from a yes/no question to a continuum.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to MitchAlsup on Mon Nov 20 17:18:14 2023

mitchalsup@aol.com (MitchAlsup) writes:

Anton Ertl wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

mitchalsup@aol.com (MitchAlsup) writes:

Anyway, the question is if hint instructions are still relevant. For
the most part, they seem to have been replaced by history-based
mechanisms.

* Branch direction hints? We have branch predictors.

* Branch target hints? We have BTBs and indirect branch predictors.

* Prefetch instructions? Hardware prefetchers tend to work better, so >>>> they fell into disuse.

All can be interesting for real-time systems, which react to
rare occurrences, and where performance for these matters (and
does not for the normal case).

The things I have heard about hard real-time systems (RTS) and
worst-case execution time (WCET) analysis for hard RTS is that all
cases have to be within the deadline, including uncommon cases. So
you have to consider the worst case for, e.g., branch prediction,
unless you can prove that youget a better case. And I have not heard
about such proofs for dynamic branch predictors, so static branch
prediction (hints) may indeed be the way to go.

Suddenly, all the things done for optimizing in hardware in the
general case (branch prediction, cache eviction, ...) can make >>>performance for the unusual, but relevant, case worse.

Actually, for caches with LRU replacement I have heard that they can
be analysed; that was research coming out of Saarbruecken ~20 years
ago, and I think they did a spin-off for commercializing it. One
problem that they had was that the usual n-way set-associative caches
with n>2 don't have proper LRU replacement, but pseudo-LRU; with these
caches the guarantees degrade to those of a 2-way cache (with the same
way sizes). I don't remember if they used that for data or for
instructions.

I have not seen real LRU for 2 decades. We mostly use what is called::
"not recently used" which is a set of n-bits (n==sets):: When then n-th
bit gets set, all n-bits are cleared.

I have not heard any advances in WCET since that time, but maybe I
just went to the wrong meetings.

I believe there is active research going on on how to overcome this
"bias for the common case" with today's processors.

When I heard about the cache work, they also talked about the
processors and what you know about their execution time. IIRC they
found that most high-performance architectures of the day were
problematic.

Hard Real Time does not like caches if you are within 50% of consuming
all CPU cycles; and does not like branch predictors, or prefetchers.

One other thing I remember is that on some PowerPC CPU one could lock
certain cache lines in the cache, so they will not be replaced. So if
you use 6 of your 8 ways (of a 32KB cache with 4KB ways) for locking
stuff into the cache, and the other two ways for performing analysable
accesses, it's a lot better than a CPU without a cache.

We used to use Cache line locking to take a set of cache lines (say 4)
and lock one whose data or tag store was "bad" and that line would go
from n-way set associative to (n-1)-way set associative.

Now ARM offers cores with the R profile (e.g., ARMv8-R), where R
stands for real-time. I have not looked what the properties of these
cores are. I found it surprising that the big market for them seems
to be disk drives and such things.

Why would disk drives NEED Real Time controller ??

Managing the heads. Error correction. et alia.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to George Neuner on Mon Nov 20 22:33:38 2023

On 2023-11-20 6:28, George Neuner wrote:

On Sun, 19 Nov 2023 19:25:50 +0000, mitchalsup@aol.com (MitchAlsup)
wrote:

Hard Real Time does not like caches if you are within 50% of consuming
all CPU cycles; and does not like branch predictors, or prefetchers.

Please forgive me if I am badly mistaken, but it sounds to me like you
may be conflating "real time" with "real fast".

The hard real-time (HRT) domain can be further divided into critical and non-critical. Typically, SW for a critical HRT system must be validated, perhaps even certified, which requires proof or strong arguments that
deadlines will be met /always/.

A non-critical HRT system is one where the consequences of deadline
misses can be tolerated, if such misses are not too frequent, and so one
can get by with less strict validation. For example, some years ago I
saw a presentation of an ASML photolithography machine where the SW
certainly had HRT requirements but where a deadline miss typically led
to only one of the chips on the wafer being damaged (and later
discarded), a relatively small loss.

The SW in that ASML machine did "dry runs" of the processes to "warm up"
the caches before the actual exposures, and of course monitored deadline
misses so that it knew which chip(s) might have to be discarded.

Although they often do go hand in hand, "real time" is only about
meeting deadlines: speed, load, cache behavior, etc., are relevant
only to the extent that they cause you to miss deadlines.

In critical HRT systems, dynamic "accelerators" such as caches and
predictors are relevant also because they make it much harder to
verify/prove that deadlines will always be met, for example by making
static WCET analysis harder or impractical and by making execution-time measurements more variable and less dependable.

I used to do HRT machine vision industrial QA/QC systems. Unless the machinery is on fire[*], the conveyor keeps going - so these systems
were "hard" in the sense that there were absolute deadlines to provide results.

But (I assume) people did not die if a deadline was missed, so I would
call this a non-critical HRT system.

Some systems simultaneously checked multiple parts at different stages
of production and with differing deadlines [...]

At the same time, the systems had to drive graphic displays showing
the operator what was going on in near real time. [...]

Most systems had to be made to work with already existing machinery,
so I usually had no control over deadlines and compute intervals - I
simply had to deal with them. Deadlines ranged from ~20ms at the very
low end to ~900ms at the very high end. Depending on cameras just
capturing images could take 16..66ms before computation could even
start. Often there were multiple threads[*] simultaneously performing different inspections on different cameras with different deadlines.

Those deadlines are fairly relaxed. If you have lots of processor
margin, you can tolerate the unpredictability of caches etc. in a
non-critical HRT system.

I'm not saying that you had an easy job -- from your description it was certainly complex and demanding, especially as you had to find
lowest-cost suitable HW -- but it seems you did not have to prove that deadlines would always be met, just demonstrate, by testing, that they
were very rarely missed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Mon Nov 20 18:07:02 2023

The hard real-time (HRT) domain can be further divided into critical and non-critical.

I like to use music players as example of real-time, since if you're
late (even by just a few ms), the result is a failure.
But the expected harm is just that you may lose users/customers if it
happens too often.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MitchAlsup@21:1/5 to Stefan Monnier on Tue Nov 21 00:46:59 2023

Stefan Monnier wrote:

The hard real-time (HRT) domain can be further divided into critical and
non-critical.

I like to use music players as example of real-time, since if you're
late (even by just a few ms), the result is a failure.

And yet, modern music is re-timed from the original human produced sounds
such that each note is perfectly aligned with its supposed time. This gives
the sound an artificial and mechanical tonality even if it is "more perfect". Almost all of this re-timing is at the millisecond level.

But the expected harm is just that you may lose users/customers if it
happens too often.

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Ginger1
  Mon Sep 15 19:33:54 2025
  from London via SSH
- Bob Worm
  Mon Sep 15 15:42:34 2025
  from Wales, Uk via Telnet
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	12:42:41
Calls:	10,389
Calls today:	4
Files:	14,061
Messages:	6,416,878
Posted today:	1

Re: Hints in the instruction set (was: Redundant prefixes break fsrm ..

Who's Online

Recent Visitors

System Info