On 2024-06-12 0:14, Stefan Monnier wrote:
Not always. If the mistakenly speculated cache-fetch /evicted/ some other
data from the (finite-sized) cache, and the evicted data are needed later on >> the /true/ execution path, the mistakenly speculated fetch has a /negative/ >> effect on performance. (This kind of "timing anomaly" is very bothersome in >> static WCET analysis.)
BTW, I heard that nowadays "static WCET analysis" is typically based on statistical models of the performance of various parts of the code.
So they don't give an ironclad worst case any more (contrary to what
you'd get back in the day where you could count cycles and there were no caches), but rather one that is "very unlikely" to be wrong.
I remember papers doing "real WCET" by analyzing the code to detect
those memory accesses which were guaranteed to be cache hits. Do you
know if this is still done nowadays?
I haven't followed the WCET-analysis field closely for a few years, so I
may not be up to date. But here is my understanding, with apologies for
the long text:
Fully static, non-probabilistic WCET analysis is no longer possible,
practical or profitable for most "big" processors because of their
complex microarchitectures, with out-of-order execution and speculation
being the most bothersome parts. The difficulty of getting accurate
information about the microarchitectures is a large factor, too, as is
the cost of creating a model of such a microarchitecture, compared to
the small market for such WCET tools for a particular microarchitecture.
For in-order processors static analysis of cache hits and misses is very possible, and has a great deal of good theory behind it. It works quite
well for I-caches, but for D-caches it is much harder and more
pessimistic because of the difficulty of predicting the addresses of
data accesses, which depends a lot on the nature of the application
being analyzed. Cache analysis also becomes harder in the presence of preemption, multiple cores and multiple cache levels. However, any
static WCET analysis tool with ambitions to give safe WCET bounds will certainly include static cache analysis, and can then give a safe upper
bound on the WCET, but in some cases it may be quite pessimistic (over-estimated).
Some time ago, even before the complex microarchitectures became an
acute problem, a class of "WCET" tools was developed that rely on a
combination of execution-time measurements of the basic code blocks,
static analysis of the control flow graphs and possible execution paths,
and a statistical method called extreme-value statistics that aims to
estimate the probability of outlier values (extreme and unlikely values)
for the sum of the observed distributions of the execution times of the
basic blocks. These methods do not need a microarchitecture model for
the processor, but do need a way to measure basic-block execution times
(which is doubtful in an OoO and speculative processor) and a "good
enough" test suite. AIUI, these tools are successful and are much used. However, it is not clear that the probability they compute for exceeding
their WCET "bound" is accurate, because the statistical theory depends
on the distributions of the execution times of the individual basic code
blocks being independent and uncorrelated, which is false for the
ordinary sort of microarchitectures.
To overcome that problem, there has been work to create "randomized" HW components, for example randomized caches, where it is possible to make statistical models of each component and the statistics of different
components are known, by design and construction, to be independent and uncorrelated. However, I don't know how far this work has progressed,
and how completely a processor can be randomized in this sense. A couple
of years ago I saw that one practical application of this work simply
used ordinary hardware but proposed to pick a new random memory lay-out
(link map) of the application every time the application was loaded and started, which seems a cop-out.
Recently a new type of "WCET" tool has appeared that does not claim to
do more than "estimate" or "explore" the execution time of an
application on a processor that is described only by its visible
architecture (instruction set) and some capacity and performance
numbers. I don't know exactly how these tools work, but it seems likely
that they use a simplified, in-order, non-speculative model of the
processor and can then apply the classical "static WCET" analysis
methods. They don't claim to provide an accurate or even probabilistic
WCET bound, but rather help the user understand which parts of the
application are likely to use most of the time, and give some confidence
that the application will be fast enough on the real processor. This may
be what most developers want and are content with.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)