anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have >>>simulation and emulators.
You don't want to use a full-blown microarchitectural emulator for a >>long-running program.
Generally hardware folks don't run 'long-running programs' when
analyzing performance, they use the emulator for determining latencies, >bandwidths and efficiacy of cache coherency algorithms and
cache prefetchers.
Their target is not application analysis.
scott@slp53.sl.home (Scott Lurndal) writes:
Their target is not application analysis.
This sounds like hardware folks that are only concerned with
memory-bound programs.
scott@slp53.sl.home (Scott Lurndal) writes:
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have >>>>simulation and emulators.
You don't want to use a full-blown microarchitectural emulator for a >>>long-running program.
Generally hardware folks don't run 'long-running programs' when
analyzing performance, they use the emulator for determining latencies, >>bandwidths and efficiacy of cache coherency algorithms and
cache prefetchers.
Their target is not application analysis.
This sounds like hardware folks that are only concerned with
memory-bound programs.
I OTOH expect that designers of out-of-order (and in-order) cores
analyse the performance of various programs to find out where the
bottlenecks of their microarchitectures are in benchmarks and
applications that people look at to determine which CPU to buy. And
that's why we not only just have PMCs for memory accesses, but also
for branch prediction accuracy, functional unit utilization, scheduler utilization, etc.
- anton
Anton Ertl wrote:
scott@slp53.sl.home (Scott Lurndal) writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >>>>scott@slp53.sl.home (Scott Lurndal) writes:
The biggest demand is from the OS vendors. Hardware folks have >>>>>simulation and emulators.
You don't want to use a full-blown microarchitectural emulator for a >>>>long-running program.
Generally hardware folks don't run 'long-running programs' when
analyzing performance, they use the emulator for determining latencies, >>>bandwidths and efficiacy of cache coherency algorithms and
cache prefetchers.
Their target is not application analysis.
This sounds like hardware folks that are only concerned with
memory-bound programs.
I OTOH expect that designers of out-of-order (and in-order) cores
analyse the performance of various programs to find out where the
bottlenecks of their microarchitectures are in benchmarks and
applications that people look at to determine which CPU to buy.
And
that's why we not only just have PMCs for memory accesses, but also
for branch prediction accuracy, functional unit utilization, scheduler
utilization, etc.
Quit being so CPU-centric.
You also need measurement on how many of which transactions few across
the bus, DRAM use analysis, and PCIe usage to fully tune the system.
- anton
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 06:50:19 |
Calls: | 10,386 |
Calls today: | 1 |
Files: | 14,058 |
Messages: | 6,416,638 |