Forum: >>> Magnum BBS <<<

More of my philosophy about the network topology of Intel Xeon and AMD

From Amine Moulay Ramdane@21:1/5 to All on Fri Sep 9 13:05:24 2022

Hello,

More of my philosophy about the network topology of Intel Xeon and
AMD Epyc and more of my thoughts..

I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..

I think i am highly smart since I have passed two certified IQ tests and i have scored above 115 IQ, i have just looked at the network topology
of AMD Epyc CPU, here it is:

https://www.anandtech.com/show/11551/amds-future-in-servers-new-7000-series-cpus-launched-and-epyc-analysis/2

And i am carefully noticing in the above article that the network topology between the different CCX on the same die and between cores on the same CCX are connected with Infinity Fabric in a much sophisticated manner than a simple bus topology and i
think it makes the AMD Eypc CPU good at "scalability" as is Intel Xeon, and i think that such CPUs are also efficiently minimizing the number of hops between sockets etc., it is why i think the number hops don't go higher than 2 and the latency of the
two hops is not so problematic, so i think that it is a good news, since i think that AMD Epyc and Intel Xeon and the like are not using the following methodology in the following paper using filters so that to reduce bus traffic:

I have just read the following interesting paper about Scaling SMP Machines Through Hierarchical Snooping, i invite you to read it:

https://pages.cs.wisc.edu/~kola/projectreports/cs757.pdf

More of my philosophy about the network topology in multicores CPUs..

I invite you to look at the following video:

Ring or Mesh, or other? AMD's Future on CPU Connectivity

https://www.youtube.com/watch?v=8teWvMXK99I&t=904s

And i invite you to read the following article:

Does an AMD Chiplet Have a Core Count Limit?

Read more here:

https://www.anandtech.com/show/16930/does-an-amd-chiplet-have-a-core-count-limit

I think i am smart and i say that the above video and the above article
are not so smart, so i will talk about a very important thing, and it is
the following, read the following:

Performance Scalability of a Multi-core Web Server

https://www.researchgate.net/publication/221046211_Performance_scalability_of_a_multi-core_web_server

So notice carefully that it is saying the following:

"..we determined that performance scaling was limited by the capacity of
the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores."

So as you notice they were using an Intel Xeon of 8 cores, and the
application was scalable to 8x but the hardware was not scalable to 8x,
since it was scalable only to 4.8x, and this was caused by the bus
saturation, since the Address bus saturation causes poor scaling, and
the Address Bus carries requests and responses for data, called snoops,
and more caches mean more sources and more destinations for snoops that is causing the poor scaling, so as you notice that a network topology of
a Ring bus or a bus was not sufficient so that to scale to 8x on an
Intel Xeon with 8 cores, so i think that the new architectures like Epyc
CPU and Threadripper CPU can use a faster bus or/and a different network topology that permits to both ensure a full scalability locally in the
same node and globally between the nodes, so then we can notice that a sophisticated mesh network topology not only permits to reduce the
number of hops inside the CPU for good latency, but it is also good for reliability by using its sophisticated redundancy and it is faster than previous topologies like the ring bus or the bus since
for example the search on address bus becomes parallelized, and it looks
like the internet network that uses mesh topology using routers, so it parallelizes, and i also think that using a more sophisticated topology
like a mesh network topology is related to queuing theory since we can
notice that in operational research the mathematics says that we can
make the queue like M/M/1 more efficient by making the server more
powerful, but we can notice that the knee of a M/M/1 queue is around 50%
, so we can notice that by using a mesh topology like internet or
inside a CPU, you can by parallelizing more you can in operational
research both enhance the knee of the queue and the speed of executing
the transactions and it is like using many servers in queuing theory and
it permits to scale better inside a CPU or in internet.

More of my philosophy about silicon chip fabrication and technology and more of my thoughts..

The atoms used in silicon chip fabrication are around 0.2nm,
so i think that we can make a transistor of one atom , you can
read about it here:

Scientists create new recipe for single-atom transistors

https://www.sciencedaily.com/releases/2020/05/200511092920.htm

So i think this gives an exponential growth of scalability with EUV(Extreme ultraviolet lithography) or such technology to around 2^5, and after that i think we can go to 3D or to the superconductor computer chips, read about them in my below thoughts,
or use the following inventions, read about them carefully in my following writing and thoughts:

More of my philosophy about latency and contention and concurrency and parallelism and more of my thoughts..

I think i am highly smart and i have just posted, read it below,
about the new two inventions that will make logic gates thousands of times faster or a million times faster than those in existing computers,
and i think that there is still a problem with those new inventions,
and it is about the latency and concurrency, since you need concurrency
and you need preemptive or non-preemptive scheduling of the coroutines ,
so since the HBM is 106.7 ns in latency and the DDR4 is 73.3 ns in latency and the AMD 3D V-Cache has also almost the same cost in latency, so as you notice that this kind of latency is still costly , also there is a latency that is the Time slice that
takes a coroutine to execute and it is costly in latency, since this kind of latency and Time slice is a waiting time that looks like the time wasted in a contention in parallelism, so by logical analogy this kind of latency and Time slice create like a
contention like in parallelism that reduces scalability, so i think it is why those new inventions have this kind of limit or constraints in a "concurrency" envirenment.

And i invite you to read my following smart thoughts about preemptive and non-preemptive timesharing:

https://groups.google.com/g/alt.culture.morocco/c/JuC4jar661w

More of my philosophy about Fastest-ever logic gates and more of my thoughts..

"Logic gates are the fundamental building blocks of computers, and researchers at the University of Rochester have now developed the fastest ones ever created. By zapping graphene and gold with laser pulses, the new logic gates are a million times faster
than those in existing computers, demonstrating the viability of “lightwave electronics.”. If these kinds of lightwave electronic devices ever do make it to market, they could be millions of times faster than today’s computers. Currently we measure
processing speeds in Gigahertz (GHz), but these new logic gates function on the scale of Petahertz (PHz). Previous studies have set that as the absolute quantum limit of how fast light-based computer systems could possibly get."

Read more here:

https://newatlas.com/electronics/fastest-ever-logic-gates-computers-million-times-faster-petahertz/

Read my following news:

And with the following new discovery computers and phones could run thousands of times faster..

Prof Alan Dalton in the School of Mathematical and Physics Sciences at the University of Sussex, said:

"We're mechanically creating kinks in a layer of graphene. It's a bit like nano-origami.

"Using these nanomaterials will make our computer chips smaller and faster. It is absolutely critical that this happens as computer manufacturers are now at the limit of what they can do with traditional semiconducting technology. Ultimately, this will
make our computers and phones thousands of times faster in the future.

"This kind of technology -- "straintronics" using nanomaterials as opposed to electronics -- allows space for more chips inside any device. Everything we want to do with computers -- to speed them up -- can be done by crinkling graphene like this."

Dr Manoj Tripathi, Research Fellow in Nano-structured Materials at the University of Sussex and lead author on the paper, said:

"Instead of having to add foreign materials into a device, we've shown we can create structures from graphene and other 2D materials simply by adding deliberate kinks into the structure. By making this sort of corrugation we can create a smart electronic
component, like a transistor, or a logic gate."

The development is a greener, more sustainable technology. Because no additional materials need to be added, and because this process works at room temperature rather than high temperature, it uses less energy to create.

Read more here:

https://www.sciencedaily.com/releases/2021/02/210216100141.htm

But I think that mass production of graphene still hasn't quite begun,
so i think the inventions above of the Fastest-ever logic gates that
uses graphene and of the one with nanomaterials that uses graphene will not be commercialized fully until perhaps around year 2035 or 2040 or so, so read the following so that to understand why:

"Because large-scale mass production of graphene still hasn't quite begun , the market is a bit limited. However, that leaves a lot of room open for investors to get in before it reaches commercialization.

The market was worth $78.7 million in 2019 and, according to Grand View Research, is expected to rise drastically to $1.08 billion by 2027.

North America currently has the bulk of market share, but the Asia-Pacific area is expected to have the quickest growth in adoption of graphene uses in coming years. North America and Europe are also expected to have above-market average growth.

The biggest driver of all this growth is expected to be the push for cleaner, more efficient energy sources and the global reduction of emissions in the air."

Read more here:

https://www.energyandcapital.com/report/the-worlds-next-rare-earth-metal/1600

More of my philosophy about superconductor computer chips and more of my thoughts..

"Scientists from the University of Virginia School of Medicine and collaborators used the building blocks of life to potentially revolutionize electronics.

Edward H. Egelman, Ph.D. and his colleagues say their new DNA-enabled method could have a wide range of research applications in physics and materials science. Crucially, it could lead to the creation of Little's room-temperature superconductor, which
could help revolutionize electronics. Their work, combined with other recent breakthroughs in superconductors, could unlock the great potential of quantum computing — which would, in turn, vastly improve countless scientific fields with its hyper-fast
calculations."

Read more here:

https://interestingengineering.com/innovation/engineering-breakthrough-dna-quantum-computing

So i think that superconductor computer chips will be possible in the future, and computer chips with superconducting circuits — circuits with zero electrical resistance — would be 50 to 100 times as energy-efficient as today’s chips, an attractive
trait given the increasing power consumption of the massive data centers that power the Internet’s most popular sites. And superconducting chips also promise greater processing power: Superconducting circuits have been clocked at a Terahertz.

More of my philosophy about energy efficiency of quantum computers and
about IBM's Compute-in-memory and more of my thoughts..

I think i am smart, and i will say that IBM have invented the following Compute-in-memory specialized hardware that works in both analogue and digital so that to lower much more energy consumption, read about it here:

IBM: Compute-in-memory beats GPUs by 10x, sometimes

https://www.electronicsweekly.com/news/research-news/ibm-compute-memory-beats-gpus-10x-sometimes-2021-11/

But i think that the much better solution than the above Compute-in-memory specialized hardware of IBM is to use the powerful quantum computers, and i invite you so that to read the following article about it:

Quantum computers vastly outperform supercomputers when it comes to energy efficiency

https://physicsworld.com/a/quantum-computers-vastly-outperform-supercomputers-when-it-comes-to-energy-efficiency/

Also i have just looked at the following article about the
benchmark of Intel Xeon Scalable Processor vs. Nvidia V100 GPU,
here it is:

https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-intel-xeon-scalable-processor-vs-nvidia-v100-gpu/

So i think that the main problem of Intel Xeons in the above benchmark is the memory bandwidth, so i think that the number of GFLOPs of Intel Xeons in the above benchmark is a result of multiplying the frequency of the CPU by the number of cores and by
2x8FMA, i mean fused multiply–add (FMA) instructions for floating-point scalar and SIMD operations, and it is giving a result of 2,240 GFLOPs, so then if you want to have a powerful computer that also have a good memory bandwidth, i advice you to use a
new two socket for new Intel Xeon processors that support a memory bandwidth of like 5.2 GT/s for DDR5 x 8 bytes per channel x 12 channels for one socket, and that equals 499.2 GB per second or 998.4 GB per second for two sockets, and this will equal
the memory bandwidth of the Nvidia V100 PCIe (Volta) in the above benchmark , and this will solve the memory bandwidth problem, and of course the two socket motherboard for a two new 64 cores Intel Xeon 3.4 Ghz will give you around 6963 GFLOPs as the
Nvidia V100 PCIe (Volta).

And i invite you so that to read my following thoughts about AVX-512 of AMD Zen 4, since i have just brought a little more precision to it:

More of my philosophy about AVX-512 of AMD Zen 4 architecture and more of my thoughts..

I think AVX-512 of AMD Zen 4 architecture will not be a true, 512-bit wide SIMD as we find in Intel architecture, but Zen 4 will be executing these AVX-512 instructions over two cycles, so i think it will not require to lower the frequency of the cores
when using AVX-512, as is doing it Intel, and i think that AMD is doing it this way for also "compatibility" with the Intel AVX-512, and you can read the following article so that to notice it:

https://www.anandtech.com/show/17552/amd-details-ryzen-7000-launch-up-ryzen-7950x-coming-sept-27

More of my philosophy about RISC-V and about open source and more of my thoughts..

I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..

I have just looked at the following video, and i invite you to look at it:

RISC-V Chips Revolution is coming

https://www.youtube.com/watch?v=irH5eKzezsE

But I don't think RISC-V is a good thing, since read
the following about the open source problems:

"RISC-V is open source, and open-source hardware centers on a design everybody is free to use and modify as they see fit. For its part, ARM claims that it has much more money to fund R&D and develop technology for its customers, whereas RISC-V
International does not and merely provides an ISA. ARM also raised concerns that usage of RISC-V could result in “fragmentation,” which is basically a lack of standards throughout an industry that creates an obstacle for compatibility in both
hardware and software. Since ARM provides standardized cores, the risk of fragmentation is averted. Whereas RISC-V sees this standardization as a weakness, ARM argues it’s a strength."

Read more here:

ARM vs. RISC-V: Is one better than the other?

https://www.digitaltrends.com/computing/arm-vs-risc-v/

And more of my writing about open source:

“Software developers today have their own supply chains. Instead of assembling car parts, they assemble code by patching together existing open source components with their unique code. While this leads to increased productivity and innovation, it has
also created significant security issues,” said Matt Jarvis, Director of Developer Relations at Snyk.

Read more here (and you have to translate the article because it is
in french):

DARPA is concerned about the reliability of open source code, it runs on every computer on the planet and keeps critical infrastructure running,
According to NSA's Dave Aitel

Read more here:

https://securite.developpez.com/actu/335029/La-DARPA-s-inquiete-de-la-fiabilite-du-code-open-source-il-fonctionne-sur-tous-les-ordinateurs-de-la-planete-et-assure-le-fonctionnement-des-infrastructures-critiques-selon-Dave-Aitel-de-la-NSA/

More of my philosophy about Intel and about its future and more of my thoughts..

I think Intel and AMD are not the same, since notice how Intel has not used TSMC semiconductor manufacturing process so that to solve its problem, since i think that Intel is one of the "important" basis fondation of USA, since it is a semiconductor
manufacturer as TSMC(Taiwan Semiconductor Manufacturing Company), so for US security reasons, Intel has to be successful, and it is why i think that Intel will soon be really successful in the CPUs and semiconductor sector, since Intel 4, the Intel
implementation of extreme ultraviolet (EUV) lithography, will be manufacturing-ready in the second half of 2022. It delivers an approximate 20% increase in transistor performance per watt. And Intel 3, with additional features, will deliver a further 18%
performance per watt and will be manufacturing-ready in the second half of 2023. Ushering in the Angstrom Era with RibbonFET and PowerVia, Intel 20A will deliver up to a 15% performance per watt improvement and will be manufacturing-ready in the first
half of 2024. Intel 18A will deliver an additional 10% improvement and will be manufacturing-ready in the second half of 2024, and read carefully the following interesting article so that to understand more how Intel will be soon really successful:

https://min.news/en/tech/531c80fc9093694ee8a7eb7db7ac1dc2.html

More of my philosophy about innovation in China and more of my thoughts..

"However, despite successes in fintech and smart cities, China continues to struggle to innovate in key areas, particularly advanced computer chips and the expensive machines that make them. Despite the huge amount of capital (estimated to be upwards of $
150 billion USD from 2014 through 2030) and resources that Beijing has poured into bolstering China’s domestic semiconductor manufacturing capacity, many of the country’s leading firms are struggling to realize the government’s goals. After being
added to the U.S. Department of Commerce’s Entity List in 2020, China’s most advanced chip foundry, Semiconductor Manufacturing International Corporation (SMIC), has struggled to meet its goals. Despite promising to produce thin and modern 7-
nanometer chips, SMIC lacks the machine tools to make them. U.S. export controls on chip design software and foundry machine tools have also crippled Huawei’s HiSilicon, effectively curbing its only potential rival to U.S. advanced chips.

To be sure, Beijing’s reliance on imported technologies goes well beyond foreign-designed semiconductors. According to a 2018 article from the Ministry of Education, China relies on imports for 35 key technologies that it is unable to produce
domestically in sufficient quality or quantity. These technologies include heavy-duty gas turbines, high-pressure piston pumps, steel for high-end bearings, photolithography machines, core industrial software, and more. With U.S.-China bilateral
technology investment seeing a steep 96 percent decline since 2016, Beijing has been forced to look for new ways to source key technologies, turning to shell companies and intermediary agents to source foreign components, reagents, and other relevant
equipment.

In short, China has demonstrated its capacity to indigenously innovate, but this capacity has not yet proliferated across all key sectors. What initially began as a strategy to import and copy the technology and innovations of other nations has changed
to reflect China’s growing ability to take foreign ideas and concepts and mold them with respect to China’s domestic requirements."

Read more here:

https://www.brookings.edu/techstream/beijings-re-innovation-strategy-is-key-element-of-u-s-china-competition/

More of my philosophy about Random Write in Solid State Drives and more of my thoughts..

I have just read the following paper , so i invite you to read it carefully:

SFS: Random Write Considered Harmful in Solid State Drives

Read more here:

https://www.usenix.org/conference/fast12/sfs-random-write-considered-harmful-solid-state-drives#:~:text=However%2C%20the%20random%20write%20performance,NAND%20block%20erases%20per%20write.

But i don't agree with the above paper, since i invite you to look
at the following article:

SupremeRAID SR-1010 : 110 Go/s en lecture et 22 Go/s en écriture

https://www.tomshardware.fr/supremeraid-sr-1010-110-go-s-en-lecture-et-22-go-s-en-ecriture/

So as you notice in the table in the above article, it says that
the random write in RAID 10 of 4k in Linux is: 6 M IOPS
so it gives around 25 GB/s throughput with Solid State Drives, and i think that it is a decent performance compared to the sequential write throughput, so i think that it is not so problematic as saying it the above paper.

More of my philosophy about superconductor computer chips and more of my thoughts..

"Scientists from the University of Virginia School of Medicine and collaborators used the building blocks of life to potentially revolutionize electronics.

Edward H. Egelman, Ph.D. and his colleagues say their new DNA-enabled method could have a wide range of research applications in physics and materials science. Crucially, it could lead to the creation of Little's room-temperature superconductor, which
could help revolutionize electronics. Their work, combined with other recent breakthroughs in superconductors, could unlock the great potential of quantum computing — which would, in turn, vastly improve countless scientific fields with its hyper-fast
calculations."

Read more here:

https://interestingengineering.com/innovation/engineering-breakthrough-dna-quantum-computing

So i think that superconductor computer chips will be possible in the future, and computer chips with superconducting circuits — circuits with zero electrical resistance — would be 50 to 100 times as energy-efficient as today’s chips, an attractive
trait given the increasing power consumption of the massive data centers that power the Internet’s most popular sites. And superconducting chips also promise greater processing power: Superconducting circuits have been clocked at a Terahertz.

More of my philosophy about Rust and about memory models and about technology and more of my thoughts..

I think i am highly smart, and i say that the new programming language that we call Rust has an important problem, since read the following interesting article that says that atomic operations that have not correct memory ordering can still cause race
conditions in safe code, this is why the suggestion made by the researchers is:

"Race detection techniques are needed for Rust, and they should focus on unsafe code and atomic operations in safe code."

Read more here:

https://www.i-programmer.info/news/98-languages/12552-is-rust-really-safe.html

More of my philosophy about the Apple Silicon and about Arm Vs. X86 and more of my thoughts..

I invite you to read carefully the following interesting article so
that to understand more:

Overhyped Apple Silicon: Arm Vs. X86 Is Irrelevant

https://seekingalpha.com/article/4447703-overhyped-apple-silicon-arm-vs-x86-is-irrelevant

More of my philosophy about code compression of RISC-V and ARM and more of my thoughts..

I think i am highly smart, and i have just read the following paper
that says that RISC-V Compressed programs are 25% smaller than RISC-V programs, fetch 25% fewer instruction bits than RISC-V programs, and incur fewer instruction cache misses. Its code size is competitive with other compressed RISCs. RVC is expected to
improve the performance and energy per operation of RISC-V.

Read more here to notice it:

https://people.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf

So i think RVC has the same compression as ARM Thumb-2, so i think
that i was correct in my previous thoughts , read them below,
so i think we have now to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression or RVC.

More of my philosophy of who will be the winner, x86 or x64 or ARM and more of my thoughts..

I think i am highly smart, and i think that since x86 or x64 has complex instructions and ARM has simple instructions, so i think that x86 or x64 is more cache friendly, but ARM has wanted to solve the problem by compressing the code by using Thumb-2
that compresses the code, so i think Thumb-2 compresses the size of the code by around 25%, so i think
we have to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression, and i think that x86 or x64 will still optimize more the power or energy efficiency, so i think that there remains that since x86 or x64 has other big
advantages, like the advantage that i am talking about below, so i think the x86 or x64 will be still successful big players in the future, so i think it will be the "tendency". So i think that x86 and x64 will be good for a long time to make money in
business, and they will be good for business for USA that make the AMD or Intel CPUs.

More of my philosophy about x86 or x64 and ARM architectures and more of my thoughts..

I think i am highly smart, and i think that x86 or x64 architectures
has another big advantage over ARM architecture, and it is the following:

"The Bright Parts of x86

Backward Compatibility

Compatibility is a two-edged sword. One reason that ARM does better in low-power contexts is that its simpler decoder doesn't have to be compatible with large accumulations of legacy cruft. The downside is that ARM operating systems need to be modified
for every new chip version.

In contrast, the latest 64-bit chips from AMD and Intel are still able to boot PC DOS, the 16-bit operating system that came with the original IBM PC. Other hardware in the system might not be supported, but the CPUs have retained backward compatibility
with every version since 1978.

Many of the bad things about x86 are due to this backward compatibility, but it's worth remembering the benefit that we've had as a result: New PCs have always been able to run old software."

Read more here on the following web link so that to notice it:

https://www.informit.com/articles/article.aspx?p=1676714&seqNum=6

So i think that you can not compare x86 or x64 to ARM, since it is
not just a power efficiency comparison, like some are doing it by comparing the Apple M1 Pro ARM CPU to x86 or x64 CPUs, it is why i think that x86 or x64 architectures will be here for a long time, so i think that they will be good for a long time to make money in business, and they are a good business for USA that make the AMD
or Intel CPUs.

More of my philosophy about weak memory model and ARM and more of my thoughts..

I think ARM hardware memory model is not good, since it is a
weak memory model, so ARM has to provide us with a TSO memory
model that is compatible with x86 TSO memory model, and read what Kent Dickey is saying about it in my following writing:

ProValid, LLC was formed in 2003 to provide hardware design and verification consulting services.

Kent Dickey, founder and President, has had 20 years experience in hardware design and verification. Kent worked at Hewlett-Packard and Intel Corporation, leading teams in ASIC chip design and pre-silicon and post-silicon hardware verification. He
architected bus interface chips for high-end servers at both companies. Kent has received more than 10 patents for innovative work in both design and verification.

Read more here about him:

https://www.provalid.com/about/about.html

And read the following thoughts of Kent Dickey about the weak memory model such as of ARM:

"First, the academic literature on ordering models is terrible. My eyes
glaze over and it's just so boring.

I'm going to guess "niev" means naive. I find that surprising since x86
is basically TSO. TSO is a good idea. I think weakly ordered CPUs are a
bad idea.

TSO is just a handy name for the Sparc and x86 effective ordering for
writeback cacheable memory: loads are ordered, and stores are buffered and will complete in order but drain separately from the main CPU pipeline. TSO can allow loads to hit stores in the buffer and see the new value, this doesn't really matter for
general ordering purposes.

TSO lets you write basic producer/consumer code with no barriers. In fact, about the only type of code that doesn't just work with no barriers on TSO is Lamport's Bakery Algorithm since it relies on "if I write a location and read it back and it's still
there, other CPUs must see that value as well", which isn't true for TSO.

Lock free programming "just works" with TSO or stronger ordering guarantees, and it's extremely difficult to automate putting in barriers for complex algorithms for weakly ordered systems. So code for weakly ordered systems tend to either toss in lots of
barriers, or use explicit locks (with barriers). And extremely weakly ordered systems are very hard to reason about, and especially hard to program since many implementations are not as weakly ordered as the specification says they could be, so just
running your code and having it work is insufficient. Alpha was terrible in this regard, and I'm glad it's silliness died with it.

HP PA-RISC was documented as weakly ordered, but all implementations
guaranteed full system sequential consistency (and it was tested in and enforced, but not including things like cache flushing, which did need barriers). No one wanted to risk breaking software from the original in-order fully sequential machines that might have relied on it. It wasn't really a performance issue, especially once OoO was added.

Weakly ordered CPUs are a bad idea in much the same way in-order VLIW is a bad idea. Certain niche applications might work out fine, but not for a general purpose CPU. It's better to throw some hardware at making TSO perform well, and keep the software
simple and easy to get right.

Kent"

Read the rest on the following web link:

https://groups.google.com/g/comp.arch/c/fSIpGiBhUj0

And read all my following thoughts:

USA: Senate passes $280 billion bill to subsidize chip industry,
$9 billion in aid for local production and $13 billion for research

Read more here (and you can translate it to your favorite language)

https://www.developpez.com/actu/335375/USA-le-Senat-adopte-un-projet-de-loi-de-280-milliards-pour-subventionner-l-industrie-des-puces-9-milliards-de-dollars-d-aides-pour-la-production-locale-et-13-milliards-pour-les-recherches/

More of my philosophy about open source and about technology and more of my thoughts..

[continued in next message]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kristjan Robam@21:1/5 to All on Sat Sep 10 00:23:39 2022

Amine, we are tired of your texts here, please go away, take some time and post here something better.
Learn to respond to people too. You are ignoring us.

Amine Moulay Ramdane kirjutas Reede, 9. september 2022 kl 23:05:28 UTC+3:

Hello,

More of my philosophy about the network topology of Intel Xeon and
AMD Epyc and more of my thoughts..

I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..

I think i am highly smart since I have passed two certified IQ tests and i have scored above 115 IQ, i have just looked at the network topology
of AMD Epyc CPU, here it is:

https://www.anandtech.com/show/11551/amds-future-in-servers-new-7000-series-cpus-launched-and-epyc-analysis/2

And i am carefully noticing in the above article that the network topology between the different CCX on the same die and between cores on the same CCX are connected with Infinity Fabric in a much sophisticated manner than a simple bus topology and i

think it makes the AMD Eypc CPU good at "scalability" as is Intel Xeon, and i think that such CPUs are also efficiently minimizing the number of hops between sockets etc., it is why i think the number hops don't go higher than 2 and the latency of the
two hops is not so problematic, so i think that it is a good news, since i think that AMD Epyc and Intel Xeon and the like are not using the following methodology in the following paper using filters so that to reduce bus traffic:

I have just read the following interesting paper about Scaling SMP Machines Through Hierarchical Snooping, i invite you to read it:

https://pages.cs.wisc.edu/~kola/projectreports/cs757.pdf

More of my philosophy about the network topology in multicores CPUs..

I invite you to look at the following video:

Ring or Mesh, or other? AMD's Future on CPU Connectivity

https://www.youtube.com/watch?v=8teWvMXK99I&t=904s

And i invite you to read the following article:

Does an AMD Chiplet Have a Core Count Limit?

Read more here:

https://www.anandtech.com/show/16930/does-an-amd-chiplet-have-a-core-count-limit

I think i am smart and i say that the above video and the above article
are not so smart, so i will talk about a very important thing, and it is
the following, read the following:

Performance Scalability of a Multi-core Web Server

https://www.researchgate.net/publication/221046211_Performance_scalability_of_a_multi-core_web_server

So notice carefully that it is saying the following:

"..we determined that performance scaling was limited by the capacity of
the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores."

So as you notice they were using an Intel Xeon of 8 cores, and the application was scalable to 8x but the hardware was not scalable to 8x, since it was scalable only to 4.8x, and this was caused by the bus saturation, since the Address bus saturation causes poor scaling, and
the Address Bus carries requests and responses for data, called snoops,
and more caches mean more sources and more destinations for snoops that is causing the poor scaling, so as you notice that a network topology of
a Ring bus or a bus was not sufficient so that to scale to 8x on an
Intel Xeon with 8 cores, so i think that the new architectures like Epyc
CPU and Threadripper CPU can use a faster bus or/and a different network topology that permits to both ensure a full scalability locally in the
same node and globally between the nodes, so then we can notice that a sophisticated mesh network topology not only permits to reduce the
number of hops inside the CPU for good latency, but it is also good for reliability by using its sophisticated redundancy and it is faster than previous topologies like the ring bus or the bus since
for example the search on address bus becomes parallelized, and it looks like the internet network that uses mesh topology using routers, so it parallelizes, and i also think that using a more sophisticated topology
like a mesh network topology is related to queuing theory since we can notice that in operational research the mathematics says that we can
make the queue like M/M/1 more efficient by making the server more
powerful, but we can notice that the knee of a M/M/1 queue is around 50%
, so we can notice that by using a mesh topology like internet or
inside a CPU, you can by parallelizing more you can in operational
research both enhance the knee of the queue and the speed of executing
the transactions and it is like using many servers in queuing theory and
it permits to scale better inside a CPU or in internet.

More of my philosophy about silicon chip fabrication and technology and more of my thoughts..

The atoms used in silicon chip fabrication are around 0.2nm,
so i think that we can make a transistor of one atom , you can
read about it here:

Scientists create new recipe for single-atom transistors

https://www.sciencedaily.com/releases/2020/05/200511092920.htm

So i think this gives an exponential growth of scalability with EUV(Extreme ultraviolet lithography) or such technology to around 2^5, and after that i think we can go to 3D or to the superconductor computer chips, read about them in my below thoughts,

or use the following inventions, read about them carefully in my following writing and thoughts:

More of my philosophy about latency and contention and concurrency and parallelism and more of my thoughts..

I think i am highly smart and i have just posted, read it below,
about the new two inventions that will make logic gates thousands of times faster or a million times faster than those in existing computers,
and i think that there is still a problem with those new inventions,
and it is about the latency and concurrency, since you need concurrency
and you need preemptive or non-preemptive scheduling of the coroutines ,
so since the HBM is 106.7 ns in latency and the DDR4 is 73.3 ns in latency and the AMD 3D V-Cache has also almost the same cost in latency, so as you notice that this kind of latency is still costly , also there is a latency that is the Time slice that

takes a coroutine to execute and it is costly in latency, since this kind of latency and Time slice is a waiting time that looks like the time wasted in a contention in parallelism, so by logical analogy this kind of latency and Time slice create like a
contention like in parallelism that reduces scalability, so i think it is why those new inventions have this kind of limit or constraints in a "concurrency" envirenment.

And i invite you to read my following smart thoughts about preemptive and non-preemptive timesharing:

https://groups.google.com/g/alt.culture.morocco/c/JuC4jar661w

More of my philosophy about Fastest-ever logic gates and more of my thoughts..

"Logic gates are the fundamental building blocks of computers, and researchers at the University of Rochester have now developed the fastest ones ever created. By zapping graphene and gold with laser pulses, the new logic gates are a million times

faster than those in existing computers, demonstrating the viability of “lightwave electronics.”. If these kinds of lightwave electronic devices ever do make it to market, they could be millions of times faster than today’s computers. Currently we
measure processing speeds in Gigahertz (GHz), but these new logic gates function on the scale of Petahertz (PHz). Previous studies have set that as the absolute quantum limit of how fast light-based computer systems could possibly get."

Read more here:

https://newatlas.com/electronics/fastest-ever-logic-gates-computers-million-times-faster-petahertz/

Read my following news:

And with the following new discovery computers and phones could run thousands of times faster..

Prof Alan Dalton in the School of Mathematical and Physics Sciences at the University of Sussex, said:

"We're mechanically creating kinks in a layer of graphene. It's a bit like nano-origami.

"Using these nanomaterials will make our computer chips smaller and faster. It is absolutely critical that this happens as computer manufacturers are now at the limit of what they can do with traditional semiconducting technology. Ultimately, this will

make our computers and phones thousands of times faster in the future.

"This kind of technology -- "straintronics" using nanomaterials as opposed to electronics -- allows space for more chips inside any device. Everything we want to do with computers -- to speed them up -- can be done by crinkling graphene like this."

Dr Manoj Tripathi, Research Fellow in Nano-structured Materials at the University of Sussex and lead author on the paper, said:

"Instead of having to add foreign materials into a device, we've shown we can create structures from graphene and other 2D materials simply by adding deliberate kinks into the structure. By making this sort of corrugation we can create a smart

electronic component, like a transistor, or a logic gate."

The development is a greener, more sustainable technology. Because no additional materials need to be added, and because this process works at room temperature rather than high temperature, it uses less energy to create.

Read more here:

https://www.sciencedaily.com/releases/2021/02/210216100141.htm

But I think that mass production of graphene still hasn't quite begun,
so i think the inventions above of the Fastest-ever logic gates that
uses graphene and of the one with nanomaterials that uses graphene will not be commercialized fully until perhaps around year 2035 or 2040 or so, so read the following so that to understand why:

"Because large-scale mass production of graphene still hasn't quite begun , the market is a bit limited. However, that leaves a lot of room open for investors to get in before it reaches commercialization.

The market was worth $78.7 million in 2019 and, according to Grand View Research, is expected to rise drastically to $1.08 billion by 2027.

North America currently has the bulk of market share, but the Asia-Pacific area is expected to have the quickest growth in adoption of graphene uses in coming years. North America and Europe are also expected to have above-market average growth.

The biggest driver of all this growth is expected to be the push for cleaner, more efficient energy sources and the global reduction of emissions in the air."

Read more here:

https://www.energyandcapital.com/report/the-worlds-next-rare-earth-metal/1600

More of my philosophy about superconductor computer chips and more of my thoughts..

"Scientists from the University of Virginia School of Medicine and collaborators used the building blocks of life to potentially revolutionize electronics.

Edward H. Egelman, Ph.D. and his colleagues say their new DNA-enabled method could have a wide range of research applications in physics and materials science. Crucially, it could lead to the creation of Little's room-temperature superconductor, which

could help revolutionize electronics. Their work, combined with other recent breakthroughs in superconductors, could unlock the great potential of quantum computing — which would, in turn, vastly improve countless scientific fields with its hyper-fast
calculations."

Read more here:

https://interestingengineering.com/innovation/engineering-breakthrough-dna-quantum-computing

So i think that superconductor computer chips will be possible in the future, and computer chips with superconducting circuits — circuits with zero electrical resistance — would be 50 to 100 times as energy-efficient as today’s chips, an

attractive trait given the increasing power consumption of the massive data centers that power the Internet’s most popular sites. And superconducting chips also promise greater processing power: Superconducting circuits have been clocked at a Terahertz.

More of my philosophy about energy efficiency of quantum computers and
about IBM's Compute-in-memory and more of my thoughts..

I think i am smart, and i will say that IBM have invented the following Compute-in-memory specialized hardware that works in both analogue and digital so that to lower much more energy consumption, read about it here:

IBM: Compute-in-memory beats GPUs by 10x, sometimes

https://www.electronicsweekly.com/news/research-news/ibm-compute-memory-beats-gpus-10x-sometimes-2021-11/

But i think that the much better solution than the above Compute-in-memory specialized hardware of IBM is to use the powerful quantum computers, and i invite you so that to read the following article about it:

Quantum computers vastly outperform supercomputers when it comes to energy efficiency

https://physicsworld.com/a/quantum-computers-vastly-outperform-supercomputers-when-it-comes-to-energy-efficiency/

Also i have just looked at the following article about the
benchmark of Intel Xeon Scalable Processor vs. Nvidia V100 GPU,
here it is:

https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-intel-xeon-scalable-processor-vs-nvidia-v100-gpu/

So i think that the main problem of Intel Xeons in the above benchmark is the memory bandwidth, so i think that the number of GFLOPs of Intel Xeons in the above benchmark is a result of multiplying the frequency of the CPU by the number of cores and by

2x8FMA, i mean fused multiply–add (FMA) instructions for floating-point scalar and SIMD operations, and it is giving a result of 2,240 GFLOPs, so then if you want to have a powerful computer that also have a good memory bandwidth, i advice you to use a
new two socket for new Intel Xeon processors that support a memory bandwidth of like 5.2 GT/s for DDR5 x 8 bytes per channel x 12 channels for one socket, and that equals 499.2 GB per second or 998.4 GB per second for two sockets, and this will equal the
memory bandwidth of the Nvidia V100 PCIe (Volta) in the above benchmark , and this will solve the memory bandwidth problem, and of course the two socket motherboard for a two new 64 cores Intel Xeon 3.4 Ghz will give you around 6963 GFLOPs as the Nvidia
V100 PCIe (Volta).

And i invite you so that to read my following thoughts about AVX-512 of AMD Zen 4, since i have just brought a little more precision to it:

More of my philosophy about AVX-512 of AMD Zen 4 architecture and more of my thoughts..

I think AVX-512 of AMD Zen 4 architecture will not be a true, 512-bit wide SIMD as we find in Intel architecture, but Zen 4 will be executing these AVX-512 instructions over two cycles, so i think it will not require to lower the frequency of the cores

when using AVX-512, as is doing it Intel, and i think that AMD is doing it this way for also "compatibility" with the Intel AVX-512, and you can read the following article so that to notice it:

https://www.anandtech.com/show/17552/amd-details-ryzen-7000-launch-up-ryzen-7950x-coming-sept-27

More of my philosophy about RISC-V and about open source and more of my thoughts..

I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..

I have just looked at the following video, and i invite you to look at it:

RISC-V Chips Revolution is coming

https://www.youtube.com/watch?v=irH5eKzezsE

But I don't think RISC-V is a good thing, since read
the following about the open source problems:

"RISC-V is open source, and open-source hardware centers on a design everybody is free to use and modify as they see fit. For its part, ARM claims that it has much more money to fund R&D and develop technology for its customers, whereas RISC-V

International does not and merely provides an ISA. ARM also raised concerns that usage of RISC-V could result in “fragmentation,” which is basically a lack of standards throughout an industry that creates an obstacle for compatibility in both
hardware and software. Since ARM provides standardized cores, the risk of fragmentation is averted. Whereas RISC-V sees this standardization as a weakness, ARM argues it’s a strength."

Read more here:

ARM vs. RISC-V: Is one better than the other?

https://www.digitaltrends.com/computing/arm-vs-risc-v/

And more of my writing about open source:

“Software developers today have their own supply chains. Instead of assembling car parts, they assemble code by patching together existing open source components with their unique code. While this leads to increased productivity and innovation, it

has also created significant security issues,” said Matt Jarvis, Director of Developer Relations at Snyk.

Read more here (and you have to translate the article because it is
in french):

DARPA is concerned about the reliability of open source code, it runs on every computer on the planet and keeps critical infrastructure running,
According to NSA's Dave Aitel

Read more here:

https://securite.developpez.com/actu/335029/La-DARPA-s-inquiete-de-la-fiabilite-du-code-open-source-il-fonctionne-sur-tous-les-ordinateurs-de-la-planete-et-assure-le-fonctionnement-des-infrastructures-critiques-selon-Dave-Aitel-de-la-NSA/

More of my philosophy about Intel and about its future and more of my thoughts..

I think Intel and AMD are not the same, since notice how Intel has not used TSMC semiconductor manufacturing process so that to solve its problem, since i think that Intel is one of the "important" basis fondation of USA, since it is a semiconductor

manufacturer as TSMC(Taiwan Semiconductor Manufacturing Company), so for US security reasons, Intel has to be successful, and it is why i think that Intel will soon be really successful in the CPUs and semiconductor sector, since Intel 4, the Intel
implementation of extreme ultraviolet (EUV) lithography, will be manufacturing-ready in the second half of 2022. It delivers an approximate 20% increase in transistor performance per watt. And Intel 3, with additional features, will deliver a further 18%
performance per watt and will be manufacturing-ready in the second half of 2023. Ushering in the Angstrom Era with RibbonFET and PowerVia, Intel 20A will deliver up to a 15% performance per watt improvement and will be manufacturing-ready in the first
half of 2024. Intel 18A will deliver an additional 10% improvement and will be manufacturing-ready in the second half of 2024, and read carefully the following interesting article so that to understand more how Intel will be soon really successful:

https://min.news/en/tech/531c80fc9093694ee8a7eb7db7ac1dc2.html

More of my philosophy about innovation in China and more of my thoughts..

"However, despite successes in fintech and smart cities, China continues to struggle to innovate in key areas, particularly advanced computer chips and the expensive machines that make them. Despite the huge amount of capital (estimated to be upwards

of $150 billion USD from 2014 through 2030) and resources that Beijing has poured into bolstering China’s domestic semiconductor manufacturing capacity, many of the country’s leading firms are struggling to realize the government’s goals. After
being added to the U.S. Department of Commerce’s Entity List in 2020, China’s most advanced chip foundry, Semiconductor Manufacturing International Corporation (SMIC), has struggled to meet its goals. Despite promising to produce thin and modern 7-
nanometer chips, SMIC lacks the machine tools to make them. U.S. export controls on chip design software and foundry machine tools have also crippled Huawei’s HiSilicon, effectively curbing its only potential rival to U.S. advanced chips.

To be sure, Beijing’s reliance on imported technologies goes well beyond foreign-designed semiconductors. According to a 2018 article from the Ministry of Education, China relies on imports for 35 key technologies that it is unable to produce

domestically in sufficient quality or quantity. These technologies include heavy-duty gas turbines, high-pressure piston pumps, steel for high-end bearings, photolithography machines, core industrial software, and more. With U.S.-China bilateral
technology investment seeing a steep 96 percent decline since 2016, Beijing has been forced to look for new ways to source key technologies, turning to shell companies and intermediary agents to source foreign components, reagents, and other relevant
equipment.

In short, China has demonstrated its capacity to indigenously innovate, but this capacity has not yet proliferated across all key sectors. What initially began as a strategy to import and copy the technology and innovations of other nations has changed

to reflect China’s growing ability to take foreign ideas and concepts and mold them with respect to China’s domestic requirements."

Read more here:

https://www.brookings.edu/techstream/beijings-re-innovation-strategy-is-key-element-of-u-s-china-competition/

More of my philosophy about Random Write in Solid State Drives and more of my thoughts..

I have just read the following paper , so i invite you to read it carefully:

SFS: Random Write Considered Harmful in Solid State Drives

Read more here:

https://www.usenix.org/conference/fast12/sfs-random-write-considered-harmful-solid-state-drives#:~:text=However%2C%20the%20random%20write%20performance,NAND%20block%20erases%20per%20write.

But i don't agree with the above paper, since i invite you to look
at the following article:

SupremeRAID SR-1010 : 110 Go/s en lecture et 22 Go/s en écriture

https://www.tomshardware.fr/supremeraid-sr-1010-110-go-s-en-lecture-et-22-go-s-en-ecriture/

So as you notice in the table in the above article, it says that
the random write in RAID 10 of 4k in Linux is: 6 M IOPS
so it gives around 25 GB/s throughput with Solid State Drives, and i think that it is a decent performance compared to the sequential write throughput, so i think that it is not so problematic as saying it the above paper.

More of my philosophy about superconductor computer chips and more of my thoughts..

"Scientists from the University of Virginia School of Medicine and collaborators used the building blocks of life to potentially revolutionize electronics.

Edward H. Egelman, Ph.D. and his colleagues say their new DNA-enabled method could have a wide range of research applications in physics and materials science. Crucially, it could lead to the creation of Little's room-temperature superconductor, which

could help revolutionize electronics. Their work, combined with other recent breakthroughs in superconductors, could unlock the great potential of quantum computing — which would, in turn, vastly improve countless scientific fields with its hyper-fast
calculations."

Read more here:

https://interestingengineering.com/innovation/engineering-breakthrough-dna-quantum-computing

So i think that superconductor computer chips will be possible in the future, and computer chips with superconducting circuits — circuits with zero electrical resistance — would be 50 to 100 times as energy-efficient as today’s chips, an

attractive trait given the increasing power consumption of the massive data centers that power the Internet’s most popular sites. And superconducting chips also promise greater processing power: Superconducting circuits have been clocked at a Terahertz.

More of my philosophy about Rust and about memory models and about technology and more of my thoughts..

I think i am highly smart, and i say that the new programming language that we call Rust has an important problem, since read the following interesting article that says that atomic operations that have not correct memory ordering can still cause race

conditions in safe code, this is why the suggestion made by the researchers is:

"Race detection techniques are needed for Rust, and they should focus on unsafe code and atomic operations in safe code."

Read more here:

https://www.i-programmer.info/news/98-languages/12552-is-rust-really-safe.html

More of my philosophy about the Apple Silicon and about Arm Vs. X86 and more of my thoughts..

I invite you to read carefully the following interesting article so
that to understand more:

Overhyped Apple Silicon: Arm Vs. X86 Is Irrelevant

https://seekingalpha.com/article/4447703-overhyped-apple-silicon-arm-vs-x86-is-irrelevant

More of my philosophy about code compression of RISC-V and ARM and more of my thoughts..

I think i am highly smart, and i have just read the following paper
that says that RISC-V Compressed programs are 25% smaller than RISC-V programs, fetch 25% fewer instruction bits than RISC-V programs, and incur fewer instruction cache misses. Its code size is competitive with other compressed RISCs. RVC is expected

to improve the performance and energy per operation of RISC-V.

Read more here to notice it:

https://people.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf

So i think RVC has the same compression as ARM Thumb-2, so i think
that i was correct in my previous thoughts , read them below,
so i think we have now to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression or RVC.

More of my philosophy of who will be the winner, x86 or x64 or ARM and more of my thoughts..

I think i am highly smart, and i think that since x86 or x64 has complex instructions and ARM has simple instructions, so i think that x86 or x64 is more cache friendly, but ARM has wanted to solve the problem by compressing the code by using Thumb-2

that compresses the code, so i think Thumb-2 compresses the size of the code by around 25%, so i think

we have to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression, and i think that x86 or x64 will still optimize more the power or energy efficiency, so i think that there remains that since x86 or x64 has other big

advantages, like the advantage that i am talking about below, so i think the x86 or x64 will be still successful big players in the future, so i think it will be the "tendency". So i think that x86 and x64 will be good for a long time to make money in
business, and they will be good for business for USA that make the AMD or Intel CPUs.

More of my philosophy about x86 or x64 and ARM architectures and more of my thoughts..

I think i am highly smart, and i think that x86 or x64 architectures
has another big advantage over ARM architecture, and it is the following:

"The Bright Parts of x86

Backward Compatibility

Compatibility is a two-edged sword. One reason that ARM does better in low-power contexts is that its simpler decoder doesn't have to be compatible with large accumulations of legacy cruft. The downside is that ARM operating systems need to be modified

for every new chip version.

In contrast, the latest 64-bit chips from AMD and Intel are still able to boot PC DOS, the 16-bit operating system that came with the original IBM PC. Other hardware in the system might not be supported, but the CPUs have retained backward

compatibility with every version since 1978.

Many of the bad things about x86 are due to this backward compatibility, but it's worth remembering the benefit that we've had as a result: New PCs have always been able to run old software."

Read more here on the following web link so that to notice it:

https://www.informit.com/articles/article.aspx?p=1676714&seqNum=6

So i think that you can not compare x86 or x64 to ARM, since it is
not just a power efficiency comparison, like some are doing it by comparing the Apple M1 Pro ARM CPU to x86 or x64 CPUs, it is why i think that x86 or x64 architectures will be here for a long time, so i think that they will be good for a long time to make money in business, and they are a good business for USA that make the

AMD or Intel CPUs.

More of my philosophy about weak memory model and ARM and more of my thoughts..

I think ARM hardware memory model is not good, since it is a
weak memory model, so ARM has to provide us with a TSO memory
model that is compatible with x86 TSO memory model, and read what Kent Dickey is saying about it in my following writing:

ProValid, LLC was formed in 2003 to provide hardware design and verification consulting services.

Kent Dickey, founder and President, has had 20 years experience in hardware design and verification. Kent worked at Hewlett-Packard and Intel Corporation, leading teams in ASIC chip design and pre-silicon and post-silicon hardware verification. He

architected bus interface chips for high-end servers at both companies. Kent has received more than 10 patents for innovative work in both design and verification.

Read more here about him:

https://www.provalid.com/about/about.html

And read the following thoughts of Kent Dickey about the weak memory model such as of ARM:

"First, the academic literature on ordering models is terrible. My eyes glaze over and it's just so boring.

I'm going to guess "niev" means naive. I find that surprising since x86
is basically TSO. TSO is a good idea. I think weakly ordered CPUs are a
bad idea.

TSO is just a handy name for the Sparc and x86 effective ordering for writeback cacheable memory: loads are ordered, and stores are buffered and will complete in order but drain separately from the main CPU pipeline. TSO can allow loads to hit stores in the buffer and see the new value, this doesn't really matter for

general ordering purposes.

TSO lets you write basic producer/consumer code with no barriers. In fact, about the only type of code that doesn't just work with no barriers on TSO is Lamport's Bakery Algorithm since it relies on "if I write a location and read it back and it's

still there, other CPUs must see that value as well", which isn't true for TSO.

Lock free programming "just works" with TSO or stronger ordering guarantees, and it's extremely difficult to automate putting in barriers for complex algorithms for weakly ordered systems. So code for weakly ordered systems tend to either toss in lots

of barriers, or use explicit locks (with barriers). And extremely weakly ordered systems are very hard to reason about, and especially hard to program since many implementations are not as weakly ordered as the specification says they could be, so just
running your code and having it work is insufficient. Alpha was terrible in this regard, and I'm glad it's silliness died with it.

HP PA-RISC was documented as weakly ordered, but all implementations guaranteed full system sequential consistency (and it was tested in and enforced, but not including things like cache flushing, which did need barriers). No one wanted to risk breaking software from the original in-order fully sequential machines that might have relied on it. It wasn't really a performance issue, especially once OoO was added.

[continued in next message]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet
- Ginger1
  Mon Sep 15 19:33:54 2025
  from London via SSH
- Bob Worm
  Mon Sep 15 15:42:34 2025
  from Wales, Uk via Telnet
- Gretchiie
  Mon Sep 15 05:16:29 2025
  from Derry, Nh via Telnet
- Fred Blogs
  Mon Sep 15 00:03:12 2025
  from Uk via SSH
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (3 / 13)
Uptime:	30:27:32
Calls:	10,391
Calls today:	2
Files:	14,064
Messages:	6,417,097

More of my philosophy about the network topology of Intel Xeon and AMD

Who's Online

Recent Visitors

System Info