Forum: >>> Magnum BBS <<<

Dark
Log in

Username Password

Re: large pages access time

From Marcel Mueller@21:1/5 to All on Sat Oct 12 17:07:35 2024

Am 12.10.24 um 16:24 schrieb Bonita Montero:

I wanted to test the performance of 2MiB pages against 4kiB pages.
My Zen4 CPU has a fully associative L1-TLB of 72 entries for all
page sizes and 3072 entries for each 4kiB and 2/4MiB pages.
So I wrote a little benchmark that allocates 32GiB memory with 4kiB
and 2MiB pages and that touches each 4kiB block once with a byte at
a random page address. If I touch the pages all at the same page off-
set large pages are only a quarter faster. But If I touch the pages
at a random offset large pages become 2.75 times faster. I can't
explain this hughe difference since the page-address is random so
no prefetching could help. But nevertheless it shows that large
pages could make a big difference.

Probably you should ask why 4k Pages are much slower at random access.
You simply need much more TLB entries.

With linear access it is likely that continuous physical memory is
mapped wich does not require additional TLB entries regardless of the
page size.

Marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Lurndal@21:1/5 to Marcel Mueller on Sat Oct 12 15:27:38 2024

Marcel Mueller <news.5.maazl@spamgourmet.org> writes:

Am 12.10.24 um 16:24 schrieb Bonita Montero:

I wanted to test the performance of 2MiB pages against 4kiB pages.
My Zen4 CPU has a fully associative L1-TLB of 72 entries for all
page sizes and 3072 entries for each 4kiB and 2/4MiB pages.
So I wrote a little benchmark that allocates 32GiB memory with 4kiB
and 2MiB pages and that touches each 4kiB block once with a byte at
a random page address. If I touch the pages all at the same page off-
set large pages are only a quarter faster. But If I touch the pages
at a random offset large pages become 2.75 times faster. I can't
explain this hughe difference since the page-address is random so
no prefetching could help. But nevertheless it shows that large
pages could make a big difference.

Probably you should ask why 4k Pages are much slower at random access.
You simply need much more TLB entries.

With linear access it is likely that continuous physical memory is
mapped wich does not require additional TLB entries regardless of the
page size.

ARM64 has a 'contiguous' hint bit in the translation table entry that
supports coalescing multiple consecutively addressed TLB entries
into a single entry when the OA (output addresses) are contiguous.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Wed Sep 17 08:54:03 2025
  from Derry, Nh via Telnet
- Bob Worm
  Wed Sep 17 08:43:18 2025
  from Wales, Uk via Telnet
- Bob Worm
  Wed Sep 17 08:14:37 2025
  from Wales, Uk via Telnet
- Volatile_Memory
  Wed Sep 17 07:20:57 2025
  from Des Moines, Iowa via SSH
- Volatile_Memory
  Wed Sep 17 07:17:26 2025
  from Des Moines, Iowa via SSH
- Bob Worm
  Tue Sep 16 21:01:27 2025
  from Wales, Uk via Telnet
- Bob Worm
  Tue Sep 16 15:15:42 2025
  from Wales, Uk via Telnet
- Gretchiie
  Tue Sep 16 05:20:21 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	55:16:51
Calls:	10,397
Calls today:	5
Files:	14,067
Messages:	6,417,425
Posted today:	1