• Re: large pages access time

    From Marcel Mueller@21:1/5 to All on Sat Oct 12 17:07:35 2024
    Am 12.10.24 um 16:24 schrieb Bonita Montero:
    I wanted to test the performance of 2MiB pages against 4kiB pages.
    My Zen4 CPU has a fully associative L1-TLB of 72 entries for all
    page sizes and 3072 entries for each 4kiB and 2/4MiB pages.
    So I wrote a little benchmark that allocates 32GiB memory with 4kiB
    and 2MiB pages and that touches each 4kiB block once with a byte at
    a random page address. If I touch the pages all at the same page off-
    set large pages are only a quarter faster. But If I touch the pages
    at a random offset large pages become 2.75 times faster. I can't
    explain this hughe difference since the page-address is random so
    no prefetching could help. But nevertheless it shows that large
    pages could make a big difference.

    Probably you should ask why 4k Pages are much slower at random access.
    You simply need much more TLB entries.

    With linear access it is likely that continuous physical memory is
    mapped wich does not require additional TLB entries regardless of the
    page size.


    Marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Marcel Mueller on Sat Oct 12 15:27:38 2024
    Marcel Mueller <news.5.maazl@spamgourmet.org> writes:
    Am 12.10.24 um 16:24 schrieb Bonita Montero:
    I wanted to test the performance of 2MiB pages against 4kiB pages.
    My Zen4 CPU has a fully associative L1-TLB of 72 entries for all
    page sizes and 3072 entries for each 4kiB and 2/4MiB pages.
    So I wrote a little benchmark that allocates 32GiB memory with 4kiB
    and 2MiB pages and that touches each 4kiB block once with a byte at
    a random page address. If I touch the pages all at the same page off-
    set large pages are only a quarter faster. But If I touch the pages
    at a random offset large pages become 2.75 times faster. I can't
    explain this hughe difference since the page-address is random so
    no prefetching could help. But nevertheless it shows that large
    pages could make a big difference.

    Probably you should ask why 4k Pages are much slower at random access.
    You simply need much more TLB entries.

    With linear access it is likely that continuous physical memory is
    mapped wich does not require additional TLB entries regardless of the
    page size.

    ARM64 has a 'contiguous' hint bit in the translation table entry that
    supports coalescing multiple consecutively addressed TLB entries
    into a single entry when the OA (output addresses) are contiguous.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)