• Re: VAX pages

    From John Levine@21:1/5 to All on Fri Aug 15 16:53:29 2025
    According to Scott Lurndal <slp53@pacbell.net>:
    Section 2.7 also describes a 128-entry TLB. The TLB is claimed to
    have "typically 97% hit rate". I would go for larger pages, which
    would reduce the TLB miss rate.

    I think that in 1979 VAX 512 bytes page was close to optimal. ...
    One must also consider that the disks in that era were
    fairly small, and 512 bytes was a common sector size.

    Convenient for both swapping and loading program text
    without wasting space on the disk by clustering
    pages in groups of 2, 4 or 8.

    That's probably it but even at the time the pages seemed rather small.
    Pages on the PDP-10 were 512 words which was about 2K bytes.
    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to BGB on Fri Aug 15 12:03:41 2025
    On 8/15/2025 11:19 AM, BGB wrote:
    On 8/15/2025 11:53 AM, John Levine wrote:
    According to Scott Lurndal <slp53@pacbell.net>:
    Section 2.7 also describes a 128-entry TLB.  The TLB is claimed to
    have "typically 97% hit rate".  I would go for larger pages, which
    would reduce the TLB miss rate.

    I think that in 1979 VAX 512 bytes page was close to optimal. ...
    One must also consider that the disks in that era were
    fairly small, and 512 bytes was a common sector size.

    Convenient for both swapping and loading program text
    without wasting space on the disk by clustering
    pages in groups of 2, 4 or 8.

    That's probably it but even at the time the pages seemed rather small.
    Pages on the PDP-10 were 512 words which was about 2K bytes.

    Yeah.


    Can note in some of my own testing, I tested various page sizes, and seemingly found a local optimum at around 16K.

    I think that is consistent with what some others have found. I suspect
    the average page size should grow as memory gets cheaper, which leads to
    more memory on average in systems. This also leads to larger programs,
    as they can "fit" in larger memory with less paging. And as disk
    (spinning or SSD) get faster transfer rates, the cost (in time) of
    paging a larger page goes down. While 4K was the sweet spot some
    decades ago, I think it has increased, probably to 16K. At some point
    in the future, it may get to 64K, but not for some years yet.


    Where, going from 4K or 8K to 16K sees a reduction in TLB miss rates,
    but 16K to 32K or 64K did not see any significant reduction; but did see
    a more significant increase in memory footprint due to allocation
    overheads (where, OTOH, going from 4K to 16K pages does not see much
    increase in memory footprint).

    Patterns seemed consistent across multiple programs tested, but harder
    to say if this pattern would be universal.


    Had noted if running stats on where in the pages memory accesses land:
      4K: Pages tend to be accessed fairly evenly
     16K: Minor variation as to what parts of the page are being used.
     64K: Significant variation between parts of the page.
    Basically, tracking per-page memory accesses on a finer grain boundary
    (eg, 512 bytes).

    Interesting.


    Say, for example, at 64K one part of the page may be being accessed
    readily but another part of the page isn't really being accessed at all
    (and increasing page size only really sees benefit for TLB miss rate so
    long as the whole page is "actually being used").

    Not necessarily. Consider the case of a 16K (or larger) page with two
    "hot spots" that are more than 4K apart. That takes 2 TLB slots with 4K
    pages, but only one with larger pages.


    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Stephen Fuld on Fri Aug 15 19:19:50 2025
    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> writes:
    On 8/15/2025 11:19 AM, BGB wrote:
    On 8/15/2025 11:53 AM, John Levine wrote:
    According to Scott Lurndal <slp53@pacbell.net>:
    Section 2.7 also describes a 128-entry TLB.  The TLB is claimed to >>>>>> have "typically 97% hit rate".  I would go for larger pages, which >>>>>> would reduce the TLB miss rate.

    I think that in 1979 VAX 512 bytes page was close to optimal. ...
    One must also consider that the disks in that era were
    fairly small, and 512 bytes was a common sector size.

    Convenient for both swapping and loading program text
    without wasting space on the disk by clustering
    pages in groups of 2, 4 or 8.

    That's probably it but even at the time the pages seemed rather small.
    Pages on the PDP-10 were 512 words which was about 2K bytes.

    Yeah.


    Can note in some of my own testing, I tested various page sizes, and
    seemingly found a local optimum at around 16K.

    I think that is consistent with what some others have found. I suspect
    the average page size should grow as memory gets cheaper, which leads to
    more memory on average in systems. This also leads to larger programs,
    as they can "fit" in larger memory with less paging. And as disk
    (spinning or SSD) get faster transfer rates, the cost (in time) of
    paging a larger page goes down. While 4K was the sweet spot some
    decades ago, I think it has increased, probably to 16K. At some point
    in the future, it may get to 64K, but not for some years yet.

    ARM64 (ARMv8) architecturally supports 4k, 16k and 64k. When
    ARMv8 first came out, one vendor (Redhat) shipped using 64k pages,
    while Ubuntu shipped with 4k page support. 16k support by the
    processor was optional (although the Neoverse cores support all
    three, some third-party cores developed before ARM added 16k
    pages to the architecture specification only supported 4k and 64k).


    Say, for example, at 64K one part of the page may be being accessed
    readily but another part of the page isn't really being accessed at all
    (and increasing page size only really sees benefit for TLB miss rate so
    long as the whole page is "actually being used").

    Not necessarily. Consider the case of a 16K (or larger) page with two
    "hot spots" that are more than 4K apart. That takes 2 TLB slots with 4K >pages, but only one with larger pages.

    Note that the ARMv8 architecture[*] supports terminating the table walk
    before reaching the smallest level, so with 4K pages[**], a single TLB
    entry can cover 4K, 2M or 1GB blocks. With 16k pages, a single
    TLB entry can cover 16k, 32MB or 64GB blocks. 64k pages support
    64k, 512M and 4TB block sizes.

    [*] Intel, AMD and others have similar "large page" capabilities.
    [**] Granules, in ARM terminology.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)