• Re: [gentoo-dev] EGO_SUM

    From William Hubbs@21:1/5 to Florian Schmaus on Thu Jun 1 22:00:01 2023
    I know I'm pretty late to this thread, but I'm going to respond to some
    of the concerns and suggest another alternative.

    On Mon, Apr 17, 2023 at 09:37:32AM +0200, Florian Schmaus wrote:
    I want to continue the discussion to re-instate EGO_SUM, potentially
    leading to a democratic vote on whether EGO_SUM should be re-instated or deprecated.

    For the past months, I tried to find *technical reasons*, e.g., reasons
    that affect end-users, that justify the deprecation of EGO_SUM. However,
    I was unable to find any. The closest thing I could find was portage
    being unable to process an ebuild due to its large environment (bug
    830187). However, as this happens while developing an ebuild, it should never affect users. Obviously this is a situation where EGO_SUM can not
    be used. Fortunately, it does not affect most Go packages, as seen in my previous analysis of Go packages in ::gentoo and their EGO_SUM size. Furthermore, newer portage versions, with USE=gentoo-dev, will
    proactively warn you if the environment caused by the ebuild becomes large.

    All further arguments for the deprecation of EGO_SUM where of cosmetic nature.

    However, the deprecation of EGO_SUM is harmful to Gentoo and its users.
    To briefly re-iterate the reasons:

    The EGO_SUM alternatives
    - do not have the same level of trust and therefore have a negative
    impact on security (a dubious tarball someone put somewhere, especially
    when proxy-maint)

    For this, I would argue that vetting the tarball falls to the developer
    who is proxying. If you don't trust the proxy maintainer you
    are pushing for, it is easy to make a dependency tarball yourself and
    add it to your dev space.

    - are not easily verifiable

    I don't have a response to this other than to say that go does its
    own verification of modules with the dependency tarballs that it can't
    do with vendor tarballs.

    - require additional effort when developing ebuilds

    This "additional effort" is pretty subjective. Making a dependency tarball isn't a lot of work, especially with the script that I posted in this thread.

    - hinder the packaging and Gentoo's adoption of Go-based projects, which
    is worrisome as Go is very popular

    I don't have a response here. I don't see it as much of a henderance
    (this is obviously subjective).

    - prevent Go modules from being shared as DISTFILES on the mirrors
    across various packages

    The issue here is really the duplicate data in the dependency or vendor
    tarballs, and yes, there is a lot of it.

    Last but not least, we have the same situation in the Rust ecosystem,
    but we allow the EGO_SUM "equivalent" there.

    I'm not sure it is quite the same because Rust projects tend to have
    much smaller numbers of dependencies.


    Another thing to consider is that using EGO_SUM adds a significant
    amount of processing to the go-module eclass.
    I was advised recently that this isn't a good idea since bash is
    slow, so I am considering moving most of that processing into
    get-ego-vendor by having it generate the contents of SRC_URI directly
    instead of using the eclass code to do that.

    My thought is to have get-ego-vendor output the value for a variable, GO_SRC_URI and add that to SRC_URI in the ebuild like so:

    # The output from get-ego-vendor:
    GO_SRC_URI="
    # dependency 1
    # dependency 2
    "

    SRC_URI="https://main-project-here
    ${GO_SRC_URI}"

    This should speed things up some since most of the processing we are
    doing in the eclass would be removed, so I would rather not see the council force the use of EGO_SUM. This, however, is still going to hit the
    limitation of bug 830187.

    I am, however, open to another solution, so I will keep following this
    thread.

    I think the better question should be around what we can do to get bug 721088 or
    bug 833567 to move forward.

    Thanks,

    William


    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQTVeuxEZo4uUHOkQAluVBb0MMRlOAUCZHj3jQAKCRBuVBb0MMRl ONSpAJ42dr9iXaW3reiFJBjki0tjl5VETACcCwcRhzVTpNUrXTZOVtxIF9w+MFc=
    =HuiG
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joonas Niilola@21:1/5 to William Hubbs on Fri Jun 2 09:20:01 2023
    To: williamh@gentoo.org

    This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------ezMbfUCzbFt2MNFkID2bSPvG
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: quoted-printable

    On 1.6.2023 22.55, William Hubbs wrote:

    The EGO_SUM alternatives
    - do not have the same level of trust and therefore have a negative
    impact on security (a dubious tarball someone put somewhere, especially
    when proxy-maint)

    For this, I would argue that vetting the tarball falls to the developer
    who is proxying. If you don't trust the proxy maintainer you
    are pushing for, it is easy to make a dependency tarball yourself and
    add it to your dev space.


    - require additional effort when developing ebuilds

    This "additional effort" is pretty subjective. Making a dependency tarball isn't a lot of work, especially with the script that I posted in this thread.


    In theory it's "easy", but in practice how'd you work? This would be
    fine when a single developer is proxying a single maintainer, but when a
    a stack of devs (project) are proxying hundreds of different people, it
    becomes messy and unsustainable rather fast.

    I do want to point out that any proxied maintainer can and should upload
    the vendor tarballs to their own Github / Gitlab distfile-repos for the
    time being, but allowing EGO_SUM to be used again would be the easiest
    solution here in my opinion for everyone involved. I'm aware it's pushed
    back due to technicalities.

    -- juippis

    --------------ezMbfUCzbFt2MNFkID2bSPvG--

    -----BEGIN PGP SIGNATURE-----

    iQGTBAEBCgB9FiEEltRJ9L6XRmDQCngHc4OUK43AaWIFAmR5lrNfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk2 RDQ0OUY0QkU5NzQ2NjBEMDBBNzgwNzczODM5NDJCOERDMDY5NjIACgkQc4OUK43A aWKI0Qf/Qq+BZUm8/ggPQLP+PHFFz0/IFE3rqKjDOiX1TUbMouRw43OEtq9wTK28 omWZ/TmiNC1yizJyuzZUk1tMHpAR+JgvtoUm408jqkCBTkfKK9w1ChXpyjDQqyVc gREQneQytJQHn4yX5zjq7IEphxaFSj8j283VtPpGibh5EmUJNUtigkTira6fEvKe c8z62sR+jsycLR+VuCnAsHj3OWb2IERI4kENLhjySCoZwr2cnAI3QV6wqVReKB/r cJa77CUCN0j8ZJoXEydPtFLwUvOG3WBuZI1e3NA36+PtFs7deO2EN3zO0Mttea+Q Ok29QTvxnO3g4c4f383f8urP/pJZ6g==
    =XhQj
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From William Hubbs@21:1/5 to Joonas Niilola on Fri Jun 2 20:10:01 2023
    On Fri, Jun 02, 2023 at 10:13:55AM +0300, Joonas Niilola wrote:
    On 1.6.2023 22.55, William Hubbs wrote:

    The EGO_SUM alternatives
    - do not have the same level of trust and therefore have a negative
    impact on security (a dubious tarball someone put somewhere, especially >> when proxy-maint)

    For this, I would argue that vetting the tarball falls to the developer
    who is proxying. If you don't trust the proxy maintainer you
    are pushing for, it is easy to make a dependency tarball yourself and
    add it to your dev space.


    - require additional effort when developing ebuilds

    This "additional effort" is pretty subjective. Making a dependency tarball isn't a lot of work, especially with the script that I posted in this thread.


    In theory it's "easy", but in practice how'd you work? This would be
    fine when a single developer is proxying a single maintainer, but when a
    a stack of devs (project) are proxying hundreds of different people, it becomes messy and unsustainable rather fast.

    This comment is completely off topic for this thread, so start another
    thread for it if you want, but if hundreds of people are being proxied
    by proxy-maint, that seems to be a concern unrelated to this. It seems
    the fix for that is to advocate for some of these hundreds of people to
    become developers so they don't have to be proxied any more.

    I do want to point out that any proxied maintainer can and should upload
    the vendor tarballs to their own Github / Gitlab distfile-repos for the
    time being, but allowing EGO_SUM to be used again would be the easiest solution here in my opinion for everyone involved. I'm aware it's pushed
    back due to technicalities.

    Like I said at another point in the thread, I want to get rid of EGO_SUM
    by moving most of the processing for it out of the eclass. I'm looking
    into that now. This will still run into the same problem as EGO_SUM if
    $A is still exported, but it should speed up ebuild processing.

    William

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQTVeuxEZo4uUHOkQAluVBb0MMRlOAUCZHovuAAKCRBuVBb0MMRl OHEOAKC6X47uPmFD5bpT7xDsBHTeqfhheACcDR2RrnTB4EyrDR3yhahTIxHRFC4=
    =JKcd
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joonas Niilola@21:1/5 to William Hubbs on Fri Jun 2 20:50:01 2023
    To: williamh@gentoo.org

    This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --------------qf0SD6mDO9A00eREOivh7J8Z
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: quoted-printable

    On 2.6.2023 21.06, William Hubbs wrote:

    In theory it's "easy", but in practice how'd you work? This would be
    fine when a single developer is proxying a single maintainer, but when a
    a stack of devs (project) are proxying hundreds of different people, it
    becomes messy and unsustainable rather fast.

    This comment is completely off topic for this thread, so start another
    thread for it if you want, but if hundreds of people are being proxied
    by proxy-maint, that seems to be a concern unrelated to this. It seems
    the fix for that is to advocate for some of these hundreds of people to
    become developers so they don't have to be proxied any more.


    How is it offtopic when I'm answering concerns you raised?

    Imagine there are tens of people who do 4 commits a year, roughly, to
    bump random go packages. What do you believe is the time investment for reviewing, testing and committing their contributions, vs. mentoring
    them to become devs if they don't involve themselves much outside
    bumping these packages? Also, will _you_ volunteer to mentor them?

    It's so easy to push more work for others to do. Sorry if I come out
    harsh but this is reality, not just theory.

    -- juippis

    --------------qf0SD6mDO9A00eREOivh7J8Z--

    -----BEGIN PGP SIGNATURE-----

    iQGSBAEBCgB9FiEEltRJ9L6XRmDQCngHc4OUK43AaWIFAmR6OCtfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk2 RDQ0OUY0QkU5NzQ2NjBEMDBBNzgwNzczODM5NDJCOERDMDY5NjIACgkQc4OUK43A aWJTxAf4l1+ZQBRxUg+8oUvqfBFecpbMAUc5P9+Xwta13hyvyG/nsVnDjBVI8+e0 hSBB5hnSqGryeQ6tcYmYFW4RRcYmfAY6qWIqpoYT1nEIh862NldZKINOAuofDRl0 Z29K0yFddM/l2z5W2PWNAC7I8zSUSbz6m3e+7sBp2Uzi4t2XNaqnwRGLCryLFKdD 4dsbDl/ns16lgRN448M6f0EDdduRPkoPo+Supf+DwbivloALk6hqIO2rIzDMzocI lZ84l9GlOacr8eySxFM7tYREGQDJeRbbOETpNlA74Zqn6L1TfgpANWdYTlc8ferb 7vZ7GrQr8a30PAnCYPrAWfbF90+i
    =FRit
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Florian Schmaus@21:1/5 to William Hubbs on Fri Jun 9 12:10:02 2023
    On 01/06/2023 21.55, William Hubbs wrote:
    The EGO_SUM alternatives
    - do not have the same level of trust and therefore have a negative
    impact on security (a dubious tarball someone put somewhere, especially
    when proxy-maint)

    For this, I would argue that vetting the tarball falls to the developer
    who is proxying. If you don't trust the proxy maintainer you
    are pushing for, it is easy to make a dependency tarball yourself and
    add it to your dev space.

    - are not easily verifiable

    I don't have a response to this other than to say that go does its
    own verification of modules with the dependency tarballs that it can't
    do with vendor tarballs.

    Yes, go has "go mod verify", which was added to the go-mod eclass after
    I asked on 2022-10-21 in #gentoo-dev if the eclass verifies the
    dependency tarball. robbat2 was so kind to provide a proof of concept of
    the security issue I was pointing out, which is available under https://gist.github.com/robbat2/82f4c208b6674e707081eda689096d55. This demonstration of the issue triggered https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=733b4944c1a061269f96219cc96530f89d8f439e,
    which made the go-module.eclass run "go mod verify".

    Unfortunately, a malicious contributor can trivially sidestep this
    verification step, rendering it ineffective. First, neither portage [1]
    nor PMS require that a later (source) archive can not override an
    existing file. This looseness allows, for example, the (non-upstream) dependency tarball, to override (upstream's) go.sum. Secondly, a
    dependency tarball could create the vendor/ directory, preventing the
    condition under which the go-module.eclass runs "go mod verify". Both approaches allow the dependency tarball to inject malicious code. With
    the first approach, "go mod verify" completes successfully; with the
    second, "go mod verify" is simply not invoked.

    The verification, as is, is ineffective.


    Last but not least, we have the same situation in the Rust ecosystem,
    but we allow the EGO_SUM "equivalent" there.

    I'm not sure it is quite the same because Rust projects tend to have
    much smaller numbers of dependencies.

    I am curious to know of any specific reason why Rust projects generally
    get by with fewer dependencies. This impression may be deceiving, caused
    by the fact that the Go-lang ecosystem hosts several projects with a
    more significant number of dependencies. If you look at the analysis
    [2], you find that under the top 10 Go packages by EGO_SUM entry count
    are cri-o, prometheus, k3s, and k3d, among others. If someone rewrites
    any of those in Rust, they would probably end up with the same number of dependencies.


    Another thing to consider is that using EGO_SUM adds a significant
    amount of processing to the go-module eclass.
    I was advised recently that this isn't a good idea since bash is
    slow, so I am considering moving most of that processing into
    get-ego-vendor by having it generate the contents of SRC_URI directly
    instead of using the eclass code to do that.

    Was this analyzed and quantified? Is this hurting us? The cache
    regeneration of an ebuild tree is an embarrassingly parallel operation,
    so this would need to be exponentially complex [3] to be of any
    significance.

    It may be possible to tune the existing EGO_SUM handling. We should keep EGO_SUM if viable, as it directly maps Go's go.sum and makes developing
    Go-lang ebuilds as frictionless as possible.

    - Flow



    1: https://github.com/gentoo/portage/pull/1030
    2: https://dev.gentoo.org/~flow/gentoo-tree-analysis-results/2023-05-17T100838-gentoo-at-2022-02-16-60dc7a03ff2f/post-processed-ego-sum.txt
    3: something similar to what was recently found in the latex ebuilds,
    see https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6ee282f0645dcfccf1836b9cc7ae55556629eb8b

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ulrich Mueller@21:1/5 to All on Mon Jul 3 13:20:01 2023
    On Mon, 03 Jul 2023, Florian Schmaus wrote:

    So pkgcheck counting EGO_SUM entries would be sufficient for the
    purpose of having a static check that notices if the ebuild would
    likely run into the environment limit?

    To find a common compromise, I would possibly invest my time in
    developing such a test. Even though I do not deem such a check a
    strict prerequisite to reintroduce EGO_SUM.

    The so-called "environment limit" is 32 pages, i.e. normally 128 KiB.
    With the A variable anywhere near this, the size of the Manifest file
    would be close to 1 MiB.

    IMHO this is way too large to be used on a regular basis. I am aware
    that we have some packages with large Manifests (71 packages above
    50 KiB, 6 packages above 200 KiB, out of 18812 packages in total),
    but these should really remain the exception.

    Ulrich

    -----BEGIN PGP SIGNATURE-----

    iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmSirTgPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4uK6oH/jSa3k73Ci9r7DLDCPsXY5SetRIGL9m5qSjl jsUtbR5+Dc0uBubT+wEbuHFVySI+jHl174FNxuRwMYofwuS1wbHeLl+pa1EjKkOO 0kpXhTpeuXvc+OZF33Ck9qIyG/oDhGwfSDrvflPfiFCsGbbZRulnvcNc/Bz4YoiN 6az9piSvr8INddThFC2/K2IausHiSN6YIC1PVkZjIEElY9eGOOG5zKQ+IoaIr98a XsnlPuu4fH2DigB0w0ZW4H3JMyE+YysOqyk3TBYYiOjbIxYy7QKV3KQh2BXg8z1d E2qLFvfhw9yK9j0C8jDQI3903fRIWhkUgg+GBt8yGp8ay0TBTjA=
    =tGPc
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)