• Kernel features and Cloud (and GCE)

    From Andrew Jorgensen@21:1/5 to All on Wed May 22 17:10:02 2024
    Hi everyone!

    The Debian images in Google Compute Engine use the Debian cloud
    kernel. This has been working well for us, because it includes the
    VirtIO, NVMe, and gVNIC drivers that are needed for most GCE machine
    types. As we move forward, additional kernel features are needed to
    support all features of current and future machine types.

    For example, we’re going to make an Intel 6300ESB watchdog device
    available, and that needs a driver that’s been in Linux a long time
    but isn’t enabled in the cloud kernel. For that one, another Debian
    user +1’d the request because it would benefit users of other
    KVM-based clouds (including private clouds). We can enumerate other
    examples, but many of those also require backports or a future Debian
    release.

    Recently in response to another feature request for the cloud kernel,
    Noah Meyerhans mentioned that, “historically the cloud kernels have specifically targeted Amazon EC2 and Microsoft Azure.”

    So we have the problem that the Debian cloud kernel supports some, but
    not all, of the devices our shared users need, and we’re not sure of
    the right way to solve that. We wondered if we should switch the
    images to the generic kernel, or if there’s a way we could help the
    cloud kernel support more clouds, or if there’s a better solution we haven’t thought of.

    What’s the best way to solve some of these problems?

    Kind regards,
    Andrew Jorgensen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Emanuele Rocca@21:1/5 to Andrew Jorgensen on Mon May 27 15:40:01 2024
    Hey Andrew,

    On 2024-05-22 07:44, Andrew Jorgensen wrote:
    For example, we’re going to make an Intel 6300ESB watchdog device available, and that needs a driver that’s been in Linux a long time
    but isn’t enabled in the cloud kernel. For that one, another Debian
    user +1’d the request because it would benefit users of other
    KVM-based clouds (including private clouds). We can enumerate other
    examples, but many of those also require backports or a future Debian release.

    In general the procedure to get a new module enabled in a future Debian
    release is:

    - open a bug report just like you have done for I6300ESB_WDT [1]
    - open a corresponding merge request on Salsa, Vincent did that for
    I6300ESB_WDT already [2]

    Which files to edit and how to build a kernel with the module(s) on may
    not be immediately obvious. As I was learning how to do it myself, I
    posted some notes on [3]. Perhaps we could add a section to the Debian
    Kernel Handbook now that I think of it.

    The final step is waiting for someone to review and merge your changes,
    with a gentle ping on irc (#debian-kernel on oftc) in case nothing
    happens for a while. :-)

    Perhaps someone else may comment for backports, I imagine the steps to
    follow may be similar but don't know for sure.

    So we have the problem that the Debian cloud kernel supports some, but
    not all, of the devices our shared users need, and we’re not sure of
    the right way to solve that. We wondered if we should switch the
    images to the generic kernel, or if there’s a way we could help the
    cloud kernel support more clouds, or if there’s a better solution we haven’t thought of.

    I think the best approach is enabling the needed modules one by one in
    the cloud image following the procedure above.

    Emanuele

    [1] https://bugs.debian.org/1067908
    [2] https://salsa.debian.org/kernel-team/linux/-/merge_requests/1059
    [3] https://www.linux.it/~ema/posts/enabling-kernel-settings-in-debian/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Noah Meyerhans@21:1/5 to Emanuele Rocca on Mon May 27 17:50:01 2024
    On Mon, May 27, 2024 at 03:37:08PM +0200, Emanuele Rocca wrote:
    So we have the problem that the Debian cloud kernel supports some, but
    not all, of the devices our shared users need, and we’re not sure of
    the right way to solve that. We wondered if we should switch the
    images to the generic kernel, or if there’s a way we could help the
    cloud kernel support more clouds, or if there’s a better solution we haven’t thought of.

    I think the best approach is enabling the needed modules one by one in
    the cloud image following the procedure above.

    Andrew's question is a bit higher level than that, and mostly boils down
    to "Which cloud environments do we actually want to support with the
    cloud kernel?"

    We have declined requests to enable modules in the cloud kernel in the
    past, referring people to the standard kernel instead (see e.g.
    #969140). See also the previous discussion at #https://lists.debian.org/debian-kernel/2020/04/msg00006.html

    We have not, as far as I can recall, ever explicitly stated a policy
    around this, nor have we documented what it would take for us to support
    more fine-grained kernel builds (i.e. what stops us from generating a
    kernel image targeting *only* GCP).

    noah

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Andrew Jorgensen on Mon May 27 21:10:01 2024
    Hi Andrew

    On Wed, May 22, 2024 at 07:44:33AM -0700, Andrew Jorgensen wrote:
    The Debian images in Google Compute Engine use the Debian cloud
    kernel. This has been working well for us, because it includes the
    VirtIO, NVMe, and gVNIC drivers that are needed for most GCE machine
    types. As we move forward, additional kernel features are needed to
    support all features of current and future machine types.

    For example, we’re going to make an Intel 6300ESB watchdog device available, and that needs a driver that’s been in Linux a long time
    but isn’t enabled in the cloud kernel. For that one, another Debian
    user +1’d the request because it would benefit users of other
    KVM-based clouds (including private clouds). We can enumerate other
    examples, but many of those also require backports or a future Debian release.

    We already backport the Microsoft MANA network driver. So at least
    backports to stable are not that of a problem, if someone does it.
    Backports to oldstable are most likely not happening, as this target is
    too far off.

    Recently in response to another feature request for the cloud kernel,
    Noah Meyerhans mentioned that, “historically the cloud kernels have specifically targeted Amazon EC2 and Microsoft Azure.”

    Yes, this is the documented target. We did never properly add GCP,
    because no communication happened. I think we can do that, if someone
    does a due diligence and knows the documentation better then we.

    So we have the problem that the Debian cloud kernel supports some, but
    not all, of the devices our shared users need, and we’re not sure of
    the right way to solve that. We wondered if we should switch the
    images to the generic kernel, or if there’s a way we could help the
    cloud kernel support more clouds, or if there’s a better solution we haven’t thought of.

    We can support more clouds. It is just a matter of taking care of it.

    I currently play with splitting the modules into multiple different
    sets, like almost all other distributions already do. We would not need
    to do multiple builds then and more targeted packages would be possible
    if needed.

    Regards,
    Bastian

    --
    Conquest is easy. Control is not.
    -- Kirk, "Mirror, Mirror", stardate unknown

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to Andrew Jorgensen on Tue May 28 17:20:01 2024
    On May 22, Andrew Jorgensen <ajorgens@google.com> wrote:

    For example, we’re going to make an Intel 6300ESB watchdog device available, and that needs a driver that’s been in Linux a long time
    but isn’t enabled in the cloud kernel. For that one, another Debian
    user +1’d the request because it would benefit users of other
    KVM-based clouds (including private clouds). We can enumerate other
    examples, but many of those also require backports or a future Debian release.
    I would like too to see the 6300ESB driver enabled for the could kernel, because it is the watchdog implemented by the modern KVM Q35 machine
    type.
    So this is not just about supporting GCE.

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZlX0nAAKCRDLPsM64d7X gRuAAP9EMFQNqwX6+9NBc9/aNNiQfEI41jiPtyjF/XINwxX+LQD/cxbb2/1l4kJA 6YyN3Buk/cGA6LktXx0q+B7v7NiyzAk=
    =784R
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrew Jorgensen@21:1/5 to All on Wed May 29 18:20:01 2024
    Andrew's question is a bit higher level than that, and mostly boils down
    to "Which cloud environments do we actually want to support with the
    cloud kernel?"

    That's right. And I'd like it to include GCE, but there are a lot of
    cloud environments out there so drawing a line somewhere can help keep
    the cloud kernel lean, which appears to be one of the goals.

    specifically targeted Amazon EC2 and Microsoft Azure.”
    Yes, this is the documented target.

    Where is it documented?
    https://wiki.debian.org/Cloud and https://wiki.debian.org/Cloud/SystemsComparison both imply at least
    AWS, Azure, GCP, and OpenStack, but I didn't find a document about the
    kernel specifically yet.
    I don't mean to disagree, but to understand.

    We can support more clouds. It is just a matter of taking care of it.

    I currently play with splitting the modules into multiple different
    sets, like almost all other distributions already do. We would not need
    to do multiple builds then and more targeted packages would be possible
    if needed.

    That could make sense. Some would not be split out, like virtio, I
    think, but others like gVNIC (gve) obviously could be.

    We already backport the Microsoft MANA network driver. So at least
    backports to stable are not that of a problem, if someone does it.

    Interesting. I assume there's probably some backporting of Amazon ENA
    (and EFA?) as well.

    To use GCP's current generation bare metal requires the Intel IDPF
    driver that landed in 6.7. Other OS vendors have backported it to
    various versions, so it ought to be possible to backport it to 6.1 for
    Debian. I'll talk to some people here about how we might do that.

    Kind regards,
    Andrew

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)