• Revival of lintian.d.o (sort of)

    From Lucas Nussbaum@21:1/5 to All on Sun Jul 3 16:00:01 2022
    TLDR: I have plans to get fresh archive-wide data about lintian results
    in UDD (and then to any service that wants to consume it), but it's
    still WIP


    Hi,

    Seeing that lintian got adopted, I got motivated into looking if I could
    help on the lintian.d.o side, that is, provide up-to-date archive-wide
    up to date to developers.

    Since the architecture of lintian.d.o seemed quite complicated, I
    instead decided to follow what worked for other UDD-based data importers
    (such as the one that scans for new upstream versions). So my plan is
    the following:
    - use a UDD postgresql table for data storage
    - use UDD to decide which packages need to be analyzed
    - coordinate the analysis from UDD, but do the analysis itself on a
    third-party 'worker' machine (since the process is quite CPU intensive)
    - provide visualisation directly on https://udd.debian.org (similar
    to https://udd.debian.org/dmd/ or https://udd.debian.org/bugs/)
    - work with data consumers on how to best export the data from UDD to
    them

    I know it feels a bit like NIH, but I believe the simpler design will
    help in the long term...

    The current status is:
    - there's now a lintian_results table in UDD
    - there's a new lintian importer that coordinates the analysis
    - the lintian_results is currently being populated (~6400 source packages
    processed at the moment, ~29000 remaining -- I expect the initial
    analysis to be over in about 3 days)

    What remains to be done:
    - bugfixing?
    - work on the visualisation part. There was a UDD CGI that did this
    using the old data (from lintian.debian.org) at
    https://udd.debian.org/lintian/ that could serve as a basis
    - talk to DSA about migrating the "worker" VM to Debian infra (it's just
    a dumb VM, so it should not be an issue)
    - talk to lintian data consumers
    - see what we want to do about lintian.d.o

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Lucas Nussbaum on Mon Jul 4 04:30:01 2022
    On Sun, 2022-07-03 at 15:51 +0200, Lucas Nussbaum wrote:

    TLDR: I have plans to get fresh archive-wide data about lintian results
    in UDD (and then to any service that wants to consume it), but it's
    still WIP

    This mail reminds me somewhat of buxy's debusine proposal/work.

    - coordinate the analysis from UDD, but do the analysis itself on a
      third-party 'worker' machine (since the process is quite CPU intensive)

    Is the analysis something that could be farmed out to multiple machines
    for parallel processing using Debian AWS credits or Grid5000? I expect
    that would be useful for updating tags when lintian gets updated, or
    for the lintian maintainers or other folks to do experimental checks.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmLCTxcACgkQMRa6Xp/6 aaOUTBAAmMByn7z5ZXfUvrQxtwW7IdfdVkH8meOKZZ0PBNnHol0z8TGAxSwXG6MN bV9ONRJ3KX3i4FhrsjWnySg13dLOR8xUniE8XSkdFFUdzSOU+nIM4lS5wwi8i6u4 506NwmpzogFnS8at5APicNDsCAyzV3h6gfjf/Xn+32Zh8wi0fogqls9ofYoSm8GB ESkmmNDHYSZRAoEymtOh5mFE1/0QeOmPC97BQA9rEyLCSD0sMaZgTYsiYG0tPVWO DqHCN6OY28IKKvdyIx1xxzQjzndbupL33zK8gqPdi8F+oz6I953UXoZ0xfStD5s8 W6Vg5MZ0tE765P7Fg+vRKuLUojRs328km19ZhtNyGlyjy/qgnS02iaGPlOCaJOA1 LP/g7OFumv0ptGCrDpMHfo2liOf8MBLs8/eFR0ZhFYug1uWnIgxOaMwI+Y8KnfKq Dm8yFCOS5h+ELLRQ/gGIvMmM+4iOIt1fhUpbHda0G6/NDvUHdpbp9vayZvwn88xj nr7hO9+6H94s5hg5OYSEc9loZnDuQ49OpO6YXRJmFU31z1nJ/UycCGkykMhzNUzQ Uo78HbqNO0Ti2D2bc4+WD+HdelZWPXRU/dZEN4NsqhbhKGd1OPpcT79HsD0f7YAn qPI8ATzYUqFkXw5zAp1cKwT9FJk/aGobAr6CZSGK9lK6g1OU3x4=
    =tjOJ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas Nussbaum@21:1/5 to Paul Wise on Mon Jul 4 12:00:04 2022
    Hi Paul,

    On 04/07/22 at 10:23 +0800, Paul Wise wrote:
    On Sun, 2022-07-03 at 15:51 +0200, Lucas Nussbaum wrote:

    TLDR: I have plans to get fresh archive-wide data about lintian results
    in UDD (and then to any service that wants to consume it), but it's
    still WIP

    This mail reminds me somewhat of buxy's debusine proposal/work.

    - coordinate the analysis from UDD, but do the analysis itself on a
    third-party 'worker' machine (since the process is quite CPU intensive)

    Is the analysis something that could be farmed out to multiple machines
    for parallel processing using Debian AWS credits or Grid5000? I expect
    that would be useful for updating tags when lintian gets updated, or
    for the lintian maintainers or other folks to do experimental checks.

    Yes, and the design for the importer already allows that, but I'd like
    to figure out how much time it takes to do a full update before adding
    other workers.

    Lucas

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEE/t7ByzN7z1CfQ8IkORS1MvTfvpkFAmLCsYoACgkQORS1MvTf vpmLpQ/+MvPlNzRaKxk3eBCcnWKhEVPus5m+F5j8XzGCT3hkeQj/ieXM/sHrfCX2 ZDFyx40PtB9c9b2yhT2RPSdDlVQYlvHdyafzv58sSepZ2fJnKCwQb2FLsWMRgyza MRe37Of7S72ClLuDvob6YzNKK1ZXH5ltFd/39obe3xD9QLtzRqME6YtVo/5QsiWL aFyrkr2+I5e0VKIoH2DhvzjASMgXE6MG6W8vYsHUzubPFkADyOrAKNY5AyPyAUVF ea7MAPk94/dt413/i9CYaYjS0tRi58SfdFmIn8T/4wVNWGPK6+mDQ2X6lItXVUuM Y3Mt4qN0XLGW76MRk3CTcIRTbqfgNxK7OtporL2cUIKzAkxl2SqRu+qYRF3mWtYz 3kkjuoKfKhikxCalJ3b11eypa2cNS/lf+qx6q8n1PZe8HeWUf/qL3Lk1iDaqb8vD Q58XYKBM4OBQhnPy6Ihy3zcgaWgcVM5VRPHLP+rqfBCaNR9NxKClllTle7tR2qg2 UPTXX7DCDQVKyAtMinNFNl7h2KSwroW4Auf3nnmXCPpTY9nvFAbewA3iVUz6Cvyg xxbGE8yzjhcQZXYGsOjhjy1JAVT/Q+YktodY4279iwJBAYdGQobhRl1qu9ztSgbx iFGOEqiFK21Q/mi/LvAjx2OHm4HTNIR5Ww0FxEg43gx6D4HWABo=
    =0JhD
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas Nussbaum@21:1/5 to Lucas Nussbaum on Wed Jul 6 15:00:01 2022
    On 03/07/22 at 15:51 +0200, Lucas Nussbaum wrote:
    TLDR: I have plans to get fresh archive-wide data about lintian results
    in UDD (and then to any service that wants to consume it), but it's
    still WIP

    Update on this:
    - the initial scan is still running, and should finish by friday morning
    (~ 12000 source packages remaining)
    - https://udd.debian.org/lintian/ now uses the data from the new importer
    - the UDD lintian_results_agg view is probably what you should look at
    if you want to access the raw data.

    Once the initial scan is finished, I will update the worker node to the
    latest lintian version from backports.

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas Nussbaum@21:1/5 to Lucas Nussbaum on Wed Jul 13 21:00:01 2022
    Hi,

    On 06/07/22 at 14:47 +0200, Lucas Nussbaum wrote:
    On 03/07/22 at 15:51 +0200, Lucas Nussbaum wrote:
    TLDR: I have plans to get fresh archive-wide data about lintian results
    in UDD (and then to any service that wants to consume it), but it's
    still WIP

    Update on this:
    - the initial scan is still running, and should finish by friday morning
    (~ 12000 source packages remaining)
    - https://udd.debian.org/lintian/ now uses the data from the new importer
    - the UDD lintian_results_agg view is probably what you should look at
    if you want to access the raw data.

    Once the initial scan is finished, I will update the worker node to the latest lintian version from backports.

    Another update:
    - UDD now has up to date data for the whole archive and a lintian git
    snapshot (2.115.2+git20220708220500)
    - the importer runs on a regular basis (every 6 hours)
    - the lintian UDD table is now a materialized view for compatibility

    What remains to be done:
    - DSA topics (see
    https://lists.debian.org/debian-qa/2022/07/msg00006.html ) :
    + move the lintian worker to Debian infra instead of AWS
    + understand what we want to do with lintian.d.o
    - talk to tracker.d.o and https://qa.debian.org/developer.php
    maintainers about switching to this source of data
    - maybe improve the web interface at https://udd.debian.org/lintian/
    (however if you are doing something complex, SQL is probably a better
    tool)

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Raphael Hertzog@21:1/5 to Lucas Nussbaum on Sat Jul 30 14:50:01 2022
    Hello Lucas,

    On Sun, 03 Jul 2022, Lucas Nussbaum wrote:
    Seeing that lintian got adopted, I got motivated into looking if I could
    help on the lintian.d.o side, that is, provide up-to-date archive-wide
    up to date to developers.

    Thanks for working on this!

    Since the architecture of lintian.d.o seemed quite complicated, I
    instead decided to follow what worked for other UDD-based data importers (such as the one that scans for new upstream versions). So my plan is
    the following:
    - use a UDD postgresql table for data storage
    - use UDD to decide which packages need to be analyzed
    - coordinate the analysis from UDD, but do the analysis itself on a
    third-party 'worker' machine (since the process is quite CPU intensive)
    - provide visualisation directly on https://udd.debian.org (similar
    to https://udd.debian.org/dmd/ or https://udd.debian.org/bugs/)
    - work with data consumers on how to best export the data from UDD to
    them

    I know it feels a bit like NIH, but I believe the simpler design will
    help in the long term...

    At least it helps to have something running in the short term. But there
    are really parts that I'd like to move to something more standardized
    that we can use for multiple tasks in the context of Debian.

    Paul already hinted at it but debusine is clearly meant to help with:
    - scheduling tasks
    - running tasks on multiple workers

    https://salsa.debian.org/freexian-team/debusine/

    At this point, debusine is not doing much yet but I hope that we have laid
    out some good initial architecture to be able to share the workload on
    multiple workers. Currently the only "Task" that it knows how to run
    is "sbuild" and we don't have any data storage yet (i.e. the generated artifacts are not stored anywhere, it's up to the caller to pass some --post-build-commands to make something with the result).

    Storing data is the next milestone that we will work on.

    Maybe it's a bit early to try to use it for your use case, but if you want
    to give it a try, you are more than welcome to. Feel free to open tickets
    and ask questions too.

    Cheers,
    --
    ⢀⣴⠾⠻⢶⣦⠀ Raphaël Hertzog <hertzog@debian.org>
    ⣾⠁⢠⠒⠀⣿⡁
    ⢿⡄⠘⠷⠚⠋ The Debian Handbook: https://debian-handbook.info/get/
    ⠈⠳⣄⠀⠀⠀⠀ Debian Long Term Support: https://deb.li/LTS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)