Forum: >>> Magnum BBS <<<

Revival of lintian.d.o (sort of)

From Lucas Nussbaum@21:1/5 to All on Sun Jul 3 16:00:01 2022

TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP

Hi,

Seeing that lintian got adopted, I got motivated into looking if I could
help on the lintian.d.o side, that is, provide up-to-date archive-wide
up to date to developers.

Since the architecture of lintian.d.o seemed quite complicated, I
instead decided to follow what worked for other UDD-based data importers
(such as the one that scans for new upstream versions). So my plan is
the following:
- use a UDD postgresql table for data storage
- use UDD to decide which packages need to be analyzed
- coordinate the analysis from UDD, but do the analysis itself on a
third-party 'worker' machine (since the process is quite CPU intensive)
- provide visualisation directly on https://udd.debian.org (similar
to https://udd.debian.org/dmd/ or https://udd.debian.org/bugs/)
- work with data consumers on how to best export the data from UDD to
them

I know it feels a bit like NIH, but I believe the simpler design will
help in the long term...

The current status is:
- there's now a lintian_results table in UDD
- there's a new lintian importer that coordinates the analysis
- the lintian_results is currently being populated (~6400 source packages
processed at the moment, ~29000 remaining -- I expect the initial
analysis to be over in about 3 days)

What remains to be done:
- bugfixing?
- work on the visualisation part. There was a UDD CGI that did this
using the old data (from lintian.debian.org) at
https://udd.debian.org/lintian/ that could serve as a basis
- talk to DSA about migrating the "worker" VM to Debian infra (it's just
a dumb VM, so it should not be an issue)
- talk to lintian data consumers
- see what we want to do about lintian.d.o

Lucas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Wise@21:1/5 to Lucas Nussbaum on Mon Jul 4 04:30:01 2022

On Sun, 2022-07-03 at 15:51 +0200, Lucas Nussbaum wrote:

TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP

This mail reminds me somewhat of buxy's debusine proposal/work.

- coordinate the analysis from UDD, but do the analysis itself on a
third-party 'worker' machine (since the process is quite CPU intensive)

Is the analysis something that could be farmed out to multiple machines
for parallel processing using Debian AWS credits or Grid5000? I expect
that would be useful for updating tags when lintian gets updated, or
for the lintian maintainers or other folks to do experimental checks.

--
bye,
pabs

https://wiki.debian.org/PaulWise

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmLCTxcACgkQMRa6Xp/6 aaOUTBAAmMByn7z5ZXfUvrQxtwW7IdfdVkH8meOKZZ0PBNnHol0z8TGAxSwXG6MN bV9ONRJ3KX3i4FhrsjWnySg13dLOR8xUniE8XSkdFFUdzSOU+nIM4lS5wwi8i6u4 506NwmpzogFnS8at5APicNDsCAyzV3h6gfjf/Xn+32Zh8wi0fogqls9ofYoSm8GB ESkmmNDHYSZRAoEymtOh5mFE1/0QeOmPC97BQA9rEyLCSD0sMaZgTYsiYG0tPVWO DqHCN6OY28IKKvdyIx1xxzQjzndbupL33zK8gqPdi8F+oz6I953UXoZ0xfStD5s8 W6Vg5MZ0tE765P7Fg+vRKuLUojRs328km19ZhtNyGlyjy/qgnS02iaGPlOCaJOA1 LP/g7OFumv0ptGCrDpMHfo2liOf8MBLs8/eFR0ZhFYug1uWnIgxOaMwI+Y8KnfKq Dm8yFCOS5h+ELLRQ/gGIvMmM+4iOIt1fhUpbHda0G6/NDvUHdpbp9vayZvwn88xj nr7hO9+6H94s5hg5OYSEc9loZnDuQ49OpO6YXRJmFU31z1nJ/UycCGkykMhzNUzQ Uo78HbqNO0Ti2D2bc4+WD+HdelZWPXRU/dZEN4NsqhbhKGd1OPpcT79HsD0f7YAn qPI8ATzYUqFkXw5zAp1cKwT9FJk/aGobAr6CZSGK9lK6g1OU3x4=
=tjOJ
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lucas Nussbaum@21:1/5 to Paul Wise on Mon Jul 4 12:00:04 2022

Hi Paul,

On 04/07/22 at 10:23 +0800, Paul Wise wrote:

On Sun, 2022-07-03 at 15:51 +0200, Lucas Nussbaum wrote:

TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP

This mail reminds me somewhat of buxy's debusine proposal/work.

- coordinate the analysis from UDD, but do the analysis itself on a
� third-party 'worker' machine (since the process is quite CPU intensive)

Is the analysis something that could be farmed out to multiple machines
for parallel processing using Debian AWS credits or Grid5000? I expect
that would be useful for updating tags when lintian gets updated, or
for the lintian maintainers or other folks to do experimental checks.

Yes, and the design for the importer already allows that, but I'd like
to figure out how much time it takes to do a full update before adding
other workers.

Lucas

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEE/t7ByzN7z1CfQ8IkORS1MvTfvpkFAmLCsYoACgkQORS1MvTf vpmLpQ/+MvPlNzRaKxk3eBCcnWKhEVPus5m+F5j8XzGCT3hkeQj/ieXM/sHrfCX2 ZDFyx40PtB9c9b2yhT2RPSdDlVQYlvHdyafzv58sSepZ2fJnKCwQb2FLsWMRgyza MRe37Of7S72ClLuDvob6YzNKK1ZXH5ltFd/39obe3xD9QLtzRqME6YtVo/5QsiWL aFyrkr2+I5e0VKIoH2DhvzjASMgXE6MG6W8vYsHUzubPFkADyOrAKNY5AyPyAUVF ea7MAPk94/dt413/i9CYaYjS0tRi58SfdFmIn8T/4wVNWGPK6+mDQ2X6lItXVUuM Y3Mt4qN0XLGW76MRk3CTcIRTbqfgNxK7OtporL2cUIKzAkxl2SqRu+qYRF3mWtYz 3kkjuoKfKhikxCalJ3b11eypa2cNS/lf+qx6q8n1PZe8HeWUf/qL3Lk1iDaqb8vD Q58XYKBM4OBQhnPy6Ihy3zcgaWgcVM5VRPHLP+rqfBCaNR9NxKClllTle7tR2qg2 UPTXX7DCDQVKyAtMinNFNl7h2KSwroW4Auf3nnmXCPpTY9nvFAbewA3iVUz6Cvyg xxbGE8yzjhcQZXYGsOjhjy1JAVT/Q+YktodY4279iwJBAYdGQobhRl1qu9ztSgbx iFGOEqiFK21Q/mi/LvAjx2OHm4HTNIR5Ww0FxEg43gx6D4HWABo=
=0JhD
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lucas Nussbaum@21:1/5 to Lucas Nussbaum on Wed Jul 6 15:00:01 2022

On 03/07/22 at 15:51 +0200, Lucas Nussbaum wrote:

TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP

Update on this:
- the initial scan is still running, and should finish by friday morning
(~ 12000 source packages remaining)
- https://udd.debian.org/lintian/ now uses the data from the new importer
- the UDD lintian_results_agg view is probably what you should look at
if you want to access the raw data.

Once the initial scan is finished, I will update the worker node to the
latest lintian version from backports.

Lucas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lucas Nussbaum@21:1/5 to Lucas Nussbaum on Wed Jul 13 21:00:01 2022

Hi,

On 06/07/22 at 14:47 +0200, Lucas Nussbaum wrote:

On 03/07/22 at 15:51 +0200, Lucas Nussbaum wrote:

TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP

Update on this:
- the initial scan is still running, and should finish by friday morning
(~ 12000 source packages remaining)
- https://udd.debian.org/lintian/ now uses the data from the new importer
- the UDD lintian_results_agg view is probably what you should look at
if you want to access the raw data.

Once the initial scan is finished, I will update the worker node to the latest lintian version from backports.

Another update:
- UDD now has up to date data for the whole archive and a lintian git
snapshot (2.115.2+git20220708220500)
- the importer runs on a regular basis (every 6 hours)
- the lintian UDD table is now a materialized view for compatibility

What remains to be done:
- DSA topics (see
https://lists.debian.org/debian-qa/2022/07/msg00006.html ) :
+ move the lintian worker to Debian infra instead of AWS
+ understand what we want to do with lintian.d.o
- talk to tracker.d.o and https://qa.debian.org/developer.php
maintainers about switching to this source of data
- maybe improve the web interface at https://udd.debian.org/lintian/
(however if you are doing something complex, SQL is probably a better
tool)

Lucas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Raphael Hertzog@21:1/5 to Lucas Nussbaum on Sat Jul 30 14:50:01 2022

Hello Lucas,

On Sun, 03 Jul 2022, Lucas Nussbaum wrote:

Seeing that lintian got adopted, I got motivated into looking if I could
help on the lintian.d.o side, that is, provide up-to-date archive-wide
up to date to developers.

Thanks for working on this!

Since the architecture of lintian.d.o seemed quite complicated, I
instead decided to follow what worked for other UDD-based data importers (such as the one that scans for new upstream versions). So my plan is
the following:
- use a UDD postgresql table for data storage
- use UDD to decide which packages need to be analyzed
- coordinate the analysis from UDD, but do the analysis itself on a
third-party 'worker' machine (since the process is quite CPU intensive)
- provide visualisation directly on https://udd.debian.org (similar
to https://udd.debian.org/dmd/ or https://udd.debian.org/bugs/)
- work with data consumers on how to best export the data from UDD to
them

I know it feels a bit like NIH, but I believe the simpler design will
help in the long term...

At least it helps to have something running in the short term. But there
are really parts that I'd like to move to something more standardized
that we can use for multiple tasks in the context of Debian.

Paul already hinted at it but debusine is clearly meant to help with:
- scheduling tasks
- running tasks on multiple workers

https://salsa.debian.org/freexian-team/debusine/

At this point, debusine is not doing much yet but I hope that we have laid
out some good initial architecture to be able to share the workload on
multiple workers. Currently the only "Task" that it knows how to run
is "sbuild" and we don't have any data storage yet (i.e. the generated artifacts are not stored anywhere, it's up to the caller to pass some --post-build-commands to make something with the result).

Storing data is the next milestone that we will work on.

Maybe it's a bit early to try to use it for your use case, but if you want
to give it a try, you are more than welcome to. Feel free to open tickets
and ask questions too.

Cheers,
--
⢀⣴⠾⠻⢶⣦⠀ Raphaël Hertzog <hertzog@debian.org>
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋ The Debian Handbook: https://debian-handbook.info/get/
⠈⠳⣄⠀⠀⠀⠀ Debian Long Term Support: https://deb.li/LTS

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	489
Nodes:	16 (2 / 14)
Uptime:	17:33:46
Calls:	9,665
Files:	13,712
Messages:	6,167,830

Revival of lintian.d.o (sort of)

Who's Online

System Info