TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP
- coordinate the analysis from UDD, but do the analysis itself on a
third-party 'worker' machine (since the process is quite CPU intensive)
On Sun, 2022-07-03 at 15:51 +0200, Lucas Nussbaum wrote:
TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP
This mail reminds me somewhat of buxy's debusine proposal/work.
- coordinate the analysis from UDD, but do the analysis itself on a
third-party 'worker' machine (since the process is quite CPU intensive)
Is the analysis something that could be farmed out to multiple machines
for parallel processing using Debian AWS credits or Grid5000? I expect
that would be useful for updating tags when lintian gets updated, or
for the lintian maintainers or other folks to do experimental checks.
TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP
On 03/07/22 at 15:51 +0200, Lucas Nussbaum wrote:
TLDR: I have plans to get fresh archive-wide data about lintian results
in UDD (and then to any service that wants to consume it), but it's
still WIP
Update on this:
- the initial scan is still running, and should finish by friday morning
(~ 12000 source packages remaining)
- https://udd.debian.org/lintian/ now uses the data from the new importer
- the UDD lintian_results_agg view is probably what you should look at
if you want to access the raw data.
Once the initial scan is finished, I will update the worker node to the latest lintian version from backports.
Seeing that lintian got adopted, I got motivated into looking if I could
help on the lintian.d.o side, that is, provide up-to-date archive-wide
up to date to developers.
Since the architecture of lintian.d.o seemed quite complicated, I
instead decided to follow what worked for other UDD-based data importers (such as the one that scans for new upstream versions). So my plan is
the following:
- use a UDD postgresql table for data storage
- use UDD to decide which packages need to be analyzed
- coordinate the analysis from UDD, but do the analysis itself on a
third-party 'worker' machine (since the process is quite CPU intensive)
- provide visualisation directly on https://udd.debian.org (similar
to https://udd.debian.org/dmd/ or https://udd.debian.org/bugs/)
- work with data consumers on how to best export the data from UDD to
them
I know it feels a bit like NIH, but I believe the simpler design will
help in the long term...
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 489 |
Nodes: | 16 (2 / 14) |
Uptime: | 17:33:46 |
Calls: | 9,665 |
Files: | 13,712 |
Messages: | 6,167,830 |