I'm interested in package popularity. I'm aware of popcon (https://popcon.debian.org/), but I'm more interested in actual
downloads.
I'm interested in package popularity. I'm aware of popcon (https://popcon.debian.org/), but I'm more interested in actual
downloads.
I'm interested in package popularity. I'm aware of popcon
(https://popcon.debian.org/), but I'm more interested in actual
downloads.
I am also interested in usage statistics. I feel it is much more
meaningful to work on packages that I know how have a lot of users.
While neither popcon of download stats are accurate, they still show
trends and relative numbers which can be used to make useful
conclusions. I would be glad to see if people could share ideas on
what stats we could collect and publish instead of just pointing out
flaws in various stats.
I am also interested in usage statistics. I feel it is much more+1
meaningful to work on packages that I know how have a lot of users.
While neither popcon of download stats are accurate, they still show
trends and relative numbers which can be used to make useful
conclusions. I would be glad to see if people could share ideas on
what stats we could collect and publish instead of just pointing out
flaws in various stats.
I suspect that compliance with GDPR would require the data to be
stored minimally.
It seems reasonable to me that a 24-hour window would reduce most repeat-downloads.
If you stream the request log and reduce to (ip,package,version), it
will be minimal.
I think it would fit into memory, e.g. 10 million unique IP adresses
x 100 packages x 40 bytes = 40 GB
The problem is that we currently do not want to retain this data.
It'd require a clear measure of usefulness, not just a "it would be
nice if we had it". And there would need to be actual criteria of what
we would be interested in. Raw download count? Some measure of
bucketing by source IP or not? What about container/hermetic builders fetching the same ancient package over and over again from snapshot?
Does the version matter?
There will be lots of packages that are rarely downloaded and still important.
Back of the envelope math says that'd be 600 GB/d of raw syslog log
traffic.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 498 |
Nodes: | 16 (2 / 14) |
Uptime: | 35:04:49 |
Calls: | 9,798 |
Files: | 13,751 |
Messages: | 6,189,205 |