• Community renewal and project obsolescence

    From Rafael =?utf-8?Q?Laboissi=C3=A8re?=@21:1/5 to All on Wed Dec 27 21:50:01 2023
    Dear Debian fellows,

    This is a very simple-minded analysis about the Debian community (lack
    of) renewal and project obsolescence:

    https://salsa.debian.org/rafael/debian-contrib-years

    containing interesting comments made by Sébastien Villemot.

    Best,

    Rafael Laboissière, DD

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rafael =?utf-8?Q?Laboissi=C3=A8re?=@21:1/5 to All on Thu Dec 28 17:00:02 2023
    * M. Zhou <lumin@debian.org> [2023-12-27 19:00]:

    Thanks for sharing the figure. The data seems correlated with the
    number of new Debian accounts. See the figure below:
    Python Code for this figure:

    ```
    # modified from ChatGPT.
    # XXX: members.csv is copy-pasted from https://nm.debian.org/members/
    import pandas as pd
    import matplotlib.pyplot as plt
    df = pd.read_csv('members.csv', sep='\t')
    df = df[df['Since'] != '(unknown)'] # filter out invalid data
    df['Since'] = pd.to_datetime(df['Since'])
    df['Year'] = df['Since'].dt.year
    account_counts = df['Year'].value_counts().sort_index()
    smoothed_counts = account_counts.rolling(window=3).mean()
    plt.figure(figsize=(10, 6))
    plt.bar(account_counts.index, account_counts.values, color='skyblue')
    plt.plot(smoothed_counts.index, smoothed_counts.values, color='orange',
    label=f'Smoothed (Window=3)')
    plt.xlabel('Year')
    plt.ylabel('Number of Accounts Created')
    plt.title('Number of Accounts Created Each Year')
    plt.legend()
    plt.savefig('nm-year.png')
    ```

    Thanks for the code and the figure. Indeed, the trend is confirmed by
    fitting a linear model count ~ year to the new members list. The
    coefficient is -1.39 member/year, which is significantly different from
    zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from year
    2001, that could be interpreted as an outlier, the trend is still
    siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p < 0.01).

    Best,

    Rafael Laboissière

    P.S.1: The correct way to do the analysis above is by using a
    generalized linear model, with the count data from a Poisson distribution
    (or, perhaps, by considering overdispersed data). I will eventually add
    this to my code in Git.

    P.S.2: In your Python code, it is possible to get the data frame directly
    from the web page, without copying&pasting. Just replace the line:

    df = pd.read_csv('members.csv', sep='\t')

    by:

    df = pd.read_html("https://nm.debian.org/members/")[0]

    I am wondering whether ChatGPT could have figured this out…

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mo Zhou@21:1/5 to All on Thu Dec 28 20:10:02 2023
    On 12/28/23 10:34, Rafael Laboissière wrote:

    * M. Zhou <lumin@debian.org> [2023-12-27 19:00]:

    Thanks for the code and the figure. Indeed, the trend is confirmed by
    fitting a linear model count ~ year to the new members list. The
    coefficient is -1.39 member/year, which is significantly different
    from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data
    from year 2001, that could be interpreted as an outlier, the trend is
    still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p
    < 0.01).

    I thought about to use some models for population statistics, so we can
    get the data about DD birth rate and DD retire/leave rate, as well as a prediction. But since the descendants of DDs are not naturally new DDs,
    the typical population models are not likely going to work well. The
    birth of DD is more likely mutation, sort of.

    Anyway, we do not need sophisticated math models to draw the conclusion
    that Debian is an aging community. And yet, we don't seem to have a good
    way to reshape the curve using Debian's funds. -- this is one of the key problems behind the data.

    P.S.1: The correct way to do the analysis above is by using a
    generalized linear model, with the count data from a Poisson
    distribution (or, perhaps, by considering overdispersed data). I will eventually add this to my code in Git.

    Why not integrate them into nm.debian.org when they are ready?

    P.S.2: In your Python code, it is possible to get the data frame
    directly from the web page, without copying&pasting. Just replace the
    line:

        df = pd.read_csv('members.csv', sep='\t')

    by:

        df = pd.read_html("https://nm.debian.org/members/")[0]

    I am wondering whether ChatGPT could have figured this out…

    I just specified the CSV input format based on what I have copied. It
    produces well-formatted code with detailed documentation in most of the
    time. I deleted too much from its outputs to keep the snippet short.

    I have to justify one thing to avoid giving you a wrong impression about
    large language models. In fact, the performance of an LLM (such as
    ChatGPT) greatly varies based on the prompt and the context people
    provided to it. Exploring this in-context learning capability is still
    one of the cutting edge research topics. For the status-quo LLMs, their
    answers on boilerplate code like plotting (matplotlib) and simple
    statistics (pandas) are terribly perfect.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel =?utf-8?Q?Gr=C3=B6ber?=@21:1/5 to Antonio Russo on Fri Dec 29 19:10:01 2023
    Hi,

    On Fri, Dec 29, 2023 at 08:48:28AM -0800, Antonio Russo wrote:
    [...] my personal experience is that making contributions is like
    dropping a message in a bottle into the sea. It feels like a complete crap-shot whether I'll even receive a comment on any code contribution (including debian-devel RFS, salsa MR, or BTS patch).

    This is also my experience.

    A related question I've been pondering: did salsa make this worse for new contributors because some maintainers (seem to) ignore issues/MRs there?

    I figure for the many people coming from GH style platforms nowerdays being ignored on salsa would be a major discouragment to contributing.

    If there were a single thing that could be done, in my mind it would be
    to have someone make sure that contributions do not go entirely ignored.

    I've been thinking along those lines too. Perhaps we just need an
    aggregator that flags mails/comments/other contributions by new people that
    are being ignored.

    I've been meaning to do something like that for the d-mentors list but
    perhaps we need to think bigger.

    --Daniel


    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEEV6G/FbT2+ZuJ7bKf05SBrh55rPcFAmWPBrQACgkQ05SBrh55 rPdXkA//fy9kJqbMSjDvq+im/G6zD+7RUr0fuChwDVvuebNQ4CSvz9+CFAikyYgu nBqQSPHWqBSPVn29Z5BwW1tvnBhGMu83/iY3JjVA7LqHn5K1ZOeu4G+YRTfUTQqS UfKNLcZdgUJr9NozNxzyM9m5Qrttee8QWBuCX8soQl229Gg5ujG1srQhxHM8XD9/ mDct9oMAwCzHQcrioeJqjg0qjNqhbvfgvoMimnAd6hIhs2rFspCWNhDxY2ts+b1t V3RQmgq7oSyv3BmsZifvzixhCFCwHlXGXyiw1Fj/jehFCZnx0YYA6z8qHTjUxsuN ibHnxNoPLCRCl5aBtNTcqc4KdMx93U6xUOLBzclA/6XlyUg8o2uDKQR35C+IRakc 1OW6B2gnK3K+JTS1od+UsPLlJ2oaFxmiRal6ofwLwghtbJRgT+77WjuIfTSByhs6 ulBQpjWL6Iw3vTLRnVwj86nZ5Z24LcRJKNpovJ9NPV1oMPj0M57Yws92ywENTzAn jzjj1zGwYnxhZgm0fpOsYq8L7zHGGxWAnN4in9+gP1/wSBdnNbf7K1ImfW8vRPgx dx7bti2abKMPrW2Lg+uDhTeztRL+3i4YlUAevTUyJhfa1X7yIJpx32hhIA5QTpfS LsSs65pbBLoWwqrGoWImmOq7dxs4Wfm160aAEKJXuweDEcrStjI=
    =p/TB
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rakhmatullin@21:1/5 to Antonio Russo on Fri Dec 29 20:30:01 2023
    On Fri, Dec 29, 2023 at 08:48:28AM -0800, Antonio Russo wrote:
    As someone who would like to participate more in the development of Debian, my personal
    experience is that making contributions is like dropping a message in a bottle into
    the sea. It feels like a complete crap-shot whether I'll even receive a comment on
    any code contribution (including debian-devel RFS, salsa MR, or BTS patch).
    There are multiple reasons for that, some common to all of these, some
    specific to some contribution types, but all ultimately boil down to other people being volunteers. There is no direct way to improve this beyond magically increasing the total amount of time spent by maintainers on
    Debian work. Some processes or tools could be improved but I'm not sure
    how much would that help.

    If there were a single thing that could be done, in my mind it would be to have someone
    make sure that contributions do not go entirely ignored. Even just telling someone "hey,
    none of the stuff you're submitting is really good enough for Debian" would be helpful
    because they could either work on improving, or stop trying to contribute.
    There is no polite way to tell that, but also it's not a big problem for
    the project if somebody who submits very bad RFSes gets those RFSes
    ignored instead of being told to stop waiting for feedback on them.
    Giving constructive feedback, on the other hand, can be very
    time-draining, especially to first-time contributors submitting poor
    quality things. This is not even specific to Debian but applies to any
    open source maintainer work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rakhmatullin@21:1/5 to All on Fri Dec 29 20:30:01 2023
    On Fri, Dec 29, 2023 at 06:49:47PM +0100, Daniel Gröber wrote:
    [...] my personal experience is that making contributions is like
    dropping a message in a bottle into the sea. It feels like a complete crap-shot whether I'll even receive a comment on any code contribution (including debian-devel RFS, salsa MR, or BTS patch).

    This is also my experience.

    A related question I've been pondering: did salsa make this worse for new contributors because some maintainers (seem to) ignore issues/MRs there?
    Maybe, but also salsa MRs being ignored by default was an intentional
    decision AFAIK, both a technical decision of not notifying maintainers
    about created MRs and a policy decision of the BTS being the only
    officially promoted way to contact maintainers and submit patches.
    I have no idea if people are actually told that before they submit MRs.

    I figure for the many people coming from GH style platforms nowerdays being ignored on salsa would be a major discouragment to contributing.
    Well, salsa didn't make this worse, it just added something that can be ignored.

    If there were a single thing that could be done, in my mind it would be
    to have someone make sure that contributions do not go entirely ignored.

    I've been thinking along those lines too. Perhaps we just need an
    aggregator that flags mails/comments/other contributions by new people that are being ignored.
    You'll still need people to provide feedback.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Fri Dec 29 21:00:02 2023
    "Daniel" == Daniel Gröber <dxld@darkboxed.org> writes:

    Daniel> Hi,
    Daniel> On Fri, Dec 29, 2023 at 08:48:28AM -0800, Antonio Russo wrote:
    >> [...] my personal experience is that making contributions is like
    >> dropping a message in a bottle into the sea. It feels like a
    >> complete crap-shot whether I'll even receive a comment on any
    >> code contribution (including debian-devel RFS, salsa MR, or BTS
    >> patch).

    Daniel> A related question I've been pondering: did salsa make this
    Daniel> worse for new contributors because some maintainers (seem
    Daniel> to) ignore issues/MRs there?

    I think so.

    Especially for group-maintained packages, it is very easy to get into a situation where no one is actually notified for a MR on a given
    repository.

    More generally, as a maintainer, when I find I'm ignoring someone it's typically because:

    * The idea has some merit; if it was complete junk I could close it as
    wontfix or invalid or whatever.

    * But it requires significant effort from me to get to a place where it
    lands.

    * And I don't care that much.

    Examples include ideas where there's significant review that would be
    needed; ideas where there's some rework needed; or especially ideas
    where it's important to consider the implications between the new idea
    and some part of the system that neither I nor the submitter understands
    well.

    Another common challenge is an idea that disturbs some part of something
    that's been mostly chugging along fine for years, but that has entirely inadequate test coverage to know whether this new code will break
    things.
    I feel bad saying "that's great, but please write a test suite to cover
    your contribution as well as a significant chunk of the package you are touching," but can rarely work up the interest in doing that test suite
    myself if I don't care much about the enhancement/fix.

    Another challenge is when some idea involves significant coordination
    work. For example there are a few pam bugs that boil tdown to
    pam-auth-update isn't quite fine grain enough to capture some
    distinctions that matter.
    Proposing a new design, and moving that across the archive would be a
    lot of work.

    Or for example there's a merge request/bug on pam to enable group write
    umask by default with usergroups. Which apparently there was a
    consensus to do way back in the day. I'm concerned that consensus
    predates modern thinking about being restrictive in write permissions,
    and something's probably going to break, but on the other hand Ubuntu
    does it, and it's probably going to enable some valuable use cases.
    Deciding how to act on something like that is hard.

    And yet I completely get your side of things.
    If you try to contribute and aren't welcomed, it totally destroys
    motivation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rafael =?utf-8?Q?Laboissi=C3=A8re?=@21:1/5 to All on Wed Jan 3 20:20:02 2024
    * Rafael Laboissière <rafael@debian.org> [2023-12-27 21:12]:

    This is a very simple-minded analysis about the Debian community (lack
    of) renewal and project obsolescence:

    https://salsa.debian.org/rafael/debian-contrib-years

    containing interesting comments made by Sébastien Villemot.

    First of all, I wish you all a happy 2024.

    I have updated my repository at salsa.d.o (URL above), integrating some elements discussed in the present thread, in particular the analysis
    proposed by Mo Zhou and the comments made by Steffen Möller.

    Best,

    Rafael Laboissière

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gunnar Wolf@21:1/5 to All on Thu Jan 4 19:00:02 2024
    Mo Zhou dijo [Thu, Dec 28, 2023 at 02:02:18PM -0500]:
    Thanks for the code and the figure. Indeed, the trend is confirmed by fitting a linear model count ~ year to the new members list. The coefficient is -1.39 member/year, which is significantly different from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data from
    year 2001, that could be interpreted as an outlier, the trend is still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p <
    0.01).

    I thought about to use some models for population statistics, so we can get the data about DD birth rate and DD retire/leave rate, as well as a prediction. But since the descendants of DDs are not naturally new DDs, the typical population models are not likely going to work well. The birth of DD is more likely mutation, sort of.

    Five years ago, I got a paper published where we analized and made
    some forecasts on the curated Web-of-Trust keyrings in Debian:

    https://jisajournal.springeropen.com/articles/10.1186/s13174-018-0082-7

    I did the first part of the article, but the part that better fits
    what you are describing was done by my coauthor, Víctor González (who understands about statistics way better than me).

    Anyway, it does not also answer to the exact question you are
    presenting --- we there studied the lifetime of keys, and left for
    later analysis a way to link said keys into people, in order to map
    the life trajectory of an individual in the project. But it might
    still be interesting or useful for your analysis.

    Anyway, we do not need sophisticated math models to draw the conclusion that Debian is an aging community. And yet, we don't seem to have a good way to reshape the curve using Debian's funds. -- this is one of the key problems behind the data.

    And I think this is hardly an unexpected outcome. There are many
    social and technological patterns that define us as a 1990s project
    that continues to liveand thrive, but not necessarily with the best /
    most up-to-date tooling.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Tille@21:1/5 to All on Fri Jan 26 09:40:02 2024
    Hi Raphael,

    thanks a lot for your analysis and sorry for my late reply.

    Am Wed, Jan 03, 2024 at 08:01:16PM +0100 schrieb Rafael Laboissière:
    https://salsa.debian.org/rafael/debian-contrib-years

    First of all, I wish you all a happy 2024.

    +1

    I have updated my repository at salsa.d.o (URL above), integrating some elements discussed in the present thread, in particular the analysis
    proposed by Mo Zhou and the comments made by Steffen Möller.

    I'd like to share some ideas. Steffen blamed the advent of homebrew and
    conda as one factor which I think is true to some extend in some fields.
    But I also think that we are a victim of our own success to be the
    distribution growing the most derivatives. Ubuntu and Mint might be the
    most famous ones and if I'm not misleaded the number of Ubuntu users is
    at least one order of magnitude higher than from (pure) Debian (if you
    don't count Ubuntu users as Debian users which they actually are
    indirectly). I do not want to discuss whether this is good or bad for
    Debian (which would be a long list of pros and cons) but contributors
    are recruited from users and we simply do not see the number of
    derivatives contributors in our stats. Maybe we simply see patches
    arriving from some derivative which are simply collected by a single contributor (hopefully they really report back issues - my experience
    with bug reports+patches from Ubuntu are pretty good ... but I see only
    those isses that are reported since I do not check the bug trackers
    there whether there are other known problems hidden from our sight).

    You might know that I'm very focussed on Blends. The idea way born when
    I noticed lots of dervatives dealing with the same topic as I (Debian
    Med with a focus on biology and medicine). While lots of people
    understood Blends wrongly as a way to create a derivative its the
    contrary: Don't derive from Debian but rather create a Blend to find a
    solution *inside* Debian.

    Over more than 20 years of Blends effort I learned that it is pretty
    hard to make this concept popular. In Debian Med we finally managed to
    attract some of the derivers in this field to integrate their stuff into
    Debian and by doing so they also became Debian contributors. But
    meanwhile those contributors drifted away for different reasons (changed
    jobs etc.) So at least their work was kept in Debian instead of beeing
    lost in an orphaned derivative.

    After all these years I need to confess, that my original plan about
    Blends somehow failed. I assumed that every Debian Developer /
    Contributor is inside the non-Debian world in some community. If this Contributor would work hard to make Debian fit for this community new contributors would arrise from this community inside Debian to make
    Debian fit even better for the own usage. Since years I include the
    proof that this *can* work in my slides of Debian Med related talks[1].
    So some outsiders project - biologists and people working in medicine
    are by far a lower percentage of overall Debian users than the >1% of
    Debian Developers we have in this field - can attract constantly
    contributors with a growing tendency in contrary to your graph! This is
    a good sign that my original assumption, a Blend can attract
    contributors, might be correct. To come back to the "reasons for
    decreasing number of contributors": We have to less Blends done right.

    My gut feeling is that this is somehow connected to the fact that
    developers usually are overworked even with technical work and do less
    to reach out to some community which is considered "additional work
    squashing some time limit dedicated to Debian". I confirm that lots of
    my outreachy acticitiy (GSoC, Outreachy, MoM, in person meetings
    (sprints), other ways to contact the community) did not really led to
    long term contributors. However, if I would not have started to reach
    out the Debian Med project would never ever have reached its current
    status of nearly 1000 packages in main[2] with a relatively low number
    of RC bugs and by probably maintaining definitely more than 500 packages
    in other teams (Debian Science, Debian Python Team, R Pkg tem, etc.)

    I also did some investigation in team metrics[3] to see how teams
    (originally targetting at Blends teams) are performing. I admit I'm
    really proud about beeing "beaten" last year in the number of bugs
    squashed[4]. The interesting thing in this bug squashing graph is not
    only the fact that it is in contrast to your graph since the number of contributors is increasing over years. Its also the not visible fact
    that the top 3 bug squashers are not actually experts in our field.
    Étienne and Nilesh just joined since its fun to work in this team.

    To summarise this long mail: Another item in your list of reasons is,
    that we should care better for our contributors in strong teams (either
    topic related Blends or kind and inviting language teams).

    Kind regards
    Andreas.

    [1] https://people.debian.org/~tille/talks/20230910_debconf_med-team_talk/teams_handout.pdf
    -> slide 6/31 "Debian Med has attracted one developer per year"
    Data are based on this survey
    https://wiki.debian.org/DebianMed/Developers

    [2] https://qa.debian.org/developer.php?email=debian-med-packaging@lists.alioth.debian.org
    [3] http://blends.debian.net/liststats/
    [4] http://blends.debian.net/liststats/bugs_debian-med.png

    --
    http://fam-tille.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)