I would like Debian to discuss and decide on the usage of AI-
generated content within the project.
You might already know that recently Gentoo made a strong move in
this context and drafted their AI policy:
- https://wiki.gentoo.org/wiki/Project:Council/AI_policy
- https://www.mail-archive.com/gentoo-
dev@lists.gentoo.org/msg99042.html
It's just another tool that might or might not be non-free like people
using Photoshop, Google Chrome, Gmail, Windows, ... to make
contributions. Or a spamfilter to filter out some.
"Ansgar" == Ansgar 🙀 <ansgar@43-1.org> writes:
You might already know that recently Gentoo made a strong move in this context
and drafted their AI policy:
- https://wiki.gentoo.org/wiki/Project:Council/AI_policy
- https://www.mail-archive.com/gentoo-dev@lists.gentoo.org/msg99042.html
That's entirely not the point.
It is not about **the tool** being non-free, but the result of its use being non-free.
Generative AI tools **produce** derivatives of other people's copyrighted works.
That said, we already have the necessary policies in place:
* d/copyright must be accurate
* all sources must be reproducible from their preferred form of modification
Both are not possible using generative AI.
I personally agree with the author's rationale on the aspects pointed
out (copyright, quality and ethical ones). But at this point I guess we
might have more questions than answers, that's why I think it'd be
helpful to have some input before suggesting any concrete proposals.
Perhaps the most important step now is to get an idea of how Debian
folks actually feels about this matter. And how we feel about moving in
a similar direction to what the gentoo project did.
Hi,
It's just another tool that might or might not be non-free like people >using Photoshop, Google Chrome, Gmail, Windows, ... to make
contributions. Or a spamfilter to filter out some.
That's entirely not the point.
It is not about **the tool** being non-free, but the result of its use being non-free.
Generative AI tools **produce** derivatives of other people's copyrighted works.
That said, we already have the necessary policies in place:
* d/copyright must be accurate
* all sources must be reproducible from their preferred form of modification
Both are not possible using generative AI.
"Dominik" == Dominik George <natureshadow@debian.org> writes:
Tiago Bortoletto Vaz <tiago@debian.org> writes:
I personally agree with the author's rationale on the aspects pointed
out (copyright, quality and ethical ones). But at this point I guess we might have more questions than answers, that's why I think it'd be
helpful to have some input before suggesting any concrete proposals. Perhaps the most important step now is to get an idea of how Debian
folks actually feels about this matter. And how we feel about moving in
a similar direction to what the gentoo project did.
I'm dubious of the Gentoo approach because it is (as they admit) unenforceable, which to me means that it's not a great policy. A position statement, maybe, but that's a different sort of thing.
About the only statement that I've wanted to make so far is to say that anyone relying on AI to summarize important project resources like Debian Policy or the Developers Guide or whatnot is taking full responsibility
for any resulting failures. If you ask an AI to read Policy for you and
it spits out nonsense or lies, this is not something the Policy Editors
have any time or bandwidth to deal with.
I would like Debian to discuss and decide on the usage of AI-generated content
within the project.
t's just another tool that might or might not be non-free like people
using Photoshop, Google Chrome, Gmail, Windows, ... to make
contributions. Or a spamfilter to filter out some.
That's entirely not the point.
It is not about **the tool** being non-free, but the result of its use being non-free.
Generative AI tools **produce** derivatives of other people's copyrighted works.
That said, we already have the necessary policies in place:
* d/copyright must be accurate
* all sources must be reproducible from their preferred form of modification
Both are not possible using generative AI.
Generative AI tools **produce** derivatives of other people's copyrighted works.
They *can* do that, but so can humans (and will). Humans look at a
product or code and write new code that sometimes resembles the
original very much.
If I would hear that other Debian developers use them in that context, I would seriously question whether there is any value to spend myThere is a popular old opinion unrelated to AI that there is not.
volunteer time in keeping debian/copyright files accurate to the level
of details our Policy asks for.
Right, note that they acknowledged this policy is a working in progress. Not perfect, but 'something needed to be done, quickly'. It's hard to find a balance here, but I kind of share this sense of urgency.
[...]
This point resonates with problems we might be facing already, for instance in the NM process and also in Debconf submissions (there's no point of going into details here because so far we can't proof anything, and even if we could,
of course we wouldn't bring any of the involved to the public arena). So I'm actually more concerned about LLM being mindlessly applied in our communication
processes (NM, bts, debconf, irc, planet, wiki, website, debian.net stuff, etc)
than one using some AI-assisted code in our infra, at least for now.
So I'm actually more concerned about LLM being mindlessly applied in
our communication processes (NM, bts, debconf, irc, planet, wiki,
website, debian.net stuff, etc) than one using some AI-assisted code
in our infra, at least for now.
On Thu May 2, 2024 at 9:21 PM -03, Tiago Bortoletto Vaz wrote:
Right, note that they acknowledged this policy is a working in progress. Not
perfect, but 'something needed to be done, quickly'. It's hard to find a balance here, but I kind of share this sense of urgency.
[...]
This point resonates with problems we might be facing already, for instance in the NM process and also in Debconf submissions (there's no point of going
into details here because so far we can't proof anything, and even if we could,
of course we wouldn't bring any of the involved to the public arena). So I'm
actually more concerned about LLM being mindlessly applied in our communication
processes (NM, bts, debconf, irc, planet, wiki, website, debian.net stuff, etc)
than one using some AI-assisted code in our infra, at least for now.
Hi Tiago,
It seems you have more context than the rest which provides a sense of urgency for you, where others do not have this same information and
can't share this sense of urgency.
If I were to assume based on the little context you shared, I would say there's someone doing a NM application using LLM, answering stuff with
LLM and passing all their communications through LLMs.
In that case, there's even less point in making a policy about it, in my opinion. Since as you stated: you can't prove anything, and ultimately
it would land in the hands of the people approving submissions or NMs to judge if the person is qualified or not. And you can't block
communications from LLM generated content when you can't even prove it's
LLM generated content. How to enforce it?
And I doubt a statement would do much, as well. What would be
communicated? "Communications produced by LLMs are troublesome"? I don't
know if there's much substance to have a statement of that sort.
(1) You are free to use AI tools to *improve* your content, but not to
create it from scratch for you.
This point is particular important for non-native English speakers,
who can benefit a lot more than natives from tool support for tasks
like proofreading/editing. I suspect the Debian community might be
particularly sensible to this argument. (And note that on this one
the barrier between ChatGPT-based proofreading and other grammar/
style checkers will become more and more blurry in the future.)
"Tiago" == Tiago Bortoletto Vaz <tiago@debian.org> writes:
On that front, useful "related work" are the policies that scientific journals and conferences (which are exposed *a lot* to this, given theirIndeed. Here are some examples:
main activity is vetting textual documents) have put in place about
this.
The general policy usually contains two main points (paraphrased below):Polishing language is the case where I find LLMs most useful. But in fact,
(1) You are free to use AI tools to *improve* your content, but not to
create it from scratch for you.
(2) You need to disclose the fact you have used AI tools, and how youYes, It is commonly encouraged to acknowledge the use of AI tools.
have used them.
Exactly as in your case, Tiago, people managing scientific journals and conferences have absolutely no way of checking if these rules areIf the cheater who use LLM is lazy enough, not editing the LLM outputs
respected or not. (They have access to large-scale plagiarism detection tools, which is a related but different concern.) They just ask people
to *state* they followed this policy upon submission, but that's it.
If your main concern is people using LLMs or the like in some of the processes you mention, a checkbox requiring such a statement uponFor the long run, there is no way to enforce a ban on the use of AI over
submission might go a longer way than a project-wide statement (which
will sit in d-d-a unknown to n-m applicants a few years from now).
On 5/3/24 12:10, Stefano Zacchiroli wrote:
On that front, useful "related work" are the policies that scientificIndeed. Here are some examples:
journals and conferences (which are exposed *a lot* to this, given their
main activity is vetting textual documents) have put in place about
this.
Nature: https://www.nature.com/nature-portfolio/editorial-policies/ai
ICML: https://icml.cc/Conferences/2023/llm-policy
CVPR: https://cvpr.thecvf.com/Conferences/2024/ReviewerGuidelines
         https://cvpr.thecvf.com/Conferences/2024/AuthorGuidelines
Some additional points to the two from Stefano:
1. Nature does not allow LLM to be an author.
2. CVPR holds the author who used LLM responsible for all LLM's fault.
3. CVPR agrees that the paper reviewers skipping their work with LLM
   is harming the community.
The general policy usually contains two main points (paraphrased below):Polishing language is the case where I find LLMs most useful. But in fact,
(1) You are free to use AI tools to *improve* your content, but not to
    create it from scratch for you.
as an author, when I really care about the quality of whatever I wrote,
I will find the state-of-the-art LLMs (such as ChatGPT4) poor in logic,
poor in understanding my deep insight. They eventually turn into a
smart language tutor to me.
(2) You need to disclose the fact you have used AI tools, and how youYes, It is commonly encouraged to acknowledge the use of AI tools.
    have used them.
Exactly as in your case, Tiago, people managing scientific journals andIf the cheater who use LLM is lazy enough, not editing the LLM outputs
conferences have absolutely no way of checking if these rules are
respected or not. (They have access to large-scale plagiarism detection
tools, which is a related but different concern.) They just ask people
to *state* they followed this policy upon submission, but that's it.
at all --- you will find it super easy to identify whether a chunk of text
is produced by LLM on your own. For example, I use ChatGPT basically
everyday in
March, and its answers always feel like being organized in the same
format. No human answers questions in the same boring format all the time.
If your main concern is people using LLMs or the like in some of theFor the long run, there is no way to enforce a ban on the use of AI over
processes you mention, a checkbox requiring such a statement upon
submission might go a longer way than a project-wide statement (which
will sit in d-d-a unknown to n-m applicants a few years from now).
this project. What is doable, from my point of view, is to confirm that
a person acknowledges the issues, potential risk and implications of
the use of AI tools, and hold people who use AI to be responsible for
AI's fault.
Afterall, it's easy to identify one's intention of using AI -- it is either for good or bad. If the NM applicants can easily get the answer of an
NM question, maybe it is time to refresh the question? Afterall nobody
can stop one from learning from AI outputs when they need suggestion
or reference answers -- and they are responsible for the wrong answer
if AI is wrong.
Apart from deliberately conducting bad acts using AIs, one thing that seems benign but harmful to the community is slacking off and skipping important work with AIs. But still, this can be covered by a single rule as well -- "Let the person who use AI to be responsible for AI's fault."
Simple, and doable.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 151:44:02 |
Calls: | 10,383 |
Files: | 14,054 |
Messages: | 6,417,815 |