Forum: >>> Magnum BBS <<<

Brief update about software freedom and artificial intelligence

From M. Zhou@21:1/5 to All on Fri Feb 24 00:20:01 2023

Hi folks,

Recap:
The modern practice of AI has blurred the boundary between the code and data, which leads to some potential ambiguity to the interpretation of the definition of
open source as well as the respective licenses. Such ambiguous interpretation in fact deviates from and violates the spirit of free software.

Several years ago I pointed out this issue on -devel, and eventually drafted ML-Policy [2].
Then OSI formally realized this issue in the last year, and invited me to contribute some
thoughts in their Deep Dive: AI event. Now the final report is available here: [1]
This is a summary of people's discussions from various field.

You may have tried ChatGPT recently -- this field develops rapidly, and some of the
state-of-the-art AIs could be astonishing if you have never tried something alike in the past.
If there will be some monopolistic proprietary AGI (artificial general intelligence) in the future,
I personally fear of its potential capability of being evil. This resembles a part of the
history of free software in the last decade.

Anyway, from the Debian side, we at least know that we should be careful
when dealing with AI software.

[1] https://deepdive.opensource.org/wp-content/uploads/2023/02/Deep-Dive-AI-final-report.pdf
[2] https://salsa.debian.org/deeplearning-team/ml-policy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to M. Zhou on Fri Feb 24 01:20:01 2023

On Fri, 24 Feb 2023 at 00:16, M. Zhou <lumin@debian.org> wrote:

Hi folks,

Recap:
The modern practice of AI has blurred the boundary between the code and data, which leads to some potential ambiguity to the interpretation of the definition of
open source as well as the respective licenses. Such ambiguous interpretation in fact deviates from and violates the spirit of free software.

Hi folks and Zhou,

cloud technologies posed a challenge to the GPLv2 because under that
license everyone has the right to change the code but do not share it
as long as s/he uses it internally which is exactly how the SaaS
works. To fulfil this lack of freedom, the GPLv3 was proposed.

Unfortunately, the GPLv3 adoption did not spread into the community.
Or fortunately because almost every company involved in cloud adopted
the software-libre as their prefered solution. This gave the community
a huge hype, such a kind of hype that made the software-libre a big
thing. Then the A.I. comes and everything is going to change again.

The A.I. is a great challenge for humanity, especially because of the
ethical approach which requires. Ethics is not an option because there
are a lot of things that can go wrong with A.I. - last but not least
their use. The next challenge is about who is going to control this
technology: a proprietary solution under the control of a single
company or a companies cartel under the same nation flag. This will
easily bring us to see a strong concentration which means: 1. no
freedom, 2. no equality, 3. no innovation because of the lack of
competition. The worst is the lack of freedom because everything else
depends on it.

There are two ways to go, mainly:

1. changing the GPLv3 in such a way will cover the A.I. topics;
2. a brand new specific license for this topic.

In this e-mail, I will present my proposal about using GPLv3 to
address the new challenges that come with the A.I. - My opinion is
that GPLv3 applied to a composition is a novelty based on two known
legal standards that can fit our needs of freedom with A.I. also.

a) GPLv3 in its last revision has been available since 29 June 2007
and this means that every law studio in the world had the time to
deeply study and understand it. In a conservative sector like legal consultancy, every novelty based on a well-known past is welcomed -
might or might not be lovely accepted but this is completely another
story.

b) Under the Copyright Act, a compilation is defined as a "collection
and assembling of preexisting materials or of data that are selected
in such a way that the resulting work as a whole constitutes an
original work of authorship." (1996)

Combining these two well known pieces of law, we can obtain - not a
new license but - a new way to use the GPLv3: apply the GPLv3 to the composition despite the fact that single pieces of codes or data are
licensed. This is obviously a great advantage because changing the
license for every {piece of code} and {set of data} is not feasible
and if it would be necessary it wil be a nightmare on the legal point
of view.

This is a project of mine that uses the GPLv3 to protect a {set,
pool, collection, combination, composition} of files, everyone with
its own license. Whatever the license has a file, the composition
could be protected as software-libre by the GPLv3.

https://github.com/robang74/git-functions#license

I hope this helps the community to easily find a solution for our
freedom needs and set a standard into A.I. licensing. For sake of
completeness, I am adding that section here below the signature.

Best regards, R-
--

License

Almost all the files are under MIT license or GPLv3 and the others are
in the public domain. Instead, the composition of these files is
protected by the GPLv3 license.

Copyright Act, title 17. U.S.C. § 101.

Under the Copyright Act, a compilation [NDR: "composition" is used
here as synonym because compilation might confuse the reader about
code compiling] is defined as a "collection and assembling of
preexisting materials or of data [NDR: source code, as well] that are
selected in such a way that the resulting work as a whole constitutes
an original work of authorship."

This means that everyone can use a single MIT licensed file or a part
of it under the MIT license terms. Instead, using two of them or two
parts of them implies that you are using a subset of this collection.
Thus a derived work of this collection which is licensed under the
GPLv3 also.

The GPLv3 license applies to the composition unless you are the
original copyright owner or the author of a specific unmodified file.
This means that every one that can legally claim rights about the
original files maintains its rights, obviously. So, it should not need
to complain with the GPLv3 license applied to the composition. Unless,
the composition is adopted for the part which had not the rights,
before.

The copyright notice, the license and the author is reported in each
file header, here summarised:

colors.shell: MIT
isatty_override.c: MIT
git-commit-edit: public domain
git-isar-send-patch: GPLv3
git.functions: GPLv3
cr-editor.sh: GPLv3
install.sh: GPLv3

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Charles Plessy@21:1/5 to All on Fri Feb 24 05:30:01 2023

Dear Mo,

thank you for the heads-up.

I was using permissive licenses in the past thinking about making life
easier to individuals, but I feel robbed by massive scrapping to train
AI models.

Just in case I updated my email signature.

Also, is there a DFSG-free license that forces the training dataset and
the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

Have a nice day,

Charles

--
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from home https://framapiaf.org/@charles_plessy
- You do not have my permission to use this email to train an AI -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Fri Feb 24 09:20:01 2023

On Fri, 24 Feb 2023 at 08:06, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

On Fri, 24 Feb 2023 at 05:23, Charles Plessy <plessy@debian.org> wrote:

One more thing about this:

- Joe tests the NN with the 10+1 images of TS and decides if the NN is
fine or not. If he decides that it is fine and it can go into
production, then Joe's employer should share all above stated.
Instead, if he decides that it is crap, he will trash it and he can
not share anything because the sharing will have zero value for
anyone. This is compliant with the clause of fair use in which I
explicitly added "testing" as a condition to avoid sharing. After all,
if there is no value produced why should we force Joe to share his
failure? In particular cases a failure (vulnerability) is valuable information but for security reasons it is better that Joe is not
forced to comply with the GPLv3 terms. It is better to give Joe the
freedom to share only those information that he considers safe to
share in public. However, if Joe's company does a business with this - providing a PoC to a client - then they have to comply with GPLv3
because the statements for which commercial and business are covered
by GPLv3.

In this specific case the provider of the PoC could make a public
statement in which they promise to share under GPLv3 the PoC but only
after 3 months in order to give their client the opportunity to
develop an update that fixes the issue and test it properly. Then
their client do their job but they need 3 more months to grant their
clients have a reasonable time to update and test their systems. So,
they will make a public statement in which they grant their PoC
provider a legal coverage for every claim started in those 3 months
that they might be exposed for not having complied with the GPLv3
terms. In this way they have 3+3 months of time to fix a critical
issue and let their clients update their systems. In case the 3+3
months become 3+3 years, obviously their risk to face a trial with a
negative outcoming for them is much higher. So, after a reasonable
time, the PoC will be shared as supposed to be.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Charles Plessy on Fri Feb 24 09:20:01 2023

On Fri, 24 Feb 2023 at 05:23, Charles Plessy <plessy@debian.org> wrote:

Dear Mo,

thank you for the heads-up.

I was using permissive licenses in the past thinking about making life
easier to individuals, but I feel robbed by massive scrapping to train
AI models.

Just in case I updated my email signature.

Also, is there a DFSG-free license that forces the training dataset and
the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

Dear Charles,

imagine that you have a collection of data files, images for example
and each of them has its own license and copyright owner. The ML/AI
trained on that image set will produce a neural network which for the
purpose of this example is a N-dimensional grid of floating point
numbers coded in 64bit (NN weights). To be even more clear for those
who have never trained a NN - almost all the lawyers - I will present
here a very simple and explicit example.

- 100 images is the dataset protected by GPLv3 as composition, like a database

- Joe is the ML trainer, the employee that is going to train the ML

- Joe legally acquired the dataset because all the licenses allow it

- Joe wrote a script that renames the images in 00.jpg .. 99.jpg and
run it, this new dataset is still protected by the GPLv3

- Joe wrote a script that randomly chooses 90 images as learning set
(LS) and the others as test set (TS): these are two sub-compositions
and both are covered by GPLv3 because both have more than one
file/piece of the original composition. In fact, in the way I adopted
the GPLv3 on the composition I cannot enforce it over the single
file/piece because in that case I will change the license terms
decided by the original author of that file/piece and I do not want to
do that even if I can do that (ethics).

- The ML has the aim to decide if the image contains at least a dog or
not: image input, binary output. Thus Joe can add his dog image to the
TS and then that image becomes part of the TS composition thus he
should share it under a license that can be acceptable with the GPLv3
on the composition. However, Joe is smart and he did not want to share
his dog image which is equivalent to saying that we cannot prove that
Joe put that image into the TS composition by moving the file in that
folder. However from a legal point of view the simple fact that it is
used as part of TS clearly states his will to use his dog image as
part of the training set. So, in principle Joe is smart but honest and
to avoid legal issues for his employer will share his dog image.

- So, now the sharing pool brings a little information: the LS, the
TS, and the Joe's dog picture. However, one more image is +1% but that
image can be very tricky/important for ML in the same way some patches
are a single line but make a huge difference. So quantity is not a
universal metric of contribution. Moreover, now we know which TS/LS,
Joe used to obtain the NN which in some cases could be relevant
information.

- Joe needs also to tag every LS image in order to back-propagate the
feedback to the NN and train it. This can be a file in which filenames
are associated with a binary label dog or not. Again, Joe did not put
this file into the LS folder but as described above that file is part
of the training set in which a GPLv3 composition belongs. IMHO this
means that Joe should also share this file. Also this information
could be relevant because the most expensive job is labelling the
data.

- Joe trains the NN with a ML engine which produces the NN weights
matrix (BIN). This binary object is a derived work of a GPLv3
composition like a binary executable is a derived work of a GPLv3
source code. Thus Joe should share the BIN as well under GPLv3 terms
which enforces him also to explain the inner coding (BIN + format specifications). As you can imagine this is another step towards
freedom even if that BIN is supposed to run on a patented hardware
because we know the format specification we can write an emulator
-much slower and without a commercial value due the performances but
it can be used for learning purposes or check a questionable NN
output.

- Joe tests the NN with the 10+1 images of TS and decides if the NN is
fine or not. If he decides that it is fine and it can go into
production, then Joe's employer should share all above stated.
Instead, if he decides that it is crap, he will trash it and he can
not share anything because the sharing will have zero value for
anyone. This is compliant with the clause of fair use in which I
explicitly added "testing" as a condition to avoid sharing. After all,
if there is no value produced why should we force Joe to share his
failure? In particular cases a failure (vulnerability) is valuable
information but for security reasons it is better that Joe is not
forced to comply with the GPLv3 terms. It is better to give Joe the
freedom to share only those information that he considers safe to
share in public. However, if Joe's company does a business with this - providing a PoC to a client - then they have to comply with GPLv3
because the statements for which commercial and business are covered
by GPLv3.

- Joe is a student at university and his work has nothing to do with
commercial / business purposes. However, if his university decides to
use Joe's work for doing commercial or business then they should ask
Joe all the information which needs to be shared under GPLv3 terms.
This forces Joe to share that information when he delivers his work to
his teacher in such a way the university can also store the
information that might or might not be shared in the future. Again, no
value produced then no need to share. After all, the work of Joe could
be a completely useless failure and then rejected. We do not need to
know about it.

To invalidate the GPLv3 application to the NN binary someone
should explain in a legal terms compliant with some law that training
a neural network is a completely different thing than compiling a
binary from source code. In the same analogy, compiling GPLv3 source
code does not imply that you have to share under GPLv3 the proprietary
compiler that it has been used for, right? So the same for the ML
training engine.

Please feel free to contact me in person in order to get deep into
some aspects which as AI experts or law experts you might want to
challenge or improve. I will be happy to read/hear about you. Just
take in consideration that every relevant discovery (good or bad)
about this new way of using the GPLv3, will be shared here or
everywhere I decide to share it. So, if you are under NDA, I am not
and thus do not write/talk to me or otherwise do it at your own risk. :-)

I hope this helps,
--
Roberto A. Foglietta
+49.176.274.75.661
+39.349.33.30.697

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gerardo Ballabio@21:1/5 to Roberto A. Foglietta on Fri Feb 24 10:30:01 2023

Roberto A. Foglietta wrote:

cloud technologies posed a challenge to the GPLv2 because under that

license everyone has the right to change the code but do not share it
as long as s/he uses it internally which is exactly how the SaaS
works. To fulfil this lack of freedom, the GPLv3 was proposed.

Well, not exactly.

If I am not mistaken, the GPLv3 was developed to clarify some
ambiguous language in the GPLv2, mostly with respect to patents. It
doesn't address SaaS -- you are still free to modify the code and keep
your modifications private, even if you run a publicly accessible
service on the modified code.

The Affero GPL <https://www.gnu.org/licenses/agpl-3.0.html> was
developed to specifically address SaaS. This license requires that if
you run a service over a network, you must offer the corresponding
source code to all users of the service.

Charles Plessy wrote:

Also, is there a DFSG-free license that forces the training dataset and

the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

As I understand, that is an open legal question. The Affero GPL would
be such a license *if* the training dataset would be considered part
of the code. While that does seem to make sense, as AI code is
essentially non-functional without the training, I am not aware that
there has ever been a pronouncement by a court of law that affirms or
denies it, nor I am aware of any free/open source license that
contains language that deals specifically with that issue, and I'm
pretty sure that there's lot of room for lawyers to argue their point.

If you explicitly publish a dataset under the GPL or AGPL, I suppose
that anybody who makes use of that dataset would be required to comply
with that. And if you don't explicitly license it at all, I suppose
that nobody would be authorized to use it except for "fair use". But
you must be careful or you might end up "licensing" your data without
even knowing. For example, I don't know the terms of service of
ChatGPT, but it seems a fair guess to assume that whatever you write
into it, you give them unlimited rights to use it. And that may easily
extend to whatever you write into a document processor or other
software that has a "feature" of "integrating" with ChatGPT, even if
you're running it on your own computer (I think I've read that even
LibreOffice is developing such a feature!).

Gerardo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to gerardo.ballabio@gmail.com on Fri Feb 24 14:30:01 2023

On Fri, 24 Feb 2023 at 10:27, Gerardo Ballabio
<gerardo.ballabio@gmail.com> wrote:

If I am not mistaken, the GPLv3 was developed to clarify some
ambiguous language in the GPLv2, mostly with respect to patents. It
doesn't address SaaS -- you are still free to modify the code and keep
your modifications private, even if you run a publicly accessible
service on the modified code.

The Affero GPL <https://www.gnu.org/licenses/agpl-3.0.html> was
developed to specifically address SaaS. This license requires that if
you run a service over a network, you must offer the corresponding
source code to all users of the service.

Thanks Gerardo for your contribution. Then, integrating it in my
previous e-mail, I can say that wherever I wrote GPLv3, it could be
used AGPLv3 instead. However, the example I did was based on the
transfer of the NN binary from one party to another so the GPLv3 was
correctly used in that case because the distribution. Instead, when
the NN is trained and used internally for offering a SaaS, then AGPLv3
should be considered by the authors.

Charles Plessy wrote:

Also, is there a DFSG-free license that forces the training dataset and

the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

As I understand, that is an open legal question. The Affero GPL would
be such a license *if* the training dataset would be considered part
of the code. While that does seem to make sense, as AI code is
essentially non-functional without the training, I am not aware that
there has ever been a pronouncement by a court of law that affirms or
denies it, nor I am aware of any free/open source license that
contains language that deals specifically with that issue, and I'm
pretty sure that there's lot of room for lawyers to argue their point.

Geranrdo, thanks again for your contribution because you highlight the
main point of my proposal: wherever the GPLv3 or AGPLv3 is used, the
most important thing is protecting the collection with such a license
and not every single files/data which instead could have a completely
different authorship and license. This is possible every time the
various parts of the collection have been licensed under terms that
are compatible with GPLv3 or AGPLv3 applied to the whole collection.
Every file/data that does not fulfil this requirement then it should
be delivered apart even better in a different manner or repository.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Gerardo Ballabio on Fri Feb 24 19:10:02 2023

Gerardo Ballabio <gerardo.ballabio@gmail.com> writes:

As I understand, that is an open legal question. The Affero GPL would be
such a license *if* the training dataset would be considered part of the code. While that does seem to make sense, as AI code is essentially non-functional without the training, I am not aware that there has ever
been a pronouncement by a court of law that affirms or denies it, nor I
am aware of any free/open source license that contains language that
deals specifically with that issue, and I'm pretty sure that there's lot
of room for lawyers to argue their point.

To add to this, I'm fairly sure that the companies that are training AI
models on, say, every piece of text they can find on the Internet, or all public GitHub repositories, are going to explicitly argue that doing so is
fair use of the training material. If that argument prevails in court, or
in legislatures, it will not be possible to write a free software license
to prevent this, since the point of fair use is that copyright law does
not apply to that usage and therefore no copyright license can prohibit
it.

I don't think we have any idea yet whether that argument will prevail. It
will probably be years before it reaches a high enough level court in the United States for a definitive ruling, let alone every other relevant
country that will have its own legal judgments. Consider Google
v. Oracle: a suitable case with litigants willing to appeal all the way to
the highest court about the copyright status of library APIs was only
filed in 2015, years after this became a common issue, and it took six
years for it to be decided, and that only in the United States. I would
expect a similar delay. Court systems work very slowly. It's also
entirely possible that court judgments will go different ways in different countries to add even more confusion.

The organizations that have every incentive to argue that it's fair use
have very deep pockets, so they have a substantial chance of success on
the prosaic grounds that the best-funded litigant or lobbyist always
stands a reasonable chance of winning.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam Hartman@21:1/5 to All on Fri Feb 24 20:00:01 2023

"Russ" == Russ Allbery <rra@debian.org> writes:

Russ> To add to this, I'm fairly sure that the companies that are
Russ> training AI models on, say, every piece of text they can find
Russ> on the Internet, or all public GitHub repositories, are going
Russ> to explicitly argue that doing so is fair use of the training
Russ> material. If that argument prevails in court, or in
Russ> legislatures, it will not be possible to write a free software
Russ> license to prevent this, since the point of fair use is that
Russ> copyright law does not apply to that usage and therefore no
Russ> copyright license can prohibit it.

Russ, I'm sure you are aware, but things get very interesting if the
input to AI training is not fair use.
In particular, if Github copilot is a derivative work of everything fed
to it (including all the copylefted works), that gets kind of awkward
for Microsoft.
Perhaps the Github user agreement grants permission for every copyright
holder who has a Github account.
But for everyone else, things could be very interesting.

Unfortunately, if there is not some sort of fair use or sui generis
solution, things like Chat GPT would be impossible because of copyright.
That will create significant energy on the legal front to find a
solution that does not involve negotiating with each right holder
individually.
The AI models are useful after all.

And then there's GDPR and privacy concerns of training data.
If I were a European, I'd definitely be very interested in filing a
subject access request to learn what OpenAI knows about me.

--Sam

-----BEGIN PGP SIGNATURE-----

iHUEARYIAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCY/kCLwAKCRAsbEw8qDeG dD3RAP9yTgVzAksRBdQUnK2japOjAROr9yYyU4HrDG47T+5nnQEArnDAesCms7iX I/y53cE18lFCITPlI8WygWPNS9xWnAk=
=CrYe
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Sam Hartman on Fri Feb 24 20:50:01 2023

Sam Hartman <hartmans@debian.org> writes:

Russ, I'm sure you are aware, but things get very interesting if the
input to AI training is not fair use.

In particular, if Github copilot is a derivative work of everything fed
to it (including all the copylefted works), that gets kind of awkward
for Microsoft.

Perhaps the Github user agreement grants permission for every copyright holder who has a Github account.

But for everyone else, things could be very interesting.

Yes. I didn't express an opinion on what the correct outcome is because
it's not at all obvious to me and I'm not sure that I have an opinion.

As a general principle, as a free software advocate, I approve of an
expansive definition of fair use and believe that far more uses of
copyrighted material should be fair use than are normally considered fair
use today. Expansive definitions of fair use are a key legal component to enabling reverse engineering and compatible replacement of non-free
software with free software, for example.

I'm seeing some tendency for free software advocates who are disturbed by
the other social effects of large AI models (and there are quite a few
things to be disturbed about), and about the degree to which some of them
are parasitic on free software and other free information communities, to respond by advocating for a narrow definition of fair use, at least in
this specific area. I'm worried that this is counterproductive; I think
we rely on fair use much more than incredibly wealthy multinational
software corporations do.

But the specific ramifications of an expansive fair use position for the societal effect of AI models isn't clear to me, and to be honest I'm
dubious that it's clear to anyone at this point. There are obviously some significant risks, including the tendency of scale effects with large
models to further consolidate power into the hands of a small number of
very wealthy organizations.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Sun Feb 26 09:50:02 2023

On Sun, 26 Feb 2023 at 09:09, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

ERRATA CORRIGE

I hope this helps to acknowledge and convince us - as the open-source
and software-libre community - about the great responsabilitiy that is
a burden on our shoulders. Such a responsibility cannot be delegated to
a few because the stake on the table is too high.

Thus we all are involved less or more depending on everyone's ability
to support and contribute.

s/contribute/support/ is a more correct English term for the idea I
had in my mind. However, one term does not exclude the other.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to All on Sun Feb 26 17:10:01 2023

Hi all,

in these two threads

* https://lists.debian.org/debian-project/2023/02/msg00017.html

* https://lists.debian.org/debian-project/2023/02/msg00022.html

we had the chance to confront each other about the emerging A.I. mass
adoption and about which licensing model would be useful to adopt to
protect the freedom of code, data, models, etc.

One topic that seems to make people worry is the "fair use" which is
a legal term but not really well defined - I would use the attribute
"blurry" for it - and this could be a triky attribute to debate about
in a trial.

I wish to add my two cents about this topic and I will follow two
main guidelines:1. historical evolution, 2. priority of legislative
sources.

First of all, we can start with a "for absurd" reasoning and because
we do not have a clear legal definition of "fair use", then we
consider the worst case which means "everything". Well, this is
exactly the situation before the introduction of the copyright:
everything was a fair use case. Then the copyright was introduced to
grant to authors some kind of exclusive rights: moral and material
rights, both. As you can imagine, without the moral rights there would
be no material rights, indeed. However, this aspect is not
relevant for our goal here but just to underline the priority.

The copyright was introduced to move some profit from the editors to
the authors who were starving. Thus, the material rights are about
business, commercial and marketing. Obviously these three terms were
not developed as we are used today but basically these three
activities are clearly related to the value and thus they SHOULD be
exclusive of the author for every copyrighted stuff that s/he created. Moreover, the copyright applies even if the author does not explicitly
claim differently.

Under this point of view, we still do not know what is "fair use" but
for certain we know what is NOT included in "fair use" otherwise the
copyright would fail in principle. Specularly, the copyleft as well.

In fact, the copyright claims {business, commercial, marketing} are
exclusive rights of the author (all rights reserved) implicitly
considering that the author's intention is selling them otherwise no
one could legally enjoy the author's work.

Copyleft trades these rights for something else rather than money but
freedom, something more valuable for some people. Thus, with copyleft
if you like to enjoy the {business, commercial, marketing} rights of
someone else's work, then you have to share back something about the
original work.

Thus this equation takes place:

copyright : money ~alike~ copyleft : freedom

My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
but - to a collection of objects using the database protection,
automatically also solves the problem of a blurry "fair use"
definition. However, to be more incisive about "fair use", it is
better to declare explicitly what is not "fair use". Otherwise, we
risk having to explain this in court. Like in this file header:

https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

# (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
# SPDX-License-Identifier: all rights reserved, but fair use allowed
# Fair use includes test, learning and marketing but not sales, redistribution # leasing, renting or every other commercial/business activities without the
# consent of the author. Every company or individual allowed to use this
# code behind these limitations will be listed here below, if any.

In this specific case, I decided that marketing belongs to "fair use"
because it lets my product be known. In case of A.I. it would not be
fine because the A.I. could suggest directly or indirectly to drink a X-soft-drink and this is marketing, clearly.

So, in conclusion "fair use" was the standard before the copyright introduction then "all rights are reserved" became the standard with
the copyright introduction but this creates others problems because it
was too restrictive so the "fair use" concept was introduced to relax
the copyright but "fair use" was not well defined. It was not well
defined because "{business, commercial, marketing} rights are
reserved" is enough and moreover protecting these rights is the core
reason of copyright law existence altogether.

IMHO, the best we can do is to ask the Free Software Foundation to
write two more licenses or updates A/GPLv3 in A/GPLv4 in which it wil
clearly stated that the license applies to the composition and the
{business, commercial, marketing} rights are reserved and exchanged
for freedom. Then the license presents a "fair use" open definition in
which some rights {testing, learning} are clearly included. Everything
else should be brought back in these two categories. Finally, the
license should state that every collection item that does not have its
own specific copyright and license note/header, it is licensed under
A/GPLv3.

So, in the most simple case in which no any file report a specific
copyright note/header but just the repository, then this happens:

- git repository A is licensed with A/GPLv4
- the composition is under A/GPLv4
- every file is under A/GPLv3

Clearly, we can also have LGPLv4 as long as it makes sense every
other license could be used to create a collection-oriented version of
that license.

Moreover, when an A.I.'s training engine hits a project repository
protected by A/GPLv4 then all the inputs before and after that hit
become part of the input collection which will be protected by A/GPLv4
and all the related consequences that I have just explained in others
e-mail here.

Everything IMHO and hoping that it helps, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Roberto A. Foglietta on Sun Feb 26 21:50:01 2023

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
but - to a collection of objects using the database protection,
automatically also solves the problem of a blurry "fair use"
definition. However, to be more incisive about "fair use", it is
better to declare explicitly what is not "fair use". Otherwise, we
risk having to explain this in court. Like in this file header:

https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

# (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
# SPDX-License-Identifier: all rights reserved, but fair use allowed
# Fair use includes test, learning and marketing but not sales, redistribution
# leasing, renting or every other commercial/business activities without the # consent of the author. Every company or individual allowed to use this
# code behind these limitations will be listed here below, if any.

I'm afraid this is not how fair use works. The whole point of fair use is
that the copyright holder has no control over uses that are fair use.
They can grant additional rights with a copyright license, but they cannot
stop legal fair use, no matter what they write in their license and no
matter what their personal opinions are about what would fall into fair
use.

You can see why this must be so if you think about the role of fair use in copyright law. Fair use is a carve-out for a whole class of uses to which society wants to put copyrighted works without requiring any permission
from the copyright holder, and if necessary against their explicit wishes.

Consider quoting small portions of a work while reviewing it, which is one
of the classic areas of fair use in US law. A copyright holder might like
to only allow friendly reviews to quote their work and prohibit hostile
reviews from quoting their work, or, failing that, prohibit quoting in any review. But the point of fair use is that everyone gets to quote their
work and they get no say in the matter.

And, as mentioned earlier, free software relies heavily on this. Among
the things that are carved out for fair use (and closely related concepts
with roughly the same legal properties, such as limits on what types of
works can be protected by copyright), is the ability to reimplement an
API. There are also a lot of activities around reverse engineering that
are protected by fair use and related limitations. The copyright holders
of that software, if allowed to redefine those limits in their licenses,
would use a far more restrictive definition that prohibited free software competition. But they can't.

This means that it is largely pointless to try to define fair use in a copyright license, since the whole point of fair use is that it applies regardless of the content of the copyright license, even if the copyright license explicitly prohibits things that are legally fair use, and even in
the case of no copyright license at all. The belief or definition offered
by the copyright holder for fair use does not matter and should be ignored entirely for legal purposes. The only definition that matters is the one
made by the legislature and enforced by the courts.

And yes, it is indeed fuzzy, and it may be beneficial for governemnts to
define it more precisely (assuming they didn't break it in the process).
But licenses like the GPL cannot do this, since those are just statements
by the copyright holder. The copyright holder *cannot* have any power to
make fair use less fuzzy, since that would defeat the point of fair use.

A similar principle applies to claiming compilation copyright. Either a compilation is covered by your copyright, in which case you have all the
normal copyright holder rights over it unless you grant them to others
with an explicit license, or it is not covered, in which case it doesn't
matter what you say about it and everyone else can ignore anything you
say. So declaring that any compilation including your work is covered by
your preferred license is only relevant if you have a legal copyright over
the compilation. If you do, then your copyright license matters; if you
don't, everyone else is entitled to ignore your license and your
statements.

I am not a lawyer, let alone a copyright lawyer, and have only an amateur Internet understanding of the nature of compilation copyrights (and they
may well also vary by jurisdiction), but my understanding (possibly
incorrect) of the law in the US is that holding copyright on a member of a collection does not give you any copyright ownership of the collection as
a whole. To gain copyright ownership of the collection, you have to
exercise some sort of creative control over the collection itself, such as
by using human creativity to select its membership, choosing some elements
and discarding others. The person distributing the collection has to
comply with copyright law with respect to the material included that you
hold a copyright on (either satisfying your license or following the rules
of fair use), but if you're not involved in creating the collection, you
don't get any separate rights over the collection itself and cannot assert
a license on it.

There's a bunch of US case law on this around things like phone books
(IIRC, found to not involve enough creativity to have a separate
copyright), recipe collections (copyrightable as a compilation even though recipes themselves are not individually copyrightable), short story collections, and so forth.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Russ Allbery on Mon Feb 27 02:50:01 2023

On Sun, 26 Feb 2023 at 21:47, Russ Allbery <rra@debian.org> wrote:

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
but - to a collection of objects using the database protection, automatically also solves the problem of a blurry "fair use"
definition. However, to be more incisive about "fair use", it is
better to declare explicitly what is not "fair use". Otherwise, we
risk having to explain this in court. Like in this file header:

https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

# (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
# SPDX-License-Identifier: all rights reserved, but fair use allowed
# Fair use includes test, learning and marketing but not sales, redistribution
# leasing, renting or every other commercial/business activities without the
# consent of the author. Every company or individual allowed to use this
# code behind these limitations will be listed here below, if any.

I'm afraid this is not how fair use works. The whole point of fair use is that the copyright holder has no control over uses that are fair use.
They can grant additional rights with a copyright license, but they cannot stop legal fair use, no matter what they write in their license and no
matter what their personal opinions are about what would fall into fair
use.

I am sorry for having confuse you trying to explain a simple fact:

- fair use as legal term is a blurry one
- fair use cannot be limited but expanded (as I did over there)
- fair use could include {testing, learning, storage} and usually it does

HOWEVER

- fair use cannot include {business, commercial, marketing} rights in
anyway and in any conditions

WHY?

Because the principle of the copyright existence is about protecting
the authors' exclusive of that {business, commercial, marketing}
rights.
Because copyleft is a copyright that trades exclusive rights for
freedom instead of money, this is certainly happening also for the
copyleft.

CONCLUSION

We might have problems in identifying all the fair use cases but we
can be very certain about what is NOT fair use.

(in another e-mail about database/collection protection)

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Wise@21:1/5 to Russ Allbery on Mon Feb 27 05:10:01 2023

On Fri, 2023-02-24 at 11:37 -0800, Russ Allbery wrote:

As a general principle, as a free software advocate, I approve of an expansive definition of fair use and believe that far more uses of copyrighted material should be fair use than are normally considered fair
use today. Expansive definitions of fair use are a key legal component to enabling reverse engineering and compatible replacement of non-free
software with free software, for example.

I note that fair use isn't a worldwide concept and other parts of the
world have the more varied and restricted concept of "fair dealing".

https://en.wikipedia.org/wiki/Fair_use#Influence_internationally https://en.wikipedia.org/wiki/Fair_dealing

So, as much as possible, we should try not to rely on fair use.

--
bye,
pabs

https://wiki.debian.org/PaulWise

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmP8Hq0ACgkQMRa6Xp/6 aaPzQRAAhJLCLJnrAtuIaCKk5L3aI7Vy54Q6grlcEDgu9xgTsMqRBNnIhHfKvC1f CKOIrLIJmt7o/QyDQlYKTCYe5GUn7wRqGkVVl7N/But/L62R44mG4MwhEYHF0OL0 sStF7qfU8aFxMji3ld/KNZGIduCeOy5Wg7vyOGXrVvTVd6E4qsUcUZcvYzZhA2A4 pL+Js/dJzJxHNp4DhTn/vz11YLxoNXFts664lbb9yu3A9qKJdVzwqxrojA2f9Drd 5wTBtShUNWxfvlLnZj2nA9e1LpYCxRW0jxhUeQ0h/lbgf54ghtFAnY6zLfheSm+Z 2WVC9OUFDb7Zh4Z2w2/fShZfbvjD37sKhPjFKyL65o+ChFEEOgSiM6PEAKS3vymt SPi+7i25duQZmM/Sa36uiUjukYBNr3qvCiH3ud/JewbUq7syBQRTehPNG9azL/kj azaD3NhYeWK9ezphKZxD19CIdjA0fiJB+o65VM605Tu+RNIMgn4+L8ndkV7grifw H9/ciYN8EihZe4HkzP09KLHJfLSfSGV/bMZdIe49pseNebo6wO9Mf7ULounuyPSZ YEbxonfalwv8gR1j4gSJmzGkfyYuu+oySLkOkJ9EkfXLvEk2j2YASL7MdZPGeEO6 3LJWKd2Lt0i3V+U9jDuDqFnBTCEnO30f6t/LgdTV+4KhLToMGdw=
=UCiN
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Scott Kitterman@21:1/5 to Roberto A. Foglietta on Mon Feb 27 05:10:01 2023

On February 27, 2023 12:45:38 AM UTC, "Roberto A. Foglietta" <roberto.foglietta@gmail.com> wrote:

On Sun, 26 Feb 2023 at 21:47, Russ Allbery <rra@debian.org> wrote:

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
but - to a collection of objects using the database protection,
automatically also solves the problem of a blurry "fair use"
definition. However, to be more incisive about "fair use", it is
better to declare explicitly what is not "fair use". Otherwise, we
risk having to explain this in court. Like in this file header:

https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

# (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
# SPDX-License-Identifier: all rights reserved, but fair use allowed
# Fair use includes test, learning and marketing but not sales, redistribution
# leasing, renting or every other commercial/business activities without the
# consent of the author. Every company or individual allowed to use this >> > # code behind these limitations will be listed here below, if any.

I'm afraid this is not how fair use works. The whole point of fair use is >> that the copyright holder has no control over uses that are fair use.
They can grant additional rights with a copyright license, but they cannot >> stop legal fair use, no matter what they write in their license and no
matter what their personal opinions are about what would fall into fair
use.

I am sorry for having confuse you trying to explain a simple fact:

- fair use as legal term is a blurry one
- fair use cannot be limited but expanded (as I did over there)
- fair use could include {testing, learning, storage} and usually it does

HOWEVER

- fair use cannot include {business, commercial, marketing} rights in
anyway and in any conditions

WHY?

Because the principle of the copyright existence is about protecting
the authors' exclusive of that {business, commercial, marketing}
rights.
Because copyleft is a copyright that trades exclusive rights for
freedom instead of money, this is certainly happening also for the
copyleft.

CONCLUSION

We might have problems in identifying all the fair use cases but we
can be very certain about what is NOT fair use.

(in another e-mail about database/collection protection)

This is not correct. Commercial fair use is quite common world wide. Please don't confuse what you wish the law says with what it actually says. In any case, there are many different laws in many different places, so one can't be very certain about
any of this on a global basis.

Russ's main point, that what is or is not fair use is a function of law, not license is correct. By definition, fair use is about what copyright cannot restrict.

Scott K

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 07:20:01 2023

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

- fair use cannot include {business, commercial, marketing} rights in
anyway and in any conditions

This is definitely not true in the United States; there is a Supreme Court decision saying the exact opposite. The ruling in Google v. Oracle said Google's commercial and business use of Oracle's copyrighted APIs met the
test for fair use.

You can't reconstruct the law from first principles without looking at the actual test that is applied by courts. (And as mentioned this may be
different in different jurisdictions, for additional complexity.) In the
US there's a four-part balancing test for fair use, and the analysis can
be quite complicated.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Mon Feb 27 09:00:01 2023

On Mon, 27 Feb 2023 at 08:38, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

On Mon, 27 Feb 2023 at 07:16, Russ Allbery <rra@debian.org> wrote:

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

[...]

No court ruling was ever emitted in favour of Google vs Oracle
leveraging fair use but it was an agreement between the two parties
supported by Microsoft.

https://arstechnica.com/tech-policy/2021/04/how-the-supreme-court-saved-the-software-industry-from-api-copyrights/

As you can learn from the Ars Technica's article linked here above.

FUNNY FACT

Microsoft convinced Oracle to settle down the cause with Google using
the escamotage based on the unilateral author right of extending the
definition of fair use which is the same I did for marketing about a
single file in one of my projects. This escamotage written down and
deposited in the Supreme Court as agreement between the two parties
allows everyone to do the same of Google about Oracle's API. Thus
Oracle surrendered not because Google leveraged the fair use but
because of Microsoft pressure.

Google: I can include the header, thus I can use the API

Oracle: no you cannot.

Microsoft: Oracle DB cannot run without an operative system, do you agree?

me: ROTFL

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Russ Allbery on Mon Feb 27 08:50:01 2023

On Mon, 27 Feb 2023 at 07:16, Russ Allbery <rra@debian.org> wrote:

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

- fair use cannot include {business, commercial, marketing} rights in anyway and in any conditions

This is definitely not true in the United States; there is a Supreme Court decision saying the exact opposite. The ruling in Google v. Oracle said Google's commercial and business use of Oracle's copyrighted APIs met the test for fair use.

It is true despite a single US case judgment. In the USA there is the
habit to use precedent judgments to influence others judgment but this
applies only if there is a significant analogy between the two cases
and even if there is a very significant analogy the judge can produce
an opposite ruling for that case.

Google lost two trials against Oracle apparently because judges are
not able to distinguish between the API and the code that runs behind
the API. Thus they faced the Supreme Court. This despite the fact that
API is unlikely to be a work that could be protected by copyright
except in very rare cases. Microsoft deposited a memorandum at the
Supreme Court asking them to rule in favour of Google.

Both these facts are in favour of the opinion that I have explained:
1. API might or might not be protected, 2. copyright applies in case
of doubts.

Microsoft used their capability to persuade Oracle in favor of Google.
Thus the parties agreed that Google - when creating libraries with the
Oracle's API - did a fair use of their declaration. After all, if you
cannot include the headers then you cannot also call the original
functions. The fair use in that agreement was an escamotage to avoid
the Supreme Court would have issued a ruling that would be a disaster
in any case AND for both parties to save the honor.

CONCLUDING

No court ruling was ever emitted in favour of Google vs Oracle
leveraging fair use but it was an agreement between the two parties
supported by Microsoft.

https://arstechnica.com/tech-policy/2021/04/how-the-supreme-court-saved-the-software-industry-from-api-copyrights/

As you can learn from the Ars Technica's article linked here above.

You can't reconstruct the law from first principles without looking at the actual test that is applied by courts. (And as mentioned this may be different in different jurisdictions, for additional complexity.)

I can reconstruct the interpretation of a law from basic principles
otherwise it would not be a law but something that appeared from
nothing: no any law roots, no any law authority. Like every three, a
law is stronger when it has ancient and well developed roots. Thus, a
law interpretation based on reconstructing it from its principle is
the most significant, the most important and the most persuasive way
of doing such a task.

In the
US there's a four-part balancing test for fair use, and the analysis can
be quite complicated.

The U.S. law interpretation is not the source of the truth. Moreover,
it does not matter how fair use is defined in many different
legislations around the world. By copyright principle, it cannot allow
doing activities like {business, commercial, marketing} without the
consent of the author or of the license. The "fair use" is a false
friend and ignoring it is the best choice.

CONCLUSION

If the question "what is X?" does not work well, then try the opposite
"what is not X?" - It is not important to define "fair use" as long as
we can certainly define what does NOT cover the blurry fair use
definition. After all, we were interested since the beginning on "what
is not fair use" thus asking the right question is half of the work
done.

@Russ: please write to me in private if you need more clarification.
At this point anything further has very little to do with the
community needs.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 09:00:01 2023

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

A totally automatic procedure like web crawling and web indexing
re-enter in your example, perfectly. However, the input collection that
a ML/AI training system needs is a protectable work because the data
should be structured, selected and properly labeled even if these
activities are done with rules like it happens using SQL for
databases.

Yes, I agree, I think that a trained AI model is a protectable work.
However, it is not protectable *by you* unless you're the one who wrote
the model and chose its training.

Therefore, putting a clause in your copyright license saying that if your
work is incorporated into an AI model, that AI model as a collection is
covered by some particular license is not really a thing you can do. The
best you can do is the standard GPL thing of saying that you don't have to license your collection under any particular license, but if you don't,
you don't have any right to include this specific work. Maybe that's what
you were getting at, and I just didn't understand.

That second approach of course only works if the use of the GPL-covered
work is not fair use. If it is fair use, then the person creating the collection can ignore any provision of the license, so we're back to the question of whether AI training is fair use.

So, web indexing and statistics are created over a input collections
that are *not* a creative works and these tools access to every
copyrighted works in fair use as long as they respect the robots:no
meta-tag when it is applied to a copyrighted work. Instead, training a
ML/AI is a completely another story and their input collections are a protectable collection under the copyright law.

I don't think it's anywhere near that easy to distinguish a web search
index from an AI training model in copyright law. They seem like very
similar cases to me. A great deal of creativity and human control go into selecting how pages are chosen for search indices (otherwise, every search engine would be unusable due to search optimization spam), and search
engines even retain and redistribute portions of the documents they index.

My guess is that *both* of these are protectable collections. And the
entire Internet currently assumes that building a search engine is fair
use of the Internet-accessible indexed documents, even if that search
engine is then used and marketed for commercial and business purposes, as Google, Bing, etc. all are.

If you believe that AI training is *not* fair use, I think you're going to
have to wrestle with the substantial similarities between AI training and
the Google search engine. I think it may prove challenging to write an analysis that says AI training is not fair use, but Google's search
indexing is fair use. Or, I guess, argue that Google's search indexing is
also not fair use but falls into some other exception to copyright law
like an implicit license, but there I'm *way* out of the depth of my legal understanding.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Russ Allbery on Mon Feb 27 09:40:01 2023

On Mon, 27 Feb 2023 at 08:50, Russ Allbery <rra@debian.org> wrote:

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

A totally automatic procedure like web crawling and web indexing
re-enter in your example, perfectly. However, the input collection that
a ML/AI training system needs is a protectable work because the data
should be structured, selected and properly labeled even if these activities are done with rules like it happens using SQL for
databases.

Yes, I agree, I think that a trained AI model is a protectable work.
However, it is not protectable *by you* unless you're the one who wrote
the model and chose its training.

Therefore, putting a clause in your copyright license saying that if your work is incorporated into an AI model, that AI model as a collection is covered by some particular license is not really a thing you can do. The best you can do is the standard GPL thing of saying that you don't have to license your collection under any particular license, but if you don't,
you don't have any right to include this specific work. Maybe that's what you were getting at, and I just didn't understand.

Dear Russ, I was completely wrong about your ability to contribute to
this discussion because the chance you gave me to confute your thesys
is the best occasion to pave the way to the lawyers that will one day
enforce the A/L/GPLv4 in a court. So, let me explain it in a very
simple and straightforward way:

- A/L/GPLv3 applies to source code and scripts that should be compiled
or run by an interpreter

- the AI/ML training engines use source code and scripts as data, this
might or might not be a fair use, but for sure is a novelty which is
not covered by A/L/GPLv3

- then I decided to protect my projects repositories as database
(collection) in addition to the standard way to protect the code with
a well-known license

- because of the copyright law about databases, if someone creates a
larger database that contains my database or a part of it, then they
have to comply with the license that I choose to protect my project as
a database.

You see, it is a very simple and straightforward concept. The only two
ways to get off this are 1. make unlawful the database copyright law,
2. make a law for which the training input collection is not coverable
by the copyright law. In both cases every employer can bring to their
home a copy of a database or a copy of AI training inputs and share it
with all the rest of the world. Moreover, the 1. includes the 2 while
the 2. would seriously undermine the database copyright law because
every database could be a training set for an AI/ML engine.

Russ, do you agree? :-)

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 18:50:02 2023

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

On Mon, 27 Feb 2023 at 07:16, Russ Allbery <rra@debian.org> wrote:

This is definitely not true in the United States; there is a Supreme
Court decision saying the exact opposite. The ruling in Google
v. Oracle said Google's commercial and business use of Oracle's
copyrighted APIs met the test for fair use.

It is true despite a single US case judgment.

It's not a single US court judgment. The standard for fair use in the
United States was created by a series of Supreme Court judgments starting
with Folsom v. Marsh in 1841 and enshrined in US national law in 17
U.S.C. § 107 in 1976:

Notwithstanding the provisions of sections 106 and 106A, the fair use
of a copyrighted work, including such use by reproduction in copies or
phonorecords or by any other means specified by that section, for
purposes such as criticism, comment, news reporting, teaching
(including multiple copies for classroom use), scholarship, or
research, is not an infringement of copyright. In determining whether
the use made of a work in any particular case is a fair use the
factors to be considered shall include—

(1) the purpose and character of the use, including whether such use
is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to
the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of
the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of
fair use if such finding is made upon consideration of all the above
factors.

You can find this history numerous places on-line, for example:

https://law.marquette.edu/facultyblog/2022/10/the-surprisingly-confused-history-of-fair-use-is-it-a-limit-or-a-defense-or-both/

Many fair use cases in US history have been about commercial use.
Probably most, since companies with commercial uses are more likely to go through the trouble of lawsuits. Commercial fair use is routine within
the classic examples of fair use, such as parody and quoting for
commentary.

This is the law in the United States. The law in other countries of
course may be quite different. But given that many of the actors who are relevant to a discussion of large AI models at present have a significant
locus in the United States, US law is going to play a large role.

No court ruling was ever emitted in favour of Google vs Oracle
leveraging fair use but it was an agreement between the two parties
supported by Microsoft.

This is not correct summary of the outcome of Google v. Oracle, nor is it
what the Ars Technica article you liked said. There was no agreement
between the parties in the question before the Supreme Court. The case
went to judgment and the Supreme Court ruled in favor of Google on fair
use grounds, mooting (and not ruling on) the question of copyrightability
of the API definitions.

Appeals like this in the US are generally over a specific question of law
and do not settle the *entire case*, so the Supreme Court then remanded
the case to trial court to dispose of the rest of the lawsuit. I didn't
follow it after that because the details following the Supreme Court
decision are generally uninteresting since they're probably forced by the decision. It's quite possible that the parties mutually agreed to dismiss
the case after that decision because the decision meant Google was certain
to win. But the Supreme Court decision was not an agreement between
parties.

This is important because in US law if the parties had reached an
agreement before the decision, the case would generally be dismissed and
thus not receive a court judgment and therefore not create precedent.
Google v. Oracle did not settle; it was decided by the Supreme Court and therefore did create binding precedent for further district court
decisions on similar cases.

I can reconstruct the interpretation of a law from basic principles
otherwise it would not be a law but something that appeared from
nothing: no any law roots, no any law authority.

If this is your approach to legal analysis, I think I will stop here,
since any further discussion along these lines is going to be pointless.

Moreover, it does not matter how fair use is defined in many different legislations around the world. By copyright principle, it cannot allow
doing activities like {business, commercial, marketing} without the
consent of the author or of the license.

This is simply not true, and it is very good for free softawre that this
is not true. One is still allowed to do reverse engineering and API replacement under fair use even if one is doing it for business and
commercial purposes, and lots of free software development is done for
business and commercial purposes.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 19:10:01 2023

"Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

- then I decided to protect my projects repositories as database
(collection) in addition to the standard way to protect the code with
a well-known license

- because of the copyright law about databases, if someone creates a
larger database that contains my database or a part of it, then they
have to comply with the license that I choose to protect my project as
a database.

In the United States, this is only true if (a) the collection is
copyrightable (let's presume that's true in this case), and (b) their use
of your collection is not fair use. If their use of your collection is
fair use, then they do not have to comply with your license.

In other countries, I have no idea. Presumably there is a similar set of
rules under the same or different terms to allow such things as parodies,
but the boundaries may be different and I know very little about how those rules have been applied to software outside of the US. My understanding
is the Berne Convention doesn't standardize the rules around fair use
(under whatever name), so this can differ a lot by jurisdiction.

You see, it is a very simple and straightforward concept. The only two
ways to get off this are 1. make unlawful the database copyright law,
2. make a law for which the training input collection is not coverable
by the copyright law. In both cases every employer can bring to their
home a copy of a database or a copy of AI training inputs and share it
with all the rest of the world. Moreover, the 1. includes the 2 while
the 2. would seriously undermine the database copyright law because
every database could be a training set for an AI/ML engine.

Russ, do you agree? :-)

No. It's entirely possible that using databases as training sets for an
AI/ML engine is fair use under existing United States law and precedent as
long as that use is sufficiently transformative (the first factor of the
test, and I suspect the most important one here). The obvious example is
a search engine, which performs a similar transformation of clearly
copyrighted works into a new service with a different purpose, without the explicit permission of the copyright holders.

This is the reason why people have focused so much on GitHub Copilot's willingness to insert large blocks of code from other projects verbatim. Reproducing code from other projects is less transformative and looks more
like simple copying, and therefore opens GitHub to a legal argument that
their AI model is not sufficiently transformative to be fair use.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Russ Allbery on Tue Feb 28 00:40:01 2023

On Mon, 27 Feb 2023 at 19:08, Russ Allbery <rra@debian.org> wrote:

No. It's entirely possible that using databases as training sets for an AI/ML engine is fair use under existing United States law and precedent as long as that use is sufficiently transformative (the first factor of the test, and I suspect the most important one here).

Considering what you reported in the previous e-mail about US national
law in 17 U.S.C. § 107 in 1976, It is not possible to use an entire or
a significant portion of a database for {business, commercial,
marketing} purposes without the copyright holder.

Whoever says the contrary forgot that fair use has been introduced to
allow those non-profit activities which have a social value plus few
profit activities (like journalism) that have a social role but the
former could use a very limited portion of copyrighted work. Very
simple and straightforward example is a newspaper article that cites a
couple of paragraphs from a book or some statistical data from a
private database. There is no chance that the incorporation of an
entire database (or a significant part of it) would enter into fair
use for {business, commercial, marketing} purposes otherwise the
principle of copyright would be gone.

I strongly feel that this discussion cannot continue because the
presentation of a mass of legal stuff without a comprehension of the
law principles would lead nowhere more than a show like some US trials
are. Principles cannot be bend by misinterpretation, misjudgement and ill-written law like US national law in 17 U.S.C. § 107 in 1976 in
which point (1)...(4) are written in such a way that everyone that is
not very acknowledged about principles could misunderstand up to
absurdity.

This (1) does not mean that non-profit and for-profit activities are
equal in enjoy the fair use

(1) the purpose and character of the use, including whether such use
is of a commercial nature or is for nonprofit educational purposes;

but it means the opposite, that the two activities can fair-use a
completely different amount of the copyrighted work

(3) the amount and substantiality of the portion used in relation to
the copyrighted work as a whole

and in particular the (3) also means that if I write an article of a
few words, it is not fair-use 2 paragraphs of a book.

One more thing: it does not matter that two parties had N trials
settled but the agreement they had at the end - principle - because a significant judgement is a definitive one otherwise it means that it
was not significant enough even to close that specific case.

The obvious example is
a search engine, which performs a similar transformation of clearly copyrighted works into a new service with a different purpose, without the explicit permission of the copyright holders.

This is another completely story for two reasons:

1. indexing by keywords - the website manager tagged that keyword, so
the content has not been accessed
2. web crawling is an automatic process that do a keyword
identification and associate them to the url

This process has nothing to do with the content unless you would
affirm that the word "cataclysm" cannot be used because it belongs to
a certain copyrighted book and moreover this process is completely
automated in which no human creativity has been involved. Moreover,
indexing and web crawling are totally different processes that lead to
totally different results and aims than those related to an AI
training. Forget to make an analogy between AI training and Google
business because they are completely different things.

This is the reason why people have focused so much on GitHub Copilot's willingness to insert large blocks of code from other projects verbatim. Reproducing code from other projects is less transformative and looks more like simple copying, and therefore opens GitHub to a legal argument that their AI model is not sufficiently transformative to be fair use.

Transformative is not the key, incorporating large pieces of code is
not the key. This is the peak of the iceberg for which people realised
that their code has been used. The iceberg to handle is the learning
process before it happens which is about the input collection. Here we
are: the input collection of an AI/ML training system is what we want
to keep free. Why do we want to keep the input collection? Because
like in compilation we also have the entire model in freedom. This in
exchange for the right to use our code as input data.

I am pretty sure that those complaining about GitHub Copilot are not
upset because the AI is not transformative enough to masquerade their
code!

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Wise@21:1/5 to Roberto A. Foglietta on Tue Feb 28 06:40:01 2023

On Mon, 2023-02-27 at 01:45 +0100, Roberto A. Foglietta wrote:

Because the principle of the copyright existence is about protecting
the authors' exclusive of that {business, commercial, marketing}
rights.

The purpose of copyright is allegedly (in the USA) "To promote the
Progress of Science and useful Arts, by securing for limited Times to
Authors and Inventors the exclusive Right to their respective Writings
and Discoveries.". The author's rights are *secondary*, which is why
fair use exists. Of course these days copyright isn't very time-limited
and in the age of DCMA/DRM, video mashups, fanfic and supporter-funded creators, copyright just ends up limiting progress in many fields,
another reason why fair use is important. Making fair use only
available for non-commercial uses would almost destroy it.

--
bye,
pabs

https://wiki.debian.org/PaulWise

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmP9dXAACgkQMRa6Xp/6 aaOOmRAAlCcSqm/gyLnL4P+D5IrTn5hUM1f4cCcVWaSAfiok0LXx7M7bDNuoh9m9 zqKJe6PHwlvR0O0INxzeyy+n2g+su6UYX8o0M9humcex17wR0/cM+xsUDZAfapv+ DM1mXSB6m0Zosa8V19pHhxko1TO+OtKqHTKbF+sLM/Nymq66DMtEuV0rf0WuCG0p CQiNPf9ftWKREIqsdOcxQFUgkrZwPzZqU3kRpwAjmdzhHZeaaqgc+jiVkxpDN5AJ MpB4Fnr0zO3Nz3P7mpyRw1UcXwj8dxybxzLxx1GI4SgrQfokUTnhzVpXdbOopTIn xvssWk4htHOCExhk2xmFLWj3nW6Wem10pS3LbIi3S8bh8m5jsh036BfwbivmSmbF iIuPkZ3bgGVDC9iuinSFLRPi2Jtm6eBzgOtv6cC5pv3ZYy0FVJLk47gXjKdfgNFW b7Entp0pNOaoWRzD1wUKRAlEsq0vW7pne/be+xxLVymRdyZkzDHuIhmI6PjkEI6i 5oghDlLzZScHb99jWl/KcVx/pUsVdPpj8hMGSZQVY9kGshxZUbtqgrNLXtUjaUud STSr44riUX4Pazq2E7vPJldamcnyJYF5xQdQP/W6JetBemWO4yGFmVyikGPPAFSs Q/UEKBT/7PXg5UAa+Z20SdhpfK4A7mhm6kgf9jXJpCV3DN22YNc=
=2c4J
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Tue Feb 28 09:10:01 2023

On Tue, 28 Feb 2023 at 08:33, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

One more thing about this following:

2. if an author does not exercise a right for a long period of time
enforcing it then that right is lost for the principle of "usucapio"
in latin

The "usucapio" principle might not be easily recognised by some
common-law legal systems but for sure it is in every latin-law legal
system. However, I am quite optimistic that common-law legal systems
can recognise that principle because also common-law legal systems are indirectly based on latin-law foundations by historical ramification.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Paul Wise on Tue Feb 28 09:10:01 2023

On Tue, 28 Feb 2023 at 06:31, Paul Wise <pabs@debian.org> wrote:

On Mon, 2023-02-27 at 01:45 +0100, Roberto A. Foglietta wrote:

Because the principle of the copyright existence is about protecting
the authors' exclusive of that {business, commercial, marketing}
rights.

The purpose of copyright is allegedly (in the USA) "To promote the
Progress of Science and useful Arts, by securing for limited Times to
Authors and Inventors the exclusive Right to their respective Writings
and Discoveries.".

A nice political manifesto. However historically it was

- a nice idea someone suggested to their queen
- a way to control and exercise the censorship
- a regalia for someone: salt, pepper, copyright

https://en.wikipedia.org/wiki/History_of_copyright (*)

Among almost in every different implementations, copyright had one
thing in common:

- protecting authors' exclusive rights of restrict anyone to use their
work without the author consent

Translated in modern words the authors' exclusive of {business,
commercial, marketing} rights over their work. The idea of fair use
came much later to boost social benefits more than made authors rich
but one thing was innovative in copyright since the beginning

- a monopoly established in favour of a class of "many individuals"

I used "many individuals" here because "people" came from latin
"plebis" and using "people" in this context would not be
etymologically nor historically correct.

Moreover, because a monopoly established in favour of individuals is a
"right" like the right to have the property of something real - this
explains the roots of the "intellectual property" term - and this
right is establish by a law, then two things happen:

1. if the law is not respected by almost all the people then it is not enforceable in a trial for the principle that has not been accepted by
those it is supposed to rule over and to enforce it over one in
particular, it is a very nasty discrimination while almost all are left
free to not abide by that law.

2. if an author does not exercise a right for a long period of time
enforcing it then that right is lost for the principle of "usucapio"
in latin

This explains why someone - once has been informed that because they
did not enforce a part of their right for a very long period of time,
then that part of their right has been lost - tried to reclaim also
those parts of their right. Unfortunately when the {speaking,
intention, actions} are not properly aligned a late attempt is much
worse than accepting the status quo. The way to hell is paved with
good intentions and poor implementations.

CONCLUSION

I am very happy to know that the US copyright law has a very nice
political manifesto but the way in which a law is applied, defines its principle. The application defines the law's aim or written with other
words: the good that is going to protect and promote. [sarcasm on]
After all, the U.S. are very notorious for free schooling and
universities, right? [sarcasm off].

NOTE

(*) there is much more interesting and complete material about the
history of copyright, especially in the London public library.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam Hartman@21:1/5 to All on Tue Feb 28 15:50:01 2023

"Roberto" == Roberto A Foglietta <roberto.foglietta@gmail.com> writes:

Roberto> On Mon, 27 Feb 2023 at 19:08, Russ Allbery <rra@debian.org> wrote:
>>
>> No. It's entirely possible that using databases as training sets
>> for an AI/ML engine is fair use under existing United States law
>> and precedent as long as that use is sufficiently transformative
>> (the first factor of the test, and I suspect the most important
>> one here).

Roberto> Considering what you reported in the previous e-mail about
Roberto> US national law in 17 U.S.C. § 107 in 1976, It is not
Roberto> possible to use an entire or a significant portion of a
Roberto> database for {business, commercial, marketing} purposes
Roberto> without the copyright holder.

Please stop!
It's clear that you are not building support for your argument.
You've made your case to the best of your ability and not been
convincing.

But beyond that, this discussion is no longer on topic for
debian-project.
Debian cannot decide what the law is.
We've established that this situation is complicated.
You've proposed various things that someone could do to limit the use of
free software in AI training sets.
Other people have pointed out that may or may not work.
You think it will.
You haven't managed to convince your critics..
We won't know until this gets hashed out in courts.

That's about the level of detail appropriate for debian-project.
Further discussion of this issue at this time on this list does not
serve the community.

--Sam

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iHUEARYIAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCY/4RYwAKCRAsbEw8qDeG dOe/AQDOhTxIgovO7SiDleIf3yiYji52I61PYg+ZJ7SDCEGd6wD9FC4o9evRI5gO rGWNM7ZoncI9JTlrWFl9UMOGXsrK1gQ=7JgW
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Sam Hartman on Tue Feb 28 19:40:01 2023

On Tue, 28 Feb 2023 at 15:36, Sam Hartman <hartmans@suchdamage.org> wrote:

"Roberto" == Roberto A Foglietta <roberto.foglietta@gmail.com> writes:

Roberto> On Mon, 27 Feb 2023 at 19:08, Russ Allbery <rra@debian.org> wrote:
>>
>> No. It's entirely possible that using databases as training sets
>> for an AI/ML engine is fair use under existing United States law
>> and precedent as long as that use is sufficiently transformative
>> (the first factor of the test, and I suspect the most important
>> one here).

Roberto> Considering what you reported in the previous e-mail about
Roberto> US national law in 17 U.S.C. § 107 in 1976, It is not
Roberto> possible to use an entire or a significant portion of a
Roberto> database for {business, commercial, marketing} purposes
Roberto> without the copyright holder.

Please stop!
It's clear that you are not building support for your argument.
You've made your case to the best of your ability and not been
convincing.

But beyond that, this discussion is no longer on topic for
debian-project.
Debian cannot decide what the law is.
We've established that this situation is complicated.
You've proposed various things that someone could do to limit the use of
free software in AI training sets.
Other people have pointed out that may or may not work.
You think it will.
You haven't managed to convince your critics..
We won't know until this gets hashed out in courts.

That's about the level of detail appropriate for debian-project.
Further discussion of this issue at this time on this list does not
serve the community.

Ok, then. No problem. This will be my last message on this topic.

However, my last suggestion here is to collect this material and share
it with the FSF and FSF Europe. My aim was not to convince people
(consensus gain) but to give technical details relevant to those who
have a law education but usually lack the ability to properly
understand technical IT mechanisms in detail. Only few have the
ability to master both sides. It is not about complexity [1], it is
about complication [2] and the complication arises because IT people
and law people have two completely mindsets and risk/value perceptions
and follow different rules to address them.

We won't know until this gets hashed out in courts.

About upgrading A/L/GPLv3 in A/L/GPLv4, it seems to me quite an
urgent thing to do but challenging it in a court might happen years
from now. So there is a lot of time for preparation.

About "uscapio" and related questions, there is a very very little
probability that someone will ever bring anyone in court and in case a
very little patch in the kernel will make a huge difference in finding
an agreement which is well known how it should be. The patch has been
shared with some kernel maintainers some months ago and it is not
pending to be applied because I did not complete all the steps
required. That patch implies license and technical changes in
perspectives, both.

Debian cannot decide what the law is.

Law is somewhat different in different countries, starting from those
countries in which you have a better chance. There are many of them.
Do not try to win the world in a single step but play chess instead.
The king is the last piece to take, not the first one. If you feel in
danger, grant your position in all the countries in which it is
feasible and cheap enough. Bringing in allies is the first thing to
do. Moreover, allies can be cheap for Debian to acquire and very
costly for your counterparty to move on their side.

Everyone that has a kind of urgency about doing business can employ me
and I will set up a near-complete solution for them that I did not
explain to everyone - oh, it is a risky business, then. Nein, it is about thinking out of the box and replicating the same scheme that worked in
the past in other similar cases. And yes, this would greatly help the
Debian community as well because it will break down every illusion
about finding another way to go.

Their resistance is futile (cit.) but enjoyable. :-)

Good luck, R-

[1] complexity (n.) "composite nature, quality or state of being
composed of interconnected parts," from complex. Meaning "intricacy".

[2] late Middle English: from late Latin compilation (n- ), from Latin complicare ‘fold together’ - what can be fols, can be unfolded
(explained).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Tue Feb 28 20:00:01 2023

On Tue, 28 Feb 2023 at 19:23, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

Everyone that has a kind of urgency about doing business can employ me
and I will set up a near-complete solution for them that I did not
explain to everyone

The "near-complete" does not mean that it is work-in-progress. It
means that it fully covers every significant business case I know.
Those that are not covered are not significant. For those I do not
know, I cannot say anything about even if they exist or not.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Stallman@21:1/5 to All on Thu Mar 2 06:00:02 2023

[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> About upgrading A/L/GPLv3 in A/L/GPLv4, it seems to me quite an
> urgent thing to do but challenging it in a court might happen years
> from now. So there is a lot of time for preparation.

Making a new version of the GPL is a big effort, and I'm the one who
has to lead it. I have not been able to follow this discussion; it
was long an complicated. If it described a reason to change the GPL,
I could not see it.

Would you please tell me the problem that you think the GPL needs to
be changed for?

--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Richard Stallman on Thu Mar 2 11:40:01 2023

On Thu, 2 Mar 2023 at 05:31, Richard Stallman <rms@gnu.org> wrote:

[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> About upgrading A/L/GPLv3 in A/L/GPLv4, it seems to me quite an
> urgent thing to do but challenging it in a court might happen years
> from now. So there is a lot of time for preparation.

Making a new version of the GPL is a big effort, and I'm the one who
has to lead it. I have not been able to follow this discussion; it
was long an complicated. If it described a reason to change the GPL,
I could not see it.

Would you please tell me the problem that you think the GPL needs to
be changed for?

Microsoft Github Copilot has shown to use large blocks of code without
citing the author/project nor indicating the license terms about that
code. This is the tip of the iceberg only because the problem is much
worse than this and it will worsen faster. Debate is about fair use
but is a blurry definition and defining what is not "fair use" seems
not having gained enough consensus. Thus a general and standard
solution is required, IMHO.

- A/L/GPLv3 applies to source code and scripts that should be compiled
or run by an interpreter (not only but just to be specific)

- the AI/ML training engines use source code and scripts as data, this
might or might not be a fair use, but for sure is a novelty which is
not covered by A/L/GPLv3

- then I decided to protect my projects repositories as database
(collection) in addition to the standard way to protect the code with
a well-known license

- because of the copyright law about databases, if someone creates a
larger database that contains my database or a part of it, then they
have to comply with the license that I choose to protect my project as
a database.

At this point it is necessary to report how to upgrade these licenses
has been proposed but first a brief summary about fair use:

- fair use as legal term is a blurry one

- fair use cannot be limited but expanded by the authors/licenses

- fair use should include {testing, learning, storage} and usually it does

- fair use cannot include {business, commercial, marketing} rights in
any way and in any conditions and can relax these rights only a little
bit and for those activities/professions that have a clear social
role/value.

To better understand this point of view, I suggest digging into the
history of copyright. The London public library has a lot of material
about it considering that the UK was one of the first countries to
develop the law further than a mere top-down dictate.

THE PROPOSAL

A/L/GPLv4 is an update in which it will clearly state that the license
applies to the composition and the {business, commercial, marketing}
rights are reserved and exchanged for freedom. Then the license
presents a "fair use" open definition in which some rights {testing,
learning} are clearly included. Everything else should be brought back
in these two categories. Finally, the license should state that every collection item that does not have its own specific copyright and
license note/header, it is licensed under A/GPLv3.

So, in the most simple case in which no any file report a specific
copyright note/header but just the repository, then this happens:

- git repository A is licensed with A/GPLv4
- the composition is under A/GPLv4
- every file is under A/GPLv3

Thus this equation takes place:

copyright : money ~alike~ copyleft : freedom

and the definition of "fair use" is not intended to "change the law"
but to give a standard interpretation of a blurry definition that
exists in all legislations but differently perceived and differently
written. Because the A/L/GPLv4 will have a global scope, then its
"fair use" definition/clarification will help many countries to align
to a standard definition and interpretation, We cannot change the law
but we can help those do that job to converge toward a standard and
reasonable definition.

Moreover, I suggest to remember in the license that without the moral
rights {authorship} the copyright itself has no meaning and thus all
related rights are void. Just to remember those companies who are used
to removing the name of authors from their source code headers in such
a way nobody, even an internal inspection can find them and verify
that all the rights have been properly and legally transferred. Again,
this would not change the world but acknowledge developers about their
rights. Education is as important as influencing as much ruling in
court, especially in open-source software-libre.

Everything above, IMHO and in the hope that it helps.

Best regards,
--
Roberto A. Foglietta
+49.176.274.75.661
+39.349.33.30.697

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bradley M. Kuhn@21:1/5 to All on Fri Mar 3 05:30:01 2023

Hey, everyone, as many of you probably know, I've been involved with many of the GPL and AGPL enforcement efforts that are (publicly) known to have happen in the USA since 1999, and also have been involved with the drafting process
of various copyleft licenses. I currently am continuing that ongoing work along with my colleagues at Software Freedom Conservancy (SFC).

From that context and point of view, there are three main points I want to contribute to this discussion:

Point 0:

Always keep in forefront of your mind that the complexity of legal issues and enforcement of licenses lags technology by a period measured in decades. For example, many folks have referenced the Google v. Oracle SCOTUS case — which dealt with questions of software licensing and copyright that we were discussing in the copyleft community as far back as the early 1980s. Yet,
the case didn't come before SCOTUS for consideration until a few years ago,
and (on top of that) SCOTUS' decision was complex and didn't really resolve some of the fundamental questions that we all have about how software
licensing works. Most of the key issues (such as “where is the bright line for when it becomes copyright infringement if you reimplement a known, documented API”?) that we in FOSS were worried about, while they *came up* in Google v. Oracle, they still remain open legal questions in the USA.

Point 1:

FOSS licensing doesn't rely solely on copyright law. Yes, grant of a
copyright license is the fundamental part of all FOSS licenses, but they are also contractual agreements too. (For folks unfamiliar with this point, I encourage you to read the stuff we published at SFC's when we filed our case against contract Vizio <https://sfconservancy.org/vizio/>.) So, when
thinking about these questions, an exclusive focus on copyright questions
might not be particularly helpful.

Furthermore, copyright law isn't moral code: it's just an extremely flawed legal system that we're forced to deal with because various regimes decided back in the 1970s that software would be governed by copyright. What
copyright law says or doesn't say in any particular jurisdiction never
provides us any moral compass to what is wrong or right for software freedom. We must approach *that* question “a priori” (and as philosophers) because all
the “a posteriori” exploration of the question in the real world are just too
heavily biased by the incumbent capitalist structures that serve and/or
benefit from the proprietarization of software.

On that point, I do invite everyone over to the mailing list we're hosting at SFC to discuss the morality and ethical implications in FOSS of machine-learning-assisted software development. You can read more about
this, and subscribe the maling list, via: <https://sfconservancy.org/news/2022/feb/23/committee-ai-assisted-software-github-copilot/>

Point 2:

There are a number of mistakes FOSS activists have made historically in copyleft licensing creation and drafting. Having been involved myself in the invention and drafting of AGPLv3, and a somewhat-involved witness to the
GPLv3 drafting process, I learned the hard way that trying to address every “issue of the day” quickly in a copyleft license draft leads to problems.

A big example appears in the patent provisions found in A/GPLv3§11¶3-6. They are complicated, unnecessarily wordy, and as full of loopholes as the worst
tax legislation. Admittedly, the primary problem there may be that the drafting process was over-influenced by large patent holders. However, the reason such influence was successful was because of a fervor of concern among FOSS activists about seemingly-urgent patent issues of the day. In
hindsight, those issues were either moot, or turned out even *worse* than we imagined, and therefore poorly addressed by this section anyway.

To be clear abundantly clear so I'm not misunderstood: I'm analyzing these issues in hindsight to help inform our current issues of the day. Lots of really experienced and smart policy people contemporaneously believed
(probably reasonably) that A/GPLv3§11 was the be-all-end-all of patent language for copyleft. But the behavior and legislation both changed in the intervening years, *and* some seemingly huge problems of those days also seem minuscule in the rear view mirror a decade later, and problems that we
thought were solved or could be solved stubbornly got worse.

Most importantly to this point, over the decade after GPLv3's release, lots
of corporate attorneys pushed heavily anti-GPLv3 agendas — claiming that the patent language was the problem. In fact, after years of work responding to those (as it turned out, specious) criticisms, we later learned that the
patent language was just a convenient place to hang their hats in their
broader anti-GPLv3 campaign. So, IMO, we (as a FOSS community) got basically *no* policy gains on patent issues in GPLv3 that we didn't already have in GPLv2, *but* we handed the opposition a bunch of text for them to paint as “big scary reasons” to avoid GPLv3. That's a huge factor in how we ended up
in the complex GPLv2-only / GPLv3-or-later divide in copyleft circles that we have today.

IMO, this seemingly unrelated example really shows three key issues highly relevant to the issue of machine-learning and FOSS:

(a) it's very easy as a FOSS license drafter to be caught up in the issues
of the day and overcompensate by writing more text into the license
thinking it's great policy but then it backfires for
political/social/enforceability/advocacy reasons,

(b) the echo chambers and deference to incumbent authority that have
historically dominated FOSS license drafting really have been
problematic and we've not fully explored how to solve that for future
drafting, and

(c) because copyleft is such an amazing invention, we (as a FOSS community)
have a tendency to see everywhere nails that we think the hammer of
copyleft can hit — even when they may well be screws, not nails.

On (c), I point to my current-favorite license, AGPLv3, which I admittedly helped design and draft. Ultimately, AGPLv3 didn't do nearly as much as we'd hoped to solve software rights for network-deployed software, precisely
because the software freedom and rights issues that come up in such software *can't* fully be addressed merely by a copyleft provision. We erred because
we didn't see the obvious: a good copyleft license is a *necessary* but not a *sufficient* condition to assure users' software rights and freedoms.

Furthermore, we didn't carefully consider when building the Affero clause how much it could be abused in proprietary licensing schemes by companies like Neo4j, MongoDB, and others. Specifically, only years later did the community (thanks to Richard Fontana) figure out that a copyleft equality clause was an absolutely mandatory to offset the problem more on this at <https://sfconservancy.org/blog/2020/jan/06/copyleft-equality/>. Proprietary relicensing is more-or-less a relatively simple problem to describe and
study, yet it took us about 30 years to come up with a copyleft clause that
can actually address the problem elegantly and in an enforceable way.

As such, based on all this that I've learned in copyleft drafting, I advise *extreme caution* about rushing to copyleft as an obvious solution to the disturbing things happening with machine learning applications. There may
well be ways copyleft can be used to fight back against the horrible things that OpenAI, Microsoft's GitHub, and dozens of other for-profit companies are doing with machine learning. However, I'm quite sure that whatever ways we think copyleft can (or can't) be modified/improved/changed/applied to help
may well turn out to be the wrong decision if we rush.

The most important thing we can do now is advocacy: first and foremost, we
need to raise awareness about why this technology is bad for users and
impedes their software freedom and rights. There are natural allies around — from folks in the visual arts, to those who have correctly pointed out that machine learning systems trained on existing date usually propagate the
biases inherent in past decisions and work. Time spent coalition building
will serve us better than more navel-gazing at copyleft terms on this front.

Ultimately, if there *is* a legalistic/licensing solution implementable in copyleft, the right one won't become apparent until the dangers and problems are fully understood by society. Similar to the advent of copyleft itself as
a strategy: proprietary software had to actually become a thing and a problem before we could figure out how to answer it with copyleft.

Inventing new copyleft terms shouldn't be the first place we run to when
facing a threat to software rights or freedom; it should be a solution used only sparingly when we're sure no other solution (including, most
importantly, enforcing the copyleft terms that we *have* already) will work
to address to the problem.

-- bkuhn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Stallman@21:1/5 to All on Sat Mar 4 05:50:01 2023

[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> - then I decided to [restrict] my projects repositories as database
> (collection) in addition to the standard way to [restrict] the code with
> a well-known license

I absolutely reject using the word "protect" to describe what copyright does.

> - fair use cannot include {business, commercial, marketing} rights in
> any way

My understanding is that it sometimes does permit commercial use of
material, but mostly it does not. Fair use depends on the purpose of
the use. If the work is published commercially for education, for
instance, it might be fair use.

> A/L/GPLv4 is an update in which it will clearly state that the license
> applies to the composition and the {business, commercial, marketing}
> rights are reserved and exchanged for freedom.

I cannot concretely understand "the XYZ rights are reserved and
exchanged for freedom." Are you proposing a substantive change in
what people can do with a GPL-covered work, or an implementation
change intended to result in roughly the same permissions as now?

> Then the license
> presents a "fair use" open definition in which some rights {testing,
> learning} are clearly included. Everything else should be brought back
> in these two categories.

I don't understand "brought back in these two categories".

> So, in the most simple case in which no any file report a specific
> copyright note/header but just the repository, then this happens:

> - git repository A is licensed with A/GPLv4
> - the composition is under A/GPLv4
> - every file is under A/GPLv3

I think that is true already.

--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Richard Stallman on Sat Mar 4 09:10:01 2023

On Sat, 4 Mar 2023 at 05:16, Richard Stallman <rms@gnu.org> wrote:

[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

Dear Richard,

I do not know personally "Bradley M. Kuhn" <bkuhn@sfconservancy.org>
but I appreciate very much his answer in which he set several points

https://lists.debian.org/debian-project/2023/03/msg00004.html

Please, focus on his answer instead of mine. As I wrote to you in
private, I have nothing to add on this subject anymore.

Collaboration is the key to success.

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Stallman@21:1/5 to All on Wed Mar 15 05:20:01 2023

[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> I do not know personally "Bradley M. Kuhn" <bkuhn@sfconservancy.org>
> but I appreciate very much his answer in which he set several points

> https://lists.debian.org/debian-project/2023/03/msg00004.html

I will take a look. Thanks.

--
Dr Richard Stallman (https://stallman.org)
Chief GNUisance of the GNU Project (https://gnu.org)
Founder, Free Software Foundation (https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roberto A. Foglietta@21:1/5 to Richard Stallman on Wed Mar 22 05:20:01 2023

On Wed, 15 Mar 2023 at 04:44, Richard Stallman <rms@gnu.org> wrote:

[[[ To any NSA and FBI agents reading my email: please consider ]]]
[[[ whether defending the US Constitution against all enemies, ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

> I do not know personally "Bradley M. Kuhn" <bkuhn@sfconservancy.org>
> but I appreciate very much his answer in which he set several points

> https://lists.debian.org/debian-project/2023/03/msg00004.html

I will take a look. Thanks.

March 16, 2023 - Copyright Office Launches New Artificial Intelligence Initiative by Copyright and Artificial Intelligence

- https://www.copyright.gov/ai/

U.S. Copyright Office Weighs in on the AI Debate

The U.S. Copyright Office has weighed in on the debate and have
ultimately assessed that only human-made works are eligible for
protection. In a report published last week, the Office cites a 2018
submission in which the applicant described their work as
“autonomously created by a computer algorithm running on a machine.”
After a series of appeals, the artwork was ultimately denied a
copyright because it was made “without any creative contribution from
a human actor.”

The Office explained further:

“For example, if a user instructs a text-generating technology to
“write a poem about copyright law in the style of William
Shakespeare,” she can expect the system to generate text that is
recognizable as a poem, mentions copyright, and resembles
Shakespeare’s style. But the technology will decide the rhyming
pattern, the words in each line, and the structure of the text. When
an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship.”

- https://hypebeast.com/2023/3/u-s-copyright-office-ai-report

Best regards, R-

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	151:44:37
Calls:	10,383
Files:	14,054
Messages:	6,417,815

Brief update about software freedom and artificial intelligence

Who's Online

System Info