• Brief update about software freedom and artificial intelligence

    From M. Zhou@21:1/5 to All on Fri Feb 24 00:20:01 2023
    Hi folks,

    Recap:
    The modern practice of AI has blurred the boundary between the code and data, which leads to some potential ambiguity to the interpretation of the definition of
    open source as well as the respective licenses. Such ambiguous interpretation in fact deviates from and violates the spirit of free software.

    Several years ago I pointed out this issue on -devel, and eventually drafted ML-Policy [2].
    Then OSI formally realized this issue in the last year, and invited me to contribute some
    thoughts in their Deep Dive: AI event. Now the final report is available here: [1]
    This is a summary of people's discussions from various field.

    You may have tried ChatGPT recently -- this field develops rapidly, and some of the
    state-of-the-art AIs could be astonishing if you have never tried something alike in the past.
    If there will be some monopolistic proprietary AGI (artificial general intelligence) in the future,
    I personally fear of its potential capability of being evil. This resembles a part of the
    history of free software in the last decade.

    Anyway, from the Debian side, we at least know that we should be careful
    when dealing with AI software.

    [1] https://deepdive.opensource.org/wp-content/uploads/2023/02/Deep-Dive-AI-final-report.pdf
    [2] https://salsa.debian.org/deeplearning-team/ml-policy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to M. Zhou on Fri Feb 24 01:20:01 2023
    On Fri, 24 Feb 2023 at 00:16, M. Zhou <lumin@debian.org> wrote:

    Hi folks,

    Recap:
    The modern practice of AI has blurred the boundary between the code and data, which leads to some potential ambiguity to the interpretation of the definition of
    open source as well as the respective licenses. Such ambiguous interpretation in fact deviates from and violates the spirit of free software.

    Hi folks and Zhou,

    cloud technologies posed a challenge to the GPLv2 because under that
    license everyone has the right to change the code but do not share it
    as long as s/he uses it internally which is exactly how the SaaS
    works. To fulfil this lack of freedom, the GPLv3 was proposed.

    Unfortunately, the GPLv3 adoption did not spread into the community.
    Or fortunately because almost every company involved in cloud adopted
    the software-libre as their prefered solution. This gave the community
    a huge hype, such a kind of hype that made the software-libre a big
    thing. Then the A.I. comes and everything is going to change again.

    The A.I. is a great challenge for humanity, especially because of the
    ethical approach which requires. Ethics is not an option because there
    are a lot of things that can go wrong with A.I. - last but not least
    their use. The next challenge is about who is going to control this
    technology: a proprietary solution under the control of a single
    company or a companies cartel under the same nation flag. This will
    easily bring us to see a strong concentration which means: 1. no
    freedom, 2. no equality, 3. no innovation because of the lack of
    competition. The worst is the lack of freedom because everything else
    depends on it.

    There are two ways to go, mainly:

    1. changing the GPLv3 in such a way will cover the A.I. topics;
    2. a brand new specific license for this topic.

    In this e-mail, I will present my proposal about using GPLv3 to
    address the new challenges that come with the A.I. - My opinion is
    that GPLv3 applied to a composition is a novelty based on two known
    legal standards that can fit our needs of freedom with A.I. also.

    a) GPLv3 in its last revision has been available since 29 June 2007
    and this means that every law studio in the world had the time to
    deeply study and understand it. In a conservative sector like legal consultancy, every novelty based on a well-known past is welcomed -
    might or might not be lovely accepted but this is completely another
    story.

    b) Under the Copyright Act, a compilation is defined as a "collection
    and assembling of preexisting materials or of data that are selected
    in such a way that the resulting work as a whole constitutes an
    original work of authorship." (1996)

    Combining these two well known pieces of law, we can obtain - not a
    new license but - a new way to use the GPLv3: apply the GPLv3 to the composition despite the fact that single pieces of codes or data are
    licensed. This is obviously a great advantage because changing the
    license for every {piece of code} and {set of data} is not feasible
    and if it would be necessary it wil be a nightmare on the legal point
    of view.

    This is a project of mine that uses the GPLv3 to protect a {set,
    pool, collection, combination, composition} of files, everyone with
    its own license. Whatever the license has a file, the composition
    could be protected as software-libre by the GPLv3.

    https://github.com/robang74/git-functions#license

    I hope this helps the community to easily find a solution for our
    freedom needs and set a standard into A.I. licensing. For sake of
    completeness, I am adding that section here below the signature.

    Best regards, R-
    --

    License

    Almost all the files are under MIT license or GPLv3 and the others are
    in the public domain. Instead, the composition of these files is
    protected by the GPLv3 license.

    Copyright Act, title 17. U.S.C. § 101.

    Under the Copyright Act, a compilation [NDR: "composition" is used
    here as synonym because compilation might confuse the reader about
    code compiling] is defined as a "collection and assembling of
    preexisting materials or of data [NDR: source code, as well] that are
    selected in such a way that the resulting work as a whole constitutes
    an original work of authorship."

    This means that everyone can use a single MIT licensed file or a part
    of it under the MIT license terms. Instead, using two of them or two
    parts of them implies that you are using a subset of this collection.
    Thus a derived work of this collection which is licensed under the
    GPLv3 also.

    The GPLv3 license applies to the composition unless you are the
    original copyright owner or the author of a specific unmodified file.
    This means that every one that can legally claim rights about the
    original files maintains its rights, obviously. So, it should not need
    to complain with the GPLv3 license applied to the composition. Unless,
    the composition is adopted for the part which had not the rights,
    before.

    The copyright notice, the license and the author is reported in each
    file header, here summarised:

    colors.shell: MIT
    isatty_override.c: MIT
    git-commit-edit: public domain
    git-isar-send-patch: GPLv3
    git.functions: GPLv3
    cr-editor.sh: GPLv3
    install.sh: GPLv3

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Plessy@21:1/5 to All on Fri Feb 24 05:30:01 2023
    Dear Mo,

    thank you for the heads-up.

    I was using permissive licenses in the past thinking about making life
    easier to individuals, but I feel robbed by massive scrapping to train
    AI models.

    Just in case I updated my email signature.

    Also, is there a DFSG-free license that forces the training dataset and
    the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

    Have a nice day,

    Charles

    --
    Charles Plessy Nagahama, Yomitan, Okinawa, Japan
    Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from home https://framapiaf.org/@charles_plessy
    - You do not have my permission to use this email to train an AI -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Fri Feb 24 09:20:01 2023
    On Fri, 24 Feb 2023 at 08:06, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

    On Fri, 24 Feb 2023 at 05:23, Charles Plessy <plessy@debian.org> wrote:


    One more thing about this:

    - Joe tests the NN with the 10+1 images of TS and decides if the NN is
    fine or not. If he decides that it is fine and it can go into
    production, then Joe's employer should share all above stated.
    Instead, if he decides that it is crap, he will trash it and he can
    not share anything because the sharing will have zero value for
    anyone. This is compliant with the clause of fair use in which I
    explicitly added "testing" as a condition to avoid sharing. After all,
    if there is no value produced why should we force Joe to share his
    failure? In particular cases a failure (vulnerability) is valuable information but for security reasons it is better that Joe is not
    forced to comply with the GPLv3 terms. It is better to give Joe the
    freedom to share only those information that he considers safe to
    share in public. However, if Joe's company does a business with this - providing a PoC to a client - then they have to comply with GPLv3
    because the statements for which commercial and business are covered
    by GPLv3.

    In this specific case the provider of the PoC could make a public
    statement in which they promise to share under GPLv3 the PoC but only
    after 3 months in order to give their client the opportunity to
    develop an update that fixes the issue and test it properly. Then
    their client do their job but they need 3 more months to grant their
    clients have a reasonable time to update and test their systems. So,
    they will make a public statement in which they grant their PoC
    provider a legal coverage for every claim started in those 3 months
    that they might be exposed for not having complied with the GPLv3
    terms. In this way they have 3+3 months of time to fix a critical
    issue and let their clients update their systems. In case the 3+3
    months become 3+3 years, obviously their risk to face a trial with a
    negative outcoming for them is much higher. So, after a reasonable
    time, the PoC will be shared as supposed to be.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Charles Plessy on Fri Feb 24 09:20:01 2023
    On Fri, 24 Feb 2023 at 05:23, Charles Plessy <plessy@debian.org> wrote:

    Dear Mo,

    thank you for the heads-up.

    I was using permissive licenses in the past thinking about making life
    easier to individuals, but I feel robbed by massive scrapping to train
    AI models.

    Just in case I updated my email signature.

    Also, is there a DFSG-free license that forces the training dataset and
    the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?


    Dear Charles,

    imagine that you have a collection of data files, images for example
    and each of them has its own license and copyright owner. The ML/AI
    trained on that image set will produce a neural network which for the
    purpose of this example is a N-dimensional grid of floating point
    numbers coded in 64bit (NN weights). To be even more clear for those
    who have never trained a NN - almost all the lawyers - I will present
    here a very simple and explicit example.

    - 100 images is the dataset protected by GPLv3 as composition, like a database

    - Joe is the ML trainer, the employee that is going to train the ML

    - Joe legally acquired the dataset because all the licenses allow it

    - Joe wrote a script that renames the images in 00.jpg .. 99.jpg and
    run it, this new dataset is still protected by the GPLv3

    - Joe wrote a script that randomly chooses 90 images as learning set
    (LS) and the others as test set (TS): these are two sub-compositions
    and both are covered by GPLv3 because both have more than one
    file/piece of the original composition. In fact, in the way I adopted
    the GPLv3 on the composition I cannot enforce it over the single
    file/piece because in that case I will change the license terms
    decided by the original author of that file/piece and I do not want to
    do that even if I can do that (ethics).

    - The ML has the aim to decide if the image contains at least a dog or
    not: image input, binary output. Thus Joe can add his dog image to the
    TS and then that image becomes part of the TS composition thus he
    should share it under a license that can be acceptable with the GPLv3
    on the composition. However, Joe is smart and he did not want to share
    his dog image which is equivalent to saying that we cannot prove that
    Joe put that image into the TS composition by moving the file in that
    folder. However from a legal point of view the simple fact that it is
    used as part of TS clearly states his will to use his dog image as
    part of the training set. So, in principle Joe is smart but honest and
    to avoid legal issues for his employer will share his dog image.

    - So, now the sharing pool brings a little information: the LS, the
    TS, and the Joe's dog picture. However, one more image is +1% but that
    image can be very tricky/important for ML in the same way some patches
    are a single line but make a huge difference. So quantity is not a
    universal metric of contribution. Moreover, now we know which TS/LS,
    Joe used to obtain the NN which in some cases could be relevant
    information.

    - Joe needs also to tag every LS image in order to back-propagate the
    feedback to the NN and train it. This can be a file in which filenames
    are associated with a binary label dog or not. Again, Joe did not put
    this file into the LS folder but as described above that file is part
    of the training set in which a GPLv3 composition belongs. IMHO this
    means that Joe should also share this file. Also this information
    could be relevant because the most expensive job is labelling the
    data.

    - Joe trains the NN with a ML engine which produces the NN weights
    matrix (BIN). This binary object is a derived work of a GPLv3
    composition like a binary executable is a derived work of a GPLv3
    source code. Thus Joe should share the BIN as well under GPLv3 terms
    which enforces him also to explain the inner coding (BIN + format specifications). As you can imagine this is another step towards
    freedom even if that BIN is supposed to run on a patented hardware
    because we know the format specification we can write an emulator
    -much slower and without a commercial value due the performances but
    it can be used for learning purposes or check a questionable NN
    output.

    - Joe tests the NN with the 10+1 images of TS and decides if the NN is
    fine or not. If he decides that it is fine and it can go into
    production, then Joe's employer should share all above stated.
    Instead, if he decides that it is crap, he will trash it and he can
    not share anything because the sharing will have zero value for
    anyone. This is compliant with the clause of fair use in which I
    explicitly added "testing" as a condition to avoid sharing. After all,
    if there is no value produced why should we force Joe to share his
    failure? In particular cases a failure (vulnerability) is valuable
    information but for security reasons it is better that Joe is not
    forced to comply with the GPLv3 terms. It is better to give Joe the
    freedom to share only those information that he considers safe to
    share in public. However, if Joe's company does a business with this - providing a PoC to a client - then they have to comply with GPLv3
    because the statements for which commercial and business are covered
    by GPLv3.

    - Joe is a student at university and his work has nothing to do with
    commercial / business purposes. However, if his university decides to
    use Joe's work for doing commercial or business then they should ask
    Joe all the information which needs to be shared under GPLv3 terms.
    This forces Joe to share that information when he delivers his work to
    his teacher in such a way the university can also store the
    information that might or might not be shared in the future. Again, no
    value produced then no need to share. After all, the work of Joe could
    be a completely useless failure and then rejected. We do not need to
    know about it.

    To invalidate the GPLv3 application to the NN binary someone
    should explain in a legal terms compliant with some law that training
    a neural network is a completely different thing than compiling a
    binary from source code. In the same analogy, compiling GPLv3 source
    code does not imply that you have to share under GPLv3 the proprietary
    compiler that it has been used for, right? So the same for the ML
    training engine.

    Please feel free to contact me in person in order to get deep into
    some aspects which as AI experts or law experts you might want to
    challenge or improve. I will be happy to read/hear about you. Just
    take in consideration that every relevant discovery (good or bad)
    about this new way of using the GPLv3, will be shared here or
    everywhere I decide to share it. So, if you are under NDA, I am not
    and thus do not write/talk to me or otherwise do it at your own risk. :-)

    I hope this helps,
    --
    Roberto A. Foglietta
    +49.176.274.75.661
    +39.349.33.30.697

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerardo Ballabio@21:1/5 to Roberto A. Foglietta on Fri Feb 24 10:30:01 2023
    Roberto A. Foglietta wrote:
    cloud technologies posed a challenge to the GPLv2 because under that
    license everyone has the right to change the code but do not share it
    as long as s/he uses it internally which is exactly how the SaaS
    works. To fulfil this lack of freedom, the GPLv3 was proposed.

    Well, not exactly.

    If I am not mistaken, the GPLv3 was developed to clarify some
    ambiguous language in the GPLv2, mostly with respect to patents. It
    doesn't address SaaS -- you are still free to modify the code and keep
    your modifications private, even if you run a publicly accessible
    service on the modified code.

    The Affero GPL <https://www.gnu.org/licenses/agpl-3.0.html> was
    developed to specifically address SaaS. This license requires that if
    you run a service over a network, you must offer the corresponding
    source code to all users of the service.

    Charles Plessy wrote:
    Also, is there a DFSG-free license that forces the training dataset and
    the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

    As I understand, that is an open legal question. The Affero GPL would
    be such a license *if* the training dataset would be considered part
    of the code. While that does seem to make sense, as AI code is
    essentially non-functional without the training, I am not aware that
    there has ever been a pronouncement by a court of law that affirms or
    denies it, nor I am aware of any free/open source license that
    contains language that deals specifically with that issue, and I'm
    pretty sure that there's lot of room for lawyers to argue their point.

    If you explicitly publish a dataset under the GPL or AGPL, I suppose
    that anybody who makes use of that dataset would be required to comply
    with that. And if you don't explicitly license it at all, I suppose
    that nobody would be authorized to use it except for "fair use". But
    you must be careful or you might end up "licensing" your data without
    even knowing. For example, I don't know the terms of service of
    ChatGPT, but it seems a fair guess to assume that whatever you write
    into it, you give them unlimited rights to use it. And that may easily
    extend to whatever you write into a document processor or other
    software that has a "feature" of "integrating" with ChatGPT, even if
    you're running it on your own computer (I think I've read that even
    LibreOffice is developing such a feature!).

    Gerardo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to gerardo.ballabio@gmail.com on Fri Feb 24 14:30:01 2023
    On Fri, 24 Feb 2023 at 10:27, Gerardo Ballabio
    <gerardo.ballabio@gmail.com> wrote:

    If I am not mistaken, the GPLv3 was developed to clarify some
    ambiguous language in the GPLv2, mostly with respect to patents. It
    doesn't address SaaS -- you are still free to modify the code and keep
    your modifications private, even if you run a publicly accessible
    service on the modified code.

    The Affero GPL <https://www.gnu.org/licenses/agpl-3.0.html> was
    developed to specifically address SaaS. This license requires that if
    you run a service over a network, you must offer the corresponding
    source code to all users of the service.

    Thanks Gerardo for your contribution. Then, integrating it in my
    previous e-mail, I can say that wherever I wrote GPLv3, it could be
    used AGPLv3 instead. However, the example I did was based on the
    transfer of the NN binary from one party to another so the GPLv3 was
    correctly used in that case because the distribution. Instead, when
    the NN is trained and used internally for offering a SaaS, then AGPLv3
    should be considered by the authors.


    Charles Plessy wrote:
    Also, is there a DFSG-free license that forces the training dataset and
    the result of the training process to be open source if a work under that license is present in the training data? Would GPLv3 be sufficient?

    As I understand, that is an open legal question. The Affero GPL would
    be such a license *if* the training dataset would be considered part
    of the code. While that does seem to make sense, as AI code is
    essentially non-functional without the training, I am not aware that
    there has ever been a pronouncement by a court of law that affirms or
    denies it, nor I am aware of any free/open source license that
    contains language that deals specifically with that issue, and I'm
    pretty sure that there's lot of room for lawyers to argue their point.

    Geranrdo, thanks again for your contribution because you highlight the
    main point of my proposal: wherever the GPLv3 or AGPLv3 is used, the
    most important thing is protecting the collection with such a license
    and not every single files/data which instead could have a completely
    different authorship and license. This is possible every time the
    various parts of the collection have been licensed under terms that
    are compatible with GPLv3 or AGPLv3 applied to the whole collection.
    Every file/data that does not fulfil this requirement then it should
    be delivered apart even better in a different manner or repository.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Gerardo Ballabio on Fri Feb 24 19:10:02 2023
    Gerardo Ballabio <gerardo.ballabio@gmail.com> writes:

    As I understand, that is an open legal question. The Affero GPL would be
    such a license *if* the training dataset would be considered part of the code. While that does seem to make sense, as AI code is essentially non-functional without the training, I am not aware that there has ever
    been a pronouncement by a court of law that affirms or denies it, nor I
    am aware of any free/open source license that contains language that
    deals specifically with that issue, and I'm pretty sure that there's lot
    of room for lawyers to argue their point.

    To add to this, I'm fairly sure that the companies that are training AI
    models on, say, every piece of text they can find on the Internet, or all public GitHub repositories, are going to explicitly argue that doing so is
    fair use of the training material. If that argument prevails in court, or
    in legislatures, it will not be possible to write a free software license
    to prevent this, since the point of fair use is that copyright law does
    not apply to that usage and therefore no copyright license can prohibit
    it.

    I don't think we have any idea yet whether that argument will prevail. It
    will probably be years before it reaches a high enough level court in the United States for a definitive ruling, let alone every other relevant
    country that will have its own legal judgments. Consider Google
    v. Oracle: a suitable case with litigants willing to appeal all the way to
    the highest court about the copyright status of library APIs was only
    filed in 2015, years after this became a common issue, and it took six
    years for it to be decided, and that only in the United States. I would
    expect a similar delay. Court systems work very slowly. It's also
    entirely possible that court judgments will go different ways in different countries to add even more confusion.

    The organizations that have every incentive to argue that it's fair use
    have very deep pockets, so they have a substantial chance of success on
    the prosaic grounds that the best-funded litigant or lobbyist always
    stands a reasonable chance of winning.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Fri Feb 24 20:00:01 2023
    "Russ" == Russ Allbery <rra@debian.org> writes:

    Russ> To add to this, I'm fairly sure that the companies that are
    Russ> training AI models on, say, every piece of text they can find
    Russ> on the Internet, or all public GitHub repositories, are going
    Russ> to explicitly argue that doing so is fair use of the training
    Russ> material. If that argument prevails in court, or in
    Russ> legislatures, it will not be possible to write a free software
    Russ> license to prevent this, since the point of fair use is that
    Russ> copyright law does not apply to that usage and therefore no
    Russ> copyright license can prohibit it.

    Russ, I'm sure you are aware, but things get very interesting if the
    input to AI training is not fair use.
    In particular, if Github copilot is a derivative work of everything fed
    to it (including all the copylefted works), that gets kind of awkward
    for Microsoft.
    Perhaps the Github user agreement grants permission for every copyright
    holder who has a Github account.
    But for everyone else, things could be very interesting.

    Unfortunately, if there is not some sort of fair use or sui generis
    solution, things like Chat GPT would be impossible because of copyright.
    That will create significant energy on the legal front to find a
    solution that does not involve negotiating with each right holder
    individually.
    The AI models are useful after all.

    And then there's GDPR and privacy concerns of training data.
    If I were a European, I'd definitely be very interested in filing a
    subject access request to learn what OpenAI knows about me.

    --Sam

    -----BEGIN PGP SIGNATURE-----

    iHUEARYIAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCY/kCLwAKCRAsbEw8qDeG dD3RAP9yTgVzAksRBdQUnK2japOjAROr9yYyU4HrDG47T+5nnQEArnDAesCms7iX I/y53cE18lFCITPlI8WygWPNS9xWnAk=
    =CrYe
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Sam Hartman on Fri Feb 24 20:50:01 2023
    Sam Hartman <hartmans@debian.org> writes:

    Russ, I'm sure you are aware, but things get very interesting if the
    input to AI training is not fair use.

    In particular, if Github copilot is a derivative work of everything fed
    to it (including all the copylefted works), that gets kind of awkward
    for Microsoft.

    Perhaps the Github user agreement grants permission for every copyright holder who has a Github account.

    But for everyone else, things could be very interesting.

    Yes. I didn't express an opinion on what the correct outcome is because
    it's not at all obvious to me and I'm not sure that I have an opinion.

    As a general principle, as a free software advocate, I approve of an
    expansive definition of fair use and believe that far more uses of
    copyrighted material should be fair use than are normally considered fair
    use today. Expansive definitions of fair use are a key legal component to enabling reverse engineering and compatible replacement of non-free
    software with free software, for example.

    I'm seeing some tendency for free software advocates who are disturbed by
    the other social effects of large AI models (and there are quite a few
    things to be disturbed about), and about the degree to which some of them
    are parasitic on free software and other free information communities, to respond by advocating for a narrow definition of fair use, at least in
    this specific area. I'm worried that this is counterproductive; I think
    we rely on fair use much more than incredibly wealthy multinational
    software corporations do.

    But the specific ramifications of an expansive fair use position for the societal effect of AI models isn't clear to me, and to be honest I'm
    dubious that it's clear to anyone at this point. There are obviously some significant risks, including the tendency of scale effects with large
    models to further consolidate power into the hands of a small number of
    very wealthy organizations.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Sun Feb 26 09:50:02 2023
    On Sun, 26 Feb 2023 at 09:09, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:


    ERRATA CORRIGE

    I hope this helps to acknowledge and convince us - as the open-source
    and software-libre community - about the great responsabilitiy that is
    a burden on our shoulders. Such a responsibility cannot be delegated to
    a few because the stake on the table is too high.

    Thus we all are involved less or more depending on everyone's ability
    to support and contribute.

    s/contribute/support/ is a more correct English term for the idea I
    had in my mind. However, one term does not exclude the other.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to All on Sun Feb 26 17:10:01 2023
    Hi all,

    in these two threads

    * https://lists.debian.org/debian-project/2023/02/msg00017.html

    * https://lists.debian.org/debian-project/2023/02/msg00022.html

    we had the chance to confront each other about the emerging A.I. mass
    adoption and about which licensing model would be useful to adopt to
    protect the freedom of code, data, models, etc.

    One topic that seems to make people worry is the "fair use" which is
    a legal term but not really well defined - I would use the attribute
    "blurry" for it - and this could be a triky attribute to debate about
    in a trial.

    I wish to add my two cents about this topic and I will follow two
    main guidelines:1. historical evolution, 2. priority of legislative
    sources.

    First of all, we can start with a "for absurd" reasoning and because
    we do not have a clear legal definition of "fair use", then we
    consider the worst case which means "everything". Well, this is
    exactly the situation before the introduction of the copyright:
    everything was a fair use case. Then the copyright was introduced to
    grant to authors some kind of exclusive rights: moral and material
    rights, both. As you can imagine, without the moral rights there would
    be no material rights, indeed. However, this aspect is not
    relevant for our goal here but just to underline the priority.

    The copyright was introduced to move some profit from the editors to
    the authors who were starving. Thus, the material rights are about
    business, commercial and marketing. Obviously these three terms were
    not developed as we are used today but basically these three
    activities are clearly related to the value and thus they SHOULD be
    exclusive of the author for every copyrighted stuff that s/he created. Moreover, the copyright applies even if the author does not explicitly
    claim differently.

    Under this point of view, we still do not know what is "fair use" but
    for certain we know what is NOT included in "fair use" otherwise the
    copyright would fail in principle. Specularly, the copyleft as well.

    In fact, the copyright claims {business, commercial, marketing} are
    exclusive rights of the author (all rights reserved) implicitly
    considering that the author's intention is selling them otherwise no
    one could legally enjoy the author's work.

    Copyleft trades these rights for something else rather than money but
    freedom, something more valuable for some people. Thus, with copyleft
    if you like to enjoy the {business, commercial, marketing} rights of
    someone else's work, then you have to share back something about the
    original work.

    Thus this equation takes place:

    copyright : money ~alike~ copyleft : freedom

    My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
    but - to a collection of objects using the database protection,
    automatically also solves the problem of a blurry "fair use"
    definition. However, to be more incisive about "fair use", it is
    better to declare explicitly what is not "fair use". Otherwise, we
    risk having to explain this in court. Like in this file header:

    https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

    # (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
    # SPDX-License-Identifier: all rights reserved, but fair use allowed
    # Fair use includes test, learning and marketing but not sales, redistribution # leasing, renting or every other commercial/business activities without the
    # consent of the author. Every company or individual allowed to use this
    # code behind these limitations will be listed here below, if any.

    In this specific case, I decided that marketing belongs to "fair use"
    because it lets my product be known. In case of A.I. it would not be
    fine because the A.I. could suggest directly or indirectly to drink a X-soft-drink and this is marketing, clearly.

    So, in conclusion "fair use" was the standard before the copyright introduction then "all rights are reserved" became the standard with
    the copyright introduction but this creates others problems because it
    was too restrictive so the "fair use" concept was introduced to relax
    the copyright but "fair use" was not well defined. It was not well
    defined because "{business, commercial, marketing} rights are
    reserved" is enough and moreover protecting these rights is the core
    reason of copyright law existence altogether.

    IMHO, the best we can do is to ask the Free Software Foundation to
    write two more licenses or updates A/GPLv3 in A/GPLv4 in which it wil
    clearly stated that the license applies to the composition and the
    {business, commercial, marketing} rights are reserved and exchanged
    for freedom. Then the license presents a "fair use" open definition in
    which some rights {testing, learning} are clearly included. Everything
    else should be brought back in these two categories. Finally, the
    license should state that every collection item that does not have its
    own specific copyright and license note/header, it is licensed under
    A/GPLv3.

    So, in the most simple case in which no any file report a specific
    copyright note/header but just the repository, then this happens:

    - git repository A is licensed with A/GPLv4
    - the composition is under A/GPLv4
    - every file is under A/GPLv3

    Clearly, we can also have LGPLv4 as long as it makes sense every
    other license could be used to create a collection-oriented version of
    that license.

    Moreover, when an A.I.'s training engine hits a project repository
    protected by A/GPLv4 then all the inputs before and after that hit
    become part of the input collection which will be protected by A/GPLv4
    and all the related consequences that I have just explained in others
    e-mail here.

    Everything IMHO and hoping that it helps, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Roberto A. Foglietta on Sun Feb 26 21:50:01 2023
    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
    but - to a collection of objects using the database protection,
    automatically also solves the problem of a blurry "fair use"
    definition. However, to be more incisive about "fair use", it is
    better to declare explicitly what is not "fair use". Otherwise, we
    risk having to explain this in court. Like in this file header:

    https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

    # (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
    # SPDX-License-Identifier: all rights reserved, but fair use allowed
    # Fair use includes test, learning and marketing but not sales, redistribution
    # leasing, renting or every other commercial/business activities without the # consent of the author. Every company or individual allowed to use this
    # code behind these limitations will be listed here below, if any.

    I'm afraid this is not how fair use works. The whole point of fair use is
    that the copyright holder has no control over uses that are fair use.
    They can grant additional rights with a copyright license, but they cannot
    stop legal fair use, no matter what they write in their license and no
    matter what their personal opinions are about what would fall into fair
    use.

    You can see why this must be so if you think about the role of fair use in copyright law. Fair use is a carve-out for a whole class of uses to which society wants to put copyrighted works without requiring any permission
    from the copyright holder, and if necessary against their explicit wishes.

    Consider quoting small portions of a work while reviewing it, which is one
    of the classic areas of fair use in US law. A copyright holder might like
    to only allow friendly reviews to quote their work and prohibit hostile
    reviews from quoting their work, or, failing that, prohibit quoting in any review. But the point of fair use is that everyone gets to quote their
    work and they get no say in the matter.

    And, as mentioned earlier, free software relies heavily on this. Among
    the things that are carved out for fair use (and closely related concepts
    with roughly the same legal properties, such as limits on what types of
    works can be protected by copyright), is the ability to reimplement an
    API. There are also a lot of activities around reverse engineering that
    are protected by fair use and related limitations. The copyright holders
    of that software, if allowed to redefine those limits in their licenses,
    would use a far more restrictive definition that prohibited free software competition. But they can't.

    This means that it is largely pointless to try to define fair use in a copyright license, since the whole point of fair use is that it applies regardless of the content of the copyright license, even if the copyright license explicitly prohibits things that are legally fair use, and even in
    the case of no copyright license at all. The belief or definition offered
    by the copyright holder for fair use does not matter and should be ignored entirely for legal purposes. The only definition that matters is the one
    made by the legislature and enforced by the courts.

    And yes, it is indeed fuzzy, and it may be beneficial for governemnts to
    define it more precisely (assuming they didn't break it in the process).
    But licenses like the GPL cannot do this, since those are just statements
    by the copyright holder. The copyright holder *cannot* have any power to
    make fair use less fuzzy, since that would defeat the point of fair use.

    A similar principle applies to claiming compilation copyright. Either a compilation is covered by your copyright, in which case you have all the
    normal copyright holder rights over it unless you grant them to others
    with an explicit license, or it is not covered, in which case it doesn't
    matter what you say about it and everyone else can ignore anything you
    say. So declaring that any compilation including your work is covered by
    your preferred license is only relevant if you have a legal copyright over
    the compilation. If you do, then your copyright license matters; if you
    don't, everyone else is entitled to ignore your license and your
    statements.

    I am not a lawyer, let alone a copyright lawyer, and have only an amateur Internet understanding of the nature of compilation copyrights (and they
    may well also vary by jurisdiction), but my understanding (possibly
    incorrect) of the law in the US is that holding copyright on a member of a collection does not give you any copyright ownership of the collection as
    a whole. To gain copyright ownership of the collection, you have to
    exercise some sort of creative control over the collection itself, such as
    by using human creativity to select its membership, choosing some elements
    and discarding others. The person distributing the collection has to
    comply with copyright law with respect to the material included that you
    hold a copyright on (either satisfying your license or following the rules
    of fair use), but if you're not involved in creating the collection, you
    don't get any separate rights over the collection itself and cannot assert
    a license on it.

    There's a bunch of US case law on this around things like phone books
    (IIRC, found to not involve enough creativity to have a separate
    copyright), recipe collections (copyrightable as a compilation even though recipes themselves are not individually copyrightable), short story collections, and so forth.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Russ Allbery on Mon Feb 27 02:50:01 2023
    On Sun, 26 Feb 2023 at 21:47, Russ Allbery <rra@debian.org> wrote:

    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
    but - to a collection of objects using the database protection, automatically also solves the problem of a blurry "fair use"
    definition. However, to be more incisive about "fair use", it is
    better to declare explicitly what is not "fair use". Otherwise, we
    risk having to explain this in court. Like in this file header:

    https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

    # (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
    # SPDX-License-Identifier: all rights reserved, but fair use allowed
    # Fair use includes test, learning and marketing but not sales, redistribution
    # leasing, renting or every other commercial/business activities without the
    # consent of the author. Every company or individual allowed to use this
    # code behind these limitations will be listed here below, if any.

    I'm afraid this is not how fair use works. The whole point of fair use is that the copyright holder has no control over uses that are fair use.
    They can grant additional rights with a copyright license, but they cannot stop legal fair use, no matter what they write in their license and no
    matter what their personal opinions are about what would fall into fair
    use.

    I am sorry for having confuse you trying to explain a simple fact:

    - fair use as legal term is a blurry one
    - fair use cannot be limited but expanded (as I did over there)
    - fair use could include {testing, learning, storage} and usually it does

    HOWEVER

    - fair use cannot include {business, commercial, marketing} rights in
    anyway and in any conditions

    WHY?

    Because the principle of the copyright existence is about protecting
    the authors' exclusive of that {business, commercial, marketing}
    rights.
    Because copyleft is a copyright that trades exclusive rights for
    freedom instead of money, this is certainly happening also for the
    copyleft.

    CONCLUSION

    We might have problems in identifying all the fair use cases but we
    can be very certain about what is NOT fair use.

    (in another e-mail about database/collection protection)

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Russ Allbery on Mon Feb 27 05:10:01 2023
    On Fri, 2023-02-24 at 11:37 -0800, Russ Allbery wrote:

    As a general principle, as a free software advocate, I approve of an expansive definition of fair use and believe that far more uses of copyrighted material should be fair use than are normally considered fair
    use today.  Expansive definitions of fair use are a key legal component to enabling reverse engineering and compatible replacement of non-free
    software with free software, for example.

    I note that fair use isn't a worldwide concept and other parts of the
    world have the more varied and restricted concept of "fair dealing".

    https://en.wikipedia.org/wiki/Fair_use#Influence_internationally https://en.wikipedia.org/wiki/Fair_dealing

    So, as much as possible, we should try not to rely on fair use.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmP8Hq0ACgkQMRa6Xp/6 aaPzQRAAhJLCLJnrAtuIaCKk5L3aI7Vy54Q6grlcEDgu9xgTsMqRBNnIhHfKvC1f CKOIrLIJmt7o/QyDQlYKTCYe5GUn7wRqGkVVl7N/But/L62R44mG4MwhEYHF0OL0 sStF7qfU8aFxMji3ld/KNZGIduCeOy5Wg7vyOGXrVvTVd6E4qsUcUZcvYzZhA2A4 pL+Js/dJzJxHNp4DhTn/vz11YLxoNXFts664lbb9yu3A9qKJdVzwqxrojA2f9Drd 5wTBtShUNWxfvlLnZj2nA9e1LpYCxRW0jxhUeQ0h/lbgf54ghtFAnY6zLfheSm+Z 2WVC9OUFDb7Zh4Z2w2/fShZfbvjD37sKhPjFKyL65o+ChFEEOgSiM6PEAKS3vymt SPi+7i25duQZmM/Sa36uiUjukYBNr3qvCiH3ud/JewbUq7syBQRTehPNG9azL/kj azaD3NhYeWK9ezphKZxD19CIdjA0fiJB+o65VM605Tu+RNIMgn4+L8ndkV7grifw H9/ciYN8EihZe4HkzP09KLHJfLSfSGV/bMZdIe49pseNebo6wO9Mf7ULounuyPSZ YEbxonfalwv8gR1j4gSJmzGkfyYuu+oySLkOkJ9EkfXLvEk2j2YASL7MdZPGeEO6 3LJWKd2Lt0i3V+U9jDuDqFnBTCEnO30f6t/LgdTV+4KhLToMGdw=
    =UCiN
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Kitterman@21:1/5 to Roberto A. Foglietta on Mon Feb 27 05:10:01 2023
    On February 27, 2023 12:45:38 AM UTC, "Roberto A. Foglietta" <roberto.foglietta@gmail.com> wrote:
    On Sun, 26 Feb 2023 at 21:47, Russ Allbery <rra@debian.org> wrote:

    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    My proposal to apply the GPLv3 or AGPLv3 - not directly to an object
    but - to a collection of objects using the database protection,
    automatically also solves the problem of a blurry "fair use"
    definition. However, to be more incisive about "fair use", it is
    better to declare explicitly what is not "fair use". Otherwise, we
    risk having to explain this in court. Like in this file header:

    https://github.com/robang74/isar/blob/evo2/meta/recipes-support/expand-on-first-boot/files/expand-last-partition.sh

    # (C) 2022, Roberto A. Foglietta <roberto.foglietta@gmail.com>
    # SPDX-License-Identifier: all rights reserved, but fair use allowed
    # Fair use includes test, learning and marketing but not sales, redistribution
    # leasing, renting or every other commercial/business activities without the
    # consent of the author. Every company or individual allowed to use this >> > # code behind these limitations will be listed here below, if any.

    I'm afraid this is not how fair use works. The whole point of fair use is >> that the copyright holder has no control over uses that are fair use.
    They can grant additional rights with a copyright license, but they cannot >> stop legal fair use, no matter what they write in their license and no
    matter what their personal opinions are about what would fall into fair
    use.

    I am sorry for having confuse you trying to explain a simple fact:

    - fair use as legal term is a blurry one
    - fair use cannot be limited but expanded (as I did over there)
    - fair use could include {testing, learning, storage} and usually it does

    HOWEVER

    - fair use cannot include {business, commercial, marketing} rights in
    anyway and in any conditions

    WHY?

    Because the principle of the copyright existence is about protecting
    the authors' exclusive of that {business, commercial, marketing}
    rights.
    Because copyleft is a copyright that trades exclusive rights for
    freedom instead of money, this is certainly happening also for the
    copyleft.

    CONCLUSION

    We might have problems in identifying all the fair use cases but we
    can be very certain about what is NOT fair use.

    (in another e-mail about database/collection protection)

    This is not correct. Commercial fair use is quite common world wide. Please don't confuse what you wish the law says with what it actually says. In any case, there are many different laws in many different places, so one can't be very certain about
    any of this on a global basis.

    Russ's main point, that what is or is not fair use is a function of law, not license is correct. By definition, fair use is about what copyright cannot restrict.

    Scott K

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 07:20:01 2023
    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    - fair use cannot include {business, commercial, marketing} rights in
    anyway and in any conditions

    This is definitely not true in the United States; there is a Supreme Court decision saying the exact opposite. The ruling in Google v. Oracle said Google's commercial and business use of Oracle's copyrighted APIs met the
    test for fair use.

    You can't reconstruct the law from first principles without looking at the actual test that is applied by courts. (And as mentioned this may be
    different in different jurisdictions, for additional complexity.) In the
    US there's a four-part balancing test for fair use, and the analysis can
    be quite complicated.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Mon Feb 27 09:00:01 2023
    On Mon, 27 Feb 2023 at 08:38, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

    On Mon, 27 Feb 2023 at 07:16, Russ Allbery <rra@debian.org> wrote:

    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:


    [...]


    No court ruling was ever emitted in favour of Google vs Oracle
    leveraging fair use but it was an agreement between the two parties
    supported by Microsoft.

    https://arstechnica.com/tech-policy/2021/04/how-the-supreme-court-saved-the-software-industry-from-api-copyrights/

    As you can learn from the Ars Technica's article linked here above.


    FUNNY FACT

    Microsoft convinced Oracle to settle down the cause with Google using
    the escamotage based on the unilateral author right of extending the
    definition of fair use which is the same I did for marketing about a
    single file in one of my projects. This escamotage written down and
    deposited in the Supreme Court as agreement between the two parties
    allows everyone to do the same of Google about Oracle's API. Thus
    Oracle surrendered not because Google leveraged the fair use but
    because of Microsoft pressure.

    Google: I can include the header, thus I can use the API

    Oracle: no you cannot.

    Microsoft: Oracle DB cannot run without an operative system, do you agree?

    me: ROTFL

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Russ Allbery on Mon Feb 27 08:50:01 2023
    On Mon, 27 Feb 2023 at 07:16, Russ Allbery <rra@debian.org> wrote:

    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    - fair use cannot include {business, commercial, marketing} rights in anyway and in any conditions

    This is definitely not true in the United States; there is a Supreme Court decision saying the exact opposite. The ruling in Google v. Oracle said Google's commercial and business use of Oracle's copyrighted APIs met the test for fair use.

    It is true despite a single US case judgment. In the USA there is the
    habit to use precedent judgments to influence others judgment but this
    applies only if there is a significant analogy between the two cases
    and even if there is a very significant analogy the judge can produce
    an opposite ruling for that case.

    Google lost two trials against Oracle apparently because judges are
    not able to distinguish between the API and the code that runs behind
    the API. Thus they faced the Supreme Court. This despite the fact that
    API is unlikely to be a work that could be protected by copyright
    except in very rare cases. Microsoft deposited a memorandum at the
    Supreme Court asking them to rule in favour of Google.

    Both these facts are in favour of the opinion that I have explained:
    1. API might or might not be protected, 2. copyright applies in case
    of doubts.

    Microsoft used their capability to persuade Oracle in favor of Google.
    Thus the parties agreed that Google - when creating libraries with the
    Oracle's API - did a fair use of their declaration. After all, if you
    cannot include the headers then you cannot also call the original
    functions. The fair use in that agreement was an escamotage to avoid
    the Supreme Court would have issued a ruling that would be a disaster
    in any case AND for both parties to save the honor.

    CONCLUDING

    No court ruling was ever emitted in favour of Google vs Oracle
    leveraging fair use but it was an agreement between the two parties
    supported by Microsoft.

    https://arstechnica.com/tech-policy/2021/04/how-the-supreme-court-saved-the-software-industry-from-api-copyrights/

    As you can learn from the Ars Technica's article linked here above.

    You can't reconstruct the law from first principles without looking at the actual test that is applied by courts. (And as mentioned this may be different in different jurisdictions, for additional complexity.)

    I can reconstruct the interpretation of a law from basic principles
    otherwise it would not be a law but something that appeared from
    nothing: no any law roots, no any law authority. Like every three, a
    law is stronger when it has ancient and well developed roots. Thus, a
    law interpretation based on reconstructing it from its principle is
    the most significant, the most important and the most persuasive way
    of doing such a task.

    In the
    US there's a four-part balancing test for fair use, and the analysis can
    be quite complicated.

    The U.S. law interpretation is not the source of the truth. Moreover,
    it does not matter how fair use is defined in many different
    legislations around the world. By copyright principle, it cannot allow
    doing activities like {business, commercial, marketing} without the
    consent of the author or of the license. The "fair use" is a false
    friend and ignoring it is the best choice.

    CONCLUSION

    If the question "what is X?" does not work well, then try the opposite
    "what is not X?" - It is not important to define "fair use" as long as
    we can certainly define what does NOT cover the blurry fair use
    definition. After all, we were interested since the beginning on "what
    is not fair use" thus asking the right question is half of the work
    done.

    @Russ: please write to me in private if you need more clarification.
    At this point anything further has very little to do with the
    community needs.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 09:00:01 2023
    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    A totally automatic procedure like web crawling and web indexing
    re-enter in your example, perfectly. However, the input collection that
    a ML/AI training system needs is a protectable work because the data
    should be structured, selected and properly labeled even if these
    activities are done with rules like it happens using SQL for
    databases.

    Yes, I agree, I think that a trained AI model is a protectable work.
    However, it is not protectable *by you* unless you're the one who wrote
    the model and chose its training.

    Therefore, putting a clause in your copyright license saying that if your
    work is incorporated into an AI model, that AI model as a collection is
    covered by some particular license is not really a thing you can do. The
    best you can do is the standard GPL thing of saying that you don't have to license your collection under any particular license, but if you don't,
    you don't have any right to include this specific work. Maybe that's what
    you were getting at, and I just didn't understand.

    That second approach of course only works if the use of the GPL-covered
    work is not fair use. If it is fair use, then the person creating the collection can ignore any provision of the license, so we're back to the question of whether AI training is fair use.

    So, web indexing and statistics are created over a input collections
    that are *not* a creative works and these tools access to every
    copyrighted works in fair use as long as they respect the robots:no
    meta-tag when it is applied to a copyrighted work. Instead, training a
    ML/AI is a completely another story and their input collections are a protectable collection under the copyright law.

    I don't think it's anywhere near that easy to distinguish a web search
    index from an AI training model in copyright law. They seem like very
    similar cases to me. A great deal of creativity and human control go into selecting how pages are chosen for search indices (otherwise, every search engine would be unusable due to search optimization spam), and search
    engines even retain and redistribute portions of the documents they index.

    My guess is that *both* of these are protectable collections. And the
    entire Internet currently assumes that building a search engine is fair
    use of the Internet-accessible indexed documents, even if that search
    engine is then used and marketed for commercial and business purposes, as Google, Bing, etc. all are.

    If you believe that AI training is *not* fair use, I think you're going to
    have to wrestle with the substantial similarities between AI training and
    the Google search engine. I think it may prove challenging to write an analysis that says AI training is not fair use, but Google's search
    indexing is fair use. Or, I guess, argue that Google's search indexing is
    also not fair use but falls into some other exception to copyright law
    like an implicit license, but there I'm *way* out of the depth of my legal understanding.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Russ Allbery on Mon Feb 27 09:40:01 2023
    On Mon, 27 Feb 2023 at 08:50, Russ Allbery <rra@debian.org> wrote:

    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    A totally automatic procedure like web crawling and web indexing
    re-enter in your example, perfectly. However, the input collection that
    a ML/AI training system needs is a protectable work because the data
    should be structured, selected and properly labeled even if these activities are done with rules like it happens using SQL for
    databases.

    Yes, I agree, I think that a trained AI model is a protectable work.
    However, it is not protectable *by you* unless you're the one who wrote
    the model and chose its training.

    Therefore, putting a clause in your copyright license saying that if your work is incorporated into an AI model, that AI model as a collection is covered by some particular license is not really a thing you can do. The best you can do is the standard GPL thing of saying that you don't have to license your collection under any particular license, but if you don't,
    you don't have any right to include this specific work. Maybe that's what you were getting at, and I just didn't understand.


    Dear Russ, I was completely wrong about your ability to contribute to
    this discussion because the chance you gave me to confute your thesys
    is the best occasion to pave the way to the lawyers that will one day
    enforce the A/L/GPLv4 in a court. So, let me explain it in a very
    simple and straightforward way:

    - A/L/GPLv3 applies to source code and scripts that should be compiled
    or run by an interpreter

    - the AI/ML training engines use source code and scripts as data, this
    might or might not be a fair use, but for sure is a novelty which is
    not covered by A/L/GPLv3

    - then I decided to protect my projects repositories as database
    (collection) in addition to the standard way to protect the code with
    a well-known license

    - because of the copyright law about databases, if someone creates a
    larger database that contains my database or a part of it, then they
    have to comply with the license that I choose to protect my project as
    a database.

    You see, it is a very simple and straightforward concept. The only two
    ways to get off this are 1. make unlawful the database copyright law,
    2. make a law for which the training input collection is not coverable
    by the copyright law. In both cases every employer can bring to their
    home a copy of a database or a copy of AI training inputs and share it
    with all the rest of the world. Moreover, the 1. includes the 2 while
    the 2. would seriously undermine the database copyright law because
    every database could be a training set for an AI/ML engine.

    Russ, do you agree? :-)

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 18:50:02 2023
    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:
    On Mon, 27 Feb 2023 at 07:16, Russ Allbery <rra@debian.org> wrote:

    This is definitely not true in the United States; there is a Supreme
    Court decision saying the exact opposite. The ruling in Google
    v. Oracle said Google's commercial and business use of Oracle's
    copyrighted APIs met the test for fair use.

    It is true despite a single US case judgment.

    It's not a single US court judgment. The standard for fair use in the
    United States was created by a series of Supreme Court judgments starting
    with Folsom v. Marsh in 1841 and enshrined in US national law in 17
    U.S.C. § 107 in 1976:

    Notwithstanding the provisions of sections 106 and 106A, the fair use
    of a copyrighted work, including such use by reproduction in copies or
    phonorecords or by any other means specified by that section, for
    purposes such as criticism, comment, news reporting, teaching
    (including multiple copies for classroom use), scholarship, or
    research, is not an infringement of copyright. In determining whether
    the use made of a work in any particular case is a fair use the
    factors to be considered shall include—

    (1) the purpose and character of the use, including whether such use
    is of a commercial nature or is for nonprofit educational purposes;

    (2) the nature of the copyrighted work;

    (3) the amount and substantiality of the portion used in relation to
    the copyrighted work as a whole; and

    (4) the effect of the use upon the potential market for or value of
    the copyrighted work.

    The fact that a work is unpublished shall not itself bar a finding of
    fair use if such finding is made upon consideration of all the above
    factors.

    You can find this history numerous places on-line, for example:

    https://law.marquette.edu/facultyblog/2022/10/the-surprisingly-confused-history-of-fair-use-is-it-a-limit-or-a-defense-or-both/

    Many fair use cases in US history have been about commercial use.
    Probably most, since companies with commercial uses are more likely to go through the trouble of lawsuits. Commercial fair use is routine within
    the classic examples of fair use, such as parody and quoting for
    commentary.

    This is the law in the United States. The law in other countries of
    course may be quite different. But given that many of the actors who are relevant to a discussion of large AI models at present have a significant
    locus in the United States, US law is going to play a large role.

    No court ruling was ever emitted in favour of Google vs Oracle
    leveraging fair use but it was an agreement between the two parties
    supported by Microsoft.

    This is not correct summary of the outcome of Google v. Oracle, nor is it
    what the Ars Technica article you liked said. There was no agreement
    between the parties in the question before the Supreme Court. The case
    went to judgment and the Supreme Court ruled in favor of Google on fair
    use grounds, mooting (and not ruling on) the question of copyrightability
    of the API definitions.

    Appeals like this in the US are generally over a specific question of law
    and do not settle the *entire case*, so the Supreme Court then remanded
    the case to trial court to dispose of the rest of the lawsuit. I didn't
    follow it after that because the details following the Supreme Court
    decision are generally uninteresting since they're probably forced by the decision. It's quite possible that the parties mutually agreed to dismiss
    the case after that decision because the decision meant Google was certain
    to win. But the Supreme Court decision was not an agreement between
    parties.

    This is important because in US law if the parties had reached an
    agreement before the decision, the case would generally be dismissed and
    thus not receive a court judgment and therefore not create precedent.
    Google v. Oracle did not settle; it was decided by the Supreme Court and therefore did create binding precedent for further district court
    decisions on similar cases.

    I can reconstruct the interpretation of a law from basic principles
    otherwise it would not be a law but something that appeared from
    nothing: no any law roots, no any law authority.

    If this is your approach to legal analysis, I think I will stop here,
    since any further discussion along these lines is going to be pointless.

    Moreover, it does not matter how fair use is defined in many different legislations around the world. By copyright principle, it cannot allow
    doing activities like {business, commercial, marketing} without the
    consent of the author or of the license.

    This is simply not true, and it is very good for free softawre that this
    is not true. One is still allowed to do reverse engineering and API replacement under fair use even if one is doing it for business and
    commercial purposes, and lots of free software development is done for
    business and commercial purposes.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Roberto A. Foglietta on Mon Feb 27 19:10:01 2023
    "Roberto A. Foglietta" <roberto.foglietta@gmail.com> writes:

    - then I decided to protect my projects repositories as database
    (collection) in addition to the standard way to protect the code with
    a well-known license

    - because of the copyright law about databases, if someone creates a
    larger database that contains my database or a part of it, then they
    have to comply with the license that I choose to protect my project as
    a database.

    In the United States, this is only true if (a) the collection is
    copyrightable (let's presume that's true in this case), and (b) their use
    of your collection is not fair use. If their use of your collection is
    fair use, then they do not have to comply with your license.

    In other countries, I have no idea. Presumably there is a similar set of
    rules under the same or different terms to allow such things as parodies,
    but the boundaries may be different and I know very little about how those rules have been applied to software outside of the US. My understanding
    is the Berne Convention doesn't standardize the rules around fair use
    (under whatever name), so this can differ a lot by jurisdiction.

    You see, it is a very simple and straightforward concept. The only two
    ways to get off this are 1. make unlawful the database copyright law,
    2. make a law for which the training input collection is not coverable
    by the copyright law. In both cases every employer can bring to their
    home a copy of a database or a copy of AI training inputs and share it
    with all the rest of the world. Moreover, the 1. includes the 2 while
    the 2. would seriously undermine the database copyright law because
    every database could be a training set for an AI/ML engine.

    Russ, do you agree? :-)

    No. It's entirely possible that using databases as training sets for an
    AI/ML engine is fair use under existing United States law and precedent as
    long as that use is sufficiently transformative (the first factor of the
    test, and I suspect the most important one here). The obvious example is
    a search engine, which performs a similar transformation of clearly
    copyrighted works into a new service with a different purpose, without the explicit permission of the copyright holders.

    This is the reason why people have focused so much on GitHub Copilot's willingness to insert large blocks of code from other projects verbatim. Reproducing code from other projects is less transformative and looks more
    like simple copying, and therefore opens GitHub to a legal argument that
    their AI model is not sufficiently transformative to be fair use.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Russ Allbery on Tue Feb 28 00:40:01 2023
    On Mon, 27 Feb 2023 at 19:08, Russ Allbery <rra@debian.org> wrote:


    No. It's entirely possible that using databases as training sets for an AI/ML engine is fair use under existing United States law and precedent as long as that use is sufficiently transformative (the first factor of the test, and I suspect the most important one here).

    Considering what you reported in the previous e-mail about US national
    law in 17 U.S.C. § 107 in 1976, It is not possible to use an entire or
    a significant portion of a database for {business, commercial,
    marketing} purposes without the copyright holder.

    Whoever says the contrary forgot that fair use has been introduced to
    allow those non-profit activities which have a social value plus few
    profit activities (like journalism) that have a social role but the
    former could use a very limited portion of copyrighted work. Very
    simple and straightforward example is a newspaper article that cites a
    couple of paragraphs from a book or some statistical data from a
    private database. There is no chance that the incorporation of an
    entire database (or a significant part of it) would enter into fair
    use for {business, commercial, marketing} purposes otherwise the
    principle of copyright would be gone.

    I strongly feel that this discussion cannot continue because the
    presentation of a mass of legal stuff without a comprehension of the
    law principles would lead nowhere more than a show like some US trials
    are. Principles cannot be bend by misinterpretation, misjudgement and ill-written law like US national law in 17 U.S.C. § 107 in 1976 in
    which point (1)...(4) are written in such a way that everyone that is
    not very acknowledged about principles could misunderstand up to
    absurdity.

    This (1) does not mean that non-profit and for-profit activities are
    equal in enjoy the fair use

    (1) the purpose and character of the use, including whether such use
    is of a commercial nature or is for nonprofit educational purposes;

    but it means the opposite, that the two activities can fair-use a
    completely different amount of the copyrighted work

    (3) the amount and substantiality of the portion used in relation to
    the copyrighted work as a whole

    and in particular the (3) also means that if I write an article of a
    few words, it is not fair-use 2 paragraphs of a book.

    One more thing: it does not matter that two parties had N trials
    settled but the agreement they had at the end - principle - because a significant judgement is a definitive one otherwise it means that it
    was not significant enough even to close that specific case.

    The obvious example is
    a search engine, which performs a similar transformation of clearly copyrighted works into a new service with a different purpose, without the explicit permission of the copyright holders.

    This is another completely story for two reasons:

    1. indexing by keywords - the website manager tagged that keyword, so
    the content has not been accessed
    2. web crawling is an automatic process that do a keyword
    identification and associate them to the url

    This process has nothing to do with the content unless you would
    affirm that the word "cataclysm" cannot be used because it belongs to
    a certain copyrighted book and moreover this process is completely
    automated in which no human creativity has been involved. Moreover,
    indexing and web crawling are totally different processes that lead to
    totally different results and aims than those related to an AI
    training. Forget to make an analogy between AI training and Google
    business because they are completely different things.


    This is the reason why people have focused so much on GitHub Copilot's willingness to insert large blocks of code from other projects verbatim. Reproducing code from other projects is less transformative and looks more like simple copying, and therefore opens GitHub to a legal argument that their AI model is not sufficiently transformative to be fair use.

    Transformative is not the key, incorporating large pieces of code is
    not the key. This is the peak of the iceberg for which people realised
    that their code has been used. The iceberg to handle is the learning
    process before it happens which is about the input collection. Here we
    are: the input collection of an AI/ML training system is what we want
    to keep free. Why do we want to keep the input collection? Because
    like in compilation we also have the entire model in freedom. This in
    exchange for the right to use our code as input data.

    I am pretty sure that those complaining about GitHub Copilot are not
    upset because the AI is not transformative enough to masquerade their
    code!

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Roberto A. Foglietta on Tue Feb 28 06:40:01 2023
    On Mon, 2023-02-27 at 01:45 +0100, Roberto A. Foglietta wrote:

    Because the principle of the copyright existence is about protecting
    the authors' exclusive of that {business, commercial, marketing}
    rights.

    The purpose of copyright is allegedly (in the USA) "To promote the
    Progress of Science and useful Arts, by securing for limited Times to
    Authors and Inventors the exclusive Right to their respective Writings
    and Discoveries.". The author's rights are *secondary*, which is why
    fair use exists. Of course these days copyright isn't very time-limited
    and in the age of DCMA/DRM, video mashups, fanfic and supporter-funded creators, copyright just ends up limiting progress in many fields,
    another reason why fair use is important. Making fair use only
    available for non-commercial uses would almost destroy it.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmP9dXAACgkQMRa6Xp/6 aaOOmRAAlCcSqm/gyLnL4P+D5IrTn5hUM1f4cCcVWaSAfiok0LXx7M7bDNuoh9m9 zqKJe6PHwlvR0O0INxzeyy+n2g+su6UYX8o0M9humcex17wR0/cM+xsUDZAfapv+ DM1mXSB6m0Zosa8V19pHhxko1TO+OtKqHTKbF+sLM/Nymq66DMtEuV0rf0WuCG0p CQiNPf9ftWKREIqsdOcxQFUgkrZwPzZqU3kRpwAjmdzhHZeaaqgc+jiVkxpDN5AJ MpB4Fnr0zO3Nz3P7mpyRw1UcXwj8dxybxzLxx1GI4SgrQfokUTnhzVpXdbOopTIn xvssWk4htHOCExhk2xmFLWj3nW6Wem10pS3LbIi3S8bh8m5jsh036BfwbivmSmbF iIuPkZ3bgGVDC9iuinSFLRPi2Jtm6eBzgOtv6cC5pv3ZYy0FVJLk47gXjKdfgNFW b7Entp0pNOaoWRzD1wUKRAlEsq0vW7pne/be+xxLVymRdyZkzDHuIhmI6PjkEI6i 5oghDlLzZScHb99jWl/KcVx/pUsVdPpj8hMGSZQVY9kGshxZUbtqgrNLXtUjaUud STSr44riUX4Pazq2E7vPJldamcnyJYF5xQdQP/W6JetBemWO4yGFmVyikGPPAFSs Q/UEKBT/7PXg5UAa+Z20SdhpfK4A7mhm6kgf9jXJpCV3DN22YNc=
    =2c4J
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Tue Feb 28 09:10:01 2023
    On Tue, 28 Feb 2023 at 08:33, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:


    One more thing about this following:


    2. if an author does not exercise a right for a long period of time
    enforcing it then that right is lost for the principle of "usucapio"
    in latin

    The "usucapio" principle might not be easily recognised by some
    common-law legal systems but for sure it is in every latin-law legal
    system. However, I am quite optimistic that common-law legal systems
    can recognise that principle because also common-law legal systems are indirectly based on latin-law foundations by historical ramification.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Paul Wise on Tue Feb 28 09:10:01 2023
    On Tue, 28 Feb 2023 at 06:31, Paul Wise <pabs@debian.org> wrote:

    On Mon, 2023-02-27 at 01:45 +0100, Roberto A. Foglietta wrote:

    Because the principle of the copyright existence is about protecting
    the authors' exclusive of that {business, commercial, marketing}
    rights.

    The purpose of copyright is allegedly (in the USA) "To promote the
    Progress of Science and useful Arts, by securing for limited Times to
    Authors and Inventors the exclusive Right to their respective Writings
    and Discoveries.".

    A nice political manifesto. However historically it was

    - a nice idea someone suggested to their queen
    - a way to control and exercise the censorship
    - a regalia for someone: salt, pepper, copyright

    https://en.wikipedia.org/wiki/History_of_copyright (*)

    Among almost in every different implementations, copyright had one
    thing in common:

    - protecting authors' exclusive rights of restrict anyone to use their
    work without the author consent

    Translated in modern words the authors' exclusive of {business,
    commercial, marketing} rights over their work. The idea of fair use
    came much later to boost social benefits more than made authors rich
    but one thing was innovative in copyright since the beginning

    - a monopoly established in favour of a class of "many individuals"

    I used "many individuals" here because "people" came from latin
    "plebis" and using "people" in this context would not be
    etymologically nor historically correct.

    Moreover, because a monopoly established in favour of individuals is a
    "right" like the right to have the property of something real - this
    explains the roots of the "intellectual property" term - and this
    right is establish by a law, then two things happen:

    1. if the law is not respected by almost all the people then it is not enforceable in a trial for the principle that has not been accepted by
    those it is supposed to rule over and to enforce it over one in
    particular, it is a very nasty discrimination while almost all are left
    free to not abide by that law.

    2. if an author does not exercise a right for a long period of time
    enforcing it then that right is lost for the principle of "usucapio"
    in latin

    This explains why someone - once has been informed that because they
    did not enforce a part of their right for a very long period of time,
    then that part of their right has been lost - tried to reclaim also
    those parts of their right. Unfortunately when the {speaking,
    intention, actions} are not properly aligned a late attempt is much
    worse than accepting the status quo. The way to hell is paved with
    good intentions and poor implementations.

    CONCLUSION

    I am very happy to know that the US copyright law has a very nice
    political manifesto but the way in which a law is applied, defines its principle. The application defines the law's aim or written with other
    words: the good that is going to protect and promote. [sarcasm on]
    After all, the U.S. are very notorious for free schooling and
    universities, right? [sarcasm off].

    NOTE

    (*) there is much more interesting and complete material about the
    history of copyright, especially in the London public library.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sam Hartman@21:1/5 to All on Tue Feb 28 15:50:01 2023
    "Roberto" == Roberto A Foglietta <roberto.foglietta@gmail.com> writes:

    Roberto> On Mon, 27 Feb 2023 at 19:08, Russ Allbery <rra@debian.org> wrote:
    >>
    >> No. It's entirely possible that using databases as training sets
    >> for an AI/ML engine is fair use under existing United States law
    >> and precedent as long as that use is sufficiently transformative
    >> (the first factor of the test, and I suspect the most important
    >> one here).

    Roberto> Considering what you reported in the previous e-mail about
    Roberto> US national law in 17 U.S.C. § 107 in 1976, It is not
    Roberto> possible to use an entire or a significant portion of a
    Roberto> database for {business, commercial, marketing} purposes
    Roberto> without the copyright holder.

    Please stop!
    It's clear that you are not building support for your argument.
    You've made your case to the best of your ability and not been
    convincing.

    But beyond that, this discussion is no longer on topic for
    debian-project.
    Debian cannot decide what the law is.
    We've established that this situation is complicated.
    You've proposed various things that someone could do to limit the use of
    free software in AI training sets.
    Other people have pointed out that may or may not work.
    You think it will.
    You haven't managed to convince your critics..
    We won't know until this gets hashed out in courts.

    That's about the level of detail appropriate for debian-project.
    Further discussion of this issue at this time on this list does not
    serve the community.

    --Sam

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iHUEARYIAB0WIQSj2jRwbAdKzGY/4uAsbEw8qDeGdAUCY/4RYwAKCRAsbEw8qDeG dOe/AQDOhTxIgovO7SiDleIf3yiYji52I61PYg+ZJ7SDCEGd6wD9FC4o9evRI5gO rGWNM7ZoncI9JTlrWFl9UMOGXsrK1gQ=7JgW
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Sam Hartman on Tue Feb 28 19:40:01 2023
    On Tue, 28 Feb 2023 at 15:36, Sam Hartman <hartmans@suchdamage.org> wrote:

    "Roberto" == Roberto A Foglietta <roberto.foglietta@gmail.com> writes:

    Roberto> On Mon, 27 Feb 2023 at 19:08, Russ Allbery <rra@debian.org> wrote:
    >>
    >> No. It's entirely possible that using databases as training sets
    >> for an AI/ML engine is fair use under existing United States law
    >> and precedent as long as that use is sufficiently transformative
    >> (the first factor of the test, and I suspect the most important
    >> one here).

    Roberto> Considering what you reported in the previous e-mail about
    Roberto> US national law in 17 U.S.C. § 107 in 1976, It is not
    Roberto> possible to use an entire or a significant portion of a
    Roberto> database for {business, commercial, marketing} purposes
    Roberto> without the copyright holder.

    Please stop!
    It's clear that you are not building support for your argument.
    You've made your case to the best of your ability and not been
    convincing.

    But beyond that, this discussion is no longer on topic for
    debian-project.
    Debian cannot decide what the law is.
    We've established that this situation is complicated.
    You've proposed various things that someone could do to limit the use of
    free software in AI training sets.
    Other people have pointed out that may or may not work.
    You think it will.
    You haven't managed to convince your critics..
    We won't know until this gets hashed out in courts.

    That's about the level of detail appropriate for debian-project.
    Further discussion of this issue at this time on this list does not
    serve the community.

    Ok, then. No problem. This will be my last message on this topic.

    However, my last suggestion here is to collect this material and share
    it with the FSF and FSF Europe. My aim was not to convince people
    (consensus gain) but to give technical details relevant to those who
    have a law education but usually lack the ability to properly
    understand technical IT mechanisms in detail. Only few have the
    ability to master both sides. It is not about complexity [1], it is
    about complication [2] and the complication arises because IT people
    and law people have two completely mindsets and risk/value perceptions
    and follow different rules to address them.

    We won't know until this gets hashed out in courts.

    About upgrading A/L/GPLv3 in A/L/GPLv4, it seems to me quite an
    urgent thing to do but challenging it in a court might happen years
    from now. So there is a lot of time for preparation.

    About "uscapio" and related questions, there is a very very little
    probability that someone will ever bring anyone in court and in case a
    very little patch in the kernel will make a huge difference in finding
    an agreement which is well known how it should be. The patch has been
    shared with some kernel maintainers some months ago and it is not
    pending to be applied because I did not complete all the steps
    required. That patch implies license and technical changes in
    perspectives, both.

    Debian cannot decide what the law is.

    Law is somewhat different in different countries, starting from those
    countries in which you have a better chance. There are many of them.
    Do not try to win the world in a single step but play chess instead.
    The king is the last piece to take, not the first one. If you feel in
    danger, grant your position in all the countries in which it is
    feasible and cheap enough. Bringing in allies is the first thing to
    do. Moreover, allies can be cheap for Debian to acquire and very
    costly for your counterparty to move on their side.

    Everyone that has a kind of urgency about doing business can employ me
    and I will set up a near-complete solution for them that I did not
    explain to everyone - oh, it is a risky business, then. Nein, it is about thinking out of the box and replicating the same scheme that worked in
    the past in other similar cases. And yes, this would greatly help the
    Debian community as well because it will break down every illusion
    about finding another way to go.

    Their resistance is futile (cit.) but enjoyable. :-)

    Good luck, R-

    [1] complexity (n.) "composite nature, quality or state of being
    composed of interconnected parts," from complex. Meaning "intricacy".

    [2] late Middle English: from late Latin compilation (n- ), from Latin complicare ‘fold together’ - what can be fols, can be unfolded
    (explained).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to roberto.foglietta@gmail.com on Tue Feb 28 20:00:01 2023
    On Tue, 28 Feb 2023 at 19:23, Roberto A. Foglietta <roberto.foglietta@gmail.com> wrote:

    Everyone that has a kind of urgency about doing business can employ me
    and I will set up a near-complete solution for them that I did not
    explain to everyone

    The "near-complete" does not mean that it is work-in-progress. It
    means that it fully covers every significant business case I know.
    Those that are not covered are not significant. For those I do not
    know, I cannot say anything about even if they exist or not.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Stallman@21:1/5 to All on Thu Mar 2 06:00:02 2023
    [[[ To any NSA and FBI agents reading my email: please consider ]]]
    [[[ whether defending the US Constitution against all enemies, ]]]
    [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > About upgrading A/L/GPLv3 in A/L/GPLv4, it seems to me quite an
    > urgent thing to do but challenging it in a court might happen years
    > from now. So there is a lot of time for preparation.

    Making a new version of the GPL is a big effort, and I'm the one who
    has to lead it. I have not been able to follow this discussion; it
    was long an complicated. If it described a reason to change the GPL,
    I could not see it.

    Would you please tell me the problem that you think the GPL needs to
    be changed for?

    --
    Dr Richard Stallman (https://stallman.org)
    Chief GNUisance of the GNU Project (https://gnu.org)
    Founder, Free Software Foundation (https://fsf.org)
    Internet Hall-of-Famer (https://internethalloffame.org)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Richard Stallman on Thu Mar 2 11:40:01 2023
    On Thu, 2 Mar 2023 at 05:31, Richard Stallman <rms@gnu.org> wrote:

    [[[ To any NSA and FBI agents reading my email: please consider ]]]
    [[[ whether defending the US Constitution against all enemies, ]]]
    [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > About upgrading A/L/GPLv3 in A/L/GPLv4, it seems to me quite an
    > urgent thing to do but challenging it in a court might happen years
    > from now. So there is a lot of time for preparation.

    Making a new version of the GPL is a big effort, and I'm the one who
    has to lead it. I have not been able to follow this discussion; it
    was long an complicated. If it described a reason to change the GPL,
    I could not see it.

    Would you please tell me the problem that you think the GPL needs to
    be changed for?

    Microsoft Github Copilot has shown to use large blocks of code without
    citing the author/project nor indicating the license terms about that
    code. This is the tip of the iceberg only because the problem is much
    worse than this and it will worsen faster. Debate is about fair use
    but is a blurry definition and defining what is not "fair use" seems
    not having gained enough consensus. Thus a general and standard
    solution is required, IMHO.

    - A/L/GPLv3 applies to source code and scripts that should be compiled
    or run by an interpreter (not only but just to be specific)

    - the AI/ML training engines use source code and scripts as data, this
    might or might not be a fair use, but for sure is a novelty which is
    not covered by A/L/GPLv3

    - then I decided to protect my projects repositories as database
    (collection) in addition to the standard way to protect the code with
    a well-known license

    - because of the copyright law about databases, if someone creates a
    larger database that contains my database or a part of it, then they
    have to comply with the license that I choose to protect my project as
    a database.

    At this point it is necessary to report how to upgrade these licenses
    has been proposed but first a brief summary about fair use:

    - fair use as legal term is a blurry one

    - fair use cannot be limited but expanded by the authors/licenses

    - fair use should include {testing, learning, storage} and usually it does

    - fair use cannot include {business, commercial, marketing} rights in
    any way and in any conditions and can relax these rights only a little
    bit and for those activities/professions that have a clear social
    role/value.

    To better understand this point of view, I suggest digging into the
    history of copyright. The London public library has a lot of material
    about it considering that the UK was one of the first countries to
    develop the law further than a mere top-down dictate.

    THE PROPOSAL

    A/L/GPLv4 is an update in which it will clearly state that the license
    applies to the composition and the {business, commercial, marketing}
    rights are reserved and exchanged for freedom. Then the license
    presents a "fair use" open definition in which some rights {testing,
    learning} are clearly included. Everything else should be brought back
    in these two categories. Finally, the license should state that every collection item that does not have its own specific copyright and
    license note/header, it is licensed under A/GPLv3.

    So, in the most simple case in which no any file report a specific
    copyright note/header but just the repository, then this happens:

    - git repository A is licensed with A/GPLv4
    - the composition is under A/GPLv4
    - every file is under A/GPLv3

    Thus this equation takes place:

    copyright : money ~alike~ copyleft : freedom

    and the definition of "fair use" is not intended to "change the law"
    but to give a standard interpretation of a blurry definition that
    exists in all legislations but differently perceived and differently
    written. Because the A/L/GPLv4 will have a global scope, then its
    "fair use" definition/clarification will help many countries to align
    to a standard definition and interpretation, We cannot change the law
    but we can help those do that job to converge toward a standard and
    reasonable definition.

    Moreover, I suggest to remember in the license that without the moral
    rights {authorship} the copyright itself has no meaning and thus all
    related rights are void. Just to remember those companies who are used
    to removing the name of authors from their source code headers in such
    a way nobody, even an internal inspection can find them and verify
    that all the rights have been properly and legally transferred. Again,
    this would not change the world but acknowledge developers about their
    rights. Education is as important as influencing as much ruling in
    court, especially in open-source software-libre.

    Everything above, IMHO and in the hope that it helps.

    Best regards,
    --
    Roberto A. Foglietta
    +49.176.274.75.661
    +39.349.33.30.697

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bradley M. Kuhn@21:1/5 to All on Fri Mar 3 05:30:01 2023
    Hey, everyone, as many of you probably know, I've been involved with many of the GPL and AGPL enforcement efforts that are (publicly) known to have happen in the USA since 1999, and also have been involved with the drafting process
    of various copyleft licenses. I currently am continuing that ongoing work along with my colleagues at Software Freedom Conservancy (SFC).

    From that context and point of view, there are three main points I want to contribute to this discussion:

    Point 0:

    Always keep in forefront of your mind that the complexity of legal issues and enforcement of licenses lags technology by a period measured in decades. For example, many folks have referenced the Google v. Oracle SCOTUS case — which dealt with questions of software licensing and copyright that we were discussing in the copyleft community as far back as the early 1980s. Yet,
    the case didn't come before SCOTUS for consideration until a few years ago,
    and (on top of that) SCOTUS' decision was complex and didn't really resolve some of the fundamental questions that we all have about how software
    licensing works. Most of the key issues (such as “where is the bright line for when it becomes copyright infringement if you reimplement a known, documented API”?) that we in FOSS were worried about, while they *came up* in Google v. Oracle, they still remain open legal questions in the USA.


    Point 1:

    FOSS licensing doesn't rely solely on copyright law. Yes, grant of a
    copyright license is the fundamental part of all FOSS licenses, but they are also contractual agreements too. (For folks unfamiliar with this point, I encourage you to read the stuff we published at SFC's when we filed our case against contract Vizio <https://sfconservancy.org/vizio/>.) So, when
    thinking about these questions, an exclusive focus on copyright questions
    might not be particularly helpful.

    Furthermore, copyright law isn't moral code: it's just an extremely flawed legal system that we're forced to deal with because various regimes decided back in the 1970s that software would be governed by copyright. What
    copyright law says or doesn't say in any particular jurisdiction never
    provides us any moral compass to what is wrong or right for software freedom. We must approach *that* question “a priori” (and as philosophers) because all
    the “a posteriori” exploration of the question in the real world are just too
    heavily biased by the incumbent capitalist structures that serve and/or
    benefit from the proprietarization of software.

    On that point, I do invite everyone over to the mailing list we're hosting at SFC to discuss the morality and ethical implications in FOSS of machine-learning-assisted software development. You can read more about
    this, and subscribe the maling list, via: <https://sfconservancy.org/news/2022/feb/23/committee-ai-assisted-software-github-copilot/>

    Point 2:

    There are a number of mistakes FOSS activists have made historically in copyleft licensing creation and drafting. Having been involved myself in the invention and drafting of AGPLv3, and a somewhat-involved witness to the
    GPLv3 drafting process, I learned the hard way that trying to address every “issue of the day” quickly in a copyleft license draft leads to problems.

    A big example appears in the patent provisions found in A/GPLv3§11¶3-6. They are complicated, unnecessarily wordy, and as full of loopholes as the worst
    tax legislation. Admittedly, the primary problem there may be that the drafting process was over-influenced by large patent holders. However, the reason such influence was successful was because of a fervor of concern among FOSS activists about seemingly-urgent patent issues of the day. In
    hindsight, those issues were either moot, or turned out even *worse* than we imagined, and therefore poorly addressed by this section anyway.

    To be clear abundantly clear so I'm not misunderstood: I'm analyzing these issues in hindsight to help inform our current issues of the day. Lots of really experienced and smart policy people contemporaneously believed
    (probably reasonably) that A/GPLv3§11 was the be-all-end-all of patent language for copyleft. But the behavior and legislation both changed in the intervening years, *and* some seemingly huge problems of those days also seem minuscule in the rear view mirror a decade later, and problems that we
    thought were solved or could be solved stubbornly got worse.

    Most importantly to this point, over the decade after GPLv3's release, lots
    of corporate attorneys pushed heavily anti-GPLv3 agendas — claiming that the patent language was the problem. In fact, after years of work responding to those (as it turned out, specious) criticisms, we later learned that the
    patent language was just a convenient place to hang their hats in their
    broader anti-GPLv3 campaign. So, IMO, we (as a FOSS community) got basically *no* policy gains on patent issues in GPLv3 that we didn't already have in GPLv2, *but* we handed the opposition a bunch of text for them to paint as “big scary reasons” to avoid GPLv3. That's a huge factor in how we ended up
    in the complex GPLv2-only / GPLv3-or-later divide in copyleft circles that we have today.

    IMO, this seemingly unrelated example really shows three key issues highly relevant to the issue of machine-learning and FOSS:

    (a) it's very easy as a FOSS license drafter to be caught up in the issues
    of the day and overcompensate by writing more text into the license
    thinking it's great policy but then it backfires for
    political/social/enforceability/advocacy reasons,

    (b) the echo chambers and deference to incumbent authority that have
    historically dominated FOSS license drafting really have been
    problematic and we've not fully explored how to solve that for future
    drafting, and

    (c) because copyleft is such an amazing invention, we (as a FOSS community)
    have a tendency to see everywhere nails that we think the hammer of
    copyleft can hit — even when they may well be screws, not nails.

    On (c), I point to my current-favorite license, AGPLv3, which I admittedly helped design and draft. Ultimately, AGPLv3 didn't do nearly as much as we'd hoped to solve software rights for network-deployed software, precisely
    because the software freedom and rights issues that come up in such software *can't* fully be addressed merely by a copyleft provision. We erred because
    we didn't see the obvious: a good copyleft license is a *necessary* but not a *sufficient* condition to assure users' software rights and freedoms.

    Furthermore, we didn't carefully consider when building the Affero clause how much it could be abused in proprietary licensing schemes by companies like Neo4j, MongoDB, and others. Specifically, only years later did the community (thanks to Richard Fontana) figure out that a copyleft equality clause was an absolutely mandatory to offset the problem more on this at <https://sfconservancy.org/blog/2020/jan/06/copyleft-equality/>. Proprietary relicensing is more-or-less a relatively simple problem to describe and
    study, yet it took us about 30 years to come up with a copyleft clause that
    can actually address the problem elegantly and in an enforceable way.

    As such, based on all this that I've learned in copyleft drafting, I advise *extreme caution* about rushing to copyleft as an obvious solution to the disturbing things happening with machine learning applications. There may
    well be ways copyleft can be used to fight back against the horrible things that OpenAI, Microsoft's GitHub, and dozens of other for-profit companies are doing with machine learning. However, I'm quite sure that whatever ways we think copyleft can (or can't) be modified/improved/changed/applied to help
    may well turn out to be the wrong decision if we rush.

    The most important thing we can do now is advocacy: first and foremost, we
    need to raise awareness about why this technology is bad for users and
    impedes their software freedom and rights. There are natural allies around — from folks in the visual arts, to those who have correctly pointed out that machine learning systems trained on existing date usually propagate the
    biases inherent in past decisions and work. Time spent coalition building
    will serve us better than more navel-gazing at copyleft terms on this front.

    Ultimately, if there *is* a legalistic/licensing solution implementable in copyleft, the right one won't become apparent until the dangers and problems are fully understood by society. Similar to the advent of copyleft itself as
    a strategy: proprietary software had to actually become a thing and a problem before we could figure out how to answer it with copyleft.

    Inventing new copyleft terms shouldn't be the first place we run to when
    facing a threat to software rights or freedom; it should be a solution used only sparingly when we're sure no other solution (including, most
    importantly, enforcing the copyleft terms that we *have* already) will work
    to address to the problem.

    -- bkuhn

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Stallman@21:1/5 to All on Sat Mar 4 05:50:01 2023
    [[[ To any NSA and FBI agents reading my email: please consider ]]]
    [[[ whether defending the US Constitution against all enemies, ]]]
    [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > - then I decided to [restrict] my projects repositories as database
    > (collection) in addition to the standard way to [restrict] the code with
    > a well-known license

    I absolutely reject using the word "protect" to describe what copyright does.

    > - fair use cannot include {business, commercial, marketing} rights in
    > any way

    My understanding is that it sometimes does permit commercial use of
    material, but mostly it does not. Fair use depends on the purpose of
    the use. If the work is published commercially for education, for
    instance, it might be fair use.

    > A/L/GPLv4 is an update in which it will clearly state that the license
    > applies to the composition and the {business, commercial, marketing}
    > rights are reserved and exchanged for freedom.

    I cannot concretely understand "the XYZ rights are reserved and
    exchanged for freedom." Are you proposing a substantive change in
    what people can do with a GPL-covered work, or an implementation
    change intended to result in roughly the same permissions as now?

    > Then the license
    > presents a "fair use" open definition in which some rights {testing,
    > learning} are clearly included. Everything else should be brought back
    > in these two categories.

    I don't understand "brought back in these two categories".

    > So, in the most simple case in which no any file report a specific
    > copyright note/header but just the repository, then this happens:

    > - git repository A is licensed with A/GPLv4
    > - the composition is under A/GPLv4
    > - every file is under A/GPLv3

    I think that is true already.

    --
    Dr Richard Stallman (https://stallman.org)
    Chief GNUisance of the GNU Project (https://gnu.org)
    Founder, Free Software Foundation (https://fsf.org)
    Internet Hall-of-Famer (https://internethalloffame.org)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Richard Stallman on Sat Mar 4 09:10:01 2023
    On Sat, 4 Mar 2023 at 05:16, Richard Stallman <rms@gnu.org> wrote:

    [[[ To any NSA and FBI agents reading my email: please consider ]]]
    [[[ whether defending the US Constitution against all enemies, ]]]
    [[[ foreign or domestic, requires you to follow Snowden's example. ]]]


    Dear Richard,

    I do not know personally "Bradley M. Kuhn" <bkuhn@sfconservancy.org>
    but I appreciate very much his answer in which he set several points

    https://lists.debian.org/debian-project/2023/03/msg00004.html

    Please, focus on his answer instead of mine. As I wrote to you in
    private, I have nothing to add on this subject anymore.

    Collaboration is the key to success.

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Stallman@21:1/5 to All on Wed Mar 15 05:20:01 2023
    [[[ To any NSA and FBI agents reading my email: please consider ]]]
    [[[ whether defending the US Constitution against all enemies, ]]]
    [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > I do not know personally "Bradley M. Kuhn" <bkuhn@sfconservancy.org>
    > but I appreciate very much his answer in which he set several points

    > https://lists.debian.org/debian-project/2023/03/msg00004.html

    I will take a look. Thanks.

    --
    Dr Richard Stallman (https://stallman.org)
    Chief GNUisance of the GNU Project (https://gnu.org)
    Founder, Free Software Foundation (https://fsf.org)
    Internet Hall-of-Famer (https://internethalloffame.org)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roberto A. Foglietta@21:1/5 to Richard Stallman on Wed Mar 22 05:20:01 2023
    On Wed, 15 Mar 2023 at 04:44, Richard Stallman <rms@gnu.org> wrote:

    [[[ To any NSA and FBI agents reading my email: please consider ]]]
    [[[ whether defending the US Constitution against all enemies, ]]]
    [[[ foreign or domestic, requires you to follow Snowden's example. ]]]

    > I do not know personally "Bradley M. Kuhn" <bkuhn@sfconservancy.org>
    > but I appreciate very much his answer in which he set several points

    > https://lists.debian.org/debian-project/2023/03/msg00004.html

    I will take a look. Thanks.


    March 16, 2023 - Copyright Office Launches New Artificial Intelligence Initiative by Copyright and Artificial Intelligence

    - https://www.copyright.gov/ai/

    U.S. Copyright Office Weighs in on the AI Debate

    The U.S. Copyright Office has weighed in on the debate and have
    ultimately assessed that only human-made works are eligible for
    protection. In a report published last week, the Office cites a 2018
    submission in which the applicant described their work as
    “autonomously created by a computer algorithm running on a machine.”
    After a series of appeals, the artwork was ultimately denied a
    copyright because it was made “without any creative contribution from
    a human actor.”

    The Office explained further:

    “For example, if a user instructs a text-generating technology to
    “write a poem about copyright law in the style of William
    Shakespeare,” she can expect the system to generate text that is
    recognizable as a poem, mentions copyright, and resembles
    Shakespeare’s style. But the technology will decide the rhyming
    pattern, the words in each line, and the structure of the text. When
    an AI technology determines the expressive elements of its output, the generated material is not the product of human authorship.”

    - https://hypebeast.com/2023/3/u-s-copyright-office-ai-report

    Best regards, R-

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)