Forum: >>> Magnum BBS <<<

Re: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (

From Stephan =?ISO-8859-1?Q?Verb=FCcheln@21:1/5 to All on Mon Apr 28 17:50:02 2025

XPost: linux.debian.vote

Is the change technical or legal/philosophical? You could call this
a Turing test for copyright.

This is not a new issue at all. I remember that back in the day in
order to legally reverse engineer a computer program, companies had to
set up two separate teams of developers.
One team reads the code and writes documentation. The second team reads
the documentation and writes the new code. It was crucial that no
member of the second team sees the original code in order to rule out
any copyright issues.

Processing of experiences into expert opinion is IMHO not directly
comparable with compilation of source to a binary.

DSFG does not only apply to programming languages and program binaries.
For all data blobs in Debian packages, it is preferred to include the
scripts that generate it, for images it is preferred to have the SVG
code over the generated pixel graphics, etc.

For a reason, the relevant licenses do not define “source code” by
being in a programming language readable by humans. They define it like
this (example from GPLv3):

The “source code” for a work means the preferred form of the work for making modifications to it.

In that definition, training data is quite obviously relevant. No one
tweaks neural network model weights manually.

Compare this to the previously mentioned example of S-boxes in
cryptography. They are small and usually created manually.

Regards
Stephan

-----BEGIN PGP SIGNATURE-----

iHUEABYKAB0WIQRB1rjSpCJd8a7h6mNgNUJZCjx8YgUCaA+i0wAKCRBgNUJZCjx8 YhAlAP0S6RiRSBO5HWAgrz3qxtAy7rZtKMQ6SD+iUwfrv97RHQEAsVty3X0BDav8 PGf0grE7hfxUse5WKVB9aa/MPpMlJwM=
=wMch
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gunnar Wolf@21:1/5 to All on Tue Apr 29 02:10:01 2025

XPost: linux.debian.vote

Stephan Verbücheln dijo [Mon, Apr 28, 2025 at 03:46:27PM +0000]:

(...)

The “source code” for a work means the preferred form of the work
for making modifications to it.

In that definition, training data is quite obviously relevant. No one
tweaks neural network model weights manually.

Compare this to the previously mentioned example of S-boxes in
cryptography. They are small and usually created manually.

I understand that, when you consider trained models as the "thing" to be modified, the preferred form of modification is the model itself: What RAG
does is to have a base trained LLM (confering the "mastery" of language),
and training over it with the domain-specific knowledge.

-----BEGIN PGP SIGNATURE-----

wr0EABYKAG8FgmgQFn0JEOL2O0NT9FmJRxQAAAAAAB4AIHNhbHRAbm90YXRpb25z LnNlcXVvaWEtcGdwLm9yZ5SbuobM6rOE6ACYpp9B2PYPdCzFsgjyiGEszl0ZbPt4 FiEEYLMJPZYQjly5cULv4vY7Q1P0WYkAAGanAP97QEe9VlzhRQRpN6KKzPrzN/vr Sg3a28DcbTWyLjiu1AD/RlY8Io3/pzZP0Cf/PgajfEleZw8J6ZsOJHb4dtmi/Ak=
=77Je
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Aigars Mahinovs@1:229/2 to All on Mon Apr 28 18:30:01 2025

XPost: linux.debian.vote
From: aigarius@gmail.com

On Mon, 28 Apr 2025 at 17:46, Stephan Verbücheln <verbuecheln@posteo.de> wrote:

Is the change technical or legal/philosophical? You could call this
a Turing test for copyright.

This is not a new issue at all. I remember that back in the day in
order to legally reverse engineer a computer program, companies had to
set up two separate teams of developers.
One team reads the code and writes documentation. The second team reads
the documentation and writes the new code. It was crucial that no
member of the second team sees the original code in order to rule out
any copyright issues.

But, does it? If we consider the product of trained knowledge to be a derivative work of the training input, then the documentation produced by
the first team would also be tainted by the copyright of the original code.
So such interpretation also defeats the whole two-teams process.

And many modern LLMs are actually often trained in stages - there is a very large model that is trained on the source data and then there are compact models that are actually trained by the first model. It's called model distillation.

And then there are other methods of getting new information into already trained models at runtime via RAG technique - with that a LLM may only
contain fundamental information and then reach out to load additional data sources, relevant to the specific query. Like an expert going online and checking prices and availability of various products before advising you
what to choose for your planned build. At this point the LLM+RAG is just a smart web browser.

(Sadly, I am *not* an expert on modern AI technologies)

--
Best regards,
Aigars Mahinovs

<div dir="ltr"><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, 28 Apr 2025 at 17:46, Stephan Verbücheln <<a href="mailto:verbuecheln@posteo.de">verbuecheln@posteo.de</a>> wrote:<br></div><blockquote class="
gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> Is the change technical or legal/philosophical? You could call this<br>
> a Turing test for copyright.<br>
This is not a new issue at all. I remember that back in the day in<br>
order to legally reverse engineer a computer program, companies had to<br>
set up two separate teams of developers.<br>
One team reads the code and writes documentation. The second team reads<br>
the documentation and writes the new code. It was crucial that no<br>
member of the second team sees the original code in order to rule out<br>
any copyright issues.</blockquote><div><br></div><div>But, does it? If we consider the pr

Who's Online
Recent Visitors
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 06:57:56 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 06:47:28 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	147:29:40
Calls:	10,383
Calls today:	8
Files:	14,054
D/L today:	2 files (1,861K bytes)
Messages:	6,417,731

Re: Re: Proposal -- Interpretation of DFSG on Artificial Intelligence (

Who's Online

Recent Visitors

System Info