Forum: >>> Magnum BBS <<<

Request for comments: Scorpion protocol/file-format

From news@zzo38computer.org.invalid@21:1/5 to All on Sun Apr 7 18:04:52 2024

XPost: comp.protocols.misc

I would like to see what other people's criticism of Scorpion protocol
and file format that I had made up. It is alternative than HTTP/HTML,
Gemini, Gopher, Spartan, etc.

Note that it won't (and is not intended to) replace any of those; you can
even link between them easily (which is intentional). (Gopher requires the
use of a hack to handle this properly, but nevertheless it works OK.)

You can access the specification document by:
echo 'R scorpion://zzo38computer.org/specification.txt' | nc zzo38computer.org 1517 | less

Alternatively, it can be accessed by GitHub:
https://github.com/zzo38/scorpion/blob/trunk/Specification

The document is possible to be changed in future, in case something is
wrong with it (including if something is missing). (I can also add more
FAQ entries if you have other frequent questions, too.)

If you want to criticize this, then in addition to the above document,
you should also be famililar with section 7 of the Gemini protocol FAQ:
echo 'gemini://geminiprotocol.net/docs/faq-section-7.gmi' | ncat --ssl geminiprotocol.net 1965 | less

My process is different from that described in the Gemini FAQ in many ways, although there are some similarities, and much of what is described there
is still relevant to what I am doing.

Scorpion protocol/file-format is not intended to be a strict subset or
strict superset of anything else. However, it is intended to be simpler
and less messy than the alternatives, in many ways.

--
Don't laugh at the moon when it is day time in France.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sean@conman.org@21:1/5 to news@zzo38computer.org.invalid on Mon Apr 8 06:42:47 2024

XPost: comp.protocols.misc

In comp.infosystems news@zzo38computer.org.invalid wrote:

I would like to see what other people's criticism of Scorpion protocol
and file format that I had made up. It is alternative than HTTP/HTML,
Gemini, Gopher, Spartan, etc.

My initial response to the specification:

First, what is ULFI? All I bring up when I search on that is "Upper Limb Functional Index"---I can't seem to locate anything that is close to MIME.
If you do use TLAs [1] and ETLAs [2], please define it somewhere in the document for those who are unfamiliar with it.

Second, URL support ... do you expect people to follow RFC-3986?
RFC-3987? Or the WHATWG living specification?

Third: On TLS, methinks you underestimate how difficult it is to check
the first byte of a request is 0x16 and have an existing TLS library take
over the connection if it is. I'm not saying it's impossible, just more technically difficult than you may think. Have you implemented a server
that supports both TLS and non-TLS support on the same port?

Third the second: More TLS---those who like TLS might take offence at support for non-TLS---an attacker can easily MITM [3] requests to force
non-TLS requests, thus defeating the purpose of TLS in the first place.

Third the third: There will be a subset of people who hate TLS, and
demand that you don't use it, but use some other, possibly bespoke,
encryption system instead. Before taking these people seriously, demand a proof-of-concept and an analysis by real cryptographers before you engage
with them. It'll save time.

Third the fourth: What's with the weird SNI support? The client should
use it, but the server should not? What?

Third the fifth: What do you mean by "clients SHOULD allow to use the system's DNS services to implement encrypted Client Hello"? And what's with the following? "if implemented, there MUST be an option to disable this feature."

Fourth: impose a hard limit on clients following redirects. I know from experience that if this isn't mandatory, no one will implement it. Even if
it is mandatory, some won't implement it, but hopefully it'll be a smaller subset who ignore this.

Fifth: Some server implementor will hard code a 2147483647 on a 4x reply, which is 69 years. Clients will obviously ignore such a silly request,
leading to an arms race. Don't bother with a timeout value.

Sixth: For the sub-protocol I, please use BNF for capability codes. And what's with terminal emulators?

Seventh: The Hashed URI section---what? You first said relative URLs
aren't allowed in a request, so is this meant for documents? What does the hash buy you here? And why number the hash algorithms instead of just
listing their names? This is getting complicated, quickly.

Eighth: oh, a new document format. Nice. Binary HTML. Even better.
Big endian---I don't mind, but it's not fasionable among kids today (because Intel won; Motorola lost and get over it Boomer!) and will be complained
about. And by "nice" I mean "oh god!" You'll get people bitching about not being able to include control data with their favorite editors and besides, you're redefining well defined control codes. You are NOT going to get acceptance of this, or the following database file format.

Ninth: ".special/crawl"? Really? Not "/robots.txt"? Or "/.wellknown/robots.txt"? Sigh. Even Gemini repurposed "/robots.txt", a
well known and supported format. But if you insist on a new format, perhaps
a example (or four) could be included?

Tenth: What is the purpose of ".special/conversion"? What file formats
to what file formats?

Thus ends my initial reaction to the specification.

-spc

[1] Three Letter Acronym

[2] Extended Three Letter Acronym

[3] Man-in-the-Middle

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From news@zzo38computer.org.invalid@21:1/5 to sean@conman.org on Mon Apr 8 16:06:58 2024

XPost: comp.protocols.misc

Thank you for your comments. I will try to respond to them the best that I
can, and will add whatever is necessary to the FAQ as well as to modify
other parts of the document as appropriate.

Some of the changes mentioned below I have done; others I have partially
done or not added yet. I will continue to work on it later, though.

sean@conman.org wrote:

First, what is ULFI? All I bring up when I search on that is "Upper Limb Functional Index"---I can't seem to locate anything that is close to MIME.
If you do use TLAs [1] and ETLAs [2], please define it somewhere in the document for those who are unfamiliar with it.

Thank you; I will write about it. (In this context, ULFI is short for "Unordered Labels File Identification".)

Second, URL support ... do you expect people to follow RFC-3986?
RFC-3987? Or the WHATWG living specification?

RFC 3986. (However, the "hashed:" scheme has its own rules.)

Third: On TLS, methinks you underestimate how difficult it is to check
the first byte of a request is 0x16 and have an existing TLS library take over the connection if it is. I'm not saying it's impossible, just more technically difficult than you may think. Have you implemented a server
that supports both TLS and non-TLS support on the same port?

I thought you could use recv with the MSG_PEEK flag. (However, I did not actually try that (yet). If I am wrong, then you can tell me what is wrong
with that, please.)

Third the second: More TLS---those who like TLS might take offence at support for non-TLS---an attacker can easily MITM [3] requests to force non-TLS requests, thus defeating the purpose of TLS in the first place.

An implementation may allow the user to configure it to not use non-TLS for some (or all) servers. (This is similar than "HTTPS-Everywhere", but it is
not specific to HTTP(S).)

Additionally, the client is supposed to display a warning message if a
redirect from TLS to non-TLS (or vice-versa) occurs.

I think non-TLS has benefits such as improved simplicity and improved
energy efficiency. However, sometimes encryption is desirable, so TLS
is permitted, too.

Third the third: There will be a subset of people who hate TLS, and
demand that you don't use it, but use some other, possibly bespoke, encryption system instead. Before taking these people seriously, demand a proof-of-concept and an analysis by real cryptographers before you engage with them. It'll save time.

I have considered that, and have decided against it (at least for now), for
the reasons you specify, and for reasons mentioned in the Gemini FAQ (see section 4.5.3). So, for now, it uses TLS.

Third the fourth: What's with the weird SNI support? The client should use it, but the server should not? What?

Maybe it is unclear. What I mean is that the server shouldn't require SNI
since the host name is included in the request anyways.

However, possibly SNI might be needed for the server to present the proper certificate to the client; if that is the case, then the server may present
an invalid certificate when the wrong (or no) SNI is used.

Third the fifth: What do you mean by "clients SHOULD allow to use the system's DNS services to implement encrypted Client Hello"? And what's with the following? "if implemented, there MUST be an option to disable this feature."

Perhaps my specification is unclear. However, I am not sure how to write it more clearly.

Fourth: impose a hard limit on clients following redirects. I know from experience that if this isn't mandatory, no one will implement it. Even if it is mandatory, some won't implement it, but hopefully it'll be a smaller subset who ignore this.

OK. I added it.

Fifth: Some server implementor will hard code a 2147483647 on a 4x reply, which is 69 years. Clients will obviously ignore such a silly request, leading to an arms race. Don't bother with a timeout value.

OK, it is a good point. Even in Gemini protocol they suggested removing the time specification in a 4x reply.

Sixth: For the sub-protocol I, please use BNF for capability codes. And what's with terminal emulators?

OK, I will add that; it is a good idea.

Seventh: The Hashed URI section---what? You first said relative URLs aren't allowed in a request, so is this meant for documents? What does the hash buy you here? And why number the hash algorithms instead of just listing their names? This is getting complicated, quickly.

That is correct that relative URLs aren't allowed in a request, although hashed: URLs are not necessarily relative (although they can be). Anyways,
it isn't useful to be used in a request (although some servers might allow
them in proxied requests (if the URL after the comma is absolute), but this
is generally discouraged).

Its use is that links to files can specify the hash so that you can verify
on the client side that the file has not changed (and that spies have not tampered with it, if the source of the hash is trustworthy).

Eighth: oh, a new document format. Nice. Binary HTML. Even better.
Big endian---I don't mind, but it's not fasionable among kids today (because Intel won; Motorola lost and get over it Boomer!) and will be complained about. And by "nice" I mean "oh god!" You'll get people bitching about not being able to include control data with their favorite editors and besides, you're redefining well defined control codes. You are NOT going to get acceptance of this, or the following database file format.

The internet is supposed to big-endian, isn't it? Although I think that small-endian is better (independently of what computers use it), I think
that it isn't that significant that it is worth violating the convention
of internet in this way. (Also, uxn is big-endian.)

A text-based format would be much more difficult for the client to parse, to have to handle difficult escaping and nesting and other stuff like that. A binary format will be simpler, especially a "flat" one such as this one,
rather than being nested like HTML and XML.

There are a few possibilities for how to write the document, such as using
a specialized editor, or using a converter or a static site generator.

Ninth: ".special/crawl"? Really? Not "/robots.txt"? Or "/.wellknown/robots.txt"? Sigh. Even Gemini repurposed "/robots.txt", a well known and supported format. But if you insist on a new format, perhaps a example (or four) could be included?

I think that there are problems with the robots.txt format, including a possible confusion of what is mandatory and optional.

I will add an example because you are correct it is a good idea to do so.
(I did not add it yet; sorry. I will do so later.)

Tenth: What is the purpose of ".special/conversion"? What file formats
to what file formats?

Any file formats to any file formats.

--
Don't laugh at the moon when it is day time in France.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From sean@conman.org@21:1/5 to news@zzo38computer.org.invalid on Tue Apr 9 04:06:28 2024

XPost: comp.protocols.misc

In comp.infosystems news@zzo38computer.org.invalid wrote:

Thank you for your comments. I will try to respond to them the best that I can, and will add whatever is necessary to the FAQ as well as to modify
other parts of the document as appropriate.

Some of the changes mentioned below I have done; others I have partially
done or not added yet. I will continue to work on it later, though.

sean@conman.org wrote:

First, what is ULFI? All I bring up when I search on that is "Upper Limb >> Functional Index"---I can't seem to locate anything that is close to MIME. >> If you do use TLAs [1] and ETLAs [2], please define it somewhere in the
document for those who are unfamiliar with it.

Thank you; I will write about it. (In this context, ULFI is short for "Unordered Labels File Identification".)

You go into some detail, but not enough to answer "what is this for?" It looks like it's supposed to replace MIME but ... how? There are no
examples, and a web search only brings up references to unordered lists in HTML.

Third: On TLS, methinks you underestimate how difficult it is to check
the first byte of a request is 0x16 and have an existing TLS library take
over the connection if it is. I'm not saying it's impossible, just more
technically difficult than you may think. Have you implemented a server
that supports both TLS and non-TLS support on the same port?

I thought you could use recv with the MSG_PEEK flag. (However, I did not actually try that (yet). If I am wrong, then you can tell me what is wrong with that, please.)

You aren't wrong, but uch a method isn't mentioned that much (if at all)
in most networking tutorials, and if you are going for implementation simplicity (which you haven't explicitly stated) then yes, this is "more technically difficult than you may think." I would try an implemention
before pushing for this myself. This was never done for HTTP---I wonder
why?

Fourth: impose a hard limit on clients following redirects. I know from >> experience that if this isn't mandatory, no one will implement it. Even if >> it is mandatory, some won't implement it, but hopefully it'll be a smaller >> subset who ignore this.

OK. I added it.

Not strong enough. In RFC-speak, MUST is stronger (mandatory) than
SHOULD.

Sixth: For the sub-protocol I, please use BNF for capability codes. And >> what's with terminal emulators?

OK, I will add that; it is a good idea.

You still lack a description of what this is used for.

Seventh: The Hashed URI section---what? You first said relative URLs
aren't allowed in a request, so is this meant for documents? What does the >> hash buy you here? And why number the hash algorithms instead of just
listing their names? This is getting complicated, quickly.

Its use is that links to files can specify the hash so that you can verify
on the client side that the file has not changed (and that spies have not tampered with it, if the source of the hash is trustworthy).

"Spies" that tamper with the file on the server can just as easily tamper with the hash. Tampering in transit is protected though.

Eighth: oh, a new document format. Nice. Binary HTML. Even better.
Big endian---I don't mind, but it's not fasionable among kids today (because >> Intel won; Motorola lost and get over it Boomer!) and will be complained
about. And by "nice" I mean "oh god!" You'll get people bitching about not >> being able to include control data with their favorite editors and besides, >> you're redefining well defined control codes. You are NOT going to get
acceptance of this, or the following database file format.

The internet is supposed to big-endian, isn't it? Although I think that small-endian is better (independently of what computers use it), I think
that it isn't that significant that it is worth violating the convention
of internet in this way. (Also, uxn is big-endian.)

Yes, most networking protocols for the Internet are big-endian, but man,
do people complain about it now that Intel won. Besides, there are file formats, like ZIP files, that are little-endian in nature. I'm not arguing
for little-endian (like I said, I like big-endian myself). I'm just saying
be prepared for pushback on this.

A text-based format would be much more difficult for the client to parse, to

And yet, in ULFI section, you have people parsing

a.b:c SAME AS c:a.b
a:b+c:d SAME AS a:b:b.c:d

and you say "text-based format would be much more difficult for the client
to parse" with a straight face?

have to handle difficult escaping and nesting and other stuff like that. A binary format will be simpler, especially a "flat" one such as this one, rather than being nested like HTML and XML.

One of the big complaints about text/gemini is the lack of nested lists.

There are a few possibilities for how to write the document, such as using
a specialized editor, or using a converter or a static site generator.

From arguments I've seen about binary-data in otherwise text documents
[3], if it's can't be done in an existing editor, it's a non-starter.

Tenth: What is the purpose of ".special/conversion"? What file formats >> to what file formats?

Any file formats to any file formats.

Why is that a part of a *protocol* specification?

-spc

[1] Three Letter Acronym

[2] Extended Three Letter Acronym

[3] Whenever CSV (Comma separated values) files come up on Hacker News
or Lobste.rs, inevitably, someone will mention that ASCII defines
four explicit separator characters, FS (File Separator), GS (Group
Separator), RS (Record Separator) and US (Unit Separator) and the
use of those will fix most problems with CSV. The pushback comes
when opponents of ASCII separators claim a file that uses such
characters can't be edited in a normal text editor so STFU! It's so
bad that people who push for TSV (Tab separated values) will get
pushback for the (ab)use of tabs in a text file.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From news@zzo38computer.org.invalid@21:1/5 to sean@conman.org on Wed Apr 10 16:01:25 2024

XPost: comp.protocols.misc

sean@conman.org wrote:

Thank you; I will write about it. (In this context, ULFI is short for "Unordered Labels File Identification".)

You go into some detail, but not enough to answer "what is this for?" It looks like it's supposed to replace MIME but ... how?

I explained more below, because you had written another comment below.

I thought you could use recv with the MSG_PEEK flag. (However, I did not actually try that (yet). If I am wrong, then you can tell me what is wrong with that, please.)

You aren't wrong, but uch a method isn't mentioned that much (if at all)
in most networking tutorials, and if you are going for implementation simplicity (which you haven't explicitly stated) then yes, this is "more technically difficult than you may think." I would try an implemention before pushing for this myself. This was never done for HTTP---I wonder
why?

Implementation simplicity is more important for mandatory parts than for optional parts. Of course TLS is itself complicated, which is one of the reasons for being made optional (although there are other reasons too).

I will try the implementation; so far I have not implemented TLS in the
server side at all. However, this will require more changes just to make
it work with TLS at all, so itmight take a while before I will manage to implement this. (Other people are free to write their own implementations,
and if they want to implement TLS, then they will try this instead.)

(I have once accessed a HTTPS server that does support this actually,
although after sending an unencrypted HTTP request on port 443, I received
a valid HTTP response but it was just an error message that says that unencrypted requests on port 443 are not allowed.)

Fourth: impose a hard limit on clients following redirects ...

OK. I added it.

Not strong enough. In RFC-speak, MUST is stronger (mandatory) than
SHOULD.

It says:
If the number of consecutive redirects exceed the limit (which MUST be
not more than five by default, although it may be configurable by the
user), then the client MUST NOT automatically follow further redirects.

It does say MUST.

Sixth: For the sub-protocol I ...

OK, I will add that; it is a good idea.

You still lack a description of what this is used for.

The document does explain what it is used for. (If it is unclear, then hopefully someone who can explain it better, is able to do so.)

Its use is that links to files can specify the hash so that you can verify on the client side that the file has not changed (and that spies have not tampered with it, if the source of the hash is trustworthy).

"Spies" that tamper with the file on the server can just as easily tamper with the hash. Tampering in transit is protected though.

That depends on where (and when) you got the hash from.

(The hash is useful for other purposes too, such as for caching, for finding another copy of the file (if someone has it indexed by its hash then you can verify that it is correct), verifying that if you linked to a file that the file has not been changed since then, etc.)

TLS does not prevent the server operator from changing the files to
malicious ones, nor does it prevent some other stuff; TLS does not (and
cannot) solve everything.

Yes, most networking protocols for the Internet are big-endian, but man,
do people complain about it now that Intel won. Besides, there are file formats, like ZIP files, that are little-endian in nature. I'm not arguing for little-endian (like I said, I like big-endian myself). I'm just saying be prepared for pushback on this.

I understand, but it isn't really a significant issue.

And yet, in ULFI section, you have people parsing

a.b:c SAME AS c:a.b
a:b+c:d SAME AS a:b:b.c:d

and you say "text-based format would be much more difficult for the client
to parse" with a straight face?

It is not generally necessary for implementations to compare ULFI for
equality. If you are looking for a piece with a specific name then you
will find it. (This is also true if you are looking for multiple names.)

It is possible that there are multiple names that an implementation will recognize, with different meanings in each case (it is also possible that
it will treat multiples with the same meanings), and it might define the priorities to decide which one to use (or use them together if they can).

As one example where it might use multiples at once, a EPUB file is also
a ZIP file, and you can easily specify both, so that an implementation that
can open ZIP archives but not EPUB can still display the ZIP archive (there might also be some command for the user to select explicitly which one to
use if both are implemented, but usually the implementation would make one
of them to have priority). MIME does have such a mechanism too, but it
seems to be just "added on" and is not a clean way to do it, in my opinion.

Another alternative than MIME is UTI (used by Apple), which can specify
that a type conforms one or more other types. However, this has its own problems, such as you will need all of the definitions in order to compare them, and there are no parameters, and it will always be required to be
exactly one that conforms with one or more others (doing it this way is sometimes wrong; e.g. a PostScript file can be text or binary and can be considered as a document or as a program).

From arguments I've seen about binary-data in otherwise text documents
[3], if it's can't be done in an existing editor, it's a non-starter.

It is only because the existing editor is not written yet; people can try to
do so if you like to do. A converter program does already exist, so that is another way to be done.

Tenth: What is the purpose of ".special/conversion"? What file formats >> to what file formats?

Any file formats to any file formats.

Why is that a part of a *protocol* specification?

It is both the protocol specification and file format specification.

[3] Whenever CSV (Comma separated values) files come up on Hacker News
or Lobste.rs, inevitably, someone will mention that ASCII defines
four explicit separator characters, FS (File Separator), GS (Group
Separator), RS (Record Separator) and US (Unit Separator) and the
use of those will fix most problems with CSV. The pushback comes
when opponents of ASCII separators claim a file that uses such
characters can't be edited in a normal text editor so STFU! It's so
bad that people who push for TSV (Tab separated values) will get
pushback for the (ab)use of tabs in a text file.

I am aware of this, and I am one of the people who have suggested the use
of ASCII separated values.

--
Don't laugh at the moon when it is day time in France.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (0 / 16)
Uptime:	169:56:29
Calls:	10,385
Calls today:	2
Files:	14,057
Messages:	6,416,555

Request for comments: Scorpion protocol/file-format

Who's Online

Recent Visitors

System Info