• Request for comments: Scorpion protocol/file-format

    From news@zzo38computer.org.invalid@21:1/5 to All on Sun Apr 7 18:04:52 2024
    XPost: comp.protocols.misc

    I would like to see what other people's criticism of Scorpion protocol
    and file format that I had made up. It is alternative than HTTP/HTML,
    Gemini, Gopher, Spartan, etc.

    Note that it won't (and is not intended to) replace any of those; you can
    even link between them easily (which is intentional). (Gopher requires the
    use of a hack to handle this properly, but nevertheless it works OK.)

    You can access the specification document by:
    echo 'R scorpion://zzo38computer.org/specification.txt' | nc zzo38computer.org 1517 | less

    Alternatively, it can be accessed by GitHub:
    https://github.com/zzo38/scorpion/blob/trunk/Specification

    The document is possible to be changed in future, in case something is
    wrong with it (including if something is missing). (I can also add more
    FAQ entries if you have other frequent questions, too.)

    If you want to criticize this, then in addition to the above document,
    you should also be famililar with section 7 of the Gemini protocol FAQ:
    echo 'gemini://geminiprotocol.net/docs/faq-section-7.gmi' | ncat --ssl geminiprotocol.net 1965 | less

    My process is different from that described in the Gemini FAQ in many ways, although there are some similarities, and much of what is described there
    is still relevant to what I am doing.

    Scorpion protocol/file-format is not intended to be a strict subset or
    strict superset of anything else. However, it is intended to be simpler
    and less messy than the alternatives, in many ways.

    --
    Don't laugh at the moon when it is day time in France.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sean@conman.org@21:1/5 to news@zzo38computer.org.invalid on Mon Apr 8 06:42:47 2024
    XPost: comp.protocols.misc

    In comp.infosystems news@zzo38computer.org.invalid wrote:
    I would like to see what other people's criticism of Scorpion protocol
    and file format that I had made up. It is alternative than HTTP/HTML,
    Gemini, Gopher, Spartan, etc.

    My initial response to the specification:

    First, what is ULFI? All I bring up when I search on that is "Upper Limb Functional Index"---I can't seem to locate anything that is close to MIME.
    If you do use TLAs [1] and ETLAs [2], please define it somewhere in the document for those who are unfamiliar with it.

    Second, URL support ... do you expect people to follow RFC-3986?
    RFC-3987? Or the WHATWG living specification?

    Third: On TLS, methinks you underestimate how difficult it is to check
    the first byte of a request is 0x16 and have an existing TLS library take
    over the connection if it is. I'm not saying it's impossible, just more technically difficult than you may think. Have you implemented a server
    that supports both TLS and non-TLS support on the same port?

    Third the second: More TLS---those who like TLS might take offence at support for non-TLS---an attacker can easily MITM [3] requests to force
    non-TLS requests, thus defeating the purpose of TLS in the first place.

    Third the third: There will be a subset of people who hate TLS, and
    demand that you don't use it, but use some other, possibly bespoke,
    encryption system instead. Before taking these people seriously, demand a proof-of-concept and an analysis by real cryptographers before you engage
    with them. It'll save time.

    Third the fourth: What's with the weird SNI support? The client should
    use it, but the server should not? What?

    Third the fifth: What do you mean by "clients SHOULD allow to use the system's DNS services to implement encrypted Client Hello"? And what's with the following? "if implemented, there MUST be an option to disable this feature."

    Fourth: impose a hard limit on clients following redirects. I know from experience that if this isn't mandatory, no one will implement it. Even if
    it is mandatory, some won't implement it, but hopefully it'll be a smaller subset who ignore this.

    Fifth: Some server implementor will hard code a 2147483647 on a 4x reply, which is 69 years. Clients will obviously ignore such a silly request,
    leading to an arms race. Don't bother with a timeout value.

    Sixth: For the sub-protocol I, please use BNF for capability codes. And what's with terminal emulators?

    Seventh: The Hashed URI section---what? You first said relative URLs
    aren't allowed in a request, so is this meant for documents? What does the hash buy you here? And why number the hash algorithms instead of just
    listing their names? This is getting complicated, quickly.

    Eighth: oh, a new document format. Nice. Binary HTML. Even better.
    Big endian---I don't mind, but it's not fasionable among kids today (because Intel won; Motorola lost and get over it Boomer!) and will be complained
    about. And by "nice" I mean "oh god!" You'll get people bitching about not being able to include control data with their favorite editors and besides, you're redefining well defined control codes. You are NOT going to get acceptance of this, or the following database file format.

    Ninth: ".special/crawl"? Really? Not "/robots.txt"? Or "/.wellknown/robots.txt"? Sigh. Even Gemini repurposed "/robots.txt", a
    well known and supported format. But if you insist on a new format, perhaps
    a example (or four) could be included?

    Tenth: What is the purpose of ".special/conversion"? What file formats
    to what file formats?

    Thus ends my initial reaction to the specification.

    -spc

    [1] Three Letter Acronym

    [2] Extended Three Letter Acronym

    [3] Man-in-the-Middle

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From news@zzo38computer.org.invalid@21:1/5 to sean@conman.org on Mon Apr 8 16:06:58 2024
    XPost: comp.protocols.misc

    Thank you for your comments. I will try to respond to them the best that I
    can, and will add whatever is necessary to the FAQ as well as to modify
    other parts of the document as appropriate.

    Some of the changes mentioned below I have done; others I have partially
    done or not added yet. I will continue to work on it later, though.

    sean@conman.org wrote:
    First, what is ULFI? All I bring up when I search on that is "Upper Limb Functional Index"---I can't seem to locate anything that is close to MIME.
    If you do use TLAs [1] and ETLAs [2], please define it somewhere in the document for those who are unfamiliar with it.

    Thank you; I will write about it. (In this context, ULFI is short for "Unordered Labels File Identification".)

    Second, URL support ... do you expect people to follow RFC-3986?
    RFC-3987? Or the WHATWG living specification?

    RFC 3986. (However, the "hashed:" scheme has its own rules.)

    Third: On TLS, methinks you underestimate how difficult it is to check
    the first byte of a request is 0x16 and have an existing TLS library take over the connection if it is. I'm not saying it's impossible, just more technically difficult than you may think. Have you implemented a server
    that supports both TLS and non-TLS support on the same port?

    I thought you could use recv with the MSG_PEEK flag. (However, I did not actually try that (yet). If I am wrong, then you can tell me what is wrong
    with that, please.)

    Third the second: More TLS---those who like TLS might take offence at support for non-TLS---an attacker can easily MITM [3] requests to force non-TLS requests, thus defeating the purpose of TLS in the first place.

    An implementation may allow the user to configure it to not use non-TLS for some (or all) servers. (This is similar than "HTTPS-Everywhere", but it is
    not specific to HTTP(S).)

    Additionally, the client is supposed to display a warning message if a
    redirect from TLS to non-TLS (or vice-versa) occurs.

    I think non-TLS has benefits such as improved simplicity and improved
    energy efficiency. However, sometimes encryption is desirable, so TLS
    is permitted, too.

    Third the third: There will be a subset of people who hate TLS, and
    demand that you don't use it, but use some other, possibly bespoke, encryption system instead. Before taking these people seriously, demand a proof-of-concept and an analysis by real cryptographers before you engage with them. It'll save time.

    I have considered that, and have decided against it (at least for now), for
    the reasons you specify, and for reasons mentioned in the Gemini FAQ (see section 4.5.3). So, for now, it uses TLS.

    Third the fourth: What's with the weird SNI support? The client should use it, but the server should not? What?

    Maybe it is unclear. What I mean is that the server shouldn't require SNI
    since the host name is included in the request anyways.

    However, possibly SNI might be needed for the server to present the proper certificate to the client; if that is the case, then the server may present
    an invalid certificate when the wrong (or no) SNI is used.

    Third the fifth: What do you mean by "clients SHOULD allow to use the system's DNS services to implement encrypted Client Hello"? And what's with the following? "if implemented, there MUST be an option to disable this feature."

    Perhaps my specification is unclear. However, I am not sure how to write it more clearly.

    Fourth: impose a hard limit on clients following redirects. I know from experience that if this isn't mandatory, no one will implement it. Even if it is mandatory, some won't implement it, but hopefully it'll be a smaller subset who ignore this.

    OK. I added it.

    Fifth: Some server implementor will hard code a 2147483647 on a 4x reply, which is 69 years. Clients will obviously ignore such a silly request, leading to an arms race. Don't bother with a timeout value.

    OK, it is a good point. Even in Gemini protocol they suggested removing the time specification in a 4x reply.

    Sixth: For the sub-protocol I, please use BNF for capability codes. And what's with terminal emulators?

    OK, I will add that; it is a good idea.

    Seventh: The Hashed URI section---what? You first said relative URLs aren't allowed in a request, so is this meant for documents? What does the hash buy you here? And why number the hash algorithms instead of just listing their names? This is getting complicated, quickly.

    That is correct that relative URLs aren't allowed in a request, although hashed: URLs are not necessarily relative (although they can be). Anyways,
    it isn't useful to be used in a request (although some servers might allow
    them in proxied requests (if the URL after the comma is absolute), but this
    is generally discouraged).

    Its use is that links to files can specify the hash so that you can verify
    on the client side that the file has not changed (and that spies have not tampered with it, if the source of the hash is trustworthy).

    Eighth: oh, a new document format. Nice. Binary HTML. Even better.
    Big endian---I don't mind, but it's not fasionable among kids today (because Intel won; Motorola lost and get over it Boomer!) and will be complained about. And by "nice" I mean "oh god!" You'll get people bitching about not being able to include control data with their favorite editors and besides, you're redefining well defined control codes. You are NOT going to get acceptance of this, or the following database file format.

    The internet is supposed to big-endian, isn't it? Although I think that small-endian is better (independently of what computers use it), I think
    that it isn't that significant that it is worth violating the convention
    of internet in this way. (Also, uxn is big-endian.)

    A text-based format would be much more difficult for the client to parse, to have to handle difficult escaping and nesting and other stuff like that. A binary format will be simpler, especially a "flat" one such as this one,
    rather than being nested like HTML and XML.

    There are a few possibilities for how to write the document, such as using
    a specialized editor, or using a converter or a static site generator.

    Ninth: ".special/crawl"? Really? Not "/robots.txt"? Or "/.wellknown/robots.txt"? Sigh. Even Gemini repurposed "/robots.txt", a well known and supported format. But if you insist on a new format, perhaps a example (or four) could be included?

    I think that there are problems with the robots.txt format, including a possible confusion of what is mandatory and optional.

    I will add an example because you are correct it is a good idea to do so.
    (I did not add it yet; sorry. I will do so later.)

    Tenth: What is the purpose of ".special/conversion"? What file formats
    to what file formats?

    Any file formats to any file formats.

    --
    Don't laugh at the moon when it is day time in France.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sean@conman.org@21:1/5 to news@zzo38computer.org.invalid on Tue Apr 9 04:06:28 2024
    XPost: comp.protocols.misc

    In comp.infosystems news@zzo38computer.org.invalid wrote:
    Thank you for your comments. I will try to respond to them the best that I can, and will add whatever is necessary to the FAQ as well as to modify
    other parts of the document as appropriate.

    Some of the changes mentioned below I have done; others I have partially
    done or not added yet. I will continue to work on it later, though.

    sean@conman.org wrote:
    First, what is ULFI? All I bring up when I search on that is "Upper Limb >> Functional Index"---I can't seem to locate anything that is close to MIME. >> If you do use TLAs [1] and ETLAs [2], please define it somewhere in the
    document for those who are unfamiliar with it.

    Thank you; I will write about it. (In this context, ULFI is short for "Unordered Labels File Identification".)

    You go into some detail, but not enough to answer "what is this for?" It looks like it's supposed to replace MIME but ... how? There are no
    examples, and a web search only brings up references to unordered lists in HTML.

    Third: On TLS, methinks you underestimate how difficult it is to check
    the first byte of a request is 0x16 and have an existing TLS library take
    over the connection if it is. I'm not saying it's impossible, just more
    technically difficult than you may think. Have you implemented a server
    that supports both TLS and non-TLS support on the same port?

    I thought you could use recv with the MSG_PEEK flag. (However, I did not actually try that (yet). If I am wrong, then you can tell me what is wrong with that, please.)

    You aren't wrong, but uch a method isn't mentioned that much (if at all)
    in most networking tutorials, and if you are going for implementation simplicity (which you haven't explicitly stated) then yes, this is "more technically difficult than you may think." I would try an implemention
    before pushing for this myself. This was never done for HTTP---I wonder
    why?

    Fourth: impose a hard limit on clients following redirects. I know from >> experience that if this isn't mandatory, no one will implement it. Even if >> it is mandatory, some won't implement it, but hopefully it'll be a smaller >> subset who ignore this.

    OK. I added it.

    Not strong enough. In RFC-speak, MUST is stronger (mandatory) than
    SHOULD.

    Sixth: For the sub-protocol I, please use BNF for capability codes. And >> what's with terminal emulators?

    OK, I will add that; it is a good idea.

    You still lack a description of what this is used for.

    Seventh: The Hashed URI section---what? You first said relative URLs
    aren't allowed in a request, so is this meant for documents? What does the >> hash buy you here? And why number the hash algorithms instead of just
    listing their names? This is getting complicated, quickly.

    Its use is that links to files can specify the hash so that you can verify
    on the client side that the file has not changed (and that spies have not tampered with it, if the source of the hash is trustworthy).

    "Spies" that tamper with the file on the server can just as easily tamper with the hash. Tampering in transit is protected though.

    Eighth: oh, a new document format. Nice. Binary HTML. Even better.
    Big endian---I don't mind, but it's not fasionable among kids today (because >> Intel won; Motorola lost and get over it Boomer!) and will be complained
    about. And by "nice" I mean "oh god!" You'll get people bitching about not >> being able to include control data with their favorite editors and besides, >> you're redefining well defined control codes. You are NOT going to get
    acceptance of this, or the following database file format.

    The internet is supposed to big-endian, isn't it? Although I think that small-endian is better (independently of what computers use it), I think
    that it isn't that significant that it is worth violating the convention
    of internet in this way. (Also, uxn is big-endian.)

    Yes, most networking protocols for the Internet are big-endian, but man,
    do people complain about it now that Intel won. Besides, there are file formats, like ZIP files, that are little-endian in nature. I'm not arguing
    for little-endian (like I said, I like big-endian myself). I'm just saying
    be prepared for pushback on this.

    A text-based format would be much more difficult for the client to parse, to

    And yet, in ULFI section, you have people parsing

    a.b:c SAME AS c:a.b
    a:b+c:d SAME AS a:b:b.c:d

    and you say "text-based format would be much more difficult for the client
    to parse" with a straight face?

    have to handle difficult escaping and nesting and other stuff like that. A binary format will be simpler, especially a "flat" one such as this one, rather than being nested like HTML and XML.

    One of the big complaints about text/gemini is the lack of nested lists.

    There are a few possibilities for how to write the document, such as using
    a specialized editor, or using a converter or a static site generator.

    From arguments I've seen about binary-data in otherwise text documents
    [3], if it's can't be done in an existing editor, it's a non-starter.

    Tenth: What is the purpose of ".special/conversion"? What file formats >> to what file formats?

    Any file formats to any file formats.

    Why is that a part of a *protocol* specification?

    -spc

    [1] Three Letter Acronym

    [2] Extended Three Letter Acronym

    [3] Whenever CSV (Comma separated values) files come up on Hacker News
    or Lobste.rs, inevitably, someone will mention that ASCII defines
    four explicit separator characters, FS (File Separator), GS (Group
    Separator), RS (Record Separator) and US (Unit Separator) and the
    use of those will fix most problems with CSV. The pushback comes
    when opponents of ASCII separators claim a file that uses such
    characters can't be edited in a normal text editor so STFU! It's so
    bad that people who push for TSV (Tab separated values) will get
    pushback for the (ab)use of tabs in a text file.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From news@zzo38computer.org.invalid@21:1/5 to sean@conman.org on Wed Apr 10 16:01:25 2024
    XPost: comp.protocols.misc

    sean@conman.org wrote:
    Thank you; I will write about it. (In this context, ULFI is short for "Unordered Labels File Identification".)

    You go into some detail, but not enough to answer "what is this for?" It looks like it's supposed to replace MIME but ... how?

    I explained more below, because you had written another comment below.

    I thought you could use recv with the MSG_PEEK flag. (However, I did not actually try that (yet). If I am wrong, then you can tell me what is wrong with that, please.)

    You aren't wrong, but uch a method isn't mentioned that much (if at all)
    in most networking tutorials, and if you are going for implementation simplicity (which you haven't explicitly stated) then yes, this is "more technically difficult than you may think." I would try an implemention before pushing for this myself. This was never done for HTTP---I wonder
    why?

    Implementation simplicity is more important for mandatory parts than for optional parts. Of course TLS is itself complicated, which is one of the reasons for being made optional (although there are other reasons too).

    I will try the implementation; so far I have not implemented TLS in the
    server side at all. However, this will require more changes just to make
    it work with TLS at all, so itmight take a while before I will manage to implement this. (Other people are free to write their own implementations,
    and if they want to implement TLS, then they will try this instead.)

    (I have once accessed a HTTPS server that does support this actually,
    although after sending an unencrypted HTTP request on port 443, I received
    a valid HTTP response but it was just an error message that says that unencrypted requests on port 443 are not allowed.)

    Fourth: impose a hard limit on clients following redirects ...
    OK. I added it.

    Not strong enough. In RFC-speak, MUST is stronger (mandatory) than
    SHOULD.

    It says:
    If the number of consecutive redirects exceed the limit (which MUST be
    not more than five by default, although it may be configurable by the
    user), then the client MUST NOT automatically follow further redirects.

    It does say MUST.

    Sixth: For the sub-protocol I ...
    OK, I will add that; it is a good idea.
    You still lack a description of what this is used for.

    The document does explain what it is used for. (If it is unclear, then hopefully someone who can explain it better, is able to do so.)

    Its use is that links to files can specify the hash so that you can verify on the client side that the file has not changed (and that spies have not tampered with it, if the source of the hash is trustworthy).

    "Spies" that tamper with the file on the server can just as easily tamper with the hash. Tampering in transit is protected though.

    That depends on where (and when) you got the hash from.

    (The hash is useful for other purposes too, such as for caching, for finding another copy of the file (if someone has it indexed by its hash then you can verify that it is correct), verifying that if you linked to a file that the file has not been changed since then, etc.)

    TLS does not prevent the server operator from changing the files to
    malicious ones, nor does it prevent some other stuff; TLS does not (and
    cannot) solve everything.

    Yes, most networking protocols for the Internet are big-endian, but man,
    do people complain about it now that Intel won. Besides, there are file formats, like ZIP files, that are little-endian in nature. I'm not arguing for little-endian (like I said, I like big-endian myself). I'm just saying be prepared for pushback on this.

    I understand, but it isn't really a significant issue.

    And yet, in ULFI section, you have people parsing

    a.b:c SAME AS c:a.b
    a:b+c:d SAME AS a:b:b.c:d

    and you say "text-based format would be much more difficult for the client
    to parse" with a straight face?

    It is not generally necessary for implementations to compare ULFI for
    equality. If you are looking for a piece with a specific name then you
    will find it. (This is also true if you are looking for multiple names.)

    It is possible that there are multiple names that an implementation will recognize, with different meanings in each case (it is also possible that
    it will treat multiples with the same meanings), and it might define the priorities to decide which one to use (or use them together if they can).

    As one example where it might use multiples at once, a EPUB file is also
    a ZIP file, and you can easily specify both, so that an implementation that
    can open ZIP archives but not EPUB can still display the ZIP archive (there might also be some command for the user to select explicitly which one to
    use if both are implemented, but usually the implementation would make one
    of them to have priority). MIME does have such a mechanism too, but it
    seems to be just "added on" and is not a clean way to do it, in my opinion.

    Another alternative than MIME is UTI (used by Apple), which can specify
    that a type conforms one or more other types. However, this has its own problems, such as you will need all of the definitions in order to compare them, and there are no parameters, and it will always be required to be
    exactly one that conforms with one or more others (doing it this way is sometimes wrong; e.g. a PostScript file can be text or binary and can be considered as a document or as a program).

    From arguments I've seen about binary-data in otherwise text documents
    [3], if it's can't be done in an existing editor, it's a non-starter.

    It is only because the existing editor is not written yet; people can try to
    do so if you like to do. A converter program does already exist, so that is another way to be done.

    Tenth: What is the purpose of ".special/conversion"? What file formats >> to what file formats?

    Any file formats to any file formats.

    Why is that a part of a *protocol* specification?

    It is both the protocol specification and file format specification.

    [3] Whenever CSV (Comma separated values) files come up on Hacker News
    or Lobste.rs, inevitably, someone will mention that ASCII defines
    four explicit separator characters, FS (File Separator), GS (Group
    Separator), RS (Record Separator) and US (Unit Separator) and the
    use of those will fix most problems with CSV. The pushback comes
    when opponents of ASCII separators claim a file that uses such
    characters can't be edited in a normal text editor so STFU! It's so
    bad that people who push for TSV (Tab separated values) will get
    pushback for the (ab)use of tabs in a text file.

    I am aware of this, and I am one of the people who have suggested the use
    of ASCII separated values.

    --
    Don't laugh at the moon when it is day time in France.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)