• Re: web

    From Computer Nerd Kev@21:1/5 to Ivan Shmakov on Fri Jan 17 07:10:03 2025
    In comp.misc Ivan Shmakov <ivan@siamics.netremove.invalid> wrote:
    On 2025-01-12, Bozo User wrote:
    Once you get a Gopher/Gemini browser, among yt-dlp, the web can go away.

    While I do appreciate the availability of yt-dlp, I feel like
    a huge part of the reason Chromium is huge is so it can support
    Youtube. Granted, there doesn't seem to be as many DSAs for
    video software (codecs and players) [1], but it's still the
    kind of software I'd rather keep at least in a container.

    You fear that a hacker can upload a YouTube video containing an
    exploit and manage to pass that exploit through YouTube's
    transcoding in order to attack Linux video player programs? Seems
    like a big stretch to me.

    By the by, what's the equivalent of wget(1) for gopher:?

    Curl supports Gopher. Not Gemini though.

    --
    __ __
    #_ < |\| |< _#

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From yeti@21:1/5 to Computer Nerd Kev on Fri Jan 17 04:58:19 2025
    not@telling.you.invalid (Computer Nerd Kev) wrote:

    Curl supports Gopher. Not Gemini though.

    Ncat and Netcat (check the existence of '-c' and '-T') can fetch stuff
    from Gemini servers:

    ------------------------------------------------------------------------
    $ printf 'gemini://geminiprotocol.net/\r\n' \
    | ncat --ssl geminiprotocol.net 1965 | less ------------------------------------------------------------------------

    ------------------------------------------------------------------------
    $ printf 'gemini://geminiprotocol.net/\r\n' \
    | nc -c -T noverify geminiprotocol.net 1965 | less ------------------------------------------------------------------------

    Wrapping that in some hands full of AWK to find links and iterate over
    them should not require deep magic.

    Some browsers capable of accessing gemini: can save the fetched files'
    and gemini pages' source, maybe they would even be easier to integrate
    in own scripts?

    TL;DR: There is no showstopper.

    --
    Trust me, I know what I'm doing...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ivan Shmakov@21:1/5 to All on Sat Jan 18 14:05:40 2025
    On 2025-01-16, Computer Nerd Kev wrote:
    In comp.misc Ivan Shmakov wrote:
    On 2025-01-12, Bozo User wrote:

    Once you get a Gopher/Gemini browser, among yt-dlp, the web can go away.

    I. e., my point being: you can't escape web by switching to
    Gopher, because Gopher /is/ web. (Even if 'darker' part of it.)

    While I do appreciate the availability of yt-dlp, I feel like a
    huge part of the reason Chromium is huge is so it can support
    Youtube. Granted, there doesn't seem to be as many DSAs for video
    software (codecs and players), but it's still the kind of software
    I'd rather keep at least in a container.

    You fear that a hacker can upload a YouTube video containing
    an exploit and manage to pass that exploit through YouTube's
    transcoding in order to attack Linux video player programs?
    Seems like a big stretch to me.

    I'm not familiar with how Youtube processes its videos; I've
    never even uploaded anything there myself, much less looked at
    their sources for security issues that might or might not be
    there.

    (I do have experience with Wikimedia Commons, and I'm reasonably
    certain that while they offer processed versions of the user
    uploads, they still keep the originals in publicly accessible
    locations on their servers. Why, I distinctly recall uploading
    a fixed version of someone else's malformed SVG file there.)

    Neither do I have any idea how opposed they would be to requests
    from companies to introduce such security issues deliberately.
    (I believe such hypothetical business entities are usually
    referred to as "MAFIAA" in colloquial speech, but I can't help
    but note that the company that pioneered the approach was in
    fact Sony [1].)

    [1] http://duckduckgo.com/html/?kd=-1&q="sony"+rootkit+controversy

    And even were I to believe for videos downloaded from Youtube
    to never ever have any potential security flaw whatsoever,
    having two copies of video player software installed, one
    within and one without container, would still be ill-advised,
    if only for the reason that I might use an out-of-container
    install for a potentially unsafe, non-Youtube video by accident.

    By the by, what's the equivalent of wget(1) for gopher:?

    Curl supports Gopher. Not Gemini though.

    Curl is my tool of choice for doing API calls; say (JFTR, [2]
    has a couple of complete examples):

    $ curl -iv --form-string comment="New file." \
    -F file=@my.jpeg -F text=\</dev/fd/5 5< my.jpeg.mw \
    --form-string filesize="$(wc -c < my.jpeg)" \
    --form-string token="1337cafe+\\" \
    ... -- https://commons.wikimedia.org/w/api.php\ "?action=upload&format=xml&assert=user"

    [2] http://am-1.org/~ivan/src/examples-2024/webwatch.mk

    However, I distinctly recall finding it inadequate as a mirroring
    tool back in the day. (Though that might've changed meanwhile.)

    And similarly for yeti's comment in [3]: I try to share what I know
    with others. Such as on IRC. So, suppose someone asks on IRC,
    "how do I get an offline copy of gopher://example.com/?"

    "You can easily write your own Gopher / Gemini recursive
    downloader yourself" is not something I'd be comfortable giving
    as an answer, TBH. (Though I /would/ be comfortable with
    providing assistance if someone explicitly asks for help with
    writing one in the first place.)

    [3] news:874j1yyt0s.fsf@tilde.institute

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Computer Nerd Kev@21:1/5 to Ivan Shmakov on Sun Jan 19 09:09:15 2025
    Ivan Shmakov <ivan@siamics.netremove.invalid> wrote:
    On 2025-01-16, Computer Nerd Kev wrote:
    In comp.misc Ivan Shmakov wrote:
    On 2025-01-12, Bozo User wrote:

    Once you get a Gopher/Gemini browser, among yt-dlp, the web can go away.

    I. e., my point being: you can't escape web by switching to
    Gopher, because Gopher /is/ web. (Even if 'darker' part of it.)

    While I do appreciate the availability of yt-dlp, I feel like a
    huge part of the reason Chromium is huge is so it can support
    Youtube. Granted, there doesn't seem to be as many DSAs for video
    software (codecs and players), but it's still the kind of software
    I'd rather keep at least in a container.

    You fear that a hacker can upload a YouTube video containing
    an exploit and manage to pass that exploit through YouTube's
    transcoding in order to attack Linux video player programs?
    Seems like a big stretch to me.

    I'm not familiar with how Youtube processes its videos; I've
    never even uploaded anything there myself, much less looked at
    their sources for security issues that might or might not be
    there.

    The files I download from YouTube always contain the metadata
    string (in both audio and video streams):
    "ISO Media file produced by Google Inc."

    But I always use the lowest quality option.

    By the by, what's the equivalent of wget(1) for gopher:?

    Curl supports Gopher. Not Gemini though.

    Curl is my tool of choice for doing API calls; say (JFTR, [2]
    has a couple of complete examples):

    $ curl -iv --form-string comment="New file." \
    -F file=@my.jpeg -F text=\</dev/fd/5 5< my.jpeg.mw \
    --form-string filesize="$(wc -c < my.jpeg)" \
    --form-string token="1337cafe+\\" \
    ... -- https://commons.wikimedia.org/w/api.php\ "?action=upload&format=xml&assert=user"

    [2] http://am-1.org/~ivan/src/examples-2024/webwatch.mk

    However, I distinctly recall finding it inadequate as a mirroring
    tool back in the day. (Though that might've changed meanwhile.)

    That's true, Curl doesn't do mirroring. Command-line options may
    exist, but one I'm aware of is that the Gopherus Gopher client
    since version 1.2 has the feature "all files from current folder
    can be downloaded by pressing F10". Not comparable to Wget's
    recursive mode, but enough for some tasks.

    --
    __ __
    #_ < |\| |< _#

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Collver@21:1/5 to Ivan Shmakov on Sun Jan 19 14:47:24 2025
    On 2025-01-18, Ivan Shmakov <ivan@siamics.netREMOVE.invalid> wrote:

    I. e., my point being: you can't escape web by switching to
    Gopher, because Gopher /is/ web. (Even if 'darker' part of it.)

    I realize that the distinction between the web and the Internet can be confusing. At one point Microsoft labeled the web browser desktop icon
    "The Internet". At another point Gmail made it mainstream to do email
    in a web browser.

    <https://www.getmyos.com/upload/files/2018/10/05/ windows_95_screenshot_1_1_bedc52f3b61686c533b5b318405508a6.png>

    Below is a link explaining the difference between the web and the
    Internet.

    <https://askleo.com/whats-the-difference-between-the-web-and-
    the-internet/>

    In short, gopher is not the web. It does not use the HTTP protocol, the
    HTML format, nor other web standards such as Javascript. Gopher is a
    separate protocol that is not directly viewable in mainstream browsers
    such as Chrome and Mozilla.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ivan Shmakov@21:1/5 to All on Sun Jan 19 19:15:29 2025
    On 2025-01-19, yeti wrote:
    Ben Collver <bencollver@tilde.pink> wrote:

    Newsgroups: comp.infosystems,comp.misc

    I took the liberty to disregard the crosspost.

    In short, gopher is not the web. It does not use the HTTP protocol,
    the HTML format, nor other web standards such as Javascript. Gopher
    is a separate protocol that is not directly viewable in mainstream
    browsers such as Chrome and Mozilla.

    Gopher resources are indeed not directly viewable in /modern/
    browsers, so I can agree they're not part of /modern/ web.

    From where I stand, they're still part of the web at large.

    As an aside, who decides what is or is not a /web/ standard?
    If the suggestion is to only consider official W3C TRs as
    "web standards proper" then, well, HTML is currently maintained
    by WHATWG, not W3C; and HTTP/1.1 is IETF RFC 9112 / IETF STD 99.

    I contradict.

    When browsers appeared, we thought of the web as what was accessible
    by them. FTP, HTTP and Gopher were among this in the early days.

    Gopher is not the web. Yes.

    HTTP is not the web!

    They just are part of the web.

    Today's big$$$-browsers converge to single protocol network file
    viewers and unluckily the smallweb browsers do too.

    That's how I see it as well. I've been using Lynx for over
    two decades now, and I have no trouble using it to read HTML
    documents (local or delivered over HTTP/1; provided, of course,
    they are documents, rather than Javascript programs wrapped
    in HTML, as is not uncommon today), gopherholes, or Usenet
    articles (such as news:87cygivmy3.fsf@tilde.institute I'm
    responding to.) It "just works."

    I have no trouble understanding the difference between the web
    proper and Internet as the technology it relies upon, either.

    DNS is not web because even though it's essential for the web
    as it is today to work, you can't point your browser, modern or
    otherwise, to a DNS server, a DNS zone, or even an individual DNS
    resource record (even though your browser /will/ request one from
    your local recursive resolver, or its DNS-over-HTTP equivalent,
    when you point it to a URL with a DNS name in that, be that
    http://example.net/ or nntp://news.example.com/comp.misc .)

    NTP is not web for much the same reason: there're no URIs for NTP
    servers or individual NTP packets. Neither are there URIs for
    currently active TCP connections or UDP datagrams or IP hosts.

    There /are/ URIs for email mailboxes (mailto:jsmith@example.net)
    to send mail to, and phone numbers (tel:) to call, though.

    To summarize, from a purely practical PoV, if you can access it
    from /your/ browser, it is part of /your/ web. From a conceptual
    PoV, I'd define "web" as a collection of interlinked resources
    identified by their URIs. So, if it has an URI and that URI is
    mentioned somewhere on the web, it's part of the web too.

    Modern web is important because that's often where the people
    you can talk to are. But non-modern portions of the web could
    be just as important, especially if it's where most of the
    people you /actually/ talk to are. Such as news:comp.misc .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Collver@21:1/5 to yeti on Mon Jan 20 15:37:07 2025
    On 2025-01-19, yeti <yeti@tilde.institute> wrote:
    Ben Collver <bencollver@tilde.pink> wrote:

    In short, gopher is not the web. It does not use the HTTP protocol,
    the HTML format, nor other web standards such as Javascript. Gopher
    is a separate protocol that is not directly viewable in mainstream
    browsers such as Chrome and Mozilla.

    I contradict.

    When browsers appeared, we thought of the web as what was accessible
    by them. FTP, HTTP and Gopher were among this in the early days.

    In the dawn of the Internet some people used a service called FTPmail
    because it could be faster and cheaper to transfer data over email
    than over direct Internet connections. By your logic, one could argue
    that FTP is email because it was historically used in email clients.
    One could also argue that because when browsers appeared, they could
    view HTML content over the Server Message Block protocol, that CIFS
    is also the web. Such arguments strike me as disingenuous.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ivan Shmakov@21:1/5 to All on Fri Jan 24 18:45:05 2025
    On 2025-01-20, Ben Collver wrote:
    On 2025-01-19, yeti <yeti@tilde.institute> wrote:
    Ben Collver <bencollver@tilde.pink> wrote:

    In short, gopher is not the web. It does not use the HTTP protocol,
    the HTML format, nor other web standards such as Javascript. Gopher
    is a separate protocol that is not directly viewable in mainstream
    browsers such as Chrome and Mozilla.

    There's a variety of formats that modern browsers allow viewing
    directly, in addition to (X)HTML. Such as WebM; e. g. (URI split
    for readability; tr -d \\n before use):

    http://upload.wikimedia.org/wikipedia/commons/2/22/ %C2%AB%D0%9C%D0%B0%D1%81%D1%82%D0%B5%D1%80 -%D0%A2%D1%83%D0%BD%D0%BA%D0%B0%C2%BB _%D0%BE%D1%82%D0%BA%D1%80%D1%8B%D0%B2%D0%B0%D0%B5%D1%82%D1%81%D1%8F%2C _2020-011_092050.webm

    Given the lack of hyperlinking in WebM, I'd hesitate to call
    such a file a "webpage." SVG does support hyperlinks, however,
    so I don't see much reason myself to be opposed to SVG webpages.

    I contradict.

    When browsers appeared, we thought of the web as what was accessible
    by them. FTP, HTTP and Gopher were among this in the early days.

    For instance, per http://en.wikipedia.org/wiki/NCSA_Mosaic :

    Mosaic is based on the libwww library and thus supported a wide
    variety of Internet protocols included in the library: Archie, FTP,
    gopher, HTTP, NNTP, telnet, WAIS.

    My understanding is that Lynx' retains the libwww codebase to
    this day, hence its support for a variety of web protocols well
    beyond the modern notion of HTTP(S)-only web.

    Call me old-fashioned, but my understanding of what "web" is
    /is/ heavily influenced by the example of Mosaic.

    In the dawn of the Internet some people used a service called FTPmail because it could be faster and cheaper to transfer data over email
    than over direct Internet connections. By your logic, one could argue
    that FTP is email because it was historically used in email clients.

    What I think you're referring to falls under the concept of a
    /gateway./ There used to be servers that you'd send a web URI
    via email to, get it downloaded by a batch web client (such as
    Wget) there, and get the result delivered to you in a response
    email. Possibly over a cheaper, high-latency link, such as UUCP.

    (Wouldn't make as much sense to request a JPEG this way, only
    to download it later over POP3 over SLIP, Base64 and all, now
    would it?)

    It's not dissimilar to how one can read netnews articles via
    http://al.howardknight.net/ . By itself, that doesn't make
    netnews a part of web, nor does it make HTTP a netnews protocol
    (even if it /is/ used in this case for netnews transmission.)

    Also, "email client" is a misnomer. An email user agent
    would commonly act as /two/ clients: an ESMTP client for mail
    submission, and, say, an IMAP client for mailbox access.

    A modern MUA, such as Thunderbird, would also embed a web browser
    so it can display HTML parts in email /as well as/ images
    referenced in those parts, including those that need retrieval
    over HTTP. Hence HTTP client being /also/ part of the so-called
    "email client." (Even though its use would typically be disabled
    for privacy reasons.)

    Conversely, a traditional MUA, such as BSD mailx(1), would
    contain /no/ network client code within at all, relying instead
    on system facilities, such as the conventional sendmail(1) MTA
    entrypoint. (Or a program like esmtp(1) posing as one.)
    And (or) a program like fetchmail(1) or mbsync(1).

    Curiously enough, email transmission between hosts was
    originally implemented on top of the FTP protocol; consider, e. g.:

    rfc475> This paper describes my understanding of the results of the
    rfc475> Network Mail System meeting SRI-ARC on February 23, 1973, and
    rfc475> the implications for FTP (File Transfer Protocol). There was
    rfc475> general agreement at the meeting that network mail function
    rfc475> should be within FTP.

    rfc475> FTP currently provides two commands for handling mail. The MAIL
    rfc475> command allows a user to send mail via the TELNET connection
    rfc475> (the server collects the mail and determines its end by
    rfc475> searching for the character sequence "CRLF.CRLF"). The MLFL
    rfc475> (mail file) command allows a user to send mail via the data
    rfc475> connection (requires a user-FTP to handle the command but
    rfc475> transfer is more efficient as server need not search for
    rfc475> a special character sequence). [...]

    Not only this predates the transition from Transmission Control
    /Program/ ("IPv3") to Transmission Control Protocol + Internet
    Protocol (TCP/IPv4), but apparently even the first (?) formal
    specification of the former in 1974:

    rfc-index> 0675 Specification of Internet Transmission Control Program.
    rfc-index> V. Cerf, Y. Dalal, C. Sunshine. December 1974.
    rfc-index> (Obsoleted by RFC7805) (Status: HISTORIC)
    rfc-index> (DOI: 10.17487/RFC0675)

    The dawn of the Internet, indeed.

    One could also argue that because when browsers appeared, they could
    view HTML content over the Server Message Block protocol, that CIFS
    is also the web. Such arguments strike me as disingenuous.

    I'm not aware of such browsers, aside of the fact that some
    Windows-based ones have allowed \\host\path syntax in place
    of proper URLs. I doubt that aside of the syntax, the browser
    had any SMB/CIFS client code within itself, however.

    So far in this thread, I see two possible definitions of the
    web: one I've suggested that boils down to "documents with
    hyperlinks based on URI syntax and semantics", and the other,
    that to me sounds like "what Google says." (I don't see Mozilla
    as a major driving force behind the web this day and age.)

    And I /do/ understand why Google would push for HTTP(S)-only
    web (even with "HTTP" now being expanded to mean /three/ similar
    in concept, but otherwise mutually incompatible protocols.)
    And I won't envy any Google manager who'll have to explain to
    the investors a decision that lowers the profits in the short
    term, and hardly promises any tangible benefits later, such as
    the decision to add (and take responsibility maintaining) a
    Gopher client to the browser.

    I do not understand why people outside of Google have to be
    bound by the decisions of their management, however. The web
    browser I use supported Gopher since before Google existed;
    I fail to see why "Google saying so" has to be a sufficient
    reason to at once stop deeming the protocol part of the web.

    As to the definition I've suggested, I could only add the
    requirement for the relevant protocol(s) to have at least two
    independent implementations.

    My understanding is that Gopher does have such implementations.
    No idea about CIFS, but given (if Wikipedia [1] is to be believed)
    that Microsoft has never made good use of it, it sounds doubtful.

    Hence: not web.

    [1] http://en.wikipedia.org/wiki/Server_Message_Block#CIFS

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From candycanearter07@21:1/5 to Computer Nerd Kev on Wed Jan 29 20:10:03 2025
    Computer Nerd Kev <not@telling.you.invalid> wrote at 23:09 this Saturday (GMT):
    Ivan Shmakov <ivan@siamics.netremove.invalid> wrote:
    [snip]
    The files I download from YouTube always contain the metadata
    string (in both audio and video streams):
    "ISO Media file produced by Google Inc."


    Weird. I think yt-dlp has an option to overwrite the metadata with info
    about the video itself?
    --
    user <candycane> is generated from /dev/urandom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to All on Tue Feb 4 21:42:08 2025
    On Wed, 29 Jan 2025 20:10:03 -0000 (UTC), candycanearter07 wrote:

    Computer Nerd Kev <not@telling.you.invalid> wrote at 23:09 this Saturday (GMT):

    The files I download from YouTube always contain the metadata string
    (in both audio and video streams):
    "ISO Media file produced by Google Inc."

    Weird. I think yt-dlp has an option to overwrite the metadata with info
    about the video itself?

    This *is* info about the video itself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From anthk@21:1/5 to yeti on Sat Mar 22 21:52:32 2025
    On 2025-01-17, yeti <yeti@tilde.institute> wrote:
    not@telling.you.invalid (Computer Nerd Kev) wrote:

    Curl supports Gopher. Not Gemini though.

    Ncat and Netcat (check the existence of '-c' and '-T') can fetch stuff
    from Gemini servers:

    ------------------------------------------------------------------------
    $ printf 'gemini://geminiprotocol.net/\r\n' \
    | ncat --ssl geminiprotocol.net 1965 | less
    ------------------------------------------------------------------------

    ------------------------------------------------------------------------
    $ printf 'gemini://geminiprotocol.net/\r\n' \
    | nc -c -T noverify geminiprotocol.net 1965 | less
    ------------------------------------------------------------------------

    Wrapping that in some hands full of AWK to find links and iterate over
    them should not require deep magic.

    Some browsers capable of accessing gemini: can save the fetched files'
    and gemini pages' source, maybe they would even be easier to integrate
    in own scripts?

    TL;DR: There is no showstopper.


    gem.awk (a gemini client written with gawk+openssl) works like that

    I expanded it with some nice features

    another one I'd like it's one to batch-dl a full phlog, easy to do
    with basename, mkdir -p and a for loop iterating the array of links.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)