• webecv4 questions

    From Hemo@1:103/705 to Al on Wed Apr 1 18:39:33 2020
    Re: webecv4 questions
    By: Al to Hemo on Wed Apr 01 2020
    02:28 pm

    I've looked for a while but my google-foo is failing me.

    I am wanting to have the BBS web pages present, but not allow anyone
    to browse the message areas unless logged in. Perhaps allow one or two
    areas like a local/main, if possible. I want to shutdown the network
    areas from being web crawling/indexing targets.

    You can stop the web crawlers with your robots.txt.

    I'm not sure but I think the default robots.txt that comes with
    Synchronet
    will do this. My own robots.txt looks like this..

    User-agent: *
    Disallow: /bbbs


    I've got this:
    User-agent: *
    Disallow: /

    Its not stopping things taht are not identifying as a crawler. I think. I think a legitimate crawler starts by looking for the robots.txt file, I see some of those too.

    Here snips of what I see in the log:

    Apr 1 12:31:32 bbs synchronet: web 0045 HTTP connection accepted from: 52.82.96.27 port 49946
    Apr 1 12:31:32 bbs synchronet: web 0045 Hostname: ec2-52-82-96-27.cn-northwest-1.compute.amazonaws.com.cn [52.82.96.27]
    Apr 1 12:31:32 bbs synchronet: web 0045 Request: GET /api/files.ssjs?call=download-file&dir=sndmodv1mod_hl&file=INFLNCIA.MOD HTTP/1.1
    Apr 1 12:31:32 bbs synchronet: web 0045 Unable to send to peer
    Apr 1 12:31:32 bbs synchronet: web 0045 Sending file: /sbbs/tmp/SBBS_SSJS.31685.45.html (0 bytes)
    Apr 1 12:31:33 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 219 served)
    Apr 1 12:32:16 bbs synchronet: web 0045 HTTPS connection accepted from: 111.225.148.163 port 55238
    Apr 1 12:32:17 bbs synchronet: web 0045 Hostname: bytespider-111-225-148-163.crawl.bytedance.com [111.225.148.163]
    Apr 1 12:32:17 bbs synchronet: web 0045 Request: GET /robots.txt HTTP/1.1
    Apr 1 12:32:17 bbs synchronet: web 0045 Sending file: /sbbs/webv4/root/robots.txt (2076 bytes)
    Apr 1 12:32:17 bbs synchronet: web 0045 Sent file: /sbbs/webv4/root/robots.txt (2076 bytes)
    Apr 1 12:32:18 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 220 served)
    Apr 1 12:32:58 bbs synchronet: web 0045 HTTP connection accepted from: 111.225.148.177 port 46388
    Apr 1 12:32:58 bbs synchronet: web 0045 Hostname: bytespider-111-225-148-177.crawl.bytedance.com [111.225.148.177]
    Apr 1 12:32:58 bbs synchronet: web 0045 Request: GET /robots.txt HTTP/1.1
    Apr 1 12:32:58 bbs synchronet: web 0045 Sending file: /sbbs/webv4/root/robots.txt (2076 bytes)
    Apr 1 12:32:58 bbs synchronet: web 0045 Sent file: /sbbs/webv4/root/robots.txt (2076 bytes)
    Apr 1 12:32:59 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 221 served)
    Apr 1 12:33:42 bbs synchronet: web 0045 HTTPS connection accepted from: 52.83.249.124 port 52734
    Apr 1 12:33:42 bbs synchronet: web 0045 Hostname: ec2-52-83-249-124.cn-northwest-1.compute.amazonaws.com.cn [52.83.249.124]
    Apr 1 12:33:43 bbs synchronet: web 0045 Request: GET /api/files.ssjs?call=download-file&dir=st20s92msdosc&file=CNEWS003.ARC HTTP/1.1
    Apr 1 12:33:43 bbs synchronet: web 0045 Sending file: /sbbs/tmp/SBBS_SSJS.31685.45.html (0 bytes)
    Apr 1 12:33:44 bbs synchronet: web 0045 Session thread terminated (0 clients, 3 threads remain, 222 served)


    every minute or so, something comes in and goes directly to a specific file and tries to download it. Most of these seem to come from cn-northwest-1.compute.amazonaws.com.cn
    --
    H

    ... It is impossible to please the whole world and your mother-in-law.

    ---
    þ Synchronet þ - Running madly into the wind and screaming - bbs.ujoint.org
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Hemo@1:103/705 to poindexter FORTRAN on Wed Apr 1 18:49:38 2020
    Re: webecv4 questions
    By: poindexter FORTRAN to Hemo on
    Wed Apr 01 2020 03:21 pm

    Re: webecv4 questions
    By: Hemo to All on Wed Apr 01 2020 03:45 pm

    I am wanting to have the BBS web pages present, but not allow anyone
    to browse the message areas unless logged in. Perhaps allow one or
    two areas like a local/main, if possible. I want to shutdown the
    network areas from being web crawling/indexing targets.

    The security levels of the groups determine what can be seen on the web. The guest user's security level controls what un-authenticated users can see from the web.

    Boom. that was it, thank you. I wasn't picking up that the 'non-logged-in' access to the web was controlled by the security level of the Guest account and somehow my Guest account got 'validated' ( well.. I'm sure I either did that not realizing the implications or it was an mis-typed key ). Validation cranked up the security level and opened up the forums and files on the web pages to the public.

    I got my Guest account security back where it should be and that sealed up the web portion back to where I wanted it. whew!

    thanks!

    --
    Hemo

    ... I love criticism just so long as it's unqualified praise.

    ---
    þ Synchronet þ - Running madly into the wind and screaming - bbs.ujoint.org
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rampage@1:103/705 to Hemo on Thu Apr 2 07:27:16 2020
    Re: webecv4 questions
    By: Hemo to Al on Wed Apr 01 2020 18:39:33


    Hemo> Its not stopping things taht are not identifying as a crawler. I think.

    robots.txt cannot stop anything... it is only a guide from the site operator to the spider operator indicating the areas the spider is allowed to crawl or not...

    Hemo> I think a legitimate crawler starts by looking for the robots.txt file,
    Hemo> I see some of those too.

    close... robots.txt may or may not be gathered on each visit by a spider... if it is gathered, it may not be taken into account until later visits...


    Hemo> every minute or so, something comes in and goes directly to a specific
    Hemo> file and tries to download it. Most of these seem to come from
    Hemo> cn-northwest-1.compute.amazonaws.com.cn

    look in your /sbbs/data/logs directory for the http logs (if you have them enabled) and you will see a traditional apache-style log format... the last field contains the user agent which will generally tell you if the visitor really is a spider or not... what you're seeing from that amazon cloud domain may be a spider or it may be someone's file getter or possible even an indexer (which is like a spider or crawler)...


    )\/(ark

    ---
    þ Synchronet þ The SouthEast Star Mail HUB - SESTAR
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Hemo@1:103/705 to Rampage on Thu Apr 2 13:17:26 2020
    Re: webecv4 questions
    By: Rampage to Hemo on Thu Apr 02
    2020 07:27 am

    Re: webecv4 questions
    By: Hemo to Al on Wed Apr 01 2020 18:39:33
    Hemo> every minute or so, something comes in and goes directly to a specific Hemo> file and tries to download it. Most of these seem to come from Hemo> cn-northwest-1.compute.amazonaws.com.cn

    look in your /sbbs/data/logs directory for the http logs (if you have
    them
    enabled) and you will see a traditional apache-style log format... the last field contains the user agent which will generally tell you if the visitor really is a spider or not... what you're seeing from that amazon cloud domain may be a spider or it may be someone's file getter or possible even an indexer (which is like a spider or crawler)...

    Interesting. I need to spend more time reading logs, I think.
    The lines in question all show this:
    "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.9740.1940 Mobile Safari/537.36"

    I also see some that are not even trying to hide anything. polaris botnet, ZmEu, zgrab, The Knowledge AI, and so forth. Even some from this fella: "masscan/1.0 (https://github.com/robertdavidgraham/masscan)"


    coincidence or not, about an hour after closing down reading of files and forums to anyone/thing not logged in, I was slammed for a couple hours from no-reverse-dns-configured.com with what looks like attempted php exploits.

    I see the php exploit attempts randomly here and there in all the log files, but this period was non stop for about 2 hours, 2-5 attempts every second. the log file is huge.


    Man.. this stuff felt simpler when just dealing with a modem and baud rates.

    ... Buy Land Now. It's Not Being Made Any More.

    ---
    þ Synchronet þ - Running madly into the wind and screaming - bbs.ujoint.org
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From poindexter FORTRAN@1:103/705 to Rampage on Thu Apr 2 09:09:00 2020
    Rampage wrote to Hemo <=-

    look in your /sbbs/data/logs directory for the http logs (if you have
    them enabled) and you will see a traditional apache-style log format... the last field contains the user agent which will generally tell you if the visitor really is a spider or not... what you're seeing from that amazon cloud domain may be a spider or it may be someone's file getter
    or possible even an indexer (which is like a spider or crawler)...

    That's a good point - ROBOTS.TXT can block by *user agent*, so if you have a particularly annoying web crawler, you can block that user agent from
    getting to anything instead of trying to block specific areas to all
    crawlers.

    This is all voluntary, a badly behaving crawler can just ignore your ROBOTS.TXT file.


    ... What do you think management's real interests are?
    --- MultiMail/XT v0.52
    þ Synchronet þ realitycheckBBS -- http://realitycheckBBS.org
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Mortifis@1:103/705 to Hemo on Fri Apr 3 15:49:07 2020
    Re: webecv4 questions
    By: Al to Hemo on Wed Apr 01
    2020 02:28 pm

    I've looked for a while but my google-foo is failing me.

    I am wanting to have the BBS web pages present, but not allow anyone
    to browse the message areas unless logged in. Perhaps allow one or two
    areas like a local/main, if possible. I want to shutdown the network
    areas from being web crawling/indexing targets.

    You can stop the web crawlers with your robots.txt.

    I'm not sure but I think the default robots.txt that comes with Synchronet will do this. My own robots.txt looks like this..

    User-agent: *
    Disallow: /bbbs


    I've got this:
    User-agent: *
    Disallow: /

    Its not stopping things taht are not identifying as a crawler. I think. I think a legitimate crawler starts by looking for the robots.txt file, I see some of those too.

    Here snips of what I see in the log:

    every minute or so, something comes in and goes directly to a specific file and tries to download it. Most of these seem to come from cn-northwest-1.compute.amazonaws.com.cn
    --
    H

    I wonder if adding if(user.alias === 'Guest') { writeln('You must be logged in to view files!'); exit(); } to /sbbs/webv4/root/api/files.ssjs would help? o0r something like that

    ---
    þ Synchronet þ AlleyCat! BBS Lake Echo, NS Canada
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Digital Man@1:103/705 to Mortifis on Fri Apr 3 12:11:03 2020
    Re: Re: webecv4 questions
    By: Mortifis to Hemo on Fri Apr 03 2020 03:49 pm

    Re: webecv4 questions
    By: Al to Hemo on Wed Apr
    01
    2020 02:28 pm

    I've looked for a while but my google-foo is failing me.

    I am wanting to have the BBS web pages present, but not allow anyone
    to browse the message areas unless logged in. Perhaps allow one or
    two
    areas like a local/main, if possible. I want to shutdown the network
    areas from being web crawling/indexing targets.

    You can stop the web crawlers with your robots.txt.

    I'm not sure but I think the default robots.txt that comes with Synchronet will do this. My own robots.txt looks like this..

    User-agent: *
    Disallow: /bbbs


    I've got this:
    User-agent: *
    Disallow: /

    Its not stopping things taht are not identifying as a crawler. I think.
    I think a legitimate crawler starts by looking for the robots.txt file, see some of those too.

    Here snips of what I see in the log:

    every minute or so, something comes in and goes directly to a specific file and tries to download it. Most of these seem to come from cn-northwest-1.compute.amazonaws.com.cn
    --
    H

    I wonder if adding if(user.alias === 'Guest') { writeln('You must be logged in to view files!'); exit(); } to /sbbs/webv4/root/api/files.ssjs would help? o0r something like that

    I don't think bots are logging in as Guest, but ecweb might do an auto-login-as-guest thing.

    digital man

    Synchronet "Real Fact" #4:
    Synchronet version 3 is written mostly in C, with some C++, x86 ASM, and Pascal.
    Norco, CA WX: 63.1øF, 58.0% humidity, 3 mph E wind, 0.00 inches rain/24hrs
    --- SBBSecho 3.10-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From echicken@1:103/705 to Mortifis on Fri Apr 3 15:28:05 2020
    Re: Re: webecv4 questions
    By: Mortifis to Hemo on Fri Apr 03 2020 15:49:07

    I wonder if adding if(user.alias === 'Guest') { writeln('You must be
    logged in
    to view files!'); exit(); } to /sbbs/webv4/root/api/files.ssjs would
    help? o0r
    something like that

    That script already checks if the current user has the ability to download, so this shouldn't be necessary.

    Likewise I think all of the file stuff uses 'file_area.lib_list', which is:

    "File Transfer Libraries (current user has access to) - introduced in v3.10"

    So I would expect it not to include areas that the current user isn't supposed to be able to see. Maybe I'm wrong or maybe that isn't working as expected.

    I suspect OP needs to tweak the guest account in use, along with settings on file and message areas.

    ---
    echicken
    electronic chicken bbs - bbs.electronicchicken.com
    þ Synchronet þ electronic chicken bbs - bbs.electronicchicken.com
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)