• Shroedinger's Variable?

    From Rick Parrish@1:103/705 to GitLab issue in main/sbbs on Fri Sep 6 20:27:40 2024
    open https://gitlab.synchro.net/main/sbbs/-/issues/782

    The sysop of Tucumcari BBS reported garbage input when trying to log in via fTelnet. Dove-net post for reference:
    https://vert.synchro.net/?page=001-forum.ssjs&sub=sync&thread=52391

    - I asked him to roll back websocketservice.js to a previous version, and input worked good with it
    - I modified the latest version of websocketservice.js to output debug information, which indicated the issue is related to the XORing used to unmask incoming websocket data frames.
    - For example, if I hit the "a" key, the debug output showed that a masking byte of 212 and a data byte of 181 were received, and when XORed they should have resulted in an input byte of 97 to match the "a" key I pressed, but instead the input byte was still 181 like the raw data byte. This means either the XOR was skipped, or somehow 181 XOR 0 ran instead of 181 XOR 212.
    - I made a second modification to output a bit more debug information to determine whether the XOR was being skipped, or the XOR 0 was happening, and unexpectedly input worked good with it!
    - I asked him to run both of my modified versions on different ports, to see whether the first debug version now works, but it still has garbage input
    - I took a closer look at the output from the first debug version, and found that it actually starts unmasking correctly, but seems to reliably fail at the same point on repeated connection attempts:
    - 255 252 1 received OK
    - 255 251 23 received OK
    - 255 253 29 not OK (29 should have been 1)
    - Everything after that is not unmasking

    The difference between the first debug version and the second debug version is the addition of this line, which references several variables but does not modify any of them:
    `SendToWebSocketClient_RickDebug(' InByte = ' + InByte + ', FFrameMasked = ' + FFrameMasked + ', FFramePayloadReceived = ' + FFramePayloadReceived + ', Mask = ' + FFrameMask[FFramePayloadReceived % 4] + ', OutByte = ' + (InByte ^ FFrameMask[FFramePayloadReceived % 4]) + '\r\n');`

    And in case it helps, this is the commit that introduces the garbage input behaviour. It switches from reading one byte at a time to reading a message at a time, to speed up file transfers. The line responsible for unmasking the data didn't change though.
    https://gitlab.synchro.net/main/sbbs/-/commit/fbefb47f10a19a68c58ae0bdd6ef64484ac100cc

    @echicken mentioned he ran into something like this sometime last year, and suggests changing options controlling JS runtime behaviour. Is this something that can be done via configuration options? Or only at compile time? And however it's done, do you have any suggestions on what to change and what to change it to?

    Here's links to connect to the two debug versions of websocketservice.js, if you'd like to see the issue in action (scrollback is limited so you won't be able to see the earliest debug output, but it's all visible in developer tools console)

    [Version 1 (garbage input)](https://embed-v2.ftelnet.ca/connect/?BareLFtoCRLF=false&BitsPerSecond=57600&ConnectionType=telnet&Emulation=ansi-bbs&Enter=\r&Font=CP437&ForceWss=false&Hostname=tucumcar.synchro.net&LocalEcho=false&NegotiateLocalEcho=true&Port=11255&ScreenColumns=80&ScreenRows=25&SendLocation=true#ftelnetdebug=1)

    [Version 2 (works)](https://embed-v2.ftelnet.ca/connect/?BareLFtoCRLF=false&BitsPerSecond=57600&ConnectionType=telnet&Emulation=ansi-bbs&Enter=\r&Font=CP437&ForceWss=false&Hostname=tucumcar.synchro.net&LocalEcho=false&NegotiateLocalEcho=true&Port=11245&ScreenColumns=80&ScreenRows=25&SendLocation=true#ftelnetdebug=1)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From echicken@1:103/705 to GitLab note in main/sbbs on Fri Sep 6 21:46:56 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5571

    I looked through some old commits and found [this](https://gitlab.synchro.net/main/sbbs/-/merge_requests/237#note_2998), which is where I encountered and worked around a similar issue.

    Looking at the two commits there, it seems that maybe:
    - computing an array index inside the accessor is sometimes problematic? (`array[somevalue + someothervalue]`)
    - introducing a situation where the dynamic index value might be coerced to string / passed into log() will make it become real?

    You might try changing:

    ```js
    if (FFrameMasked) InByte ^= FFrameMask[FFramePayloadReceived++ % 4];
    ```

    to

    ```js
    var idx = FFramePayloadReceived++ % 4;
    if (FFrameMasked) InByte ^= FFrameMask[idx];
    ```

    or even

    ```js
    var idx = FFramePayloadReceived++ % 4;
    if (idx < 0 || idx >= 4) log(LOG_DEBUG, 'This will never happen');
    if (FFrameMasked) InByte ^= FFrameMask[idx];
    ```

    which seems similar to what I ended up doing. Which of course would be a workaround but I wonder if either of those would "solve" the problem.

    Regarding JS runtime behavior, it's wild speculation on my part. I remember seeing discussion within the past few years of what I think were compile-time options for SpiderMonkey and something in that is nagging at me, like this could be a symptom of some JS optimization gone wrong.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From echicken@1:103/705 to GitLab note in main/sbbs on Fri Sep 6 21:52:12 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5571

    I looked through some old commits and found [this](https://gitlab.synchro.net/main/sbbs/-/merge_requests/237#note_2998), which is where I encountered and worked around a similar issue.

    Looking at the two commits there, it seems that maybe:
    - computing an array index inside the accessor is sometimes problematic? (`array[somevalue + someothervalue]`)
    - introducing a situation where the dynamic index value might be coerced to string / passed into log() will make it become real?

    You might try changing:

    ```js
    if (FFrameMasked) InByte ^= FFrameMask[FFramePayloadReceived++ % 4];
    ```

    to

    ```js
    var idx = FFramePayloadReceived++ % 4;
    if (FFrameMasked) InByte ^= FFrameMask[idx];
    ```

    or even

    ```js
    var idx = FFramePayloadReceived++ % 4;
    if (idx < 0 || idx >= 4) log(LOG_DEBUG, 'idx is ' + idx + '(this will never happen)');
    if (FFrameMasked) InByte ^= FFrameMask[idx];
    ```

    which seems similar to what I ended up doing. Which of course would be a workaround but I wonder if either of those would "solve" the problem.

    Regarding JS runtime behavior, it's wild speculation on my part. I remember seeing discussion within the past few years of what I think were compile-time options for SpiderMonkey and something in that is nagging at me, like this could be a symptom of some JS optimization gone wrong.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Fri Sep 6 22:22:11 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5573

    Or try:
    ```
    var idx = FFramePayloadReceived % 4;
    FFramePayloadReceived++;
    ```
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Sat Sep 7 07:05:19 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5574

    Thanks for finding your old commit @echicken. At least two things in common, which does make it look like it's probably a platform specific optimization or bug:

    - You documented the sysop using "Linux Debian 11 on a raspi 4, 32 bits" and Tucumcari's sysop said they followed the raspi instructions, although they're using a banana pi on armbian (I don't know anything about these devices, but I guess they're similar and if they followed the same instructions, would be using the same js library)

    - You said "Some invalid index was being used to access the array of strings, I don't know what. This seemed to begin after eight iterations, don't know if that was consistent.", and based on my testing the unmasking works fine for the first 8 data bytes and then fails from the 9th byte and beyond, very consistently (tried about 10 times)

    Since the sysop is willing to help test, I'm going to go with your first suggested fix first, which if it works will let us know to avoid using logic when accessing an array. If it doesn't work, then I'll move on to your second suggestion and incorporate DM's increment-after suggestion as well.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Sun Sep 8 20:47:38 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5585

    The sysop confirmed the first suggested fix didn't work.

    For the next attempted fix I created an ugly function to unmask in multiple steps with lots of checks. I haven't heard back from the sysop yet, but I got a raspi working under qemu, and the fix didn't work:
    https://gitlab.synchro.net/main/sbbs/-/commit/aa419b47f090be8fccd23ba811304b48182b3064?view=inline

    By "doesn't work" I mean it gets through the function without throwing, which means the various values are never undefined or out-of-range, but the input is garbled nonetheless.

    After trying random shit I found that the modulus operation seems to be the culprit, because when I manually track the mask index it works (at least under qemu, hopefully it'll work on real hardware too):
    https://gitlab.synchro.net/main/sbbs/-/commit/da0597791fa88035fe31c75683b22d7df828b1b3?view=inline

    Weird thing is the version from last year that works for him uses the same modulus math that the broken version uses. And the issue echicken ran into doesn't use modulus at all. My commit that broke things added a for loop, and echicken's code was also in a for loop, so maybe the real culprit is broken for loop optimization?
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From echicken@1:103/705 to GitLab note in main/sbbs on Sun Sep 8 21:17:09 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5588

    "Broken for loop optimization" is about what I was thinking at the time. Forcing the variable to be logged (the "observer effect") made some loop optimization impossible since the value had to be known and used then and there.

    The latter of my two commits introduced the log statement and I vaguely recall this being the "real" fix, nothing to do with making the calculation inside the square brackets, so I'm not surprised. In your case, I guess that explicitly setting FFrameMaskIndex to 0 is the secret sauce, since zero is zero and it doesn't need to be lazy about doing the math.

    Of course, goodness knows what's going on in the guts of ye olden SpiderMonkey on ARM. I'd love to have a better understanding of this so I can avoid introducing similar problems in the future, but it's also annoying that perfectly reasonable code just shits the bed unexpectedly.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Mon Sep 9 05:26:12 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5589

    Forcing the variable to be logged (the "observer effect") made some loop optimization impossible since the value had to be known and used then and there.

    That's what I tried doing in https://gitlab.synchro.net/main/sbbs/-/commit/aa419b47f090be8fccd23ba811304b48182b3064?view=inline but I just realized I'm a dumbass and I hardcoded the word 'undefined' in the error message instead of referencing the variables themselves. I also used `throw` instead of `log()` so maybe that makes a difference too.

    Sysop confirmed the latest commit did the trick on real hardware too, so I'll clean up my branch and open a merge request.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Mon Sep 9 12:25:58 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5597

    Replacing the hardcoded 'undefined' in the `log` call with a variable didn't help. No matter what I try, if I conditionally call `log` the input is garbled. The only thing that works is unconditionally calling `log` for every byte.

    I was curious why the conditional call fixed things for you in `randomString` but not for me, and after investigating it looks like it didn't, at least not fully. When I checked my 0001.web file I found:

    `csrf_token=8EMorJ4QundefinedKundefinedundefinedundefinedundefinedundefinedg2mBMKfpagGBuOm2oOveEShPThIs`

    So it looks like calling `log` for the impossible numbers eventually allows it to recover and start generating valid numbers again, but there's still a stretch of impossible numbers in there as well.

    If I alter `randomString` to unconditionally call `log`, then a good `csrf_token` is generated.

    Going back to the websocket unmasking, I found multiple function calls that workaround the bug. For example `mswait(InByte / InByte)` works, but would introduce undesirable delays when uploading files. `ascii` seems to be the most promising, since `Result += ascii(InByte)` is equivalent to the existing `Result += String.fromCharCode(InByte)`, so is a drop-in replacement (at least as far as I can tell).

    And `randomString` could be refactored to use `ascii` as well. Drop the `.split("")` to keep `chars` as a string, then change `str += chars[rn]` to `str += ascii(ascii(chars[rn]))`, and while super-janky to double-call `ascii` like that, it results in a random string without any 'undefined' bits. Alternatively you could switch from a for loop to a while loop, and put the `str += chars[rn]` portion into an else so it only appends when `chars[rn]` is defined.

    Speaking of while loop, I checked to see whether it would have the same issue as the for loop, and it does. Then I manually unrolled the for loop, and it worked fine. So definitely seems to be related to loop optimization going haywire.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From echicken@1:103/705 to GitLab note in main/sbbs on Mon Sep 9 13:44:38 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5598

    I was curious why the conditional call fixed things for you in `randomString` but not for me, and after investigating it looks like it didn't, at least not fully.

    That's interesting - I guess it solved the immediate problem for the sysop and appeared fixed, even if the string wasn't what I intended to generate.

    `randomString` could be refactored to use `ascii` as well.

    I'll definitely keep this in mind especially if I go poking around in webv4 again.

    The problem now is ... how many scripts in our codebase have similar "problems", and do we need to use similar tricks going forward in new ones? It'd be great if a fix of the underlying problem were possible, but that's hard to say. I'll have to see what kind of RPi stuff I have kicking around here and do some tinkering soon.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Mon Sep 9 20:38:22 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5610

    That's interesting - I guess it solved the immediate problem for the sysop and appeared fixed, even if the string wasn't what I intended to generate.

    The generated csrf_token was significantly shorter than the fully-broken version, so likely it didn't exceed whatever limit the fully-broken version was hitting.

    The problem now is ... how many scripts in our codebase have similar "problems"

    I'm actually leaning towards NOT merging the input garbage workaround into master, because as you said it probably leaves other hidden problems that could do who knows what damage. Leaving it un-worked-around means it could act as a litmus test for future js-related bug reports that only seem to affect pis. @rswindell any thoughts on whether to merge the workaround or not?

    Also, I found a cloud provider that rents pis, and things work great there without any workarounds. I noticed it was a 32bit install vs my qemu install being a 64bit install, and as far as I can tell Armbian (what the sysop who reported the bug is running) only comes in 64bit. Not a big enough sample size to say anything for sure, but it does make me wonder if 64bit is the problem, and the solution to future js-on-pi bug reports is "install a 32bit os instead".
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Tue Sep 10 16:10:53 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5611

    I'd probably be inclined to commit the workaround. If we can/want to write a separate litmus test that can be easily executed (e.g. via jsexec), that'd be preferable.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Wed Sep 11 07:19:24 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5612

    I copied randomString and setSessionValue out into a stand-alone .js file, then ran it via jsexec, and it generated a good string without any 'undefined' bits, so it doesn't look like a jsexec-based litmus will work.

    But creating a (hidden) webv4/pages/000-raspi-test.xjs that calls randomString works (and by works I mean generates 'undefined' so can be used as a litmus):

    ![image](/uploads/d921af3a6a980b9431f2fe5f10279352/image.png)

    On systems where randomString doesn't generate 'undefined' bits it could show:

    ![image](/uploads/6f80a6fc36396bc3026cbd537f7c6a80/image.png)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rick Parrish@1:103/705 to GitLab note in main/sbbs on Wed Sep 11 08:23:14 2024
    https://gitlab.synchro.net/main/sbbs/-/issues/782#note_5612

    I copied randomString and setSessionValue out into a stand-alone .js file, then ran it via jsexec, and it generated a good string without any 'undefined' bits, so it doesn't look like a jsexec-based litmus will work.

    But creating a (hidden) webv4/pages/000-raspi-test.xjs that calls randomString works (and by works I mean generates 'undefined' so can be used as a litmus):

    ![image](/uploads/9d4005240bbdf022c5d13cf0e7f5d54d/image.png)

    On systems where randomString doesn't generate 'undefined' bits it could show:

    ![image](/uploads/1b7d0d44c1a3af482b364122849c230c/image.png)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab issue in main/sbbs on Thu Sep 12 13:58:23 2024
    close https://gitlab.synchro.net/main/sbbs/-/issues/782
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)