• Netbooting from MAME emulator

    From toshok@gmail.com@21:1/5 to All on Sat Jul 22 18:25:58 2023
    Hey all,

    Helping a friend get a dn4500 netbooting over ethernet from relatively current MAME. It's fetching domain_os from netman just fine, it spits out the low/high/start addresses, and even prints the kernel version + build time. But then it hangs without
    making it to the phase 2 boot environment.

    Happens regardless of the SR version I've tried (10.4 and 10.2 both show it.)

    Once the kernel shows the build time, wireshark on the linux-end shows a heartbeat-like proto 0x8019 packet being sent every ~6 seconds. But no other 0x8019 communication and no further output on the dn4500 screen.

    Any ideas on what next steps could be? there doesn't seem to be useful output/logging available anywhere.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jim Rees@21:1/5 to All on Sun Jul 23 12:56:25 2023
    I had a good friend at Apollo, Norman Garfinkle, who sadly died last year. He was a sort of engineering troubleshooter and quite a character. He used to say that the answer to all problems, not just netbooting, was "did you try running netman in a window?
    " It became a running joke. Like, "Your car won't start? Did you try running netman in a window?" That's what I would do here.

    Does wireshark show any network activity after the domain_os fetch other than the heartbeat? Can you tell if has fetched all of domain_os or just part of it?

    Are client and server both running the same version of domain os?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From toshok@gmail.com@21:1/5 to tos...@gmail.com on Sun Jul 23 15:22:17 2023
    On Sunday, July 23, 2023 at 3:19:47 PM UTC-7, tos...@gmail.com wrote:
    I see the range requests for pages, and it looks like the client refetches the last set 0x2e0-0x2ef (which I'm guessing is just a way to make sure that it actually got it all, if the returned length is less than a multiple of page_size?)

    oops, I meant to remove this block. The multiple fetches was an illusion caused by me selecting both the physical interface and the mame tunnel. There is no refetch of the last set of pages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From toshok@gmail.com@21:1/5 to Jim Rees on Sun Jul 23 15:19:46 2023
    On Sunday, July 23, 2023 at 12:56:26 PM UTC-7, Jim Rees wrote:
    I had a good friend at Apollo, Norman Garfinkle, who sadly died last year. He was a sort of engineering troubleshooter and quite a character. He used to say that the answer to all problems, not just netbooting, was "did you try running netman in a
    window?" It became a running joke. Like, "Your car won't start? Did you try running netman in a window?" That's what I would do here.

    I have run it in a window (with debug mode on to see everything), and .. it all looks good. I also spent some time decompiling it to figure out the protocol (using your mach-netman as a base). Looks like the boot rom is using range requests, giving
    both start and end page numbers.

    Can you tell if has fetched all of domain_os or just part of it?

    I see the range requests for pages, and it looks like the client refetches the last set 0x2e0-0x2ef (which I'm guessing is just a way to make sure that it actually got it all, if the returned length is less than a multiple of page_size?)

    One thing that's interesting (and I didn't notice before): the high address is one byte off.

    Emulator: low: 01001C00 high: 010BCEE8 start: 01001C24
    DN4500: low: 01001C00 high: 010BCEE7 start: 01001C24

    Instead of `EX DOMAIN_OS` I tried `LO DOMAIN_OS` and then poked at various memory locations to see if I could see any differences and did find one. In the emulator:

    LO DOMAIN_OS

    ...

    A 1010000
    1010000: 8745

    That memory location holds 9845 on dn4500. I looked at the next 2 locations on the dn4500 and saw 6606 and DBBA, respectively.

    Given the the address, figuring out the ethernet packet given the pages was pretty easy (it's page 0x39), and wireshark shows 9845 was sent. the byte sequence is actually 98 45 66 06 DB BA at byte. so that's all lining up. I also found that byte
    sequence in /sau7/domain_os at that page's offset (0xe400), so I'm not sure what's going on here, except "mame might have a bug.."

    I also suspect the difference in "high" address in the boot rom output might be due to differences in the boot roms themselves. The emulator is running MD7C REV 8.00, and the dn4500 is running MD7R REV 4.00.

    re: further traffic from the dn4500, there was an ARP request ("Who has <emulator ethernet address>? tell <dn4500 ethernet address>.")

    Does wireshark show any network activity after the domain_os fetch other than the heartbeat?

    I went back and looked at the wireshark output, and there's a bit more between the domain_os fetch and the heartbeat:

    1. Another boot service request from the dn4500 to the emulator (get_uids.), which the emulator responds to.
    2. Another packet from the dn4500 to the emulator with proto 0x8019, but I don't think it's a boot service request. that one goes unanswered.
    3. A packet from the dn4500 to the Apollo-DOMAIN multicast address. no clue what that one is. also goes unanswered.

    Then the heartbeat packet starts up.

    Are client and server both running the same version of domain os?

    yeah. client is netbooting off a 10.2 mame 3500, so same sau# and should have the same kernel bits.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jim Rees@21:1/5 to All on Mon Jul 24 15:29:21 2023
    Well I'm out of ideas. You should ask Hans but I'm sure you thought of that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From toshok@gmail.com@21:1/5 to Jim Rees on Mon Jul 24 16:35:56 2023
    On Monday, July 24, 2023 at 3:29:22 PM UTC-7, Jim Rees wrote:
    Well I'm out of ideas. You should ask Hans but I'm sure you thought of that.

    Oh actually, you might have some deep, repressed memories about something that might help me figure out what those remaining packets are. Any memory of where the demuxing of 0x8019 packets to services happens? i.e. how are boot service packets routed
    to netman? Maybe I can find where the other packets are meant to end up/figure out the protocol there.

    I had thought about pinging Hans directly but figured he might be around here as well. Will try directly :)

    -c

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)