• Serial Bus Speed on PCs

    From Rick C@21:1/5 to All on Tue Nov 29 23:33:55 2022
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe 2
    pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16 responses
    and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The application
    program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45 connector,
    there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
    second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Rick C on Wed Nov 30 07:42:10 2022
    On 11/30/22 2:33 AM, Rick C wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe
    2 pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16
    responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The
    application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
    connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
    second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.


    You can get much more that 8000 cps with an FTDI interface. This is
    because you can send/recieve more that one character per "poll".

    My first thought is why are you trying to combine everything into one
    USB serial port. Why not give each test fixture its own serial port (or
    lump just a few onto a given port) and let the USB bus do the bulk of
    the multi-drop.

    The ethernet unit might be just a 10 MBit device, or maybe a 100MBit and
    you need to send a whole message block, process it, then send the data
    in it, and then it can send back the answer when it figures the full
    answer has come back. It likely doesn't even TRY to transmit on a
    character basis, but because of the much larger overhead of an ethernet
    packet, presumes network bandwidth is more important the delay.

    Also, they may be quoting figures with typical routing delays assuming a multi-hop route from computer to destination, which adds to the delay,
    since that is the sort of application you use those for. Ethernet is a
    "long haul" medium, not normally thought of as short haul, particularly
    when talking about lower bandwidth applications.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernd Linsel@21:1/5 to Rick C on Wed Nov 30 16:11:19 2022
    On 30.11.2022 15:21, Rick C wrote:

    It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not have
    message handling delays. Same concern with the 12 Mbps TTL level interface.


    So what is there against you using such a 12 Mbps USB/serial thing and attaching an RS-422/485 transceiver (e.g. https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).

    That should meet all your requirements mentioned so far.

    Regards,
    Bernd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Richard Damon on Wed Nov 30 06:21:18 2022
    On Wednesday, November 30, 2022 at 8:42:17 AM UTC-4, Richard Damon wrote:
    On 11/30/22 2:33 AM, Rick C wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe
    2 pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16
    responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The
    application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
    connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
    second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

    You can get much more that 8000 cps with an FTDI interface. This is
    because you can send/recieve more that one character per "poll".

    Yes, I'm aware of that. I suppose I didn't spell out everything in my post, but the 8,000 per second polling rate translates into 4,000 message pairs, one Tx, one Rx. With 256 end points to be controlled, this is just 16 message pairs per second per
    end point. The length of the messages is around 15 char, so this gives a bit over 1 Mbps. The RS-422 FTDI adapter can manage 3 Mbps, or the TTL, hi-speed adapter can be set for up to 12 Mbps, but I'm still waiting to hear from them about any internal
    or software overhead that would slow the message rate.


    My first thought is why are you trying to combine everything into one
    USB serial port. Why not give each test fixture its own serial port (or
    lump just a few onto a given port) and let the USB bus do the bulk of
    the multi-drop.

    I don't know if that will work any better. I have questions in to the various vendors.


    The ethernet unit might be just a 10 MBit device, or maybe a 100MBit

    10, 100 Mbps and 1 Gbps.


    and
    you need to send a whole message block, process it, then send the data
    in it, and then it can send back the answer when it figures the full
    answer has come back.

    "It"??? What is "it" exactly? The message blocks are 15 characters. The bus runs with a single command from the master resulting in a single response from the slave, lather, rinse, repeat. The short message size results in a low bit rate, or, really,
    the message rate is the choke point, not the bit rate.


    It likely doesn't even TRY to transmit on a
    character basis, but because of the much larger overhead of an ethernet packet, presumes network bandwidth is more important the delay.

    I don't know where you got the "character" idea. I don't know what the adapter decides is a block to send, but I assume there is a maximum size and short of that, there's a time out.


    Also, they may be quoting figures with typical routing delays assuming a multi-hop route from computer to destination, which adds to the delay,
    since that is the sort of application you use those for. Ethernet is a
    "long haul" medium, not normally thought of as short haul, particularly
    when talking about lower bandwidth applications.

    No one said anything about Ethernet "routing" delays. I've explained to them what I'm doing and one vendor said there is a 1 ms delay in handling each "message" as I described it.

    I could go with something much fancier, where the same command is sent to all slaves, and the slaves respond in turn, controlled by a separate signal controlling priority to write the reply onto the shared bus. The message from the master can be a
    single broadcast message, with 128 replies.

    So far, no one has indicated the specific baud rates they support. They only list the maximum rate. I have to design the slaves with a clock for the baud rate times X. It would be nice to share that with the rest of the design which needs a clock
    around 33 MHz for comms to the UUTs.

    It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not have
    message handling delays. Same concern with the 12 Mbps TTL level interface.

    --

    Rick C.

    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Bernd Linsel on Wed Nov 30 07:58:44 2022
    On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel wrote:
    On 30.11.2022 15:21, Rick C wrote:

    It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not
    have message handling delays. Same concern with the 12 Mbps TTL level interface.

    So what is there against you using such a 12 Mbps USB/serial thing and attaching an RS-422/485 transceiver (e.g. https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).

    That should meet all your requirements mentioned so far.

    I heard back from FTDI and they only support polling rates up to 1 kHz. So I guess I'm stuck with Ethernet. I might be stuck with changing the protocol. Someone suggested that the OS will interject delays as well. So I might have to either install 16
    serial ports directly in the PC, or change th e protocol so the master talks to all the slaves in a burst or a single broadcast command, and the replies are controlled by a priority scheme so they are back to back.

    I didn't expect this to be the difficult part of the job.

    I could also automate the test steps into the FPGA on each test fixture board. But that makes the whole thing much less flexible while developing.

    --

    Rick C.

    -- Get 1,000 miles of free Supercharging
    -- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Rick C on Wed Nov 30 18:14:12 2022
    On 30/11/2022 16:58, Rick C wrote:
    On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel
    wrote:
    On 30.11.2022 15:21, Rick C wrote:

    It's kind of odd that FTDI has a hi-speed serial adapter with a
    TTL level UART interface that runs up to 12 Mbps, while the
    RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
    Still, 3 Mbps will work a champ if the interface does not have
    message handling delays. Same concern with the 12 Mbps TTL level
    interface.

    So what is there against you using such a 12 Mbps USB/serial thing
    and attaching an RS-422/485 transceiver (e.g.
    https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).


    That should meet all your requirements mentioned so far.

    I heard back from FTDI and they only support polling rates up to 1
    kHz. So I guess I'm stuck with Ethernet. I might be stuck with
    changing the protocol. Someone suggested that the OS will interject
    delays as well. So I might have to either install 16 serial ports
    directly in the PC, or change th e protocol so the master talks to
    all the slaves in a burst or a single broadcast command, and the
    replies are controlled by a priority scheme so they are back to
    back.

    I didn't expect this to be the difficult part of the job.

    I could also automate the test steps into the FPGA on each test
    fixture board. But that makes the whole thing much less flexible
    while developing.


    The general issue is that PC's are great at throughput, but poor at
    latency. USB in particular has a scheduler and polls the devices on the
    bus at regular intervals. (This can't really be avoided in a
    half-duplex master-slave system.) For Ethernet, a gigibit switch will
    usually have a latency of 50 - 125 us. Even with a direct connection
    with no switch, you'll be hard pushed to get latencies lower than 50 us,
    and thus a query-reply peak rate of 10,000 telegram pairs a second.

    You can get higher throughput if you have multiple outstanding
    query-replies going to different USB devices or different IP
    connections. So while you are not going to get more than 4000
    send/receive transactions a second to one USB 2.0 high speed FTDI serial
    port device, you could probably do that simultaneously to several such
    devices on the same bus as long as you don't need to wait for the reply
    from one target before sending a message to a different target. (The
    same principle goes for Ethernet.)

    A communication hierarchy is likely the best way to handle this.

    Alternatively, at the messages from the PC can be large and broadcast,
    rather than divided up. You could even make an EtherCAT-style serial
    protocol (using the hybrid RS-422 bus you suggested earlier). The PC
    could send a single massive serial telegram consisting of multiple small
    ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one big
    telegram in - using 3 MBaud if you like.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to David Brown on Wed Nov 30 20:52:40 2022
    On 11/30/2022 19:14, David Brown wrote:
    On 30/11/2022 16:58, Rick C wrote:
    On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel
    wrote:
    On 30.11.2022 15:21, Rick C wrote:

    It's kind of odd that FTDI has a hi-speed serial adapter with a
    TTL level UART interface that runs up to 12 Mbps, while the
    RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
    Still, 3 Mbps will work a champ if the interface does not have
    message handling delays. Same concern with the 12 Mbps TTL level
    interface.

    So what is there against you using such a 12 Mbps USB/serial thing
    and attaching an RS-422/485 transceiver (e.g.
    https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).


    That should meet all your requirements mentioned so far.

    I heard back from FTDI and they only support polling rates up to 1
    kHz.  So I guess I'm stuck with Ethernet.  I might be stuck with
    changing the protocol.  Someone suggested that the OS will interject
    delays as well.  So I might have to either install 16 serial ports
    directly in the PC, or change th e protocol so the master talks to
    all the slaves in a burst or a single broadcast command, and the
    replies are controlled by a priority scheme so they are back to
    back.

    I didn't expect this to be the difficult part of the job.

    I could also automate the test steps into the FPGA on each test
    fixture board.  But that makes the whole thing much less flexible
    while developing.


    The general issue is that PC's are great at throughput, but poor at latency.  USB in particular has a scheduler and polls the devices on the
    bus at regular intervals.  (This can't really be avoided in a
    half-duplex master-slave system.)  For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us.  Even with a direct connection
    with no switch, you'll be hard pushed to get latencies lower than 50 us,
    and thus a query-reply peak rate of 10,000 telegram pairs a second.

    You can get higher throughput if you have multiple outstanding
    query-replies going to different USB devices or different IP
    connections.  So while you are not going to get more than 4000
    send/receive transactions a second to one USB 2.0 high speed FTDI serial
    port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply
    from one target before sending a message to a different target.  (The
    same principle goes for Ethernet.)

    A communication hierarchy is likely the best way to handle this.

    Alternatively, at the messages from the PC can be large and broadcast,
    rather than divided up.  You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier).  The PC
    could send a single massive serial telegram consisting of multiple small ones:

      <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts.  (Adjust padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one big
    telegram in - using 3 MBaud if you like.


    David, that kind of detailed problem solving should not go out free
    of charge you know :-).
    Of course this is the way to do it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Thu Dec 1 01:08:19 2022
    Rick C <gnuarm.deletethisbit@gmail.com> wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe
    2 pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16
    responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The
    application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
    connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
    second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

    I am not sure if you get that there are two issues: througput and latency.
    If you wait for answer before sending next request you will be bounded
    by latency. OTOH if you fire several request without waiting, then
    you will be limited by througput. With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need
    the following times:

    CH340 2Mb/s, waiting, 6.890s
    CH340 2Mb/s, overlapped 1.058s
    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    The other end was STM32F030, which was simply replaying back
    received characters.

    Note: there results are not fully comparable. Apparently CH340
    will silently drop excess characters, so for overalapped operation
    I simply sent more charactes than I read. OTOH CP2104 seem to
    stall when its receive buffer overflows, so I limited overlap to
    avoid stalls. Of course real application would need some way
    to ensure that receive buffers do not overflow.

    So, you should be easily able to handle 10000 round trips
    per second provided there is enough overlap. For this
    you need to ensure that only one device is transmitting to
    PC. If you have several FPGA-s on a single board, coordinating
    them should be easy. Of couse, you need some free pins and
    extra tracks. I would use single transceiver per board,
    depending on coordination to ensure that only one FPGA
    controls transceiver at given time. Anyway, this would
    allow overlapped transmisson to all devices on single
    board. With multiple boards you would need some hardware
    or software protocol decide which board can transmit.
    On hardware side a single pair of extra wires could
    carry needed signals (that is your "priority daisy chain").

    As other suggested you could use multiple convertors for
    better overlap. My convertors are "full speed" USB, that
    is they are half-duplex 12 Mb/s. USB has significant
    protocol overhead, so probably two 2 Mb/s duplex serial
    convertes would saturate single USB bus. In desktops
    it is normal to have several separate USB controllers
    (buses), but that depends on specific motherboard.
    Theoreticaly, when using "high speed" USB converters,
    several could easily work from single USB port (provided
    that you have enough places in hub(s)).

    An extra thing: there are reasonably cheap PC compatible
    boards, supposedly they are cheaper and more easy to buy
    than Raspberry Pi (but I did not try buy them). If you
    need really large scale you could have a single such board
    per batch of devices and run copy of your program there. And
    a single laptop connecting to satelite board via ethernet
    and collecting results.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Thu Dec 1 02:48:15 2022
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe
    2 pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16
    responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The
    application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
    connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
    second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.
    I am not sure if you get that there are two issues: througput and latency.

    Of course I'm aware of it. That's the entirety of the problem.


    If you wait for answer before sending next request you will be bounded
    by latency.

    Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would
    not implement the data path in an FPGA.

    I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.


    OTOH if you fire several request without waiting, then
    you will be limited by througput.

    Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine
    messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that
    manipulate the details being tested.

    Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more
    problematic. Even 16 is a PITA.

    Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled
    by commands as well.


    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need
    the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.


    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.

    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

    The other end was STM32F030, which was simply replaying back
    received characters.

    Note: there results are not fully comparable. Apparently CH340
    will silently drop excess characters, so for overalapped operation
    I simply sent more charactes than I read. OTOH CP2104 seem to
    stall when its receive buffer overflows, so I limited overlap to
    avoid stalls. Of course real application would need some way
    to ensure that receive buffers do not overflow.

    Wait, what? How would overlapped operation operate if you have to worry about lost characters???

    I'm not sure what "stall" means. Did it send XOFF or something?

    Any idea on what size of aggregated messages would prevent character loss? That's kind of important.


    So, you should be easily able to handle 10000 round trips
    per second provided there is enough overlap. For this
    you need to ensure that only one device is transmitting to
    PC. If you have several FPGA-s on a single board, coordinating
    them should be easy. Of couse, you need some free pins and
    extra tracks. I would use single transceiver per board,
    depending on coordination to ensure that only one FPGA
    controls transceiver at given time. Anyway, this would
    allow overlapped transmisson to all devices on single
    board. With multiple boards you would need some hardware
    or software protocol decide which board can transmit.
    On hardware side a single pair of extra wires could
    carry needed signals (that is your "priority daisy chain").

    Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

    My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then
    passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.


    As other suggested you could use multiple convertors for
    better overlap. My convertors are "full speed" USB, that
    is they are half-duplex 12 Mb/s. USB has significant
    protocol overhead, so probably two 2 Mb/s duplex serial
    convertes would saturate single USB bus. In desktops
    it is normal to have several separate USB controllers
    (buses), but that depends on specific motherboard.
    Theoreticaly, when using "high speed" USB converters,
    several could easily work from single USB port (provided
    that you have enough places in hub(s)).

    I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version, their
    devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422, full-
    speed device. So maybe he got confused. They don't offer many hi-speed devices.

    But the Ethernet implementations also have speed issues, likely because they are actually software based.


    An extra thing: there are reasonably cheap PC compatible
    boards, supposedly they are cheaper and more easy to buy
    than Raspberry Pi (but I did not try buy them). If you
    need really large scale you could have a single such board
    per batch of devices and run copy of your program there. And
    a single laptop connecting to satelite board via ethernet
    and collecting results.

    Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

    The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the
    overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast
    the MSP430 UART will run? I might have an ARM board that runs Mecrisp, I can't recall.

    --

    Rick C.

    -+ Get 1,000 miles of free Supercharging
    -+ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Rick C on Fri Dec 2 13:30:19 2022
    On 01/12/2022 11:48, Rick C wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4,
    anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    <snip>
    LOL, so now I'm leaning back toward the USB based FTDI RS-422
    cable and a priority scheme so every target gets many, more
    commands per second. I just ran the math, and this would be
    almost 20,000 bits per command. Try to run that at 8,000 times
    per second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect
    with their cables. We'll see what they come back with.
    I am not sure if you get that there are two issues: througput and
    latency.

    Of course I'm aware of it. That's the entirety of the problem.


    I would be rather surprised if you were not aware of the difference -
    but your posts show you don't seem to be familiar with the level of the latencies inherent in USB and Ethernet. It seems you think it is just
    poor implementations of hardware or drivers. (Of course, limited implementations can make it worse.)


    If you wait for answer before sending next request you will be
    bounded by latency.

    Until I contacted the various vendors, I had no reason to expect
    their hardware to have such excessive latencies. Especially in the
    Ethernet converter, I would have expected better hardware. Being an
    FPGA sort of guy, I didn't even realize they would not implement the
    data path in an FPGA.

    No one implements the data path of Ethernet in an FPGA. Sometimes a few
    bits (such as checksums) are accelerated in hardware, and there can even
    be filtering or re-direction done in hardware, but the data in Ethernet
    packets is always handled in software.

    Even if it was all handled instantly in perfect hardware, an Ethernet
    frame is 72 bytes plus 12 bytes gap. Then there is at least 20 bytes of
    IP header, then 20 bytes for the TCP header. That's 124 bytes before
    there is any content whatsoever, or 10 us for 100 Mbps Ethernet.


    I found one company that does use an FPGA for a USB to serial
    adapter, but I expect the PC side USB software may be problematic as
    well. It makes you wonder how they ever get audio to work over USB.
    I guess lots of buffering.


    USB works by cyclic polling. There is inevitably a latency. USB 1 had
    1 kHz polling, while USB 2 has 8 kHz. (I don't know off-hand what USB 3
    has, but USB serial devices are invariably USB 1 or 2.)

    Most serial port drivers have lower polling rates than strictly
    necessary by USB cycle times, since polling very fast is difficult to do efficiently. I believe it is difficult on Windows to have periodic
    events at a resolution below 1 millisecond without busy-waiting, and
    drivers can't have busy-waiting - you can't have a driver that eats one
    of your cpu cores just because you've plugged in a USB to serial cable!

    If you write your own code that accesses the USB lower levels directly
    (such as using Linux libusb, or its Windows port) then you can, I
    believe, call USB transfer functions faster, up to the base USB cycle rate.


    None of this should make you wonder about audio. You just need enough buffering to cover USB cycles (125 us for USB 2). Any application delay
    is typically /far/ longer, such as when collecting streaming audio from
    a dodgy internet connection.


    I wonder if you are confusing the two related kinds of latency - one-way latency (time difference between when an application starts to send
    something at one end, and the application at the other end has got the
    data), and two-way latency for a query-reply two-way communication. You
    might also be mixing up jitter in this.

    I say this because there are such critical differences between the needs
    of audio and the needs of your communication. In particular, audio does
    not care about two-way latencies, and can cope with significant one-way
    latency (up to perhaps 20 ms) even when there is video. Without video,
    latency is irrelevant for audio as long as the jitter is low.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to David Brown on Fri Dec 2 09:01:54 2022
    On Friday, December 2, 2022 at 7:30:25 AM UTC-5, David Brown wrote:
    On 01/12/2022 11:48, Rick C wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    <snip>
    LOL, so now I'm leaning back toward the USB based FTDI RS-422
    cable and a priority scheme so every target gets many, more
    commands per second. I just ran the math, and this would be
    almost 20,000 bits per command. Try to run that at 8,000 times
    per second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect
    with their cables. We'll see what they come back with.
    I am not sure if you get that there are two issues: througput and
    latency.

    Of course I'm aware of it. That's the entirety of the problem.

    I would be rather surprised if you were not aware of the difference -
    but your posts show you don't seem to be familiar with the level of the latencies inherent in USB and Ethernet. It seems you think it is just
    poor implementations of hardware or drivers. (Of course, limited implementations can make it worse.)

    I was warned that the polling rate in USB is at best 1 kHz for full-speed and 8 kHz for hi-speed, which creates definitely significant delays in this application. I've been told by FTDI (possibly in error) that even using the hi-speed interface, the
    best they can set their device for is 1,000 kHz polling. This does not result in a terrible data throughput, but it's not as fast as I'd like. If FTDI supported the hi-speed polling rate of 8 kHz, I would probably settle for that and quit looking.

    I'm pretty confident there is nothing inherent in 100 Mbps Ethernet that would create delays significant to this application. I've been told by one supplier, their device has a 1 ms built in delay. I'm wondering if this is a timeout to indicate a
    packet should be sent, even if no more data is being received. But so far, no one has said this is adjustable. I just spoke with Perle and I was told of a 5 ms delay on their Ethernet unit. Again, that's not inherent in the Ethernet protocol.


    If you wait for answer before sending next request you will be
    bounded by latency.

    Until I contacted the various vendors, I had no reason to expect
    their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an
    FPGA sort of guy, I didn't even realize they would not implement the
    data path in an FPGA.
    No one implements the data path of Ethernet in an FPGA. Sometimes a few
    bits (such as checksums) are accelerated in hardware, and there can even
    be filtering or re-direction done in hardware, but the data in Ethernet packets is always handled in software.

    You are the second person to tell me that I didn't design FPGAs for the TTC/Acterna/Viavi TBerd to process OC-12 data. I guess I just dreamed it.

    I'd like to know you base your assertion on?


    Even if it was all handled instantly in perfect hardware, an Ethernet
    frame is 72 bytes plus 12 bytes gap. Then there is at least 20 bytes of
    IP header, then 20 bytes for the TCP header. That's 124 bytes before
    there is any content whatsoever, or 10 us for 100 Mbps Ethernet.

    10 uS would be wonderful! 100 times faster than anyone else. Where do you sell your devices?


    I found one company that does use an FPGA for a USB to serial
    adapter, but I expect the PC side USB software may be problematic as
    well. It makes you wonder how they ever get audio to work over USB.
    I guess lots of buffering.

    USB works by cyclic polling. There is inevitably a latency. USB 1 had
    1 kHz polling, while USB 2 has 8 kHz. (I don't know off-hand what USB 3
    has, but USB serial devices are invariably USB 1 or 2.)

    Most serial port drivers have lower polling rates than strictly
    necessary by USB cycle times, since polling very fast is difficult to do efficiently. I believe it is difficult on Windows to have periodic
    events at a resolution below 1 millisecond without busy-waiting, and
    drivers can't have busy-waiting - you can't have a driver that eats one
    of your cpu cores just because you've plugged in a USB to serial cable!

    So far, no one has said it was the PC software. They have *all* said the delays are in their box.


    If you write your own code that accesses the USB lower levels directly
    (such as using Linux libusb, or its Windows port) then you can, I
    believe, call USB transfer functions faster, up to the base USB cycle rate.


    None of this should make you wonder about audio. You just need enough buffering to cover USB cycles (125 us for USB 2). Any application delay
    is typically /far/ longer, such as when collecting streaming audio from
    a dodgy internet connection.

    Please don't say USB 2. The number you cite is for hi-speed USB, regardless of the version of USB being used.


    I wonder if you are confusing the two related kinds of latency - one-way latency (time difference between when an application starts to send something at one end, and the application at the other end has got the data), and two-way latency for a query-reply two-way communication. You might also be mixing up jitter in this.

    Or not. The application sends messages two-ways as a means of preventing collisions on the RS-485 bus. The delay at the slave is near zero, approximately 0.5 us. The two messages are each 150 bits long, which on a 1.5 Mbps bus take 100 us to transmit.
    Everything else is due to the equipment. With a 1 ms delay added, that's a 10x slowdown.


    I say this because there are such critical differences between the needs
    of audio and the needs of your communication. In particular, audio does
    not care about two-way latencies, and can cope with significant one-way latency (up to perhaps 20 ms) even when there is video. Without video, latency is irrelevant for audio as long as the jitter is low.

    Ok, then forget about audio. Far too much has been said about that already. Thank you.

    At this point I am looking at using an Ethernet to serial module on each test fixture card and an Ethernet switch to connect them all to the PC. I don't like this in terms of the connectivity and the reliance on not just one, but two different vendors
    to make it work. Also, most of the modules are either rather large or expensive, or from an Asian company with awkward documentation. They often design their modules without regard to height which make them skyscrapers compared to the rest of the board.
    But I have a couple identified as potential candidates, but they will be much harder to test, since they need to be attached to a board.

    --

    Rick C.

    +- Get 1,000 miles of free Supercharging
    +- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Rick C on Sat Dec 3 12:55:29 2022
    On Saturday, December 3, 2022 at 3:42:49 PM UTC-5, Rick C wrote:
    On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown wrote:
    On 30/11/2022 16:58, Rick C wrote:
    On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel wrote:
    On 30.11.2022 15:21, Rick C wrote:

    It's kind of odd that FTDI has a hi-speed serial adapter with a
    TTL level UART interface that runs up to 12 Mbps, while the
    RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
    Still, 3 Mbps will work a champ if the interface does not have
    message handling delays. Same concern with the 12 Mbps TTL level
    interface.

    So what is there against you using such a 12 Mbps USB/serial thing
    and attaching an RS-422/485 transceiver (e.g.
    https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf). >>

    That should meet all your requirements mentioned so far.

    I heard back from FTDI and they only support polling rates up to 1
    kHz. So I guess I'm stuck with Ethernet. I might be stuck with
    changing the protocol. Someone suggested that the OS will interject delays as well. So I might have to either install 16 serial ports directly in the PC, or change th e protocol so the master talks to
    all the slaves in a burst or a single broadcast command, and the
    replies are controlled by a priority scheme so they are back to
    back.

    I didn't expect this to be the difficult part of the job.

    I could also automate the test steps into the FPGA on each test
    fixture board. But that makes the whole thing much less flexible
    while developing.

    The general issue is that PC's are great at throughput, but poor at latency. USB in particular has a scheduler and polls the devices on the bus at regular intervals. (This can't really be avoided in a
    half-duplex master-slave system.) For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us. Even with a direct connection
    with no switch, you'll be hard pushed to get latencies lower than 50 us, and thus a query-reply peak rate of 10,000 telegram pairs a second.

    You can get higher throughput if you have multiple outstanding query-replies going to different USB devices or different IP
    connections. So while you are not going to get more than 4000
    send/receive transactions a second to one USB 2.0 high speed FTDI serial port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply from one target before sending a message to a different target. (The
    same principle goes for Ethernet.)

    A communication hierarchy is likely the best way to handle this.

    Alternatively, at the messages from the PC can be large and broadcast, rather than divided up. You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier). The PC
    could send a single massive serial telegram consisting of multiple small ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one big
    telegram in - using 3 MBaud if you like.
    I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've
    been told that the messages will see delays in the adapters, but no one has indicated how they block the data. In the case of the FTDI adapter, the issue is the polling rate.

    This is the format I'm currently thinking of
    01 23 45 C\r\n - 11 chars
    01 23 45 C 67\r\n - 14 chars

    The transmitted message would add 15 char of padding for a total of 26 chars per end point. At 3 Mbps a message takes 87 us to transmit on the serial bus for 11,500 messages a second, or 90 messages per second per end point. That certainly would do the
    job, if I've done the math right. Even assuming other factors cut this rate in half, and it's still around 45 messages per end point each second.

    I wish I had something I could run tests with. I suppose any old MCU board would do the job. All it needs to do is see the \n and return a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232
    and I don't have RS-422 on an MCU card, but I do have TTL! I should be able to make that work from an RS-422 signal. The RS-422 receiver will work too, if I bias one input to ~1.5V.

    Unfortunately I don't have the dongle yet, so the test will need to wait a bit. I could try it with an RS-232 dongle just to see how it will work at slower data rates. I think the fastest might be around 250 kbps.

    Actually, I wasn't taking into account that the dummy characters only need to provide a small amount of delay to prevent slave collisions. The padding doesn't need to be as long as a slave message. So, with a 3 character difference in length, 4 char of
    padding should suffice, and make the replies look almost like a continuous stream of characters.

    I hate sending dummy characters though. They get in the way of debugging if you connect to the bus with an analyzer. But that shouldn't be needed, right? LOL In the first iteration of this test fixture, I had a bug in the FPGA code that showed up as
    random characters being dropped or changed. It was hard to find because that code had been used elsewhere. It was a failure in the documentation (not unlike the Ariane rocket failure) that resulted in my omission of a synchronizing FF that should have
    been at the input. The protocol that echos the command helped a LOT.

    --

    Rick C.

    --- Get 1,000 miles of free Supercharging
    --- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to David Brown on Sat Dec 3 12:42:46 2022
    On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown wrote:
    On 30/11/2022 16:58, Rick C wrote:
    On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel
    wrote:
    On 30.11.2022 15:21, Rick C wrote:

    It's kind of odd that FTDI has a hi-speed serial adapter with a
    TTL level UART interface that runs up to 12 Mbps, while the
    RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
    Still, 3 Mbps will work a champ if the interface does not have
    message handling delays. Same concern with the 12 Mbps TTL level
    interface.

    So what is there against you using such a 12 Mbps USB/serial thing
    and attaching an RS-422/485 transceiver (e.g.
    https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf). >>

    That should meet all your requirements mentioned so far.

    I heard back from FTDI and they only support polling rates up to 1
    kHz. So I guess I'm stuck with Ethernet. I might be stuck with
    changing the protocol. Someone suggested that the OS will interject
    delays as well. So I might have to either install 16 serial ports
    directly in the PC, or change th e protocol so the master talks to
    all the slaves in a burst or a single broadcast command, and the
    replies are controlled by a priority scheme so they are back to
    back.

    I didn't expect this to be the difficult part of the job.

    I could also automate the test steps into the FPGA on each test
    fixture board. But that makes the whole thing much less flexible
    while developing.

    The general issue is that PC's are great at throughput, but poor at
    latency. USB in particular has a scheduler and polls the devices on the
    bus at regular intervals. (This can't really be avoided in a
    half-duplex master-slave system.) For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us. Even with a direct connection
    with no switch, you'll be hard pushed to get latencies lower than 50 us,
    and thus a query-reply peak rate of 10,000 telegram pairs a second.

    You can get higher throughput if you have multiple outstanding
    query-replies going to different USB devices or different IP
    connections. So while you are not going to get more than 4000
    send/receive transactions a second to one USB 2.0 high speed FTDI serial port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply
    from one target before sending a message to a different target. (The
    same principle goes for Ethernet.)

    A communication hierarchy is likely the best way to handle this.

    Alternatively, at the messages from the PC can be large and broadcast, rather than divided up. You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier). The PC
    could send a single massive serial telegram consisting of multiple small ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one big
    telegram in - using 3 MBaud if you like.

    I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've
    been told that the messages will see delays in the adapters, but no one has indicated how they block the data. In the case of the FTDI adapter, the issue is the polling rate.

    This is the format I'm currently thinking of
    01 23 45 C\r\n - 11 chars
    01 23 45 C 67\r\n - 14 chars

    The transmitted message would add 15 char of padding for a total of 26 chars per end point. At 3 Mbps a message takes 87 us to transmit on the serial bus for 11,500 messages a second, or 90 messages per second per end point. That certainly would do the
    job, if I've done the math right. Even assuming other factors cut this rate in half, and it's still around 45 messages per end point each second.

    I wish I had something I could run tests with. I suppose any old MCU board would do the job. All it needs to do is see the \n and return a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232
    and I don't have RS-422 on an MCU card, but I do have TTL! I should be able to make that work from an RS-422 signal. The RS-422 receiver will work too, if I bias one input to ~1.5V.

    Unfortunately I don't have the dongle yet, so the test will need to wait a bit. I could try it with an RS-232 dongle just to see how it will work at slower data rates. I think the fastest might be around 250 kbps.

    --

    Rick C.

    ++ Get 1,000 miles of free Supercharging
    ++ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Rick C on Sun Dec 4 13:21:49 2022
    On 03/12/2022 21:42, Rick C wrote:
    On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown
    wrote:

    A communication hierarchy is likely the best way to handle this.

    Alternatively, at the messages from the PC can be large and
    broadcast, rather than divided up. You could even make an
    EtherCAT-style serial protocol (using the hybrid RS-422 bus you
    suggested earlier). The PC could send a single massive serial
    telegram consisting of multiple small ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast enough
    to be complete in good time before the next slave starts. (Adjust
    padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one big
    telegram in - using 3 MBaud if you like.

    I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one
    another. But I don't know if the replies will be seen as a unit for
    shipping over Ethernet or USB by the adapter. I've been told that
    the messages will see delays in the adapters, but no one has
    indicated how they block the data. In the case of the FTDI adapter,
    the issue is the polling rate.

    This is the format I'm currently thinking of 01 23 45 C\r\n - 11
    chars 01 23 45 C 67\r\n - 14 chars

    The transmitted message would add 15 char of padding for a total of
    26 chars per end point. At 3 Mbps a message takes 87 us to transmit
    on the serial bus for 11,500 messages a second, or 90 messages per
    second per end point. That certainly would do the job, if I've done
    the math right. Even assuming other factors cut this rate in half,
    and it's still around 45 messages per end point each second.


    Just to be clear - the slaves should not send any kind of dummy
    characters. When they have read their part of the incoming stream, they
    turn on their driver, send their reply, then turn off the driver.

    The master side might need dummy characters for padding if the slave
    replies (including any handling delay - the slaves might be fast, but
    they still take some time) can be longer than the master side telegrams.

    Each subtelegram in the master's telegram chain must be self-contained -
    a start character, an ending CRC or simple checksum, and so on. Replies
    from slaves must also be self-contained.

    It doesn't matter how the USB-to-serial or Ethernet-to-serial adaptors
    break up the messages - applications read the data as serial streams,
    not synchronous timed data. The only timing you have is a pause between
    master telegrams, which can be many milliseconds long, used to ensure
    that if something has gone wrong or lost synchronisation, their
    receiving state machine is reset and ready for the next round.


    I wish I had something I could run tests with. I suppose any old MCU
    board would do the job. All it needs to do is see the \n and return
    a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232 and I don't have
    RS-422 on an MCU card, but I do have TTL! I should be able to make
    that work from an RS-422 signal. The RS-422 receiver will work too,
    if I bias one input to ~1.5V.

    Unfortunately I don't have the dongle yet, so the test will need to
    wait a bit. I could try it with an RS-232 dongle just to see how it
    will work at slower data rates. I think the fastest might be around
    250 kbps.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to David Brown on Sun Dec 4 08:54:32 2022
    On Sunday, December 4, 2022 at 7:21:56 AM UTC-5, David Brown wrote:
    On 03/12/2022 21:42, Rick C wrote:
    On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown
    wrote:

    A communication hierarchy is likely the best way to handle this.

    Alternatively, at the messages from the PC can be large and
    broadcast, rather than divided up. You could even make an
    EtherCAT-style serial protocol (using the hybrid RS-422 bus you
    suggested earlier). The PC could send a single massive serial
    telegram consisting of multiple small ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast enough
    to be complete in good time before the next slave starts. (Adjust
    padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one big
    telegram in - using 3 MBaud if you like.

    I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one
    another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've been told that
    the messages will see delays in the adapters, but no one has
    indicated how they block the data. In the case of the FTDI adapter,
    the issue is the polling rate.

    This is the format I'm currently thinking of 01 23 45 C\r\n - 11
    chars 01 23 45 C 67\r\n - 14 chars

    The transmitted message would add 15 char of padding for a total of
    26 chars per end point. At 3 Mbps a message takes 87 us to transmit
    on the serial bus for 11,500 messages a second, or 90 messages per
    second per end point. That certainly would do the job, if I've done
    the math right. Even assuming other factors cut this rate in half,
    and it's still around 45 messages per end point each second.

    Just to be clear - the slaves should not send any kind of dummy
    characters. When they have read their part of the incoming stream, they
    turn on their driver, send their reply, then turn off the driver.

    The master side might need dummy characters for padding if the slave
    replies (including any handling delay - the slaves might be fast, but
    they still take some time) can be longer than the master side telegrams.

    Each subtelegram in the master's telegram chain must be self-contained -
    a start character, an ending CRC or simple checksum, and so on. Replies
    from slaves must also be self-contained.

    It doesn't matter how the USB-to-serial or Ethernet-to-serial adaptors
    break up the messages - applications read the data as serial streams,
    not synchronous timed data. The only timing you have is a pause between master telegrams, which can be many milliseconds long, used to ensure
    that if something has gone wrong or lost synchronisation, their
    receiving state machine is reset and ready for the next round.

    It absolutely does matter how the messages get broken up. That's where the delays come in. If the slave replies are sent over the network/USB bus one at a time, it's not significantly better than the original approach.

    --

    Rick C.

    --+ Get 1,000 miles of free Supercharging
    --+ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Sun Dec 4 21:30:19 2022
    Rick C <gnuarm.deletethisbit@gmail.com> wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10
    Mbps. It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable
    if there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or
    maybe 2 pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16
    responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The
    application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
    connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
    second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.
    I am not sure if you get that there are two issues: througput and latency.

    Of course I'm aware of it. That's the entirety of the problem.


    If you wait for answer before sending next request you will be bounded
    by latency.

    Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would
    not implement the data path in an FPGA.

    How do you know that data path is not in hardware? One question is
    if hardware is able to opperate with low latency. Another is if it
    should. And frequently answer to secend question is no, it should
    not try to minimize latency. Namely, Ethernet has minimal packet
    size which is about 60 characters. If you send each character in
    separate packet, then there would be very bad utilization of media.
    So, converter is expected to wait till there is enough characters
    to transmit. Note that at 115200 bits/s delay of 1ms is roughly
    11 characters, so not so big. At lower rates delay becomes less
    signifincant and at higher rates people usually care more about
    throughput than latency. And do not forget that Ethernet is
    shared medium, even if convertor could manage to transmit with
    lower latency withing available Ethernet bandwidth, it could
    do that only at cost of other users (possibly second convertor).

    And from a bit different point of view: normally there will be
    software in the path, giving you 0.1ms of latency on good modern
    unloaded hardware and much more in worse conditions. Also,
    Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
    Ethernet this is about 1.4 ms for transmitssion of packet.
    If network in not dedicated to convertor such packets are likely
    to appear from time to time and convertor has to wait till
    such packet is fully transmitted and only then gets chance
    to transmit. So, you should regularly expect delays of order
    1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
    media delays are smaller, but serial convertors are frequently
    deployed in legacy contexts where 10 Mbit/s matter.

    I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.

    Audio is quite different than serial. Audio can be pre-scheduled
    but in general you do not know when there will be traffic on
    serial port.

    OTOH if you fire several request without waiting, then
    you will be limited by througput.

    Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine
    messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that
    manipulate the details being tested.

    Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more
    problematic. Even 16 is a PITA.

    Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled
    by commands as well.


    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need
    the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

    Well, my use is rather light. Most is for debugging at say 9600 or
    115200. And when plugged in convertor mostly sits idle. I previously
    wrote that CH340 did not work at 921600. More testing showed that
    it actually worked, but speed was significantly different, I had to
    set my MCU to 847000 communicate. This could be bug in Linux driver
    (there is rather funky formula connecting speed to parameters
    and it looks easy to get it wrong). Similary, when CH340 was set to 576800
    I had to set MCU to 541300. Even after matching speed at nomial
    576800, 921600 and 1152000 test time was much (more than 10 times)
    higher than for other rates (I only tested 1 character messages at those
    rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages
    and catching up with longer messages). Still, ATM CH340 looks
    resonably good.

    Remark: I bought all my convertors from Chinese sellers. IIUC
    FTDI chip is faked a lot, but other too. Still, I think they
    show what is possible and illustrate some difficulties.

    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.

    It is a chip by Silicon Laboratories. Datasheet gives contact address
    in Austin, TX.

    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

    No. My slave simply returns back each received character. There is
    some software delay but it should be less than 2us. So even waiting
    test has some overlap at character level. To get more overlap above
    I cheated: my test program was sending 1 more character than it should.
    So sent message was 16 bytes, read was 15. After reading 15 another
    batch of 16 was sent and so on. In total there were 10000 more
    characters sent than received. My hope was that OS would read
    and buffer excess characters, but it seems that at least for
    CP2104 they cause trouble. My current guess is that OS is
    reading only when requested, but I did not investigate deeper...

    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

    OK, I tried modifeed version of my test program. It first sends
    k messages without reading anything, then goes to main loop where
    after sending each message it read one. At the end it tail loop
    which reads last k messages without sending anything. So, there
    is k + 1 messages in transit: after sending message k + i program
    waits for answer to message i. In total there is 10000 messages.
    Results are:

    CH340, 15 char message 20 char message
    k = 0 6.869s 7.163s
    k = 1 4.682s 1.320s
    k = 2 0.992s 1.320s
    k = 3 0.991s 1.319s
    k = 4 0.991s 1.320s
    k = 5 0.990s 1.319s
    k = 8 0.992s 1.320s
    k = 12 0.990s 1.320s
    k = 20 0.992s 1.319s
    k = 36 0.991s 1.321s
    k = 128 0.991s 1.319s

    CP2104, 15 char message 20 char message
    k = 0 2.508s 3.756s
    k = 1 1.897s 1.993s
    k = 2 1.668s 2.087s
    k = 3 1.486s 1.887s
    k = 4 1.457s 1.917s
    k = 5 1.559s 1.877s
    k = 8 1.455s 1.803s
    k = 12 1.337s 1.501s
    k = 20 1.123s 1.499s
    k = 36 1.125s 1.502s

    k = 128 reliably stalled, there were random stalls in other cases

    FTDI232R,
    2 Mbit/s 15 char message 20 char message
    k = 0 5.478s 3.755s
    k = 1 4.929s 3.030s
    k = 2 2.506s 3.339s
    k = 3 2.459s 2.020s
    k = 4 1.708s 1.061s
    k = 5 1.671s 1.032s
    k = 8 0.764s 1.021s
    k = 12 0.772s 1.014s
    k = 20 0.763s 1.009s
    k = 36 0.758s 1.007s
    k = 128 0.757s 1.008s

    FTDI232R,
    3 Mbit/s 15 char message 20 char message
    k = 0 8.216s 10.007s
    k = 1 5.006s 4.344s
    k = 2 3.338s 1.602s
    k = 3 2.406s 1.444s
    k = 4 1.766s 1.316s
    k = 5 1.599s 1.673s
    k = 8 1.040s 1.327s
    k = 12 1.071s 1.312s

    With k = 20, k = 36 and k = 128 communication stalled.

    With PL2303HX at 2 Mbit/s I had a lot of transmission errors,
    so did not test speed.

    The other end was STM32F030, which was simply replaying back
    received characters.

    Note: there results are not fully comparable. Apparently CH340
    will silently drop excess characters, so for overalapped operation
    I simply sent more charactes than I read. OTOH CP2104 seem to
    stall when its receive buffer overflows, so I limited overlap to
    avoid stalls. Of course real application would need some way
    to ensure that receive buffers do not overflow.

    Wait, what? How would overlapped operation operate if you have to worry about lost characters???

    I'm not sure what "stall" means. Did it send XOFF or something?

    My program uses blocking system calls, it did not finish in resonable
    time. I did not investigate deeper. ATM I assume that OS/driver
    is correct os that my program would get characters if convertor
    delivered them. I also assume that MCU is fast enough to avoid
    loss of any character (character processing should be less than
    2us, at 2 Mbit/s I have 5us per character). In inital test
    I have sent more characters then I wanted receive, so loss of
    some characters would not stop the program (OK, loss of more than
    10000 would be too much). I this batch of tests I sent exactly
    the number of characters that I wanted to receive, so loss of
    any would cause infinite wait.

    Any idea on what size of aggregated messages would prevent character loss? That's kind of important.

    Each convertor has finite transmission and receive buffers.
    Accordinng to datasheet CP2104 have 576 character receive buffer.
    For other I do now have numbers handy, but I would expect something
    between 200 characters and kilobyte. When characters arrive via
    serial port they fill receive buffers. Driver/OS/user program have
    to promptly read them. When doing first test my hope was that
    OS/driver will read characters from convertor and store them
    is system buffer. But then I saw stalls with CP2104. After I have
    seen this my guess was that in my test I overflowed CP2104 receive
    buffer (in my initial test I was sending 10000 characters more than
    I received, so much more than receive buffer size). However I have
    seen stalls with k = 18 and message size 15. And even with k = 0 and
    message size 20. In both cases new test program guaranteed that amount
    of data in transit was much smaller than stated buffer size.
    So, at least for CP2104 there must be some other reason.

    So, you should be easily able to handle 10000 round trips
    per second provided there is enough overlap. For this
    you need to ensure that only one device is transmitting to
    PC. If you have several FPGA-s on a single board, coordinating
    them should be easy. Of couse, you need some free pins and
    extra tracks. I would use single transceiver per board,
    depending on coordination to ensure that only one FPGA
    controls transceiver at given time. Anyway, this would
    allow overlapped transmisson to all devices on single
    board. With multiple boards you would need some hardware
    or software protocol decide which board can transmit.
    On hardware side a single pair of extra wires could
    carry needed signals (that is your "priority daisy chain").

    Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

    My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then
    passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.


    As other suggested you could use multiple convertors for
    better overlap. My convertors are "full speed" USB, that
    is they are half-duplex 12 Mb/s. USB has significant
    protocol overhead, so probably two 2 Mb/s duplex serial
    convertes would saturate single USB bus. In desktops
    it is normal to have several separate USB controllers
    (buses), but that depends on specific motherboard.
    Theoreticaly, when using "high speed" USB converters,
    several could easily work from single USB port (provided
    that you have enough places in hub(s)).

    I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,
    their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
    full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

    But the Ethernet implementations also have speed issues, likely because they are actually software based.

    The issues are more fundamental: both in USB and Ethernet there
    is per message/packet overhead. Low latency means sending data
    soon after it is available, which means small packets/messages.
    But due to overheads small packets are bad for throughput.
    So designers have to choose what they value more and in both
    cases the whole system is normally optimized for throughput.

    An extra thing: there are reasonably cheap PC compatible
    boards, supposedly they are cheaper and more easy to buy
    than Raspberry Pi (but I did not try buy them). If you
    need really large scale you could have a single such board
    per batch of devices and run copy of your program there. And
    a single laptop connecting to satelite board via ethernet
    and collecting results.

    Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

    The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the
    overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast
    the MSP430 UART will run?

    MSP430G2553 theoretically allows setting quite high rates like 4 Mbit/s,
    but it is not clear it it will run (if noise immunity is good enough).
    AFAICS 1 Mbit/s is supposed to work. Other thing is software speed,
    I think that software can handle 1 Mbit/s, but probably not more.

    I might have an ARM board that runs Mecrisp, I can't recall.


    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Sun Dec 4 14:57:18 2022
    On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10
    Mbps. It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable
    if there were no other delays.

    While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or
    maybe 2 pairs per second per channel. That's slow!

    They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

    I've also found another vendor with a similar product, and I've asked about that too.

    I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

    I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16
    responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

    Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The
    application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
    connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

    I guess I'm thinking out loud here.

    LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times
    per second and a 100 Mbps Ethernet port won't keep up.

    I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.
    I am not sure if you get that there are two issues: througput and latency.

    Of course I'm aware of it. That's the entirety of the problem.


    If you wait for answer before sending next request you will be bounded by latency.

    Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would
    not implement the data path in an FPGA.
    How do you know that data path is not in hardware?

    Not only did the vendor tell me it's through a CPU, he laughed at the idea of implementing Ethernet in an FPGA. That's when I sent him a link to the TBERD product line I had worked on around 2000.


    One question is
    if hardware is able to opperate with low latency. Another is if it
    should. And frequently answer to secend question is no, it should
    not try to minimize latency. Namely, Ethernet has minimal packet
    size which is about 60 characters. If you send each character in
    separate packet, then there would be very bad utilization of media.
    So, converter is expected to wait till there is enough characters
    to transmit.

    At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What you
    are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.


    Note that at 115200 bits/s delay of 1ms is roughly
    11 characters, so not so big.

    We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.


    At lower rates delay becomes less
    signifincant and at higher rates people usually care more about
    throughput than latency. And do not forget that Ethernet is
    shared medium, even if convertor could manage to transmit with
    lower latency withing available Ethernet bandwidth, it could
    do that only at cost of other users (possibly second convertor).

    Most Ethernet is not shared, rather point to point. In this case it definitely is not.


    And from a bit different point of view: normally there will be
    software in the path, giving you 0.1ms of latency on good modern
    unloaded hardware and much more in worse conditions.

    Ok, now it sounds like you are agreeing with me that the hardware is poor.


    Also,
    Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
    Ethernet this is about 1.4 ms for transmitssion of packet.

    No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a
    bad idea to saddle them with such throughput killers.


    If network in not dedicated to convertor such packets are likely
    to appear from time to time and convertor has to wait till
    such packet is fully transmitted and only then gets chance
    to transmit. So, you should regularly expect delays of order
    1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
    media delays are smaller, but serial convertors are frequently
    deployed in legacy contexts where 10 Mbit/s matter.

    Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?


    I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.
    Audio is quite different than serial. Audio can be pre-scheduled
    but in general you do not know when there will be traffic on
    serial port.
    OTOH if you fire several request without waiting, then
    you will be limited by througput.

    Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine
    messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that manipulate
    the details being tested.

    Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more
    problematic. Even 16 is a PITA.

    Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled
    by commands as well.


    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need
    the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.
    Well, my use is rather light. Most is for debugging at say 9600 or
    115200. And when plugged in convertor mostly sits idle. I previously
    wrote that CH340 did not work at 921600. More testing showed that
    it actually worked, but speed was significantly different, I had to
    set my MCU to 847000 communicate. This could be bug in Linux driver
    (there is rather funky formula connecting speed to parameters
    and it looks easy to get it wrong). Similary, when CH340 was set to 576800
    I had to set MCU to 541300. Even after matching speed at nomial
    576800, 921600 and 1152000 test time was much (more than 10 times)
    higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages
    and catching up with longer messages). Still, ATM CH340 looks
    resonably good.

    Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name
    brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.


    Remark: I bought all my convertors from Chinese sellers. IIUC
    FTDI chip is faked a lot, but other too. Still, I think they
    show what is possible and illustrate some difficulties.

    FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.


    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.
    It is a chip by Silicon Laboratories. Datasheet gives contact address
    in Austin, TX.
    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?
    No. My slave simply returns back each received character. There is
    some software delay but it should be less than 2us. So even waiting
    test has some overlap at character level. To get more overlap above
    I cheated: my test program was sending 1 more character than it should.
    So sent message was 16 bytes, read was 15. After reading 15 another
    batch of 16 was sent and so on. In total there were 10000 more
    characters sent than received. My hope was that OS would read
    and buffer excess characters, but it seems that at least for
    CP2104 they cause trouble. My current guess is that OS is
    reading only when requested, but I did not investigate deeper...
    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?
    OK, I tried modifeed version of my test program. It first sends
    k messages without reading anything, then goes to main loop where
    after sending each message it read one. At the end it tail loop
    which reads last k messages without sending anything. So, there
    is k + 1 messages in transit: after sending message k + i program
    waits for answer to message i. In total there is 10000 messages.
    Results are:

    CH340, 15 char message 20 char message
    k = 0 6.869s 7.163s
    k = 1 4.682s 1.320s
    k = 2 0.992s 1.320s
    k = 3 0.991s 1.319s
    k = 4 0.991s 1.320s
    k = 5 0.990s 1.319s
    k = 8 0.992s 1.320s
    k = 12 0.990s 1.320s
    k = 20 0.992s 1.319s
    k = 36 0.991s 1.321s
    k = 128 0.991s 1.319s

    CP2104, 15 char message 20 char message
    k = 0 2.508s 3.756s
    k = 1 1.897s 1.993s
    k = 2 1.668s 2.087s
    k = 3 1.486s 1.887s
    k = 4 1.457s 1.917s
    k = 5 1.559s 1.877s
    k = 8 1.455s 1.803s
    k = 12 1.337s 1.501s
    k = 20 1.123s 1.499s
    k = 36 1.125s 1.502s

    k = 128 reliably stalled, there were random stalls in other cases

    FTDI232R,
    2 Mbit/s 15 char message 20 char message
    k = 0 5.478s 3.755s
    k = 1 4.929s 3.030s
    k = 2 2.506s 3.339s
    k = 3 2.459s 2.020s
    k = 4 1.708s 1.061s
    k = 5 1.671s 1.032s
    k = 8 0.764s 1.021s
    k = 12 0.772s 1.014s
    k = 20 0.763s 1.009s
    k = 36 0.758s 1.007s
    k = 128 0.757s 1.008s

    FTDI232R,
    3 Mbit/s 15 char message 20 char message
    k = 0 8.216s 10.007s
    k = 1 5.006s 4.344s
    k = 2 3.338s 1.602s
    k = 3 2.406s 1.444s
    k = 4 1.766s 1.316s
    k = 5 1.599s 1.673s
    k = 8 1.040s 1.327s
    k = 12 1.071s 1.312s

    With k = 20, k = 36 and k = 128 communication stalled.

    Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the
    polling rate... if you adjust it. I think the default for FTDI was 8 ms.

    Thanks for doing this.


    With PL2303HX at 2 Mbit/s I had a lot of transmission errors,
    so did not test speed.
    The other end was STM32F030, which was simply replaying back
    received characters.

    Note: there results are not fully comparable. Apparently CH340
    will silently drop excess characters, so for overalapped operation
    I simply sent more charactes than I read. OTOH CP2104 seem to
    stall when its receive buffer overflows, so I limited overlap to
    avoid stalls. Of course real application would need some way
    to ensure that receive buffers do not overflow.

    Wait, what? How would overlapped operation operate if you have to worry about lost characters???

    I'm not sure what "stall" means. Did it send XOFF or something?
    My program uses blocking system calls, it did not finish in resonable
    time. I did not investigate deeper. ATM I assume that OS/driver
    is correct os that my program would get characters if convertor
    delivered them. I also assume that MCU is fast enough to avoid
    loss of any character (character processing should be less than
    2us, at 2 Mbit/s I have 5us per character). In inital test
    I have sent more characters then I wanted receive, so loss of
    some characters would not stop the program (OK, loss of more than
    10000 would be too much). I this batch of tests I sent exactly
    the number of characters that I wanted to receive, so loss of
    any would cause infinite wait.
    Any idea on what size of aggregated messages would prevent character loss? That's kind of important.
    Each convertor has finite transmission and receive buffers.
    Accordinng to datasheet CP2104 have 576 character receive buffer.
    For other I do now have numbers handy, but I would expect something
    between 200 characters and kilobyte. When characters arrive via
    serial port they fill receive buffers. Driver/OS/user program have
    to promptly read them. When doing first test my hope was that
    OS/driver will read characters from convertor and store them
    is system buffer. But then I saw stalls with CP2104. After I have
    seen this my guess was that in my test I overflowed CP2104 receive
    buffer (in my initial test I was sending 10000 characters more than
    I received, so much more than receive buffer size). However I have
    seen stalls with k = 18 and message size 15. And even with k = 0 and
    message size 20. In both cases new test program guaranteed that amount
    of data in transit was much smaller than stated buffer size.
    So, at least for CP2104 there must be some other reason.
    So, you should be easily able to handle 10000 round trips
    per second provided there is enough overlap. For this
    you need to ensure that only one device is transmitting to
    PC. If you have several FPGA-s on a single board, coordinating
    them should be easy. Of couse, you need some free pins and
    extra tracks. I would use single transceiver per board,
    depending on coordination to ensure that only one FPGA
    controls transceiver at given time. Anyway, this would
    allow overlapped transmisson to all devices on single
    board. With multiple boards you would need some hardware
    or software protocol decide which board can transmit.
    On hardware side a single pair of extra wires could
    carry needed signals (that is your "priority daisy chain").

    Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

    My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then
    passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.


    As other suggested you could use multiple convertors for
    better overlap. My convertors are "full speed" USB, that
    is they are half-duplex 12 Mb/s. USB has significant
    protocol overhead, so probably two 2 Mb/s duplex serial
    convertes would saturate single USB bus. In desktops
    it is normal to have several separate USB controllers
    (buses), but that depends on specific motherboard.
    Theoreticaly, when using "high speed" USB converters,
    several could easily work from single USB port (provided
    that you have enough places in hub(s)).

    I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,
    their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
    full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

    But the Ethernet implementations also have speed issues, likely because they are actually software based.
    The issues are more fundamental: both in USB and Ethernet there
    is per message/packet overhead. Low latency means sending data
    soon after it is available, which means small packets/messages.
    But due to overheads small packets are bad for throughput.
    So designers have to choose what they value more and in both
    cases the whole system is normally optimized for throughput.

    With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.


    An extra thing: there are reasonably cheap PC compatible
    boards, supposedly they are cheaper and more easy to buy
    than Raspberry Pi (but I did not try buy them). If you
    need really large scale you could have a single such board
    per batch of devices and run copy of your program there. And
    a single laptop connecting to satelite board via ethernet
    and collecting results.

    Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

    The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the
    overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast the
    MSP430 UART will run?
    MSP430G2553 theoretically allows setting quite high rates like 4 Mbit/s,
    but it is not clear it it will run (if noise immunity is good enough). AFAICS 1 Mbit/s is supposed to work. Other thing is software speed,
    I think that software can handle 1 Mbit/s, but probably not more.

    I have an FTDI adapter which I will try running my own tests with. To be realistic, they should be with a target, but that might be a problem just now. We'll see what I can cobble up. Most of my stuff is not convenient at the moment.

    I was playing with it using Putty, but that's not the best terminal emulator in the world. I can't get it to show control characters or use different colors for transmit and receive. Heck, maybe I'm just being stupid, but I can't find how to send a
    file through the port. I'm pretty sure I've done that using Putty before, because that's how you compile programs on an embedded Forth. You simply send the file through the serial port like you were typing it.

    --

    Rick C.

    -+- Get 1,000 miles of free Supercharging
    -+- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Mon Dec 5 03:33:22 2022
    Rick C <gnuarm.deletethisbit@gmail.com> wrote:
    On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    One question is
    if hardware is able to opperate with low latency. Another is if it
    should. And frequently answer to secend question is no, it should
    not try to minimize latency. Namely, Ethernet has minimal packet
    size which is about 60 characters. If you send each character in
    separate packet, then there would be very bad utilization of media.
    So, converter is expected to wait till there is enough characters
    to transmit.

    At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What
    you are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.

    Well, delay from Ethernet to serial port clearly means that implementer
    did not spent enough effort to make it fast.

    Note that at 115200 bits/s delay of 1ms is roughly
    11 characters, so not so big.

    We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.


    At lower rates delay becomes less
    signifincant and at higher rates people usually care more about
    throughput than latency. And do not forget that Ethernet is
    shared medium, even if convertor could manage to transmit with
    lower latency withing available Ethernet bandwidth, it could
    do that only at cost of other users (possibly second convertor).

    Most Ethernet is not shared, rather point to point. In this case it definitely is not.

    You were talking about connecting more convertors. Normally laptops
    have only single Ethernet port, so all convertors that you connect
    will share single Ethernet. If you use 100 convertors, 3 Mbits/s each
    + switches it should be possible to get 30 Mbytes/s of aggregate bandwidth (assuming gigabyte port in laptop and gigabyte switch at top of tree).
    But if each converter would waste a lot of bandwidth due to small
    payload per packet, then such rate would be impossible.

    And from a bit different point of view: normally there will be
    software in the path, giving you 0.1ms of latency on good modern
    unloaded hardware and much more in worse conditions.

    Ok, now it sounds like you are agreeing with me that the hardware is poor.


    Also,
    Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
    Ethernet this is about 1.4 ms for transmitssion of packet.

    No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a
    bad idea to saddle them with such throughput killers.

    It is your planned use that would kill throughput. I would expect
    that when product is used as intended you would get resonable fraction
    (say 70%) of nominal throughput (that is 2*16*3Mbits/s). If not,
    then I will join you in calling it bad product.

    If network in not dedicated to convertor such packets are likely
    to appear from time to time and convertor has to wait till
    such packet is fully transmitted and only then gets chance
    to transmit. So, you should regularly expect delays of order
    1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
    media delays are smaller, but serial convertors are frequently
    deployed in legacy contexts where 10 Mbit/s matter.

    Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?

    Sometimes you get product designed for 10 Mbit/s which just got faster
    Ethernet part to be good citizen on fast network. Above you wrote
    about 16 port thing. That should be designed for faster network, but
    on common 100 Mbit/s Ethernet running ports in parallel it would be
    limited by Ethernet troughput. And even on 1 Gbit/s Ethernet it
    needs enough bandwidth that you can not waste it even if it is
    the only thing on the network. And just a litte thing: you
    wrote Ethernet, but raw Ethernet is problematic on PC OSes.
    So I would guess that you really mean TCP/IP over Ethernet.
    TCP requires every packet to be acknowleged, which may add more
    small-packet trafic.

    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need
    the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.
    Well, my use is rather light. Most is for debugging at say 9600 or
    115200. And when plugged in convertor mostly sits idle. I previously
    wrote that CH340 did not work at 921600. More testing showed that
    it actually worked, but speed was significantly different, I had to
    set my MCU to 847000 communicate. This could be bug in Linux driver
    (there is rather funky formula connecting speed to parameters
    and it looks easy to get it wrong). Similary, when CH340 was set to 576800 I had to set MCU to 541300. Even after matching speed at nomial
    576800, 921600 and 1152000 test time was much (more than 10 times)
    higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages and catching up with longer messages). Still, ATM CH340 looks
    resonably good.

    Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name
    brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

    Well, I say you what I observed. People say various thing on the
    net. I was interested if net know something about my trouble with
    CP2104 so I googled for "CP2104 lockup". And I got a bunch of
    complaints about FTDI devices, solved by using CP2104. So, there
    is a lot of noise and ATM I prefer to stay with what I see.

    Remark: I bought all my convertors from Chinese sellers. IIUC
    FTDI chip is faked a lot, but other too. Still, I think they
    show what is possible and illustrate some difficulties.

    FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

    AFAIK Linux driver does not discriminate againt non-FTDI devices.
    So fact that convertors works with Linux driver tells you nothing
    about its origin. And for the record, I bought mine several years
    ago.

    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.
    It is a chip by Silicon Laboratories. Datasheet gives contact address
    in Austin, TX.
    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?
    No. My slave simply returns back each received character. There is
    some software delay but it should be less than 2us. So even waiting
    test has some overlap at character level. To get more overlap above
    I cheated: my test program was sending 1 more character than it should.
    So sent message was 16 bytes, read was 15. After reading 15 another
    batch of 16 was sent and so on. In total there were 10000 more
    characters sent than received. My hope was that OS would read
    and buffer excess characters, but it seems that at least for
    CP2104 they cause trouble. My current guess is that OS is
    reading only when requested, but I did not investigate deeper...
    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?
    OK, I tried modifeed version of my test program. It first sends
    k messages without reading anything, then goes to main loop where
    after sending each message it read one. At the end it tail loop
    which reads last k messages without sending anything. So, there
    is k + 1 messages in transit: after sending message k + i program
    waits for answer to message i. In total there is 10000 messages.
    Results are:

    CH340, 15 char message 20 char message
    k = 0 6.869s 7.163s
    k = 1 4.682s 1.320s
    k = 2 0.992s 1.320s
    k = 3 0.991s 1.319s
    k = 4 0.991s 1.320s
    k = 5 0.990s 1.319s
    k = 8 0.992s 1.320s
    k = 12 0.990s 1.320s
    k = 20 0.992s 1.319s
    k = 36 0.991s 1.321s
    k = 128 0.991s 1.319s

    CP2104, 15 char message 20 char message
    k = 0 2.508s 3.756s
    k = 1 1.897s 1.993s
    k = 2 1.668s 2.087s
    k = 3 1.486s 1.887s
    k = 4 1.457s 1.917s
    k = 5 1.559s 1.877s
    k = 8 1.455s 1.803s
    k = 12 1.337s 1.501s
    k = 20 1.123s 1.499s
    k = 36 1.125s 1.502s

    k = 128 reliably stalled, there were random stalls in other cases

    FTDI232R,
    2 Mbit/s 15 char message 20 char message
    k = 0 5.478s 3.755s
    k = 1 4.929s 3.030s
    k = 2 2.506s 3.339s
    k = 3 2.459s 2.020s
    k = 4 1.708s 1.061s
    k = 5 1.671s 1.032s
    k = 8 0.764s 1.021s
    k = 12 0.772s 1.014s
    k = 20 0.763s 1.009s
    k = 36 0.758s 1.007s
    k = 128 0.757s 1.008s

    FTDI232R,
    3 Mbit/s 15 char message 20 char message
    k = 0 8.216s 10.007s
    k = 1 5.006s 4.344s
    k = 2 3.338s 1.602s
    k = 3 2.406s 1.444s
    k = 4 1.766s 1.316s
    k = 5 1.599s 1.673s
    k = 8 1.040s 1.327s
    k = 12 1.071s 1.312s

    With k = 20, k = 36 and k = 128 communication stalled.

    Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the
    polling rate... if you adjust it. I think the default for FTDI was 8 ms.

    Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte
    blocks (they say that actual payload is 62-bytes and there are 2-bytes
    of protocol info). With 15 characters messages 0.764s really means
    98% of use of serial bandwidth, so essentiall as good as possible. Corresponding k = 8 means really 9 messages in transit, so 135
    characters which is slightly more than 2 buffers. More data in
    transit does not help, but also does not make things worse.
    With 20 charaster messages main improvement is at k = 4 which
    means 100 characters, which is smaller than 2 buffers, with extra
    improvements for more data in transit. With CH340 and 15 char
    messages we see main improvement for k = 2, which corresponds
    to 45 characters in transit. With 20 char messages we get
    impovement for k = 1 which is 40 charactes in transit.
    CH340 uses 32 character transfer buffers, so improvemnet corresponds
    to somwhat more than 1 buffer in transit. Now, if transfers
    between converter and PC were at optimal times, then one buffer
    + one character would be enough to get full serial speed. But
    USB tranfers can not be started at arbitrary times, IIUC there
    are discrete time slots when transfer can occur. When tranfer
    can not be done in given slot it must wait for next slot.
    So, depending on locations of possible slots more buffering
    and more data in transit may be needed for optimal performance.
    OTOH 2-3 buffers should be enough to allow PC to get full
    bandwidth and this is in good agreement with FTDI results.
    In case of CH340 there is extra factor: CH340 also uses 8 byte
    transfers. I do not know what function they have, but
    resonably likely guess is that those 8 byte pack tranfer control
    info that FTDI bundles with normal data. Anyway, those
    are "interrupt" tranfers in USB sense, so have higher priority
    than data transfer. Resonable guess it that they steal some
    USB bandwith from data tranfers. Also, smaller than maximal
    data block size limits efficiency, so it is possible that
    CH340 is limited by USB bandwith (lack of enough slots).

    Now, concerning 3 Mbits/s, due to different serial speed
    optimal times for transfers are different than in 2 Mbits/s
    case. It is possible that there is worse fit of desired
    and possible transfer times. Buffering allows to at least
    partially cure this, so initial improvement. But clearly,
    there is some extra bottleneck. Now some speculation:
    with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
    cycle. I would have to look at spec to be sure, but this
    is close to 150 byte worst case FS transfer. Beside data
    there is some USB protocol overhead and (speculatively) it
    is possible that low level USB diver may refuse to schedule
    two 64-byte transfers in single cycle. In such case effective
    bandwith for serial data would be 4096000 bits, which
    correspond to 5120000 serial bits (serial sends start and stop
    bits which are not needed for USB). This is less than
    full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
    must go trouh the same USB). With larger amount of data in
    transit this could give wild oscilations in amount of
    buffered data, leading to slowdown when buffers get empty
    and giving stall when receive buffer overflows.

    Of course there is another speculation: convertor may be fake.
    Supposedly fakes use MCU-s with special program. Software
    could crate delays which limit transfer rate at 3 Mbits/s
    and lead to data loss/stall with more data in transit.

    As other suggested you could use multiple convertors for
    better overlap. My convertors are "full speed" USB, that
    is they are half-duplex 12 Mb/s. USB has significant
    protocol overhead, so probably two 2 Mb/s duplex serial
    convertes would saturate single USB bus. In desktops
    it is normal to have several separate USB controllers
    (buses), but that depends on specific motherboard.
    Theoreticaly, when using "high speed" USB converters,
    several could easily work from single USB port (provided
    that you have enough places in hub(s)).

    I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,
    their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
    full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

    But the Ethernet implementations also have speed issues, likely because they are actually software based.
    The issues are more fundamental: both in USB and Ethernet there
    is per message/packet overhead. Low latency means sending data
    soon after it is available, which means small packets/messages.
    But due to overheads small packets are bad for throughput.
    So designers have to choose what they value more and in both
    cases the whole system is normally optimized for throughput.

    With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.

    Well, you wrote that you have needed experience, so do low-latency Ethernet-serial convertor based on FPGA. Give your numbers and look
    how many customers come in.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Sun Dec 4 22:39:24 2022
    On Sunday, December 4, 2022 at 10:33:28 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    One question is
    if hardware is able to opperate with low latency. Another is if it should. And frequently answer to secend question is no, it should
    not try to minimize latency. Namely, Ethernet has minimal packet
    size which is about 60 characters. If you send each character in separate packet, then there would be very bad utilization of media.
    So, converter is expected to wait till there is enough characters
    to transmit.

    At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What you
    are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.
    Well, delay from Ethernet to serial port clearly means that implementer
    did not spent enough effort to make it fast.

    I won't argue with that!


    Note that at 115200 bits/s delay of 1ms is roughly
    11 characters, so not so big.

    We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.


    At lower rates delay becomes less
    signifincant and at higher rates people usually care more about throughput than latency. And do not forget that Ethernet is
    shared medium, even if convertor could manage to transmit with
    lower latency withing available Ethernet bandwidth, it could
    do that only at cost of other users (possibly second convertor).

    Most Ethernet is not shared, rather point to point. In this case it definitely is not.
    You were talking about connecting more convertors. Normally laptops
    have only single Ethernet port, so all convertors that you connect
    will share single Ethernet. If you use 100 convertors, 3 Mbits/s each
    + switches it should be possible to get 30 Mbytes/s of aggregate bandwidth (assuming gigabyte port in laptop and gigabyte switch at top of tree).
    But if each converter would waste a lot of bandwidth due to small
    payload per packet, then such rate would be impossible.

    Good thing we aren't trying to use 100 converters. The vendors who produce 4, 8 and 16 port versions don't do much to make them fast actually. I think the matter of small messages, just don't come up often enough to be on their radar.


    And from a bit different point of view: normally there will be
    software in the path, giving you 0.1ms of latency on good modern unloaded hardware and much more in worse conditions.

    Ok, now it sounds like you are agreeing with me that the hardware is poor.


    Also,
    Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
    Ethernet this is about 1.4 ms for transmitssion of packet.

    No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a
    bad idea to saddle them with such throughput killers.
    It is your planned use that would kill throughput. I would expect
    that when product is used as intended you would get resonable fraction
    (say 70%) of nominal throughput (that is 2*16*3Mbits/s). If not,
    then I will join you in calling it bad product.

    "Intended"!? I saw nothing in any document that said serial port traffic had to meet any particular specifications. They didn't set this sort of spec when they designed the product. It happened that it had this limitation and someone said, "Good
    enough, ship it"!


    If network in not dedicated to convertor such packets are likely
    to appear from time to time and convertor has to wait till
    such packet is fully transmitted and only then gets chance
    to transmit. So, you should regularly expect delays of order
    1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
    media delays are smaller, but serial convertors are frequently
    deployed in legacy contexts where 10 Mbit/s matter.

    Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?
    Sometimes you get product designed for 10 Mbit/s which just got faster Ethernet part to be good citizen on fast network. Above you wrote
    about 16 port thing. That should be designed for faster network, but
    on common 100 Mbit/s Ethernet running ports in parallel it would be
    limited by Ethernet troughput. And even on 1 Gbit/s Ethernet it
    needs enough bandwidth that you can not waste it even if it is
    the only thing on the network. And just a litte thing: you
    wrote Ethernet, but raw Ethernet is problematic on PC OSes.
    So I would guess that you really mean TCP/IP over Ethernet.
    TCP requires every packet to be acknowleged, which may add more
    small-packet trafic.

    I'm not sure what your point is. But it is not important. We are discussing nits at this point.


    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need
    the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.
    Well, my use is rather light. Most is for debugging at say 9600 or 115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that
    it actually worked, but speed was significantly different, I had to
    set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters
    and it looks easy to get it wrong). Similary, when CH340 was set to 576800
    I had to set MCU to 541300. Even after matching speed at nomial
    576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly
    slower than 460800 (but "merely" 2 times slower for 1 character messages and catching up with longer messages). Still, ATM CH340 looks
    resonably good.

    Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name
    brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.
    Well, I say you what I observed. People say various thing on the
    net. I was interested if net know something about my trouble with
    CP2104 so I googled for "CP2104 lockup". And I got a bunch of
    complaints about FTDI devices, solved by using CP2104. So, there
    is a lot of noise and ATM I prefer to stay with what I see.

    What sort of complaints about FTDI? Did you contact them about it?


    Remark: I bought all my convertors from Chinese sellers. IIUC
    FTDI chip is faked a lot, but other too. Still, I think they
    show what is possible and illustrate some difficulties.

    FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.
    AFAIK Linux driver does not discriminate againt non-FTDI devices.
    So fact that convertors works with Linux driver tells you nothing
    about its origin. And for the record, I bought mine several years
    ago.

    I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the
    bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.


    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.
    It is a chip by Silicon Laboratories. Datasheet gives contact address
    in Austin, TX.
    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?
    No. My slave simply returns back each received character. There is
    some software delay but it should be less than 2us. So even waiting
    test has some overlap at character level. To get more overlap above
    I cheated: my test program was sending 1 more character than it should. So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read
    and buffer excess characters, but it seems that at least for
    CP2104 they cause trouble. My current guess is that OS is
    reading only when requested, but I did not investigate deeper...
    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?
    OK, I tried modifeed version of my test program. It first sends
    k messages without reading anything, then goes to main loop where
    after sending each message it read one. At the end it tail loop
    which reads last k messages without sending anything. So, there
    is k + 1 messages in transit: after sending message k + i program
    waits for answer to message i. In total there is 10000 messages.
    Results are:

    CH340, 15 char message 20 char message
    k = 0 6.869s 7.163s
    k = 1 4.682s 1.320s
    k = 2 0.992s 1.320s
    k = 3 0.991s 1.319s
    k = 4 0.991s 1.320s
    k = 5 0.990s 1.319s
    k = 8 0.992s 1.320s
    k = 12 0.990s 1.320s
    k = 20 0.992s 1.319s
    k = 36 0.991s 1.321s
    k = 128 0.991s 1.319s

    CP2104, 15 char message 20 char message
    k = 0 2.508s 3.756s
    k = 1 1.897s 1.993s
    k = 2 1.668s 2.087s
    k = 3 1.486s 1.887s
    k = 4 1.457s 1.917s
    k = 5 1.559s 1.877s
    k = 8 1.455s 1.803s
    k = 12 1.337s 1.501s
    k = 20 1.123s 1.499s
    k = 36 1.125s 1.502s

    k = 128 reliably stalled, there were random stalls in other cases

    FTDI232R,
    2 Mbit/s 15 char message 20 char message
    k = 0 5.478s 3.755s
    k = 1 4.929s 3.030s
    k = 2 2.506s 3.339s
    k = 3 2.459s 2.020s
    k = 4 1.708s 1.061s
    k = 5 1.671s 1.032s
    k = 8 0.764s 1.021s
    k = 12 0.772s 1.014s
    k = 20 0.763s 1.009s
    k = 36 0.758s 1.007s
    k = 128 0.757s 1.008s

    FTDI232R,
    3 Mbit/s 15 char message 20 char message
    k = 0 8.216s 10.007s
    k = 1 5.006s 4.344s
    k = 2 3.338s 1.602s
    k = 3 2.406s 1.444s
    k = 4 1.766s 1.316s
    k = 5 1.599s 1.673s
    k = 8 1.040s 1.327s
    k = 12 1.071s 1.312s

    With k = 20, k = 36 and k = 128 communication stalled.

    Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the
    polling rate... if you adjust it. I think the default for FTDI was 8 ms.
    Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte
    blocks (they say that actual payload is 62-bytes and there are 2-bytes
    of protocol info). With 15 characters messages 0.764s really means
    98% of use of serial bandwidth, so essentiall as good as possible.

    Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???


    Corresponding k = 8 means really 9 messages in transit, so 135
    characters which is slightly more than 2 buffers. More data in
    transit does not help, but also does not make things worse.
    With 20 charaster messages main improvement is at k = 4 which
    means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char
    messages we see main improvement for k = 2, which corresponds
    to 45 characters in transit. With 20 char messages we get
    impovement for k = 1 which is 40 charactes in transit.
    CH340 uses 32 character transfer buffers, so improvemnet corresponds
    to somwhat more than 1 buffer in transit. Now, if transfers
    between converter and PC were at optimal times, then one buffer
    + one character would be enough to get full serial speed. But
    USB tranfers can not be started at arbitrary times, IIUC there
    are discrete time slots when transfer can occur. When tranfer
    can not be done in given slot it must wait for next slot.
    So, depending on locations of possible slots more buffering
    and more data in transit may be needed for optimal performance.
    OTOH 2-3 buffers should be enough to allow PC to get full
    bandwidth and this is in good agreement with FTDI results.
    In case of CH340 there is extra factor: CH340 also uses 8 byte
    transfers. I do not know what function they have, but
    resonably likely guess is that those 8 byte pack tranfer control
    info that FTDI bundles with normal data. Anyway, those
    are "interrupt" tranfers in USB sense, so have higher priority
    than data transfer. Resonable guess it that they steal some
    USB bandwith from data tranfers. Also, smaller than maximal
    data block size limits efficiency, so it is possible that
    CH340 is limited by USB bandwith (lack of enough slots).

    Now, concerning 3 Mbits/s, due to different serial speed
    optimal times for transfers are different than in 2 Mbits/s
    case. It is possible that there is worse fit of desired
    and possible transfer times. Buffering allows to at least
    partially cure this, so initial improvement. But clearly,
    there is some extra bottleneck. Now some speculation:
    with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
    cycle. I would have to look at spec to be sure, but this
    is close to 150 byte worst case FS transfer. Beside data
    there is some USB protocol overhead and (speculatively) it
    is possible that low level USB diver may refuse to schedule
    two 64-byte transfers in single cycle. In such case effective
    bandwith for serial data would be 4096000 bits, which
    correspond to 5120000 serial bits (serial sends start and stop
    bits which are not needed for USB). This is less than
    full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
    must go trouh the same USB). With larger amount of data in
    transit this could give wild oscilations in amount of
    buffered data, leading to slowdown when buffers get empty
    and giving stall when receive buffer overflows.

    Of course there is another speculation: convertor may be fake.
    Supposedly fakes use MCU-s with special program. Software
    could crate delays which limit transfer rate at 3 Mbits/s
    and lead to data loss/stall with more data in transit.

    It's too late for me to try to read all this.


    As other suggested you could use multiple convertors for
    better overlap. My convertors are "full speed" USB, that
    is they are half-duplex 12 Mb/s. USB has significant
    protocol overhead, so probably two 2 Mb/s duplex serial
    convertes would saturate single USB bus. In desktops
    it is normal to have several separate USB controllers
    (buses), but that depends on specific motherboard.
    Theoreticaly, when using "high speed" USB converters,
    several could easily work from single USB port (provided
    that you have enough places in hub(s)).

    I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,
    their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
    full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

    But the Ethernet implementations also have speed issues, likely because they are actually software based.
    The issues are more fundamental: both in USB and Ethernet there
    is per message/packet overhead. Low latency means sending data
    soon after it is available, which means small packets/messages.
    But due to overheads small packets are bad for throughput.
    So designers have to choose what they value more and in both
    cases the whole system is normally optimized for throughput.

    With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.
    Well, you wrote that you have needed experience, so do low-latency Ethernet-serial convertor based on FPGA. Give your numbers and look
    how many customers come in.

    No, I never said I've designed Ethernet interfaces. I said, I worked on FPGA code in the a comms tester, which also tested Ethernet. I worked on one of the telecom formats, OC-12 rings a bell. Besides that would be a major project. I have two other
    major projects to work on. This should be something I can buy.

    Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver
    anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

    Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second.
    That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command.
    The beauty of full-duplex!

    Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.

    --

    Rick C.

    -++ Get 1,000 miles of free Supercharging
    -++ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Rick C on Mon Dec 5 08:57:07 2022
    On 04/12/2022 17:54, Rick C wrote:
    On Sunday, December 4, 2022 at 7:21:56 AM UTC-5, David Brown wrote:
    On 03/12/2022 21:42, Rick C wrote:
    On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David
    Brown wrote:

    A communication hierarchy is likely the best way to handle
    this.

    Alternatively, at the messages from the PC can be large and
    broadcast, rather than divided up. You could even make an
    EtherCAT-style serial protocol (using the hybrid RS-422 bus
    you suggested earlier). The PC could send a single massive
    serial telegram consisting of multiple small ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast
    enough to be complete in good time before the next slave
    starts. (Adjust padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one
    big telegram in - using 3 MBaud if you like.

    I've been giving this some thought and it might work, but it's
    not guaranteed. This will prevent the slaves from talking over
    one another. But I don't know if the replies will be seen as a
    unit for shipping over Ethernet or USB by the adapter. I've been
    told that the messages will see delays in the adapters, but no
    one has indicated how they block the data. In the case of the
    FTDI adapter, the issue is the polling rate.

    This is the format I'm currently thinking of 01 23 45 C\r\n - 11
    chars 01 23 45 C 67\r\n - 14 chars

    The transmitted message would add 15 char of padding for a total
    of 26 chars per end point. At 3 Mbps a message takes 87 us to
    transmit on the serial bus for 11,500 messages a second, or 90
    messages per second per end point. That certainly would do the
    job, if I've done the math right. Even assuming other factors cut
    this rate in half, and it's still around 45 messages per end
    point each second.

    Just to be clear - the slaves should not send any kind of dummy
    characters. When they have read their part of the incoming stream,
    they turn on their driver, send their reply, then turn off the
    driver.

    The master side might need dummy characters for padding if the
    slave replies (including any handling delay - the slaves might be
    fast, but they still take some time) can be longer than the master
    side telegrams.

    Each subtelegram in the master's telegram chain must be
    self-contained - a start character, an ending CRC or simple
    checksum, and so on. Replies from slaves must also be
    self-contained.

    It doesn't matter how the USB-to-serial or Ethernet-to-serial
    adaptors break up the messages - applications read the data as
    serial streams, not synchronous timed data. The only timing you
    have is a pause between master telegrams, which can be many
    milliseconds long, used to ensure that if something has gone wrong
    or lost synchronisation, their receiving state machine is reset and
    ready for the next round.

    It absolutely does matter how the messages get broken up. That's
    where the delays come in. If the slave replies are sent over the
    network/USB bus one at a time, it's not significantly better than the original approach.


    I mean it doesn't matter how the messages are broken up from the
    application code's viewpoint, as long as you handle it correctly as a
    stream and don't incorrectly assume you always read whole telegrams at a
    time.

    You can expect the converter to buffer up the incoming data and send it
    in large lumps up the USB or Ethernet bus. That's how it can work at
    high baud rates and throughputs. You lose the precise timing
    information, however, and have extra latency and jitter - so you be sure
    to treat the incoming data as a stream and then that does not matter.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrew Smallshaw@21:1/5 to Rick C on Mon Dec 5 09:58:32 2022
    On 2022-11-30, Rick C <gnuarm.deletethisbit@gmail.com> wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.

    Use some multidrop standard at the physical layer such as RS485.
    At the DLL adopt a token ring style arbitration system. The first
    device interprets the request from the host as both receiving the
    token and a request for data - for consistency with the other units
    you'd probably want to format that initial request as a dumy "Device
    0" response. Device N interprets the reply from N-1 as sending it
    the token and its request to transmit. From the host perspective
    you send a single request and get back a byte stream with the
    results from all devices.

    --
    Andrew Smallshaw
    andrews@sdf.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Andrew Smallshaw on Mon Dec 5 14:03:11 2022
    On Monday, December 5, 2022 at 4:58:39 AM UTC-5, Andrew Smallshaw wrote:
    On 2022-11-30, Rick C <gnuarm.del...@gmail.com> wrote:
    I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
    It's the interface to the serial port.

    The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

    FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
    there were no other delays.
    Use some multidrop standard at the physical layer such as RS485.
    At the DLL adopt a token ring style arbitration system. The first
    device interprets the request from the host as both receiving the
    token and a request for data - for consistency with the other units
    you'd probably want to format that initial request as a dumy "Device
    0" response. Device N interprets the reply from N-1 as sending it
    the token and its request to transmit. From the host perspective
    you send a single request and get back a byte stream with the
    results from all devices.

    That scheme requires every end point to know where it is in the grand scheme, but more importantly, to know what other end points are in the system. It also requires the master to address every end point in sequence. How would you address one end point
    only, or some number of missing slots? This would require the end point keep track of what commands have been sent, as well as who has replied.

    I've mulled this about for the last few days, including a priority scheme where handshake lines would be used to pass the priority more mechanically. This priority "token" could be passed through the entire chain of 16 boards and 8 endpoints on each
    board, but it can also be done by using a priority chain only within the 8 slaves on each test fixture boards. This will provide a burst of serial port operation for about 500 us at a 3 Mbps rate. So if USB has a polling rate of 1 ms, we would get half
    bandwidth, which would be pretty good. I feel better about blocking 8 commands for a given test fixture than blocking all 128 commands.

    Someone had suggested padding the transmitted data to set the timing of the replies. That would work as well, and order would no longer be significant at all. But I'm not comfortable with sending garbage data too control timing. It can make debug
    more difficult. Too bad there's no way to send a data byte without a start bit! lol

    --

    Rick C.

    +-- Get 1,000 miles of free Supercharging
    +-- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to David Brown on Mon Dec 5 14:12:37 2022
    On Monday, December 5, 2022 at 2:57:46 AM UTC-5, David Brown wrote:
    On 04/12/2022 17:54, Rick C wrote:
    On Sunday, December 4, 2022 at 7:21:56 AM UTC-5, David Brown wrote:
    On 03/12/2022 21:42, Rick C wrote:
    On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David
    Brown wrote:

    A communication hierarchy is likely the best way to handle
    this.

    Alternatively, at the messages from the PC can be large and
    broadcast, rather than divided up. You could even make an
    EtherCAT-style serial protocol (using the hybrid RS-422 bus
    you suggested earlier). The PC could send a single massive
    serial telegram consisting of multiple small ones:

    <header><padding><tele1><padding><tele2><padding>...<pause>

    Each slave would reply after hearing its own telegram, fast
    enough to be complete in good time before the next slave
    starts. (Adjust padding as necessary to give this timing.)

    Then from the PC side, you have one big telegram out, and one
    big telegram in - using 3 MBaud if you like.

    I've been giving this some thought and it might work, but it's
    not guaranteed. This will prevent the slaves from talking over
    one another. But I don't know if the replies will be seen as a
    unit for shipping over Ethernet or USB by the adapter. I've been
    told that the messages will see delays in the adapters, but no
    one has indicated how they block the data. In the case of the
    FTDI adapter, the issue is the polling rate.

    This is the format I'm currently thinking of 01 23 45 C\r\n - 11
    chars 01 23 45 C 67\r\n - 14 chars

    The transmitted message would add 15 char of padding for a total
    of 26 chars per end point. At 3 Mbps a message takes 87 us to
    transmit on the serial bus for 11,500 messages a second, or 90
    messages per second per end point. That certainly would do the
    job, if I've done the math right. Even assuming other factors cut
    this rate in half, and it's still around 45 messages per end
    point each second.

    Just to be clear - the slaves should not send any kind of dummy
    characters. When they have read their part of the incoming stream,
    they turn on their driver, send their reply, then turn off the
    driver.

    The master side might need dummy characters for padding if the
    slave replies (including any handling delay - the slaves might be
    fast, but they still take some time) can be longer than the master
    side telegrams.

    Each subtelegram in the master's telegram chain must be
    self-contained - a start character, an ending CRC or simple
    checksum, and so on. Replies from slaves must also be
    self-contained.

    It doesn't matter how the USB-to-serial or Ethernet-to-serial
    adaptors break up the messages - applications read the data as
    serial streams, not synchronous timed data. The only timing you
    have is a pause between master telegrams, which can be many
    milliseconds long, used to ensure that if something has gone wrong
    or lost synchronisation, their receiving state machine is reset and
    ready for the next round.

    It absolutely does matter how the messages get broken up. That's
    where the delays come in. If the slave replies are sent over the network/USB bus one at a time, it's not significantly better than the original approach.

    I mean it doesn't matter how the messages are broken up from the
    application code's viewpoint, as long as you handle it correctly as a
    stream and don't incorrectly assume you always read whole telegrams at a time.

    Of course the application doesn't care. No one is worried about the application. The concern is the timing of the messages on the various buses. A message broken up too much may be sent in multiple small pieces resulting in more delays.


    You can expect the converter to buffer up the incoming data and send it
    in large lumps up the USB or Ethernet bus. That's how it can work at
    high baud rates and throughputs. You lose the precise timing
    information, however, and have extra latency and jitter - so you be sure
    to treat the incoming data as a stream and then that does not matter.

    I don't "expect" anything of the adapter. They have delays that are largely unexplained, at least in any detail. That's why this is hard to deal with.

    Right now I'm looking at using a priority enable across the 8 end points within a test fixture board. That will allow a 400 us message block at 3 Mbps, with 350 us of overlap between the commands and the replies, so 450 us total. That would work well
    with either a 1 ms polling rate or a 0.5 ms polling rate, if available, and provide 50 us of breathing room for the adapter.

    This is a lot like making gears for a mechanical clock, with a calendar and an appointment reminder. LOL

    --

    Rick C.

    +-+ Get 1,000 miles of free Supercharging
    +-+ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Tue Dec 6 02:30:19 2022
    Rick C <gnuarm.deletethisbit@gmail.com> wrote:
    On Sunday, December 4, 2022 at 10:33:28 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:

    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.
    Well, my use is rather light. Most is for debugging at say 9600 or 115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that
    it actually worked, but speed was significantly different, I had to
    set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters
    and it looks easy to get it wrong). Similary, when CH340 was set to 576800
    I had to set MCU to 541300. Even after matching speed at nomial
    576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those
    rates, did not want to wait for full test). Also, 500000 was significantly
    slower than 460800 (but "merely" 2 times slower for 1 character messages
    and catching up with longer messages). Still, ATM CH340 looks
    resonably good.

    Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a
    name brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.
    Well, I say you what I observed. People say various thing on the
    net. I was interested if net know something about my trouble with
    CP2104 so I googled for "CP2104 lockup". And I got a bunch of
    complaints about FTDI devices, solved by using CP2104. So, there
    is a lot of noise and ATM I prefer to stay with what I see.

    What sort of complaints about FTDI? Did you contact them about it?

    Things like computer locking up (IIUC fixed by newer driver). Or "communication did not work" (no real info). ATM I have enough
    converters. If I need more/better I will look at FTDI products
    and possible ask them questions.

    Remark: I bought all my convertors from Chinese sellers. IIUC
    FTDI chip is faked a lot, but other too. Still, I think they
    show what is possible and illustrate some difficulties.

    FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.
    AFAIK Linux driver does not discriminate againt non-FTDI devices.
    So fact that convertors works with Linux driver tells you nothing
    about its origin. And for the record, I bought mine several years
    ago.

    I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the
    bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.

    It was FTDI who bricked fakes, that was widely discussed. I did not
    hear about Prolific doing something like that.

    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.
    It is a chip by Silicon Laboratories. Datasheet gives contact address in Austin, TX.
    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?
    No. My slave simply returns back each received character. There is
    some software delay but it should be less than 2us. So even waiting test has some overlap at character level. To get more overlap above
    I cheated: my test program was sending 1 more character than it should. So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read
    and buffer excess characters, but it seems that at least for
    CP2104 they cause trouble. My current guess is that OS is
    reading only when requested, but I did not investigate deeper...
    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?
    OK, I tried modifeed version of my test program. It first sends
    k messages without reading anything, then goes to main loop where
    after sending each message it read one. At the end it tail loop
    which reads last k messages without sending anything. So, there
    is k + 1 messages in transit: after sending message k + i program
    waits for answer to message i. In total there is 10000 messages. Results are:

    CH340, 15 char message 20 char message
    k = 0 6.869s 7.163s
    k = 1 4.682s 1.320s
    k = 2 0.992s 1.320s
    k = 3 0.991s 1.319s
    k = 4 0.991s 1.320s
    k = 5 0.990s 1.319s
    k = 8 0.992s 1.320s
    k = 12 0.990s 1.320s
    k = 20 0.992s 1.319s
    k = 36 0.991s 1.321s
    k = 128 0.991s 1.319s

    CP2104, 15 char message 20 char message
    k = 0 2.508s 3.756s
    k = 1 1.897s 1.993s
    k = 2 1.668s 2.087s
    k = 3 1.486s 1.887s
    k = 4 1.457s 1.917s
    k = 5 1.559s 1.877s
    k = 8 1.455s 1.803s
    k = 12 1.337s 1.501s
    k = 20 1.123s 1.499s
    k = 36 1.125s 1.502s

    k = 128 reliably stalled, there were random stalls in other cases

    FTDI232R,
    2 Mbit/s 15 char message 20 char message
    k = 0 5.478s 3.755s
    k = 1 4.929s 3.030s
    k = 2 2.506s 3.339s
    k = 3 2.459s 2.020s
    k = 4 1.708s 1.061s
    k = 5 1.671s 1.032s
    k = 8 0.764s 1.021s
    k = 12 0.772s 1.014s
    k = 20 0.763s 1.009s
    k = 36 0.758s 1.007s
    k = 128 0.757s 1.008s

    FTDI232R,
    3 Mbit/s 15 char message 20 char message
    k = 0 8.216s 10.007s
    k = 1 5.006s 4.344s
    k = 2 3.338s 1.602s
    k = 3 2.406s 1.444s
    k = 4 1.766s 1.316s
    k = 5 1.599s 1.673s
    k = 8 1.040s 1.327s
    k = 12 1.071s 1.312s

    With k = 20, k = 36 and k = 128 communication stalled.

    Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is
    the polling rate... if you adjust it. I think the default for FTDI was 8 ms.
    Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte
    blocks (they say that actual payload is 62-bytes and there are 2-bytes
    of protocol info). With 15 characters messages 0.764s really means
    98% of use of serial bandwidth, so essentiall as good as possible.

    Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???

    It seems that your news agent messed formating of tables. I gave
    results in two columns, one column for 15 character messages, second
    for 20 character messages. 0.764s is for 15 character messages,
    1.021s is for 20 character messages.

    Corresponding k = 8 means really 9 messages in transit, so 135
    characters which is slightly more than 2 buffers. More data in
    transit does not help, but also does not make things worse.
    With 20 charaster messages main improvement is at k = 4 which
    means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char
    messages we see main improvement for k = 2, which corresponds
    to 45 characters in transit. With 20 char messages we get
    impovement for k = 1 which is 40 charactes in transit.
    CH340 uses 32 character transfer buffers, so improvemnet corresponds
    to somwhat more than 1 buffer in transit. Now, if transfers
    between converter and PC were at optimal times, then one buffer
    + one character would be enough to get full serial speed. But
    USB tranfers can not be started at arbitrary times, IIUC there
    are discrete time slots when transfer can occur. When tranfer
    can not be done in given slot it must wait for next slot.
    So, depending on locations of possible slots more buffering
    and more data in transit may be needed for optimal performance.
    OTOH 2-3 buffers should be enough to allow PC to get full
    bandwidth and this is in good agreement with FTDI results.
    In case of CH340 there is extra factor: CH340 also uses 8 byte
    transfers. I do not know what function they have, but
    resonably likely guess is that those 8 byte pack tranfer control
    info that FTDI bundles with normal data. Anyway, those
    are "interrupt" tranfers in USB sense, so have higher priority
    than data transfer. Resonable guess it that they steal some
    USB bandwith from data tranfers. Also, smaller than maximal
    data block size limits efficiency, so it is possible that
    CH340 is limited by USB bandwith (lack of enough slots).

    Now, concerning 3 Mbits/s, due to different serial speed
    optimal times for transfers are different than in 2 Mbits/s
    case. It is possible that there is worse fit of desired
    and possible transfer times. Buffering allows to at least
    partially cure this, so initial improvement. But clearly,
    there is some extra bottleneck. Now some speculation:
    with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
    cycle. I would have to look at spec to be sure, but this
    is close to 150 byte worst case FS transfer. Beside data
    there is some USB protocol overhead and (speculatively) it
    is possible that low level USB diver may refuse to schedule
    two 64-byte transfers in single cycle. In such case effective
    bandwith for serial data would be 4096000 bits, which
    correspond to 5120000 serial bits (serial sends start and stop
    bits which are not needed for USB). This is less than
    full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
    must go trouh the same USB). With larger amount of data in
    transit this could give wild oscilations in amount of
    buffered data, leading to slowdown when buffers get empty
    and giving stall when receive buffer overflows.

    Of course there is another speculation: convertor may be fake.
    Supposedly fakes use MCU-s with special program. Software
    could crate delays which limit transfer rate at 3 Mbits/s
    and lead to data loss/stall with more data in transit.

    I last part I was partially wrong. USB-2.0 spec says that transmission
    between PC and high speed hub is always high speed. For full speed
    devices hub is supposed to buffer messages and transmit to device
    at its speed. In effect PC needs two high speed messages per low
    speed message. My tests above was with converter connected via
    high speed hub. There was also Stlink dongle plugged into the
    same hub. To remove effect of hub I tried plugging converter
    directly into USB-1.1 port on separate USB controller. That
    led to significantly longer times. I also tried to connect Stlink
    into separate port so that converter was the only thing connected
    to the hub. I run several times few cases at 3 Mbits/s, for short
    messages and low k results vary significanlty,
    for 120 characters and k = 0 I got times from 6.375s to 6.598s.
    At 2 Mbits/s in 25 runs I got one outlier at 6.667s, the rest
    was between 6.029s and 0m6.049s.

    Anyway, USB seem to have significant impact on possible speed,
    with full speed convertor and full duplex trasmission 2 Mbits/s
    seem to give better speed than 3 Mbits/s. Maybe better USB
    hub could help (I do not know how to find out size of buffers
    in my hub, but by the spec hub may have buffers just for 2
    bulk transfers or mauch more). Given the above I would expect
    convertor connected via high speed USB to perform better at 3 Mbits/s.

    Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver
    anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

    Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second.
    That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command.
    The beauty of full-duplex!

    Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.

    See above. Note that my test slave started replay after receiving first character. ATM it seems that with enough overlap at 2 Mbits/s I getting repeatably almost optimal speed (even with 1.1 port). But with less
    overlap there are randomly looking variations, which probably means
    high sensitivity to precise timing of messages. And at 3 Mbits/s
    variation seem to be much worse.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Mon Dec 5 23:41:14 2022
    On Monday, December 5, 2022 at 9:30:24 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Sunday, December 4, 2022 at 10:33:28 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:
    On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:
    Rick C <gnuarm.del...@gmail.com> wrote:

    With relatively cheap convertors
    on Linux to handle 10000 roundtrips for 15 bytes messages I need the following times:

    CH340 2Mb/s, waiting, 6.890s

    That's 11.3 per target, per second. (128 targets)

    CH340 2Mb/s, overlapped 1.058s

    That's pretty close to 74 per target, per second.

    I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.
    Well, my use is rather light. Most is for debugging at say 9600 or 115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that
    it actually worked, but speed was significantly different, I had to set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters
    and it looks easy to get it wrong). Similary, when CH340 was set to 576800
    I had to set MCU to 541300. Even after matching speed at nomial 576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those
    rates, did not want to wait for full test). Also, 500000 was significantly
    slower than 460800 (but "merely" 2 times slower for 1 character messages
    and catching up with longer messages). Still, ATM CH340 looks resonably good.

    Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a
    name brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.
    Well, I say you what I observed. People say various thing on the
    net. I was interested if net know something about my trouble with
    CP2104 so I googled for "CP2104 lockup". And I got a bunch of
    complaints about FTDI devices, solved by using CP2104. So, there
    is a lot of noise and ATM I prefer to stay with what I see.

    What sort of complaints about FTDI? Did you contact them about it?
    Things like computer locking up (IIUC fixed by newer driver). Or "communication did not work" (no real info). ATM I have enough
    converters. If I need more/better I will look at FTDI products
    and possible ask them questions.
    Remark: I bought all my convertors from Chinese sellers. IIUC
    FTDI chip is faked a lot, but other too. Still, I think they
    show what is possible and illustrate some difficulties.

    FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.
    AFAIK Linux driver does not discriminate againt non-FTDI devices.
    So fact that convertors works with Linux driver tells you nothing
    about its origin. And for the record, I bought mine several years
    ago.

    I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the
    bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.
    It was FTDI who bricked fakes, that was widely discussed. I did not
    hear about Prolific doing something like that.
    CP2104 2Mb/s, waiting, 2.514s
    CP2104 2Mb/s, overlapped 1.214s

    I don't know what the CP2104 is.
    It is a chip by Silicon Laboratories. Datasheet gives contact address
    in Austin, TX.
    I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?
    No. My slave simply returns back each received character. There is some software delay but it should be less than 2us. So even waiting test has some overlap at character level. To get more overlap above I cheated: my test program was sending 1 more character than it should.
    So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read
    and buffer excess characters, but it seems that at least for
    CP2104 they cause trouble. My current guess is that OS is
    reading only when requested, but I did not investigate deeper...
    Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?
    OK, I tried modifeed version of my test program. It first sends
    k messages without reading anything, then goes to main loop where after sending each message it read one. At the end it tail loop which reads last k messages without sending anything. So, there
    is k + 1 messages in transit: after sending message k + i program waits for answer to message i. In total there is 10000 messages. Results are:

    CH340, 15 char message 20 char message
    k = 0 6.869s 7.163s
    k = 1 4.682s 1.320s
    k = 2 0.992s 1.320s
    k = 3 0.991s 1.319s
    k = 4 0.991s 1.320s
    k = 5 0.990s 1.319s
    k = 8 0.992s 1.320s
    k = 12 0.990s 1.320s
    k = 20 0.992s 1.319s
    k = 36 0.991s 1.321s
    k = 128 0.991s 1.319s

    CP2104, 15 char message 20 char message
    k = 0 2.508s 3.756s
    k = 1 1.897s 1.993s
    k = 2 1.668s 2.087s
    k = 3 1.486s 1.887s
    k = 4 1.457s 1.917s
    k = 5 1.559s 1.877s
    k = 8 1.455s 1.803s
    k = 12 1.337s 1.501s
    k = 20 1.123s 1.499s
    k = 36 1.125s 1.502s

    k = 128 reliably stalled, there were random stalls in other cases

    FTDI232R,
    2 Mbit/s 15 char message 20 char message
    k = 0 5.478s 3.755s
    k = 1 4.929s 3.030s
    k = 2 2.506s 3.339s
    k = 3 2.459s 2.020s
    k = 4 1.708s 1.061s
    k = 5 1.671s 1.032s
    k = 8 0.764s 1.021s
    k = 12 0.772s 1.014s
    k = 20 0.763s 1.009s
    k = 36 0.758s 1.007s
    k = 128 0.757s 1.008s

    FTDI232R,
    3 Mbit/s 15 char message 20 char message
    k = 0 8.216s 10.007s
    k = 1 5.006s 4.344s
    k = 2 3.338s 1.602s
    k = 3 2.406s 1.444s
    k = 4 1.766s 1.316s
    k = 5 1.599s 1.673s
    k = 8 1.040s 1.327s
    k = 12 1.071s 1.312s

    With k = 20, k = 36 and k = 128 communication stalled.

    Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is
    the polling rate... if you adjust it. I think the default for FTDI was 8 ms.
    Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte blocks (they say that actual payload is 62-bytes and there are 2-bytes of protocol info). With 15 characters messages 0.764s really means
    98% of use of serial bandwidth, so essentiall as good as possible.

    Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???
    It seems that your news agent messed formating of tables. I gave
    results in two columns, one column for 15 character messages, second
    for 20 character messages. 0.764s is for 15 character messages,
    1.021s is for 20 character messages.

    Yes, I see that now. Google Groups removes excess spaces. Not a good idea and for no apparent reason. If they want to conserve bytes, maybe they should delete the message contents. That would greatly reduce the noise and only reduce the signal
    slightly in many cases.


    Corresponding k = 8 means really 9 messages in transit, so 135 characters which is slightly more than 2 buffers. More data in
    transit does not help, but also does not make things worse.
    With 20 charaster messages main improvement is at k = 4 which
    means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char
    messages we see main improvement for k = 2, which corresponds
    to 45 characters in transit. With 20 char messages we get
    impovement for k = 1 which is 40 charactes in transit.
    CH340 uses 32 character transfer buffers, so improvemnet corresponds
    to somwhat more than 1 buffer in transit. Now, if transfers
    between converter and PC were at optimal times, then one buffer
    + one character would be enough to get full serial speed. But
    USB tranfers can not be started at arbitrary times, IIUC there
    are discrete time slots when transfer can occur. When tranfer
    can not be done in given slot it must wait for next slot.
    So, depending on locations of possible slots more buffering
    and more data in transit may be needed for optimal performance.
    OTOH 2-3 buffers should be enough to allow PC to get full
    bandwidth and this is in good agreement with FTDI results.
    In case of CH340 there is extra factor: CH340 also uses 8 byte transfers. I do not know what function they have, but
    resonably likely guess is that those 8 byte pack tranfer control
    info that FTDI bundles with normal data. Anyway, those
    are "interrupt" tranfers in USB sense, so have higher priority
    than data transfer. Resonable guess it that they steal some
    USB bandwith from data tranfers. Also, smaller than maximal
    data block size limits efficiency, so it is possible that
    CH340 is limited by USB bandwith (lack of enough slots).

    Now, concerning 3 Mbits/s, due to different serial speed
    optimal times for transfers are different than in 2 Mbits/s
    case. It is possible that there is worse fit of desired
    and possible transfer times. Buffering allows to at least
    partially cure this, so initial improvement. But clearly,
    there is some extra bottleneck. Now some speculation:
    with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
    cycle. I would have to look at spec to be sure, but this
    is close to 150 byte worst case FS transfer. Beside data
    there is some USB protocol overhead and (speculatively) it
    is possible that low level USB diver may refuse to schedule
    two 64-byte transfers in single cycle. In such case effective
    bandwith for serial data would be 4096000 bits, which
    correspond to 5120000 serial bits (serial sends start and stop
    bits which are not needed for USB). This is less than
    full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
    must go trouh the same USB). With larger amount of data in
    transit this could give wild oscilations in amount of
    buffered data, leading to slowdown when buffers get empty
    and giving stall when receive buffer overflows.

    Of course there is another speculation: convertor may be fake. Supposedly fakes use MCU-s with special program. Software
    could crate delays which limit transfer rate at 3 Mbits/s
    and lead to data loss/stall with more data in transit.
    I last part I was partially wrong. USB-2.0 spec says that transmission between PC and high speed hub is always high speed. For full speed
    devices hub is supposed to buffer messages and transmit to device
    at its speed. In effect PC needs two high speed messages per low
    speed message. My tests above was with converter connected via
    high speed hub. There was also Stlink dongle plugged into the
    same hub. To remove effect of hub I tried plugging converter
    directly into USB-1.1 port on separate USB controller. That
    led to significantly longer times. I also tried to connect Stlink
    into separate port so that converter was the only thing connected
    to the hub. I run several times few cases at 3 Mbits/s, for short
    messages and low k results vary significanlty,
    for 120 characters and k = 0 I got times from 6.375s to 6.598s.
    At 2 Mbits/s in 25 runs I got one outlier at 6.667s, the rest
    was between 6.029s and 0m6.049s.

    Anyway, USB seem to have significant impact on possible speed,
    with full speed convertor and full duplex trasmission 2 Mbits/s
    seem to give better speed than 3 Mbits/s. Maybe better USB
    hub could help (I do not know how to find out size of buffers
    in my hub, but by the spec hub may have buffers just for 2
    bulk transfers or mauch more).

    I would not be using a hub at all. This PC would be dedicated to testing and only a mouse would use a USB in addition to the serial dongle. Oh, I think they use a bar code scanner too, so three USB ports.


    Given the above I would expect
    convertor connected via high speed USB to perform better at 3 Mbits/s.
    Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver
    anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

    Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second.
    That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command.
    The beauty of full-duplex!

    Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.
    See above. Note that my test slave started replay after receiving first character. ATM it seems that with enough overlap at 2 Mbits/s I getting repeatably almost optimal speed (even with 1.1 port). But with less
    overlap there are randomly looking variations, which probably means
    high sensitivity to precise timing of messages. And at 3 Mbits/s
    variation seem to be much worse.

    The priority protocol I described above would overlap after one message. So not a lot of difference.

    Thanks for the info. It was very useful.

    --

    Rick C.

    +-- Get 1,000 miles of free Supercharging
    +-- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Rick C on Tue Dec 6 13:32:21 2022
    On 06/12/2022 08:41, Rick C wrote:

    Yes, I see that now. Google Groups removes excess spaces. Not a
    good idea and for no apparent reason. If they want to conserve
    bytes, maybe they should delete the message contents. That would
    greatly reduce the noise and only reduce the signal slightly in many
    cases.


    You do realise it is up to /you/, the person making a post, to snip
    excess content? For some reason, google posters do this extremely badly
    - either they never snip, or they cut too much (including attributions).

    Google groups ruins the format of Usenet posts - including removing
    leading spaces and screwing up line endings. It's one of the reasons
    why so many Usenet users dislike it.

    (Yes, I know you have some particular personal reasons for using GG.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to David Brown on Tue Dec 6 18:03:15 2022
    On Tuesday, December 6, 2022 at 8:32:28 AM UTC-4, David Brown wrote:
    On 06/12/2022 08:41, Rick C wrote:

    Yes, I see that now. Google Groups removes excess spaces. Not a
    good idea and for no apparent reason. If they want to conserve
    bytes, maybe they should delete the message contents. That would
    greatly reduce the noise and only reduce the signal slightly in many
    cases.

    You do realise it is up to /you/, the person making a post, to snip
    excess content? For some reason, google posters do this extremely badly
    - either they never snip, or they cut too much (including attributions).

    Google groups ruins the format of Usenet posts - including removing
    leading spaces and screwing up line endings. It's one of the reasons
    why so many Usenet users dislike it.

    (Yes, I know you have some particular personal reasons for using GG.)

    I guess I should have used a smiley. I was trying to say that much of what is posted here would be better not posted at all... as a joke.

    --

    Rick C.

    +-+ Get 1,000 miles of free Supercharging
    +-+ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Rick C on Wed Dec 7 08:08:15 2022
    On 07/12/2022 03:03, Rick C wrote:
    On Tuesday, December 6, 2022 at 8:32:28 AM UTC-4, David Brown wrote:
    On 06/12/2022 08:41, Rick C wrote:

    Yes, I see that now. Google Groups removes excess spaces. Not a
    good idea and for no apparent reason. If they want to conserve
    bytes, maybe they should delete the message contents. That would
    greatly reduce the noise and only reduce the signal slightly in many
    cases.

    You do realise it is up to /you/, the person making a post, to snip
    excess content? For some reason, google posters do this extremely badly
    - either they never snip, or they cut too much (including attributions).

    Google groups ruins the format of Usenet posts - including removing
    leading spaces and screwing up line endings. It's one of the reasons
    why so many Usenet users dislike it.

    (Yes, I know you have some particular personal reasons for using GG.)

    I guess I should have used a smiley. I was trying to say that much of what is posted here would be better not posted at all... as a joke.


    I think there's been a lot of interesting stuff posted in this thread.
    Maybe not all of it has been useful to /you/, but you're not paying us
    for the job. So we chatter - sometimes people learn something new or
    get some new ideas.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to David Brown on Wed Dec 7 03:36:38 2022
    On Wednesday, December 7, 2022 at 3:08:22 AM UTC-4, David Brown wrote:
    On 07/12/2022 03:03, Rick C wrote:
    On Tuesday, December 6, 2022 at 8:32:28 AM UTC-4, David Brown wrote:
    On 06/12/2022 08:41, Rick C wrote:

    Yes, I see that now. Google Groups removes excess spaces. Not a
    good idea and for no apparent reason. If they want to conserve
    bytes, maybe they should delete the message contents. That would
    greatly reduce the noise and only reduce the signal slightly in many
    cases.

    You do realise it is up to /you/, the person making a post, to snip
    excess content? For some reason, google posters do this extremely badly
    - either they never snip, or they cut too much (including attributions). >>
    Google groups ruins the format of Usenet posts - including removing
    leading spaces and screwing up line endings. It's one of the reasons
    why so many Usenet users dislike it.

    (Yes, I know you have some particular personal reasons for using GG.)

    I guess I should have used a smiley. I was trying to say that much of what is posted here would be better not posted at all... as a joke.

    I think there's been a lot of interesting stuff posted in this thread.
    Maybe not all of it has been useful to /you/, but you're not paying us
    for the job. So we chatter - sometimes people learn something new or
    get some new ideas.

    Again, I was not being clear enough. By "here", I did not mean this thread. I was referring to newsgroups as a whole.

    IT WAS JUST A JOKE!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)