Forum: >>> Magnum BBS <<<

Serial Bus Speed on PCs

From Rick C@21:1/5 to All on Tue Nov 29 23:33:55 2022

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.
It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if
there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe 2
pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16 responses
and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The application
program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45 connector,
there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per
second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Rick C on Wed Nov 30 07:42:10 2022

On 11/30/22 2:33 AM, Rick C wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.

It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if

there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe

2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16

responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The

application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per

second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

You can get much more that 8000 cps with an FTDI interface. This is
because you can send/recieve more that one character per "poll".

My first thought is why are you trying to combine everything into one
USB serial port. Why not give each test fixture its own serial port (or
lump just a few onto a given port) and let the USB bus do the bulk of
the multi-drop.

The ethernet unit might be just a 10 MBit device, or maybe a 100MBit and
you need to send a whole message block, process it, then send the data
in it, and then it can send back the answer when it figures the full
answer has come back. It likely doesn't even TRY to transmit on a
character basis, but because of the much larger overhead of an ethernet
packet, presumes network bandwidth is more important the delay.

Also, they may be quoting figures with typical routing delays assuming a multi-hop route from computer to destination, which adds to the delay,
since that is the sort of application you use those for. Ethernet is a
"long haul" medium, not normally thought of as short haul, particularly
when talking about lower bandwidth applications.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bernd Linsel@21:1/5 to Rick C on Wed Nov 30 16:11:19 2022

On 30.11.2022 15:21, Rick C wrote:

It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not have

message handling delays. Same concern with the 12 Mbps TTL level interface.

So what is there against you using such a 12 Mbps USB/serial thing and attaching an RS-422/485 transceiver (e.g. https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).

That should meet all your requirements mentioned so far.

Regards,
Bernd

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Richard Damon on Wed Nov 30 06:21:18 2022

On Wednesday, November 30, 2022 at 8:42:17 AM UTC-4, Richard Damon wrote:

On 11/30/22 2:33 AM, Rick C wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.

It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if

there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe

2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16

responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The

application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per

second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

You can get much more that 8000 cps with an FTDI interface. This is
because you can send/recieve more that one character per "poll".

Yes, I'm aware of that. I suppose I didn't spell out everything in my post, but the 8,000 per second polling rate translates into 4,000 message pairs, one Tx, one Rx. With 256 end points to be controlled, this is just 16 message pairs per second per
end point. The length of the messages is around 15 char, so this gives a bit over 1 Mbps. The RS-422 FTDI adapter can manage 3 Mbps, or the TTL, hi-speed adapter can be set for up to 12 Mbps, but I'm still waiting to hear from them about any internal
or software overhead that would slow the message rate.

My first thought is why are you trying to combine everything into one
USB serial port. Why not give each test fixture its own serial port (or
lump just a few onto a given port) and let the USB bus do the bulk of
the multi-drop.

I don't know if that will work any better. I have questions in to the various vendors.

The ethernet unit might be just a 10 MBit device, or maybe a 100MBit

10, 100 Mbps and 1 Gbps.

and
you need to send a whole message block, process it, then send the data
in it, and then it can send back the answer when it figures the full
answer has come back.

"It"??? What is "it" exactly? The message blocks are 15 characters. The bus runs with a single command from the master resulting in a single response from the slave, lather, rinse, repeat. The short message size results in a low bit rate, or, really,
the message rate is the choke point, not the bit rate.

It likely doesn't even TRY to transmit on a
character basis, but because of the much larger overhead of an ethernet packet, presumes network bandwidth is more important the delay.

I don't know where you got the "character" idea. I don't know what the adapter decides is a block to send, but I assume there is a maximum size and short of that, there's a time out.

Also, they may be quoting figures with typical routing delays assuming a multi-hop route from computer to destination, which adds to the delay,
since that is the sort of application you use those for. Ethernet is a
"long haul" medium, not normally thought of as short haul, particularly
when talking about lower bandwidth applications.

No one said anything about Ethernet "routing" delays. I've explained to them what I'm doing and one vendor said there is a 1 ms delay in handling each "message" as I described it.

I could go with something much fancier, where the same command is sent to all slaves, and the slaves respond in turn, controlled by a separate signal controlling priority to write the reply onto the shared bus. The message from the master can be a
single broadcast message, with 128 replies.

So far, no one has indicated the specific baud rates they support. They only list the maximum rate. I have to design the slaves with a clock for the baud rate times X. It would be nice to share that with the rest of the design which needs a clock
around 33 MHz for comms to the UUTs.

It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not have
message handling delays. Same concern with the 12 Mbps TTL level interface.

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Bernd Linsel on Wed Nov 30 07:58:44 2022

On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel wrote:

On 30.11.2022 15:21, Rick C wrote:

It's kind of odd that FTDI has a hi-speed serial adapter with a TTL level UART interface that runs up to 12 Mbps, while the RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps. Still, 3 Mbps will work a champ if the interface does not

have message handling delays. Same concern with the 12 Mbps TTL level interface.

So what is there against you using such a 12 Mbps USB/serial thing and attaching an RS-422/485 transceiver (e.g. https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).

That should meet all your requirements mentioned so far.

I heard back from FTDI and they only support polling rates up to 1 kHz. So I guess I'm stuck with Ethernet. I might be stuck with changing the protocol. Someone suggested that the OS will interject delays as well. So I might have to either install 16
serial ports directly in the PC, or change th e protocol so the master talks to all the slaves in a burst or a single broadcast command, and the replies are controlled by a priority scheme so they are back to back.

I didn't expect this to be the difficult part of the job.

I could also automate the test steps into the FPGA on each test fixture board. But that makes the whole thing much less flexible while developing.

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Rick C on Wed Nov 30 18:14:12 2022

On 30/11/2022 16:58, Rick C wrote:

On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel
wrote:

On 30.11.2022 15:21, Rick C wrote:

It's kind of odd that FTDI has a hi-speed serial adapter with a
TTL level UART interface that runs up to 12 Mbps, while the
RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
Still, 3 Mbps will work a champ if the interface does not have
message handling delays. Same concern with the 12 Mbps TTL level
interface.

So what is there against you using such a 12 Mbps USB/serial thing
and attaching an RS-422/485 transceiver (e.g.
https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).

That should meet all your requirements mentioned so far.

I heard back from FTDI and they only support polling rates up to 1
kHz. So I guess I'm stuck with Ethernet. I might be stuck with
changing the protocol. Someone suggested that the OS will interject
delays as well. So I might have to either install 16 serial ports
directly in the PC, or change th e protocol so the master talks to
all the slaves in a burst or a single broadcast command, and the
replies are controlled by a priority scheme so they are back to
back.

I didn't expect this to be the difficult part of the job.

I could also automate the test steps into the FPGA on each test
fixture board. But that makes the whole thing much less flexible
while developing.

The general issue is that PC's are great at throughput, but poor at
latency. USB in particular has a scheduler and polls the devices on the
bus at regular intervals. (This can't really be avoided in a
half-duplex master-slave system.) For Ethernet, a gigibit switch will
usually have a latency of 50 - 125 us. Even with a direct connection
with no switch, you'll be hard pushed to get latencies lower than 50 us,
and thus a query-reply peak rate of 10,000 telegram pairs a second.

You can get higher throughput if you have multiple outstanding
query-replies going to different USB devices or different IP
connections. So while you are not going to get more than 4000
send/receive transactions a second to one USB 2.0 high speed FTDI serial
port device, you could probably do that simultaneously to several such
devices on the same bus as long as you don't need to wait for the reply
from one target before sending a message to a different target. (The
same principle goes for Ethernet.)

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and broadcast,
rather than divided up. You could even make an EtherCAT-style serial
protocol (using the hybrid RS-422 bus you suggested earlier). The PC
could send a single massive serial telegram consisting of multiple small
ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big
telegram in - using 3 MBaud if you like.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to David Brown on Wed Nov 30 20:52:40 2022

On 11/30/2022 19:14, David Brown wrote:

On 30/11/2022 16:58, Rick C wrote:

On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel
wrote:

On 30.11.2022 15:21, Rick C wrote:

It's kind of odd that FTDI has a hi-speed serial adapter with a
TTL level UART interface that runs up to 12 Mbps, while the
RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
Still, 3 Mbps will work a champ if the interface does not have
message handling delays. Same concern with the 12 Mbps TTL level
interface.

So what is there against you using such a 12 Mbps USB/serial thing
and attaching an RS-422/485 transceiver (e.g.
https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf).

That should meet all your requirements mentioned so far.

I heard back from FTDI and they only support polling rates up to 1
kHz. So I guess I'm stuck with Ethernet. I might be stuck with
changing the protocol. Someone suggested that the OS will interject
delays as well. So I might have to either install 16 serial ports
directly in the PC, or change th e protocol so the master talks to
all the slaves in a burst or a single broadcast command, and the
replies are controlled by a priority scheme so they are back to
back.

I didn't expect this to be the difficult part of the job.

I could also automate the test steps into the FPGA on each test
fixture board. But that makes the whole thing much less flexible
while developing.

The general issue is that PC's are great at throughput, but poor at latency. USB in particular has a scheduler and polls the devices on the
bus at regular intervals. (This can't really be avoided in a
half-duplex master-slave system.) For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us. Even with a direct connection
with no switch, you'll be hard pushed to get latencies lower than 50 us,
and thus a query-reply peak rate of 10,000 telegram pairs a second.

You can get higher throughput if you have multiple outstanding
query-replies going to different USB devices or different IP
connections. So while you are not going to get more than 4000
send/receive transactions a second to one USB 2.0 high speed FTDI serial
port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply
from one target before sending a message to a different target. (The
same principle goes for Ethernet.)

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and broadcast,
rather than divided up. You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier). The PC
could send a single massive serial telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big
telegram in - using 3 MBaud if you like.

David, that kind of detailed problem solving should not go out free
of charge you know :-).
Of course this is the way to do it.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Thu Dec 1 01:08:19 2022

Rick C <gnuarm.deletethisbit@gmail.com> wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.

It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if

there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe

2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16

responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The

application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per

second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

I am not sure if you get that there are two issues: througput and latency.
If you wait for answer before sending next request you will be bounded
by latency. OTOH if you fire several request without waiting, then
you will be limited by througput. With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need
the following times:

CH340 2Mb/s, waiting, 6.890s
CH340 2Mb/s, overlapped 1.058s
CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

The other end was STM32F030, which was simply replaying back
received characters.

Note: there results are not fully comparable. Apparently CH340
will silently drop excess characters, so for overalapped operation
I simply sent more charactes than I read. OTOH CP2104 seem to
stall when its receive buffer overflows, so I limited overlap to
avoid stalls. Of course real application would need some way
to ensure that receive buffers do not overflow.

So, you should be easily able to handle 10000 round trips
per second provided there is enough overlap. For this
you need to ensure that only one device is transmitting to
PC. If you have several FPGA-s on a single board, coordinating
them should be easy. Of couse, you need some free pins and
extra tracks. I would use single transceiver per board,
depending on coordination to ensure that only one FPGA
controls transceiver at given time. Anyway, this would
allow overlapped transmisson to all devices on single
board. With multiple boards you would need some hardware
or software protocol decide which board can transmit.
On hardware side a single pair of extra wires could
carry needed signals (that is your "priority daisy chain").

As other suggested you could use multiple convertors for
better overlap. My convertors are "full speed" USB, that
is they are half-duplex 12 Mb/s. USB has significant
protocol overhead, so probably two 2 Mb/s duplex serial
convertes would saturate single USB bus. In desktops
it is normal to have several separate USB controllers
(buses), but that depends on specific motherboard.
Theoreticaly, when using "high speed" USB converters,
several could easily work from single USB port (provided
that you have enough places in hub(s)).

An extra thing: there are reasonably cheap PC compatible
boards, supposedly they are cheaper and more easy to buy
than Raspberry Pi (but I did not try buy them). If you
need really large scale you could have a single such board
per batch of devices and run copy of your program there. And
a single laptop connecting to satelite board via ethernet
and collecting results.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Thu Dec 1 02:48:15 2022

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.

It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if

there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or maybe

2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16

responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The

application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per

second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

I am not sure if you get that there are two issues: througput and latency.

Of course I'm aware of it. That's the entirety of the problem.

If you wait for answer before sending next request you will be bounded
by latency.

Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would
not implement the data path in an FPGA.

I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.

OTOH if you fire several request without waiting, then
you will be limited by througput.

Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine
messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that
manipulate the details being tested.

Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more
problematic. Even 16 is a PITA.

Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled
by commands as well.

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need
the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

The other end was STM32F030, which was simply replaying back
received characters.

Note: there results are not fully comparable. Apparently CH340
will silently drop excess characters, so for overalapped operation
I simply sent more charactes than I read. OTOH CP2104 seem to
stall when its receive buffer overflows, so I limited overlap to
avoid stalls. Of course real application would need some way
to ensure that receive buffers do not overflow.

Wait, what? How would overlapped operation operate if you have to worry about lost characters???

I'm not sure what "stall" means. Did it send XOFF or something?

Any idea on what size of aggregated messages would prevent character loss? That's kind of important.

So, you should be easily able to handle 10000 round trips
per second provided there is enough overlap. For this
you need to ensure that only one device is transmitting to
PC. If you have several FPGA-s on a single board, coordinating
them should be easy. Of couse, you need some free pins and
extra tracks. I would use single transceiver per board,
depending on coordination to ensure that only one FPGA
controls transceiver at given time. Anyway, this would
allow overlapped transmisson to all devices on single
board. With multiple boards you would need some hardware
or software protocol decide which board can transmit.
On hardware side a single pair of extra wires could
carry needed signals (that is your "priority daisy chain").

Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then
passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.

As other suggested you could use multiple convertors for
better overlap. My convertors are "full speed" USB, that
is they are half-duplex 12 Mb/s. USB has significant
protocol overhead, so probably two 2 Mb/s duplex serial
convertes would saturate single USB bus. In desktops
it is normal to have several separate USB controllers
(buses), but that depends on specific motherboard.
Theoreticaly, when using "high speed" USB converters,
several could easily work from single USB port (provided
that you have enough places in hub(s)).

I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version, their
devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422, full-
speed device. So maybe he got confused. They don't offer many hi-speed devices.

But the Ethernet implementations also have speed issues, likely because they are actually software based.

An extra thing: there are reasonably cheap PC compatible
boards, supposedly they are cheaper and more easy to buy
than Raspberry Pi (but I did not try buy them). If you
need really large scale you could have a single such board
per batch of devices and run copy of your program there. And
a single laptop connecting to satelite board via ethernet
and collecting results.

Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the
overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast
the MSP430 UART will run? I might have an ARM board that runs Mecrisp, I can't recall.

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Rick C on Fri Dec 2 13:30:19 2022

On 01/12/2022 11:48, Rick C wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4,
anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

<snip>

LOL, so now I'm leaning back toward the USB based FTDI RS-422
cable and a priority scheme so every target gets many, more
commands per second. I just ran the math, and this would be
almost 20,000 bits per command. Try to run that at 8,000 times
per second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect
with their cables. We'll see what they come back with.

I am not sure if you get that there are two issues: througput and
latency.

Of course I'm aware of it. That's the entirety of the problem.

I would be rather surprised if you were not aware of the difference -
but your posts show you don't seem to be familiar with the level of the latencies inherent in USB and Ethernet. It seems you think it is just
poor implementations of hardware or drivers. (Of course, limited implementations can make it worse.)

If you wait for answer before sending next request you will be
bounded by latency.

Until I contacted the various vendors, I had no reason to expect
their hardware to have such excessive latencies. Especially in the
Ethernet converter, I would have expected better hardware. Being an
FPGA sort of guy, I didn't even realize they would not implement the
data path in an FPGA.

No one implements the data path of Ethernet in an FPGA. Sometimes a few
bits (such as checksums) are accelerated in hardware, and there can even
be filtering or re-direction done in hardware, but the data in Ethernet
packets is always handled in software.

Even if it was all handled instantly in perfect hardware, an Ethernet
frame is 72 bytes plus 12 bytes gap. Then there is at least 20 bytes of
IP header, then 20 bytes for the TCP header. That's 124 bytes before
there is any content whatsoever, or 10 us for 100 Mbps Ethernet.

I found one company that does use an FPGA for a USB to serial
adapter, but I expect the PC side USB software may be problematic as
well. It makes you wonder how they ever get audio to work over USB.
I guess lots of buffering.

USB works by cyclic polling. There is inevitably a latency. USB 1 had
1 kHz polling, while USB 2 has 8 kHz. (I don't know off-hand what USB 3
has, but USB serial devices are invariably USB 1 or 2.)

Most serial port drivers have lower polling rates than strictly
necessary by USB cycle times, since polling very fast is difficult to do efficiently. I believe it is difficult on Windows to have periodic
events at a resolution below 1 millisecond without busy-waiting, and
drivers can't have busy-waiting - you can't have a driver that eats one
of your cpu cores just because you've plugged in a USB to serial cable!

If you write your own code that accesses the USB lower levels directly
(such as using Linux libusb, or its Windows port) then you can, I
believe, call USB transfer functions faster, up to the base USB cycle rate.

None of this should make you wonder about audio. You just need enough buffering to cover USB cycles (125 us for USB 2). Any application delay
is typically /far/ longer, such as when collecting streaming audio from
a dodgy internet connection.

I wonder if you are confusing the two related kinds of latency - one-way latency (time difference between when an application starts to send
something at one end, and the application at the other end has got the
data), and two-way latency for a query-reply two-way communication. You
might also be mixing up jitter in this.

I say this because there are such critical differences between the needs
of audio and the needs of your communication. In particular, audio does
not care about two-way latencies, and can cope with significant one-way
latency (up to perhaps 20 ms) even when there is video. Without video,
latency is irrelevant for audio as long as the jitter is low.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to David Brown on Fri Dec 2 09:01:54 2022

On Friday, December 2, 2022 at 7:30:25 AM UTC-5, David Brown wrote:

On 01/12/2022 11:48, Rick C wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

<snip>

LOL, so now I'm leaning back toward the USB based FTDI RS-422
cable and a priority scheme so every target gets many, more
commands per second. I just ran the math, and this would be
almost 20,000 bits per command. Try to run that at 8,000 times
per second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect
with their cables. We'll see what they come back with.

I am not sure if you get that there are two issues: througput and
latency.

Of course I'm aware of it. That's the entirety of the problem.

I would be rather surprised if you were not aware of the difference -
but your posts show you don't seem to be familiar with the level of the latencies inherent in USB and Ethernet. It seems you think it is just
poor implementations of hardware or drivers. (Of course, limited implementations can make it worse.)

I was warned that the polling rate in USB is at best 1 kHz for full-speed and 8 kHz for hi-speed, which creates definitely significant delays in this application. I've been told by FTDI (possibly in error) that even using the hi-speed interface, the
best they can set their device for is 1,000 kHz polling. This does not result in a terrible data throughput, but it's not as fast as I'd like. If FTDI supported the hi-speed polling rate of 8 kHz, I would probably settle for that and quit looking.

I'm pretty confident there is nothing inherent in 100 Mbps Ethernet that would create delays significant to this application. I've been told by one supplier, their device has a 1 ms built in delay. I'm wondering if this is a timeout to indicate a
packet should be sent, even if no more data is being received. But so far, no one has said this is adjustable. I just spoke with Perle and I was told of a 5 ms delay on their Ethernet unit. Again, that's not inherent in the Ethernet protocol.

If you wait for answer before sending next request you will be
bounded by latency.

Until I contacted the various vendors, I had no reason to expect
their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an
FPGA sort of guy, I didn't even realize they would not implement the
data path in an FPGA.

No one implements the data path of Ethernet in an FPGA. Sometimes a few
bits (such as checksums) are accelerated in hardware, and there can even
be filtering or re-direction done in hardware, but the data in Ethernet packets is always handled in software.

You are the second person to tell me that I didn't design FPGAs for the TTC/Acterna/Viavi TBerd to process OC-12 data. I guess I just dreamed it.

I'd like to know you base your assertion on?

Even if it was all handled instantly in perfect hardware, an Ethernet
frame is 72 bytes plus 12 bytes gap. Then there is at least 20 bytes of
IP header, then 20 bytes for the TCP header. That's 124 bytes before
there is any content whatsoever, or 10 us for 100 Mbps Ethernet.

10 uS would be wonderful! 100 times faster than anyone else. Where do you sell your devices?

I found one company that does use an FPGA for a USB to serial
adapter, but I expect the PC side USB software may be problematic as
well. It makes you wonder how they ever get audio to work over USB.
I guess lots of buffering.

USB works by cyclic polling. There is inevitably a latency. USB 1 had
1 kHz polling, while USB 2 has 8 kHz. (I don't know off-hand what USB 3
has, but USB serial devices are invariably USB 1 or 2.)

Most serial port drivers have lower polling rates than strictly
necessary by USB cycle times, since polling very fast is difficult to do efficiently. I believe it is difficult on Windows to have periodic
events at a resolution below 1 millisecond without busy-waiting, and
drivers can't have busy-waiting - you can't have a driver that eats one
of your cpu cores just because you've plugged in a USB to serial cable!

So far, no one has said it was the PC software. They have *all* said the delays are in their box.

If you write your own code that accesses the USB lower levels directly
(such as using Linux libusb, or its Windows port) then you can, I
believe, call USB transfer functions faster, up to the base USB cycle rate.

None of this should make you wonder about audio. You just need enough buffering to cover USB cycles (125 us for USB 2). Any application delay
is typically /far/ longer, such as when collecting streaming audio from
a dodgy internet connection.

Please don't say USB 2. The number you cite is for hi-speed USB, regardless of the version of USB being used.

I wonder if you are confusing the two related kinds of latency - one-way latency (time difference between when an application starts to send something at one end, and the application at the other end has got the data), and two-way latency for a query-reply two-way communication. You might also be mixing up jitter in this.

Or not. The application sends messages two-ways as a means of preventing collisions on the RS-485 bus. The delay at the slave is near zero, approximately 0.5 us. The two messages are each 150 bits long, which on a 1.5 Mbps bus take 100 us to transmit.
Everything else is due to the equipment. With a 1 ms delay added, that's a 10x slowdown.

I say this because there are such critical differences between the needs
of audio and the needs of your communication. In particular, audio does
not care about two-way latencies, and can cope with significant one-way latency (up to perhaps 20 ms) even when there is video. Without video, latency is irrelevant for audio as long as the jitter is low.

Ok, then forget about audio. Far too much has been said about that already. Thank you.

At this point I am looking at using an Ethernet to serial module on each test fixture card and an Ethernet switch to connect them all to the PC. I don't like this in terms of the connectivity and the reliance on not just one, but two different vendors
to make it work. Also, most of the modules are either rather large or expensive, or from an Asian company with awkward documentation. They often design their modules without regard to height which make them skyscrapers compared to the rest of the board.
But I have a couple identified as potential candidates, but they will be much harder to test, since they need to be attached to a board.

--

Rick C.

+- Get 1,000 miles of free Supercharging
+- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Rick C on Sat Dec 3 12:55:29 2022

On Saturday, December 3, 2022 at 3:42:49 PM UTC-5, Rick C wrote:

On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown wrote:

On 30/11/2022 16:58, Rick C wrote:

On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel wrote:

On 30.11.2022 15:21, Rick C wrote:

It's kind of odd that FTDI has a hi-speed serial adapter with a
TTL level UART interface that runs up to 12 Mbps, while the
RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
Still, 3 Mbps will work a champ if the interface does not have
message handling delays. Same concern with the 12 Mbps TTL level
interface.

So what is there against you using such a 12 Mbps USB/serial thing
and attaching an RS-422/485 transceiver (e.g.
https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf). >>

That should meet all your requirements mentioned so far.

I heard back from FTDI and they only support polling rates up to 1
kHz. So I guess I'm stuck with Ethernet. I might be stuck with
changing the protocol. Someone suggested that the OS will interject delays as well. So I might have to either install 16 serial ports directly in the PC, or change th e protocol so the master talks to
all the slaves in a burst or a single broadcast command, and the
replies are controlled by a priority scheme so they are back to
back.

I didn't expect this to be the difficult part of the job.

I could also automate the test steps into the FPGA on each test
fixture board. But that makes the whole thing much less flexible
while developing.

The general issue is that PC's are great at throughput, but poor at latency. USB in particular has a scheduler and polls the devices on the bus at regular intervals. (This can't really be avoided in a
half-duplex master-slave system.) For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us. Even with a direct connection
with no switch, you'll be hard pushed to get latencies lower than 50 us, and thus a query-reply peak rate of 10,000 telegram pairs a second.

You can get higher throughput if you have multiple outstanding query-replies going to different USB devices or different IP
connections. So while you are not going to get more than 4000
send/receive transactions a second to one USB 2.0 high speed FTDI serial port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply from one target before sending a message to a different target. (The
same principle goes for Ethernet.)

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and broadcast, rather than divided up. You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier). The PC
could send a single massive serial telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big
telegram in - using 3 MBaud if you like.

I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've

been told that the messages will see delays in the adapters, but no one has indicated how they block the data. In the case of the FTDI adapter, the issue is the polling rate.

This is the format I'm currently thinking of
01 23 45 C\r\n - 11 chars
01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total of 26 chars per end point. At 3 Mbps a message takes 87 us to transmit on the serial bus for 11,500 messages a second, or 90 messages per second per end point. That certainly would do the

job, if I've done the math right. Even assuming other factors cut this rate in half, and it's still around 45 messages per end point each second.

I wish I had something I could run tests with. I suppose any old MCU board would do the job. All it needs to do is see the \n and return a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232

and I don't have RS-422 on an MCU card, but I do have TTL! I should be able to make that work from an RS-422 signal. The RS-422 receiver will work too, if I bias one input to ~1.5V.

Unfortunately I don't have the dongle yet, so the test will need to wait a bit. I could try it with an RS-232 dongle just to see how it will work at slower data rates. I think the fastest might be around 250 kbps.

Actually, I wasn't taking into account that the dummy characters only need to provide a small amount of delay to prevent slave collisions. The padding doesn't need to be as long as a slave message. So, with a 3 character difference in length, 4 char of
padding should suffice, and make the replies look almost like a continuous stream of characters.

I hate sending dummy characters though. They get in the way of debugging if you connect to the bus with an analyzer. But that shouldn't be needed, right? LOL In the first iteration of this test fixture, I had a bug in the FPGA code that showed up as
random characters being dropped or changed. It was hard to find because that code had been used elsewhere. It was a failure in the documentation (not unlike the Ariane rocket failure) that resulted in my omission of a synchronizing FF that should have
been at the input. The protocol that echos the command helped a LOT.

--

Rick C.

--- Get 1,000 miles of free Supercharging
--- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to David Brown on Sat Dec 3 12:42:46 2022

On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown wrote:

On 30/11/2022 16:58, Rick C wrote:

On Wednesday, November 30, 2022 at 11:11:25 AM UTC-4, Bernd Linsel
wrote:

On 30.11.2022 15:21, Rick C wrote:

It's kind of odd that FTDI has a hi-speed serial adapter with a
TTL level UART interface that runs up to 12 Mbps, while the
RS-422/485 UART interfaces only run full-speed, at up to 3 Mbps.
Still, 3 Mbps will work a champ if the interface does not have
message handling delays. Same concern with the 12 Mbps TTL level
interface.

So what is there against you using such a 12 Mbps USB/serial thing
and attaching an RS-422/485 transceiver (e.g.
https://www2.mouser.com/datasheet/2/256/MAX22025_MAX22028-1701782.pdf). >>

That should meet all your requirements mentioned so far.

I heard back from FTDI and they only support polling rates up to 1
kHz. So I guess I'm stuck with Ethernet. I might be stuck with
changing the protocol. Someone suggested that the OS will interject
delays as well. So I might have to either install 16 serial ports
directly in the PC, or change th e protocol so the master talks to
all the slaves in a burst or a single broadcast command, and the
replies are controlled by a priority scheme so they are back to
back.

I didn't expect this to be the difficult part of the job.

I could also automate the test steps into the FPGA on each test
fixture board. But that makes the whole thing much less flexible
while developing.

The general issue is that PC's are great at throughput, but poor at
latency. USB in particular has a scheduler and polls the devices on the
bus at regular intervals. (This can't really be avoided in a
half-duplex master-slave system.) For Ethernet, a gigibit switch will usually have a latency of 50 - 125 us. Even with a direct connection
with no switch, you'll be hard pushed to get latencies lower than 50 us,
and thus a query-reply peak rate of 10,000 telegram pairs a second.

You can get higher throughput if you have multiple outstanding
query-replies going to different USB devices or different IP
connections. So while you are not going to get more than 4000
send/receive transactions a second to one USB 2.0 high speed FTDI serial port device, you could probably do that simultaneously to several such devices on the same bus as long as you don't need to wait for the reply
from one target before sending a message to a different target. (The
same principle goes for Ethernet.)

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and broadcast, rather than divided up. You could even make an EtherCAT-style serial protocol (using the hybrid RS-422 bus you suggested earlier). The PC
could send a single massive serial telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast enough to be complete in good time before the next slave starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big
telegram in - using 3 MBaud if you like.

I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've
been told that the messages will see delays in the adapters, but no one has indicated how they block the data. In the case of the FTDI adapter, the issue is the polling rate.

This is the format I'm currently thinking of
01 23 45 C\r\n - 11 chars
01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total of 26 chars per end point. At 3 Mbps a message takes 87 us to transmit on the serial bus for 11,500 messages a second, or 90 messages per second per end point. That certainly would do the
job, if I've done the math right. Even assuming other factors cut this rate in half, and it's still around 45 messages per end point each second.

I wish I had something I could run tests with. I suppose any old MCU board would do the job. All it needs to do is see the \n and return a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232
and I don't have RS-422 on an MCU card, but I do have TTL! I should be able to make that work from an RS-422 signal. The RS-422 receiver will work too, if I bias one input to ~1.5V.

Unfortunately I don't have the dongle yet, so the test will need to wait a bit. I could try it with an RS-232 dongle just to see how it will work at slower data rates. I think the fastest might be around 250 kbps.

--

Rick C.

++ Get 1,000 miles of free Supercharging
++ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Rick C on Sun Dec 4 13:21:49 2022

On 03/12/2022 21:42, Rick C wrote:

On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown
wrote:

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and
broadcast, rather than divided up. You could even make an
EtherCAT-style serial protocol (using the hybrid RS-422 bus you
suggested earlier). The PC could send a single massive serial
telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast enough
to be complete in good time before the next slave starts. (Adjust
padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big
telegram in - using 3 MBaud if you like.

I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one
another. But I don't know if the replies will be seen as a unit for
shipping over Ethernet or USB by the adapter. I've been told that
the messages will see delays in the adapters, but no one has
indicated how they block the data. In the case of the FTDI adapter,
the issue is the polling rate.

This is the format I'm currently thinking of 01 23 45 C\r\n - 11
chars 01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total of
26 chars per end point. At 3 Mbps a message takes 87 us to transmit
on the serial bus for 11,500 messages a second, or 90 messages per
second per end point. That certainly would do the job, if I've done
the math right. Even assuming other factors cut this rate in half,
and it's still around 45 messages per end point each second.

Just to be clear - the slaves should not send any kind of dummy
characters. When they have read their part of the incoming stream, they
turn on their driver, send their reply, then turn off the driver.

The master side might need dummy characters for padding if the slave
replies (including any handling delay - the slaves might be fast, but
they still take some time) can be longer than the master side telegrams.

Each subtelegram in the master's telegram chain must be self-contained -
a start character, an ending CRC or simple checksum, and so on. Replies
from slaves must also be self-contained.

It doesn't matter how the USB-to-serial or Ethernet-to-serial adaptors
break up the messages - applications read the data as serial streams,
not synchronous timed data. The only timing you have is a pause between
master telegrams, which can be many milliseconds long, used to ensure
that if something has gone wrong or lost synchronisation, their
receiving state machine is reset and ready for the next round.

I wish I had something I could run tests with. I suppose any old MCU
board would do the job. All it needs to do is see the \n and return
a fixed response. The PC can perform this repeatedly and time it. Unfortunately, 3 Mbps is a bit fast for RS-232 and I don't have
RS-422 on an MCU card, but I do have TTL! I should be able to make
that work from an RS-422 signal. The RS-422 receiver will work too,
if I bias one input to ~1.5V.

Unfortunately I don't have the dongle yet, so the test will need to
wait a bit. I could try it with an RS-232 dongle just to see how it
will work at slower data rates. I think the fastest might be around
250 kbps.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to David Brown on Sun Dec 4 08:54:32 2022

On Sunday, December 4, 2022 at 7:21:56 AM UTC-5, David Brown wrote:

On 03/12/2022 21:42, Rick C wrote:

On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David Brown
wrote:

A communication hierarchy is likely the best way to handle this.

Alternatively, at the messages from the PC can be large and
broadcast, rather than divided up. You could even make an
EtherCAT-style serial protocol (using the hybrid RS-422 bus you
suggested earlier). The PC could send a single massive serial
telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast enough
to be complete in good time before the next slave starts. (Adjust
padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one big
telegram in - using 3 MBaud if you like.

I've been giving this some thought and it might work, but it's not guaranteed. This will prevent the slaves from talking over one
another. But I don't know if the replies will be seen as a unit for shipping over Ethernet or USB by the adapter. I've been told that
the messages will see delays in the adapters, but no one has
indicated how they block the data. In the case of the FTDI adapter,
the issue is the polling rate.

This is the format I'm currently thinking of 01 23 45 C\r\n - 11
chars 01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total of
26 chars per end point. At 3 Mbps a message takes 87 us to transmit
on the serial bus for 11,500 messages a second, or 90 messages per
second per end point. That certainly would do the job, if I've done
the math right. Even assuming other factors cut this rate in half,
and it's still around 45 messages per end point each second.

Just to be clear - the slaves should not send any kind of dummy
characters. When they have read their part of the incoming stream, they
turn on their driver, send their reply, then turn off the driver.

The master side might need dummy characters for padding if the slave
replies (including any handling delay - the slaves might be fast, but
they still take some time) can be longer than the master side telegrams.

Each subtelegram in the master's telegram chain must be self-contained -
a start character, an ending CRC or simple checksum, and so on. Replies
from slaves must also be self-contained.

It doesn't matter how the USB-to-serial or Ethernet-to-serial adaptors
break up the messages - applications read the data as serial streams,
not synchronous timed data. The only timing you have is a pause between master telegrams, which can be many milliseconds long, used to ensure
that if something has gone wrong or lost synchronisation, their
receiving state machine is reset and ready for the next round.

It absolutely does matter how the messages get broken up. That's where the delays come in. If the slave replies are sent over the network/USB bus one at a time, it's not significantly better than the original approach.

--

Rick C.

--+ Get 1,000 miles of free Supercharging
--+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Sun Dec 4 21:30:19 2022

Rick C <gnuarm.deletethisbit@gmail.com> wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10

Mbps. It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable

if there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or

maybe 2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16

responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The

application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times per

second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

I am not sure if you get that there are two issues: througput and latency.

Of course I'm aware of it. That's the entirety of the problem.

If you wait for answer before sending next request you will be bounded
by latency.

Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would

not implement the data path in an FPGA.

How do you know that data path is not in hardware? One question is
if hardware is able to opperate with low latency. Another is if it
should. And frequently answer to secend question is no, it should
not try to minimize latency. Namely, Ethernet has minimal packet
size which is about 60 characters. If you send each character in
separate packet, then there would be very bad utilization of media.
So, converter is expected to wait till there is enough characters
to transmit. Note that at 115200 bits/s delay of 1ms is roughly
11 characters, so not so big. At lower rates delay becomes less
signifincant and at higher rates people usually care more about
throughput than latency. And do not forget that Ethernet is
shared medium, even if convertor could manage to transmit with
lower latency withing available Ethernet bandwidth, it could
do that only at cost of other users (possibly second convertor).

And from a bit different point of view: normally there will be
software in the path, giving you 0.1ms of latency on good modern
unloaded hardware and much more in worse conditions. Also,
Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
Ethernet this is about 1.4 ms for transmitssion of packet.
If network in not dedicated to convertor such packets are likely
to appear from time to time and convertor has to wait till
such packet is fully transmitted and only then gets chance
to transmit. So, you should regularly expect delays of order
1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
media delays are smaller, but serial convertors are frequently
deployed in legacy contexts where 10 Mbit/s matter.

I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.

Audio is quite different than serial. Audio can be pre-scheduled
but in general you do not know when there will be traffic on
serial port.

OTOH if you fire several request without waiting, then
you will be limited by througput.

Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine

messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that
manipulate the details being tested.

Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more

problematic. Even 16 is a PITA.

Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled

by commands as well.

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need
the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

Well, my use is rather light. Most is for debugging at say 9600 or
115200. And when plugged in convertor mostly sits idle. I previously
wrote that CH340 did not work at 921600. More testing showed that
it actually worked, but speed was significantly different, I had to
set my MCU to 847000 communicate. This could be bug in Linux driver
(there is rather funky formula connecting speed to parameters
and it looks easy to get it wrong). Similary, when CH340 was set to 576800
I had to set MCU to 541300. Even after matching speed at nomial
576800, 921600 and 1152000 test time was much (more than 10 times)
higher than for other rates (I only tested 1 character messages at those
rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages
and catching up with longer messages). Still, ATM CH340 looks
resonably good.

Remark: I bought all my convertors from Chinese sellers. IIUC
FTDI chip is faked a lot, but other too. Still, I think they
show what is possible and illustrate some difficulties.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

It is a chip by Silicon Laboratories. Datasheet gives contact address
in Austin, TX.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

No. My slave simply returns back each received character. There is
some software delay but it should be less than 2us. So even waiting
test has some overlap at character level. To get more overlap above
I cheated: my test program was sending 1 more character than it should.
So sent message was 16 bytes, read was 15. After reading 15 another
batch of 16 was sent and so on. In total there were 10000 more
characters sent than received. My hope was that OS would read
and buffer excess characters, but it seems that at least for
CP2104 they cause trouble. My current guess is that OS is
reading only when requested, but I did not investigate deeper...

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

OK, I tried modifeed version of my test program. It first sends
k messages without reading anything, then goes to main loop where
after sending each message it read one. At the end it tail loop
which reads last k messages without sending anything. So, there
is k + 1 messages in transit: after sending message k + i program
waits for answer to message i. In total there is 10000 messages.
Results are:

CH340, 15 char message 20 char message
k = 0 6.869s 7.163s
k = 1 4.682s 1.320s
k = 2 0.992s 1.320s
k = 3 0.991s 1.319s
k = 4 0.991s 1.320s
k = 5 0.990s 1.319s
k = 8 0.992s 1.320s
k = 12 0.990s 1.320s
k = 20 0.992s 1.319s
k = 36 0.991s 1.321s
k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message
k = 0 2.508s 3.756s
k = 1 1.897s 1.993s
k = 2 1.668s 2.087s
k = 3 1.486s 1.887s
k = 4 1.457s 1.917s
k = 5 1.559s 1.877s
k = 8 1.455s 1.803s
k = 12 1.337s 1.501s
k = 20 1.123s 1.499s
k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R,
2 Mbit/s 15 char message 20 char message
k = 0 5.478s 3.755s
k = 1 4.929s 3.030s
k = 2 2.506s 3.339s
k = 3 2.459s 2.020s
k = 4 1.708s 1.061s
k = 5 1.671s 1.032s
k = 8 0.764s 1.021s
k = 12 0.772s 1.014s
k = 20 0.763s 1.009s
k = 36 0.758s 1.007s
k = 128 0.757s 1.008s

FTDI232R,
3 Mbit/s 15 char message 20 char message
k = 0 8.216s 10.007s
k = 1 5.006s 4.344s
k = 2 3.338s 1.602s
k = 3 2.406s 1.444s
k = 4 1.766s 1.316s
k = 5 1.599s 1.673s
k = 8 1.040s 1.327s
k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

With PL2303HX at 2 Mbit/s I had a lot of transmission errors,
so did not test speed.

The other end was STM32F030, which was simply replaying back
received characters.

Note: there results are not fully comparable. Apparently CH340
will silently drop excess characters, so for overalapped operation
I simply sent more charactes than I read. OTOH CP2104 seem to
stall when its receive buffer overflows, so I limited overlap to
avoid stalls. Of course real application would need some way
to ensure that receive buffers do not overflow.

Wait, what? How would overlapped operation operate if you have to worry about lost characters???

I'm not sure what "stall" means. Did it send XOFF or something?

My program uses blocking system calls, it did not finish in resonable
time. I did not investigate deeper. ATM I assume that OS/driver
is correct os that my program would get characters if convertor
delivered them. I also assume that MCU is fast enough to avoid
loss of any character (character processing should be less than
2us, at 2 Mbit/s I have 5us per character). In inital test
I have sent more characters then I wanted receive, so loss of
some characters would not stop the program (OK, loss of more than
10000 would be too much). I this batch of tests I sent exactly
the number of characters that I wanted to receive, so loss of
any would cause infinite wait.

Any idea on what size of aggregated messages would prevent character loss? That's kind of important.

Each convertor has finite transmission and receive buffers.
Accordinng to datasheet CP2104 have 576 character receive buffer.
For other I do now have numbers handy, but I would expect something
between 200 characters and kilobyte. When characters arrive via
serial port they fill receive buffers. Driver/OS/user program have
to promptly read them. When doing first test my hope was that
OS/driver will read characters from convertor and store them
is system buffer. But then I saw stalls with CP2104. After I have
seen this my guess was that in my test I overflowed CP2104 receive
buffer (in my initial test I was sending 10000 characters more than
I received, so much more than receive buffer size). However I have
seen stalls with k = 18 and message size 15. And even with k = 0 and
message size 20. In both cases new test program guaranteed that amount
of data in transit was much smaller than stated buffer size.
So, at least for CP2104 there must be some other reason.

So, you should be easily able to handle 10000 round trips
per second provided there is enough overlap. For this
you need to ensure that only one device is transmitting to
PC. If you have several FPGA-s on a single board, coordinating
them should be easy. Of couse, you need some free pins and
extra tracks. I would use single transceiver per board,
depending on coordination to ensure that only one FPGA
controls transceiver at given time. Anyway, this would
allow overlapped transmisson to all devices on single
board. With multiple boards you would need some hardware
or software protocol decide which board can transmit.
On hardware side a single pair of extra wires could
carry needed signals (that is your "priority daisy chain").

Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then

passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.

As other suggested you could use multiple convertors for
better overlap. My convertors are "full speed" USB, that
is they are half-duplex 12 Mb/s. USB has significant
protocol overhead, so probably two 2 Mb/s duplex serial
convertes would saturate single USB bus. In desktops
it is normal to have several separate USB controllers
(buses), but that depends on specific motherboard.
Theoreticaly, when using "high speed" USB converters,
several could easily work from single USB port (provided
that you have enough places in hub(s)).

I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,

their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

But the Ethernet implementations also have speed issues, likely because they are actually software based.

The issues are more fundamental: both in USB and Ethernet there
is per message/packet overhead. Low latency means sending data
soon after it is available, which means small packets/messages.
But due to overheads small packets are bad for throughput.
So designers have to choose what they value more and in both
cases the whole system is normally optimized for throughput.

An extra thing: there are reasonably cheap PC compatible
boards, supposedly they are cheaper and more easy to buy
than Raspberry Pi (but I did not try buy them). If you
need really large scale you could have a single such board
per batch of devices and run copy of your program there. And
a single laptop connecting to satelite board via ethernet
and collecting results.

Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the

overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast
the MSP430 UART will run?

MSP430G2553 theoretically allows setting quite high rates like 4 Mbit/s,
but it is not clear it it will run (if noise immunity is good enough).
AFAICS 1 Mbit/s is supposed to work. Other thing is software speed,
I think that software can handle 1 Mbit/s, but probably not more.

I might have an ARM board that runs Mecrisp, I can't recall.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Sun Dec 4 14:57:18 2022

On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10

Mbps. It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable

if there were no other delays.

While investigating other units, I found some Ethernet to serial devices and found some claim the serial port can run at up to 3.7 Mbps. But when I contacted them, they said each message has a 1 ms delay, so that's only 500 pairs per second, or

maybe 2 pairs per second per channel. That's slow!

They have multi-port boxes, up to 16, so I've asked them if they will run with a larger aggregate rate, or if the delay on one port impacts all of them.

I've also found another vendor with a similar product, and I've asked about that too.

I'm surprised and disappointed the Ethernet devices have such delays. I would have expected them to work better given their rather high prices.

I could add a module, to interface between the PC serial port and the 16 test fixtures. It would allow the test application on the PC to send messages to all 16 test fixtures in a row. The added module would receive on separate lines, the 16

responses and stream them out to the port to the PC as one, continuous message. This is a bit messier since now, the 16 lines from this new module would need to be marked since they have to plug into the right test fixture each day.

Or, if I could devise a manner of assigning priority, the slaves could all manage the priority themselves and still share the receive bus to the serial port on the PC. Again, this would look like one long message to the port and the PC. The

application program would see the individual messages and parse them separately. Many of the commands from the PC could actually be shortened to a single, broadcast command since the same tests are done on all targets in parallel. So using an RJ-45
connector, there would be the two pairs for the serial port, and two pairs for the priority daisy-chain.

I guess I'm thinking out loud here.

LOL, so now I'm leaning back toward the USB based FTDI RS-422 cable and a priority scheme so every target gets many, more commands per second. I just ran the math, and this would be almost 20,000 bits per command. Try to run that at 8,000 times

per second and a 100 Mbps Ethernet port won't keep up.

I've written to FTDI about the actual throughput I can expect with their cables. We'll see what they come back with.

I am not sure if you get that there are two issues: througput and latency.

Of course I'm aware of it. That's the entirety of the problem.

If you wait for answer before sending next request you will be bounded by latency.

Until I contacted the various vendors, I had no reason to expect their hardware to have such excessive latencies. Especially in the Ethernet converter, I would have expected better hardware. Being an FPGA sort of guy, I didn't even realize they would

not implement the data path in an FPGA.

How do you know that data path is not in hardware?

Not only did the vendor tell me it's through a CPU, he laughed at the idea of implementing Ethernet in an FPGA. That's when I sent him a link to the TBERD product line I had worked on around 2000.

One question is
if hardware is able to opperate with low latency. Another is if it
should. And frequently answer to secend question is no, it should
not try to minimize latency. Namely, Ethernet has minimal packet
size which is about 60 characters. If you send each character in
separate packet, then there would be very bad utilization of media.
So, converter is expected to wait till there is enough characters
to transmit.

At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What you
are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.

Note that at 115200 bits/s delay of 1ms is roughly
11 characters, so not so big.

We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.

At lower rates delay becomes less
signifincant and at higher rates people usually care more about
throughput than latency. And do not forget that Ethernet is
shared medium, even if convertor could manage to transmit with
lower latency withing available Ethernet bandwidth, it could
do that only at cost of other users (possibly second convertor).

Most Ethernet is not shared, rather point to point. In this case it definitely is not.

And from a bit different point of view: normally there will be
software in the path, giving you 0.1ms of latency on good modern
unloaded hardware and much more in worse conditions.

Ok, now it sounds like you are agreeing with me that the hardware is poor.

Also,
Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
Ethernet this is about 1.4 ms for transmitssion of packet.

No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a
bad idea to saddle them with such throughput killers.

If network in not dedicated to convertor such packets are likely
to appear from time to time and convertor has to wait till
such packet is fully transmitted and only then gets chance
to transmit. So, you should regularly expect delays of order
1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
media delays are smaller, but serial convertors are frequently
deployed in legacy contexts where 10 Mbit/s matter.

Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?

I found one company that does use an FPGA for a USB to serial adapter, but I expect the PC side USB software may be problematic as well. It makes you wonder how they ever get audio to work over USB. I guess lots of buffering.

Audio is quite different than serial. Audio can be pre-scheduled
but in general you do not know when there will be traffic on
serial port.

OTOH if you fire several request without waiting, then
you will be limited by througput.

Yes, but the current protocol using a single target works with one command at a time. In ignorance of the many problems with serial port converters, I was planning to use the same protocol. I have several new ideas, including various ways to combine

messages to multiple targets, into one message. Or... I could move the details of the various tests into the target FPGAs, so they receive a command to test function X, rather than the multiple commands to write and read various registers that manipulate
the details being tested.

Concerns with this include the need to reload all the FPGAs, any time the are updated with a new test feature, or bug fix. That's probably 64 FPGAs. I could use one FPGA per test fixture, for a total of 16, but that makes the routing a bit more

problematic. Even 16 is a PITA.

Also, I've relied on monitoring the command stream to spot bugs. That would require attaching a serial debugger of some sort to the interface to the UUT, and the internal test controller would be much harder to observe. Currently, that is controlled

by commands as well.

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need
the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

Well, my use is rather light. Most is for debugging at say 9600 or
115200. And when plugged in convertor mostly sits idle. I previously
wrote that CH340 did not work at 921600. More testing showed that
it actually worked, but speed was significantly different, I had to
set my MCU to 847000 communicate. This could be bug in Linux driver
(there is rather funky formula connecting speed to parameters
and it looks easy to get it wrong). Similary, when CH340 was set to 576800
I had to set MCU to 541300. Even after matching speed at nomial
576800, 921600 and 1152000 test time was much (more than 10 times)
higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages
and catching up with longer messages). Still, ATM CH340 looks
resonably good.

Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name
brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

Remark: I bought all my convertors from Chinese sellers. IIUC
FTDI chip is faked a lot, but other too. Still, I think they
show what is possible and illustrate some difficulties.

FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

It is a chip by Silicon Laboratories. Datasheet gives contact address
in Austin, TX.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

No. My slave simply returns back each received character. There is
some software delay but it should be less than 2us. So even waiting
test has some overlap at character level. To get more overlap above
I cheated: my test program was sending 1 more character than it should.
So sent message was 16 bytes, read was 15. After reading 15 another
batch of 16 was sent and so on. In total there were 10000 more
characters sent than received. My hope was that OS would read
and buffer excess characters, but it seems that at least for
CP2104 they cause trouble. My current guess is that OS is
reading only when requested, but I did not investigate deeper...

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

OK, I tried modifeed version of my test program. It first sends
k messages without reading anything, then goes to main loop where
after sending each message it read one. At the end it tail loop
which reads last k messages without sending anything. So, there
is k + 1 messages in transit: after sending message k + i program
waits for answer to message i. In total there is 10000 messages.
Results are:

CH340, 15 char message 20 char message
k = 0 6.869s 7.163s
k = 1 4.682s 1.320s
k = 2 0.992s 1.320s
k = 3 0.991s 1.319s
k = 4 0.991s 1.320s
k = 5 0.990s 1.319s
k = 8 0.992s 1.320s
k = 12 0.990s 1.320s
k = 20 0.992s 1.319s
k = 36 0.991s 1.321s
k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message
k = 0 2.508s 3.756s
k = 1 1.897s 1.993s
k = 2 1.668s 2.087s
k = 3 1.486s 1.887s
k = 4 1.457s 1.917s
k = 5 1.559s 1.877s
k = 8 1.455s 1.803s
k = 12 1.337s 1.501s
k = 20 1.123s 1.499s
k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R,
2 Mbit/s 15 char message 20 char message
k = 0 5.478s 3.755s
k = 1 4.929s 3.030s
k = 2 2.506s 3.339s
k = 3 2.459s 2.020s
k = 4 1.708s 1.061s
k = 5 1.671s 1.032s
k = 8 0.764s 1.021s
k = 12 0.772s 1.014s
k = 20 0.763s 1.009s
k = 36 0.758s 1.007s
k = 128 0.757s 1.008s

FTDI232R,
3 Mbit/s 15 char message 20 char message
k = 0 8.216s 10.007s
k = 1 5.006s 4.344s
k = 2 3.338s 1.602s
k = 3 2.406s 1.444s
k = 4 1.766s 1.316s
k = 5 1.599s 1.673s
k = 8 1.040s 1.327s
k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the
polling rate... if you adjust it. I think the default for FTDI was 8 ms.

Thanks for doing this.

With PL2303HX at 2 Mbit/s I had a lot of transmission errors,
so did not test speed.

The other end was STM32F030, which was simply replaying back
received characters.

Note: there results are not fully comparable. Apparently CH340
will silently drop excess characters, so for overalapped operation
I simply sent more charactes than I read. OTOH CP2104 seem to
stall when its receive buffer overflows, so I limited overlap to
avoid stalls. Of course real application would need some way
to ensure that receive buffers do not overflow.

Wait, what? How would overlapped operation operate if you have to worry about lost characters???

I'm not sure what "stall" means. Did it send XOFF or something?

My program uses blocking system calls, it did not finish in resonable
time. I did not investigate deeper. ATM I assume that OS/driver
is correct os that my program would get characters if convertor
delivered them. I also assume that MCU is fast enough to avoid
loss of any character (character processing should be less than
2us, at 2 Mbit/s I have 5us per character). In inital test
I have sent more characters then I wanted receive, so loss of
some characters would not stop the program (OK, loss of more than
10000 would be too much). I this batch of tests I sent exactly
the number of characters that I wanted to receive, so loss of
any would cause infinite wait.

Any idea on what size of aggregated messages would prevent character loss? That's kind of important.

Each convertor has finite transmission and receive buffers.
Accordinng to datasheet CP2104 have 576 character receive buffer.
For other I do now have numbers handy, but I would expect something
between 200 characters and kilobyte. When characters arrive via
serial port they fill receive buffers. Driver/OS/user program have
to promptly read them. When doing first test my hope was that
OS/driver will read characters from convertor and store them
is system buffer. But then I saw stalls with CP2104. After I have
seen this my guess was that in my test I overflowed CP2104 receive
buffer (in my initial test I was sending 10000 characters more than
I received, so much more than receive buffer size). However I have
seen stalls with k = 18 and message size 15. And even with k = 0 and
message size 20. In both cases new test program guaranteed that amount
of data in transit was much smaller than stated buffer size.
So, at least for CP2104 there must be some other reason.

So, you should be easily able to handle 10000 round trips
per second provided there is enough overlap. For this
you need to ensure that only one device is transmitting to
PC. If you have several FPGA-s on a single board, coordinating
them should be easy. Of couse, you need some free pins and
extra tracks. I would use single transceiver per board,
depending on coordination to ensure that only one FPGA
controls transceiver at given time. Anyway, this would
allow overlapped transmisson to all devices on single
board. With multiple boards you would need some hardware
or software protocol decide which board can transmit.
On hardware side a single pair of extra wires could
carry needed signals (that is your "priority daisy chain").

Yes, the test fixture boards have to be set up each day and to make it easy to connect, (and no backplane) I was planning to have two RJ-45 connectors on the front panel. A short jumper would string the RS-422 ports together.

My thinking, if the aggregated commands were needed, was to use the other pins for "handshake" lines to implement a priority chain for the replies. The master sets the flag when starting to transmit. The first board gives all the needed replies, then

passes the flag on to the next board. When the last reply is received by the master, the flag is removed and the process is restarted.

As other suggested you could use multiple convertors for
better overlap. My convertors are "full speed" USB, that
is they are half-duplex 12 Mb/s. USB has significant
protocol overhead, so probably two 2 Mb/s duplex serial
convertes would saturate single USB bus. In desktops
it is normal to have several separate USB controllers
(buses), but that depends on specific motherboard.
Theoreticaly, when using "high speed" USB converters,
several could easily work from single USB port (provided
that you have enough places in hub(s)).

I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,

their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

But the Ethernet implementations also have speed issues, likely because they are actually software based.

The issues are more fundamental: both in USB and Ethernet there
is per message/packet overhead. Low latency means sending data
soon after it is available, which means small packets/messages.
But due to overheads small packets are bad for throughput.
So designers have to choose what they value more and in both
cases the whole system is normally optimized for throughput.

With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.

An extra thing: there are reasonably cheap PC compatible
boards, supposedly they are cheaper and more easy to buy
than Raspberry Pi (but I did not try buy them). If you
need really large scale you could have a single such board
per batch of devices and run copy of your program there. And
a single laptop connecting to satelite board via ethernet
and collecting results.

Yeah, but more complexity. Maybe it doesn't need to run so fast. I've been working with the idea that it is not a hard thing to do, but I just keep finding more and more problems.

The one approach that seems to have the best chance at running very fast, is a PCIe board with 4 or 8 ports. I'd have to use an embedded PC, or at least a mini-tower or something. Many of these seem to have rather low end x86 CPUs. There's also the

overhead of the PC OS, so maybe I need to do some testing before I worry with this further. I have one FTDI cable. I can use an embedded MCU board for the other end I suppose. It will give me a chance to get back into Mecrisp Forth. I wonder how fast the
MSP430 UART will run?

MSP430G2553 theoretically allows setting quite high rates like 4 Mbit/s,
but it is not clear it it will run (if noise immunity is good enough). AFAICS 1 Mbit/s is supposed to work. Other thing is software speed,
I think that software can handle 1 Mbit/s, but probably not more.

I have an FTDI adapter which I will try running my own tests with. To be realistic, they should be with a target, but that might be a problem just now. We'll see what I can cobble up. Most of my stuff is not convenient at the moment.

I was playing with it using Putty, but that's not the best terminal emulator in the world. I can't get it to show control characters or use different colors for transmit and receive. Heck, maybe I'm just being stupid, but I can't find how to send a
file through the port. I'm pretty sure I've done that using Putty before, because that's how you compile programs on an embedded Forth. You simply send the file through the serial port like you were typing it.

--

Rick C.

-+- Get 1,000 miles of free Supercharging
-+- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Mon Dec 5 03:33:22 2022

Rick C <gnuarm.deletethisbit@gmail.com> wrote:

On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

One question is
if hardware is able to opperate with low latency. Another is if it
should. And frequently answer to secend question is no, it should
not try to minimize latency. Namely, Ethernet has minimal packet
size which is about 60 characters. If you send each character in
separate packet, then there would be very bad utilization of media.
So, converter is expected to wait till there is enough characters
to transmit.

At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What

you are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.

Well, delay from Ethernet to serial port clearly means that implementer
did not spent enough effort to make it fast.

Note that at 115200 bits/s delay of 1ms is roughly
11 characters, so not so big.

We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.

At lower rates delay becomes less
signifincant and at higher rates people usually care more about
throughput than latency. And do not forget that Ethernet is
shared medium, even if convertor could manage to transmit with
lower latency withing available Ethernet bandwidth, it could
do that only at cost of other users (possibly second convertor).

Most Ethernet is not shared, rather point to point. In this case it definitely is not.

You were talking about connecting more convertors. Normally laptops
have only single Ethernet port, so all convertors that you connect
will share single Ethernet. If you use 100 convertors, 3 Mbits/s each
+ switches it should be possible to get 30 Mbytes/s of aggregate bandwidth (assuming gigabyte port in laptop and gigabyte switch at top of tree).
But if each converter would waste a lot of bandwidth due to small
payload per packet, then such rate would be impossible.

And from a bit different point of view: normally there will be
software in the path, giving you 0.1ms of latency on good modern
unloaded hardware and much more in worse conditions.

Ok, now it sounds like you are agreeing with me that the hardware is poor.

Also,
Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
Ethernet this is about 1.4 ms for transmitssion of packet.

No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a

bad idea to saddle them with such throughput killers.

It is your planned use that would kill throughput. I would expect
that when product is used as intended you would get resonable fraction
(say 70%) of nominal throughput (that is 2*16*3Mbits/s). If not,
then I will join you in calling it bad product.

If network in not dedicated to convertor such packets are likely
to appear from time to time and convertor has to wait till
such packet is fully transmitted and only then gets chance
to transmit. So, you should regularly expect delays of order
1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
media delays are smaller, but serial convertors are frequently
deployed in legacy contexts where 10 Mbit/s matter.

Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?

Sometimes you get product designed for 10 Mbit/s which just got faster
Ethernet part to be good citizen on fast network. Above you wrote
about 16 port thing. That should be designed for faster network, but
on common 100 Mbit/s Ethernet running ports in parallel it would be
limited by Ethernet troughput. And even on 1 Gbit/s Ethernet it
needs enough bandwidth that you can not waste it even if it is
the only thing on the network. And just a litte thing: you
wrote Ethernet, but raw Ethernet is problematic on PC OSes.
So I would guess that you really mean TCP/IP over Ethernet.
TCP requires every packet to be acknowleged, which may add more
small-packet trafic.

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need
the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

Well, my use is rather light. Most is for debugging at say 9600 or
115200. And when plugged in convertor mostly sits idle. I previously
wrote that CH340 did not work at 921600. More testing showed that
it actually worked, but speed was significantly different, I had to
set my MCU to 847000 communicate. This could be bug in Linux driver
(there is rather funky formula connecting speed to parameters
and it looks easy to get it wrong). Similary, when CH340 was set to 576800 I had to set MCU to 541300. Even after matching speed at nomial
576800, 921600 and 1152000 test time was much (more than 10 times)
higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly slower than 460800 (but "merely" 2 times slower for 1 character messages and catching up with longer messages). Still, ATM CH340 looks
resonably good.

Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name

brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

Well, I say you what I observed. People say various thing on the
net. I was interested if net know something about my trouble with
CP2104 so I googled for "CP2104 lockup". And I got a bunch of
complaints about FTDI devices, solved by using CP2104. So, there
is a lot of noise and ATM I prefer to stay with what I see.

Remark: I bought all my convertors from Chinese sellers. IIUC
FTDI chip is faked a lot, but other too. Still, I think they
show what is possible and illustrate some difficulties.

FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

AFAIK Linux driver does not discriminate againt non-FTDI devices.
So fact that convertors works with Linux driver tells you nothing
about its origin. And for the record, I bought mine several years
ago.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

It is a chip by Silicon Laboratories. Datasheet gives contact address
in Austin, TX.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

No. My slave simply returns back each received character. There is
some software delay but it should be less than 2us. So even waiting
test has some overlap at character level. To get more overlap above
I cheated: my test program was sending 1 more character than it should.
So sent message was 16 bytes, read was 15. After reading 15 another
batch of 16 was sent and so on. In total there were 10000 more
characters sent than received. My hope was that OS would read
and buffer excess characters, but it seems that at least for
CP2104 they cause trouble. My current guess is that OS is
reading only when requested, but I did not investigate deeper...

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

OK, I tried modifeed version of my test program. It first sends
k messages without reading anything, then goes to main loop where
after sending each message it read one. At the end it tail loop
which reads last k messages without sending anything. So, there
is k + 1 messages in transit: after sending message k + i program
waits for answer to message i. In total there is 10000 messages.
Results are:

CH340, 15 char message 20 char message
k = 0 6.869s 7.163s
k = 1 4.682s 1.320s
k = 2 0.992s 1.320s
k = 3 0.991s 1.319s
k = 4 0.991s 1.320s
k = 5 0.990s 1.319s
k = 8 0.992s 1.320s
k = 12 0.990s 1.320s
k = 20 0.992s 1.319s
k = 36 0.991s 1.321s
k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message
k = 0 2.508s 3.756s
k = 1 1.897s 1.993s
k = 2 1.668s 2.087s
k = 3 1.486s 1.887s
k = 4 1.457s 1.917s
k = 5 1.559s 1.877s
k = 8 1.455s 1.803s
k = 12 1.337s 1.501s
k = 20 1.123s 1.499s
k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R,
2 Mbit/s 15 char message 20 char message
k = 0 5.478s 3.755s
k = 1 4.929s 3.030s
k = 2 2.506s 3.339s
k = 3 2.459s 2.020s
k = 4 1.708s 1.061s
k = 5 1.671s 1.032s
k = 8 0.764s 1.021s
k = 12 0.772s 1.014s
k = 20 0.763s 1.009s
k = 36 0.758s 1.007s
k = 128 0.757s 1.008s

FTDI232R,
3 Mbit/s 15 char message 20 char message
k = 0 8.216s 10.007s
k = 1 5.006s 4.344s
k = 2 3.338s 1.602s
k = 3 2.406s 1.444s
k = 4 1.766s 1.316s
k = 5 1.599s 1.673s
k = 8 1.040s 1.327s
k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the

polling rate... if you adjust it. I think the default for FTDI was 8 ms.

Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte
blocks (they say that actual payload is 62-bytes and there are 2-bytes
of protocol info). With 15 characters messages 0.764s really means
98% of use of serial bandwidth, so essentiall as good as possible. Corresponding k = 8 means really 9 messages in transit, so 135
characters which is slightly more than 2 buffers. More data in
transit does not help, but also does not make things worse.
With 20 charaster messages main improvement is at k = 4 which
means 100 characters, which is smaller than 2 buffers, with extra
improvements for more data in transit. With CH340 and 15 char
messages we see main improvement for k = 2, which corresponds
to 45 characters in transit. With 20 char messages we get
impovement for k = 1 which is 40 charactes in transit.
CH340 uses 32 character transfer buffers, so improvemnet corresponds
to somwhat more than 1 buffer in transit. Now, if transfers
between converter and PC were at optimal times, then one buffer
+ one character would be enough to get full serial speed. But
USB tranfers can not be started at arbitrary times, IIUC there
are discrete time slots when transfer can occur. When tranfer
can not be done in given slot it must wait for next slot.
So, depending on locations of possible slots more buffering
and more data in transit may be needed for optimal performance.
OTOH 2-3 buffers should be enough to allow PC to get full
bandwidth and this is in good agreement with FTDI results.
In case of CH340 there is extra factor: CH340 also uses 8 byte
transfers. I do not know what function they have, but
resonably likely guess is that those 8 byte pack tranfer control
info that FTDI bundles with normal data. Anyway, those
are "interrupt" tranfers in USB sense, so have higher priority
than data transfer. Resonable guess it that they steal some
USB bandwith from data tranfers. Also, smaller than maximal
data block size limits efficiency, so it is possible that
CH340 is limited by USB bandwith (lack of enough slots).

Now, concerning 3 Mbits/s, due to different serial speed
optimal times for transfers are different than in 2 Mbits/s
case. It is possible that there is worse fit of desired
and possible transfer times. Buffering allows to at least
partially cure this, so initial improvement. But clearly,
there is some extra bottleneck. Now some speculation:
with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
cycle. I would have to look at spec to be sure, but this
is close to 150 byte worst case FS transfer. Beside data
there is some USB protocol overhead and (speculatively) it
is possible that low level USB diver may refuse to schedule
two 64-byte transfers in single cycle. In such case effective
bandwith for serial data would be 4096000 bits, which
correspond to 5120000 serial bits (serial sends start and stop
bits which are not needed for USB). This is less than
full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
must go trouh the same USB). With larger amount of data in
transit this could give wild oscilations in amount of
buffered data, leading to slowdown when buffers get empty
and giving stall when receive buffer overflows.

Of course there is another speculation: convertor may be fake.
Supposedly fakes use MCU-s with special program. Software
could crate delays which limit transfer rate at 3 Mbits/s
and lead to data loss/stall with more data in transit.

As other suggested you could use multiple convertors for
better overlap. My convertors are "full speed" USB, that
is they are half-duplex 12 Mb/s. USB has significant
protocol overhead, so probably two 2 Mb/s duplex serial
convertes would saturate single USB bus. In desktops
it is normal to have several separate USB controllers
(buses), but that depends on specific motherboard.
Theoreticaly, when using "high speed" USB converters,
several could easily work from single USB port (provided
that you have enough places in hub(s)).

I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,

their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

But the Ethernet implementations also have speed issues, likely because they are actually software based.

The issues are more fundamental: both in USB and Ethernet there
is per message/packet overhead. Low latency means sending data
soon after it is available, which means small packets/messages.
But due to overheads small packets are bad for throughput.
So designers have to choose what they value more and in both
cases the whole system is normally optimized for throughput.

With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.

Well, you wrote that you have needed experience, so do low-latency Ethernet-serial convertor based on FPGA. Give your numbers and look
how many customers come in.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Sun Dec 4 22:39:24 2022

On Sunday, December 4, 2022 at 10:33:28 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

One question is
if hardware is able to opperate with low latency. Another is if it should. And frequently answer to secend question is no, it should
not try to minimize latency. Namely, Ethernet has minimal packet
size which is about 60 characters. If you send each character in separate packet, then there would be very bad utilization of media.
So, converter is expected to wait till there is enough characters
to transmit.

At the high serial rates we are talking about (3 Mbps) waiting for milliseconds is a bit absurd. Why? Because it can cause exactly the poor results I'm talking about. But the vendor with the Ethernet box never said the delay was intentional. What you

are talking about would only be on the slave to master path. Why would data received from the network be delayed before sending to the slave? That sounds like a recipe for disaster.

Well, delay from Ethernet to serial port clearly means that implementer
did not spent enough effort to make it fast.

I won't argue with that!

Note that at 115200 bits/s delay of 1ms is roughly
11 characters, so not so big.

We are not talking about 0.1 Mbps, rather 30x faster, 3 Mbps.

At lower rates delay becomes less
signifincant and at higher rates people usually care more about throughput than latency. And do not forget that Ethernet is
shared medium, even if convertor could manage to transmit with
lower latency withing available Ethernet bandwidth, it could
do that only at cost of other users (possibly second convertor).

Most Ethernet is not shared, rather point to point. In this case it definitely is not.

You were talking about connecting more convertors. Normally laptops
have only single Ethernet port, so all convertors that you connect
will share single Ethernet. If you use 100 convertors, 3 Mbits/s each
+ switches it should be possible to get 30 Mbytes/s of aggregate bandwidth (assuming gigabyte port in laptop and gigabyte switch at top of tree).
But if each converter would waste a lot of bandwidth due to small
payload per packet, then such rate would be impossible.

Good thing we aren't trying to use 100 converters. The vendors who produce 4, 8 and 16 port versions don't do much to make them fast actually. I think the matter of small messages, just don't come up often enough to be on their radar.

And from a bit different point of view: normally there will be
software in the path, giving you 0.1ms of latency on good modern unloaded hardware and much more in worse conditions.

Ok, now it sounds like you are agreeing with me that the hardware is poor.

Also,
Ethernet likes packets of about 1400 bytes size. On 10 Mbit/s
Ethernet this is about 1.4 ms for transmitssion of packet.

No one is using 10 Mbps. If needed, I would use 1 Gbps. I'm assuming that's not required. But you keep talking about "likes" and shared which don't apply here, at all. If a product is going to support up to 16 ports at 3 Mbps each, it seems like a

bad idea to saddle them with such throughput killers.

It is your planned use that would kill throughput. I would expect
that when product is used as intended you would get resonable fraction
(say 70%) of nominal throughput (that is 2*16*3Mbits/s). If not,
then I will join you in calling it bad product.

"Intended"!? I saw nothing in any document that said serial port traffic had to meet any particular specifications. They didn't set this sort of spec when they designed the product. It happened that it had this limitation and someone said, "Good
enough, ship it"!

If network in not dedicated to convertor such packets are likely
to appear from time to time and convertor has to wait till
such packet is fully transmitted and only then gets chance
to transmit. So, you should regularly expect delays of order
1ms. Of course, with 100 Mbit/s Ethernet or gigabit one
media delays are smaller, but serial convertors are frequently
deployed in legacy contexts where 10 Mbit/s matter.

Now you are being silly. If they design the equipment to work on 100 Mbps or even 1 Gbps Ethernet, you think it's reasonable for them to limit it to what they can do on 10 Mbps?

Sometimes you get product designed for 10 Mbit/s which just got faster Ethernet part to be good citizen on fast network. Above you wrote
about 16 port thing. That should be designed for faster network, but
on common 100 Mbit/s Ethernet running ports in parallel it would be
limited by Ethernet troughput. And even on 1 Gbit/s Ethernet it
needs enough bandwidth that you can not waste it even if it is
the only thing on the network. And just a litte thing: you
wrote Ethernet, but raw Ethernet is problematic on PC OSes.
So I would guess that you really mean TCP/IP over Ethernet.
TCP requires every packet to be acknowleged, which may add more
small-packet trafic.

I'm not sure what your point is. But it is not important. We are discussing nits at this point.

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need
the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

Well, my use is rather light. Most is for debugging at say 9600 or 115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that
it actually worked, but speed was significantly different, I had to
set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters
and it looks easy to get it wrong). Similary, when CH340 was set to 576800
I had to set MCU to 541300. Even after matching speed at nomial
576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those rates, did not want to wait for full test). Also, 500000 was significantly
slower than 460800 (but "merely" 2 times slower for 1 character messages and catching up with longer messages). Still, ATM CH340 looks
resonably good.

Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a name

brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

Well, I say you what I observed. People say various thing on the
net. I was interested if net know something about my trouble with
CP2104 so I googled for "CP2104 lockup". And I got a bunch of
complaints about FTDI devices, solved by using CP2104. So, there
is a lot of noise and ATM I prefer to stay with what I see.

What sort of complaints about FTDI? Did you contact them about it?

Remark: I bought all my convertors from Chinese sellers. IIUC
FTDI chip is faked a lot, but other too. Still, I think they
show what is possible and illustrate some difficulties.

FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

AFAIK Linux driver does not discriminate againt non-FTDI devices.
So fact that convertors works with Linux driver tells you nothing
about its origin. And for the record, I bought mine several years
ago.

I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the
bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

It is a chip by Silicon Laboratories. Datasheet gives contact address
in Austin, TX.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

No. My slave simply returns back each received character. There is
some software delay but it should be less than 2us. So even waiting
test has some overlap at character level. To get more overlap above
I cheated: my test program was sending 1 more character than it should. So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read
and buffer excess characters, but it seems that at least for
CP2104 they cause trouble. My current guess is that OS is
reading only when requested, but I did not investigate deeper...

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

OK, I tried modifeed version of my test program. It first sends
k messages without reading anything, then goes to main loop where
after sending each message it read one. At the end it tail loop
which reads last k messages without sending anything. So, there
is k + 1 messages in transit: after sending message k + i program
waits for answer to message i. In total there is 10000 messages.
Results are:

CH340, 15 char message 20 char message
k = 0 6.869s 7.163s
k = 1 4.682s 1.320s
k = 2 0.992s 1.320s
k = 3 0.991s 1.319s
k = 4 0.991s 1.320s
k = 5 0.990s 1.319s
k = 8 0.992s 1.320s
k = 12 0.990s 1.320s
k = 20 0.992s 1.319s
k = 36 0.991s 1.321s
k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message
k = 0 2.508s 3.756s
k = 1 1.897s 1.993s
k = 2 1.668s 2.087s
k = 3 1.486s 1.887s
k = 4 1.457s 1.917s
k = 5 1.559s 1.877s
k = 8 1.455s 1.803s
k = 12 1.337s 1.501s
k = 20 1.123s 1.499s
k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R,
2 Mbit/s 15 char message 20 char message
k = 0 5.478s 3.755s
k = 1 4.929s 3.030s
k = 2 2.506s 3.339s
k = 3 2.459s 2.020s
k = 4 1.708s 1.061s
k = 5 1.671s 1.032s
k = 8 0.764s 1.021s
k = 12 0.772s 1.014s
k = 20 0.763s 1.009s
k = 36 0.758s 1.007s
k = 128 0.757s 1.008s

FTDI232R,
3 Mbit/s 15 char message 20 char message
k = 0 8.216s 10.007s
k = 1 5.006s 4.344s
k = 2 3.338s 1.602s
k = 3 2.406s 1.444s
k = 4 1.766s 1.316s
k = 5 1.599s 1.673s
k = 8 1.040s 1.327s
k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is the

polling rate... if you adjust it. I think the default for FTDI was 8 ms.

Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte
blocks (they say that actual payload is 62-bytes and there are 2-bytes
of protocol info). With 15 characters messages 0.764s really means
98% of use of serial bandwidth, so essentiall as good as possible.

Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???

Corresponding k = 8 means really 9 messages in transit, so 135
characters which is slightly more than 2 buffers. More data in
transit does not help, but also does not make things worse.
With 20 charaster messages main improvement is at k = 4 which
means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char
messages we see main improvement for k = 2, which corresponds
to 45 characters in transit. With 20 char messages we get
impovement for k = 1 which is 40 charactes in transit.
CH340 uses 32 character transfer buffers, so improvemnet corresponds
to somwhat more than 1 buffer in transit. Now, if transfers
between converter and PC were at optimal times, then one buffer
+ one character would be enough to get full serial speed. But
USB tranfers can not be started at arbitrary times, IIUC there
are discrete time slots when transfer can occur. When tranfer
can not be done in given slot it must wait for next slot.
So, depending on locations of possible slots more buffering
and more data in transit may be needed for optimal performance.
OTOH 2-3 buffers should be enough to allow PC to get full
bandwidth and this is in good agreement with FTDI results.
In case of CH340 there is extra factor: CH340 also uses 8 byte
transfers. I do not know what function they have, but
resonably likely guess is that those 8 byte pack tranfer control
info that FTDI bundles with normal data. Anyway, those
are "interrupt" tranfers in USB sense, so have higher priority
than data transfer. Resonable guess it that they steal some
USB bandwith from data tranfers. Also, smaller than maximal
data block size limits efficiency, so it is possible that
CH340 is limited by USB bandwith (lack of enough slots).

Now, concerning 3 Mbits/s, due to different serial speed
optimal times for transfers are different than in 2 Mbits/s
case. It is possible that there is worse fit of desired
and possible transfer times. Buffering allows to at least
partially cure this, so initial improvement. But clearly,
there is some extra bottleneck. Now some speculation:
with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
cycle. I would have to look at spec to be sure, but this
is close to 150 byte worst case FS transfer. Beside data
there is some USB protocol overhead and (speculatively) it
is possible that low level USB diver may refuse to schedule
two 64-byte transfers in single cycle. In such case effective
bandwith for serial data would be 4096000 bits, which
correspond to 5120000 serial bits (serial sends start and stop
bits which are not needed for USB). This is less than
full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
must go trouh the same USB). With larger amount of data in
transit this could give wild oscilations in amount of
buffered data, leading to slowdown when buffers get empty
and giving stall when receive buffer overflows.

Of course there is another speculation: convertor may be fake.
Supposedly fakes use MCU-s with special program. Software
could crate delays which limit transfer rate at 3 Mbits/s
and lead to data loss/stall with more data in transit.

It's too late for me to try to read all this.

As other suggested you could use multiple convertors for
better overlap. My convertors are "full speed" USB, that
is they are half-duplex 12 Mb/s. USB has significant
protocol overhead, so probably two 2 Mb/s duplex serial
convertes would saturate single USB bus. In desktops
it is normal to have several separate USB controllers
(buses), but that depends on specific motherboard.
Theoreticaly, when using "high speed" USB converters,
several could easily work from single USB port (provided
that you have enough places in hub(s)).

I've been shying away from USB because of the inherent speed issues with small messages. But with larger messages, hi-speed converters can work, I would hope. Maybe FTDI did not understand my question, but they said even on the hi-speed version,

their devices use a polling rate of 1 ms. They call it "latency", but since it is adjustable, I think it is the same thing. I asked about the C232HD-EDHSP-0, which is a hi-speed device, but also mentioned the USB-RS422-WE-5000-BT, which is an RS-422,
full-speed device. So maybe he got confused. They don't offer many hi-speed devices.

But the Ethernet implementations also have speed issues, likely because they are actually software based.

The issues are more fundamental: both in USB and Ethernet there
is per message/packet overhead. Low latency means sending data
soon after it is available, which means small packets/messages.
But due to overheads small packets are bad for throughput.
So designers have to choose what they value more and in both
cases the whole system is normally optimized for throughput.

With 100 Mbps Ethernet the inherent latencies are very low compared to my message transmission rates. One vendor specifically indicated the delays were in their software. That was when I mentioned FPGAs and he talked as if I were being ridiculous.

Well, you wrote that you have needed experience, so do low-latency Ethernet-serial convertor based on FPGA. Give your numbers and look
how many customers come in.

No, I never said I've designed Ethernet interfaces. I said, I worked on FPGA code in the a comms tester, which also tested Ethernet. I worked on one of the telecom formats, OC-12 rings a bell. Besides that would be a major project. I have two other
major projects to work on. This should be something I can buy.

Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver
anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second.
That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command.
The beauty of full-duplex!

Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.

--

Rick C.

-++ Get 1,000 miles of free Supercharging
-++ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Rick C on Mon Dec 5 08:57:07 2022

On 04/12/2022 17:54, Rick C wrote:

On Sunday, December 4, 2022 at 7:21:56 AM UTC-5, David Brown wrote:

On 03/12/2022 21:42, Rick C wrote:

On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David
Brown wrote:

A communication hierarchy is likely the best way to handle
this.

Alternatively, at the messages from the PC can be large and
broadcast, rather than divided up. You could even make an
EtherCAT-style serial protocol (using the hybrid RS-422 bus
you suggested earlier). The PC could send a single massive
serial telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast
enough to be complete in good time before the next slave
starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one
big telegram in - using 3 MBaud if you like.

I've been giving this some thought and it might work, but it's
not guaranteed. This will prevent the slaves from talking over
one another. But I don't know if the replies will be seen as a
unit for shipping over Ethernet or USB by the adapter. I've been
told that the messages will see delays in the adapters, but no
one has indicated how they block the data. In the case of the
FTDI adapter, the issue is the polling rate.

This is the format I'm currently thinking of 01 23 45 C\r\n - 11
chars 01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total
of 26 chars per end point. At 3 Mbps a message takes 87 us to
transmit on the serial bus for 11,500 messages a second, or 90
messages per second per end point. That certainly would do the
job, if I've done the math right. Even assuming other factors cut
this rate in half, and it's still around 45 messages per end
point each second.

Just to be clear - the slaves should not send any kind of dummy
characters. When they have read their part of the incoming stream,
they turn on their driver, send their reply, then turn off the
driver.

The master side might need dummy characters for padding if the
slave replies (including any handling delay - the slaves might be
fast, but they still take some time) can be longer than the master
side telegrams.

Each subtelegram in the master's telegram chain must be
self-contained - a start character, an ending CRC or simple
checksum, and so on. Replies from slaves must also be
self-contained.

It doesn't matter how the USB-to-serial or Ethernet-to-serial
adaptors break up the messages - applications read the data as
serial streams, not synchronous timed data. The only timing you
have is a pause between master telegrams, which can be many
milliseconds long, used to ensure that if something has gone wrong
or lost synchronisation, their receiving state machine is reset and
ready for the next round.

It absolutely does matter how the messages get broken up. That's
where the delays come in. If the slave replies are sent over the
network/USB bus one at a time, it's not significantly better than the original approach.

I mean it doesn't matter how the messages are broken up from the
application code's viewpoint, as long as you handle it correctly as a
stream and don't incorrectly assume you always read whole telegrams at a
time.

You can expect the converter to buffer up the incoming data and send it
in large lumps up the USB or Ethernet bus. That's how it can work at
high baud rates and throughputs. You lose the precise timing
information, however, and have extra latency and jitter - so you be sure
to treat the incoming data as a stream and then that does not matter.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andrew Smallshaw@21:1/5 to Rick C on Mon Dec 5 09:58:32 2022

On 2022-11-30, Rick C <gnuarm.deletethisbit@gmail.com> wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.

It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if

there were no other delays.

Use some multidrop standard at the physical layer such as RS485.
At the DLL adopt a token ring style arbitration system. The first
device interprets the request from the host as both receiving the
token and a request for data - for consistency with the other units
you'd probably want to format that initial request as a dumy "Device
0" response. Device N interprets the reply from N-1 as sending it
the token and its request to transmit. From the host perspective
you send a single request and get back a byte stream with the
results from all devices.

--
Andrew Smallshaw
andrews@sdf.org

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Andrew Smallshaw on Mon Dec 5 14:03:11 2022

On Monday, December 5, 2022 at 4:58:39 AM UTC-5, Andrew Smallshaw wrote:

On 2022-11-30, Rick C <gnuarm.del...@gmail.com> wrote:

I am using laptops to control test fixtures via a USB serial port. I'm looking at combining many test fixtures in one chassis, controlled over one serial port. The problem I'm concerned about is not the speed of the bus, which can range up to 10 Mbps.

It's the interface to the serial port.

The messages are all short, around 15 characters. The master PC addresses a slave and the slave promptly replies. It seems this message level hand shake creates a bottle neck in every interface I've looked at.

FTDI has a high-speed USB cable that is likely limited by the 8 kHz polling rate. So the message and response pair would be limited to 4 kHz. Spread over 256 end points, that's only 16 message pairs a second to each target. That might be workable if

there were no other delays.

Use some multidrop standard at the physical layer such as RS485.
At the DLL adopt a token ring style arbitration system. The first
device interprets the request from the host as both receiving the
token and a request for data - for consistency with the other units
you'd probably want to format that initial request as a dumy "Device
0" response. Device N interprets the reply from N-1 as sending it
the token and its request to transmit. From the host perspective
you send a single request and get back a byte stream with the
results from all devices.

That scheme requires every end point to know where it is in the grand scheme, but more importantly, to know what other end points are in the system. It also requires the master to address every end point in sequence. How would you address one end point
only, or some number of missing slots? This would require the end point keep track of what commands have been sent, as well as who has replied.

I've mulled this about for the last few days, including a priority scheme where handshake lines would be used to pass the priority more mechanically. This priority "token" could be passed through the entire chain of 16 boards and 8 endpoints on each
board, but it can also be done by using a priority chain only within the 8 slaves on each test fixture boards. This will provide a burst of serial port operation for about 500 us at a 3 Mbps rate. So if USB has a polling rate of 1 ms, we would get half
bandwidth, which would be pretty good. I feel better about blocking 8 commands for a given test fixture than blocking all 128 commands.

Someone had suggested padding the transmitted data to set the timing of the replies. That would work as well, and order would no longer be significant at all. But I'm not comfortable with sending garbage data too control timing. It can make debug
more difficult. Too bad there's no way to send a data byte without a start bit! lol

--

Rick C.

+-- Get 1,000 miles of free Supercharging
+-- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to David Brown on Mon Dec 5 14:12:37 2022

On Monday, December 5, 2022 at 2:57:46 AM UTC-5, David Brown wrote:

On 04/12/2022 17:54, Rick C wrote:

On Sunday, December 4, 2022 at 7:21:56 AM UTC-5, David Brown wrote:

On 03/12/2022 21:42, Rick C wrote:

On Wednesday, November 30, 2022 at 12:14:18 PM UTC-5, David
Brown wrote:

A communication hierarchy is likely the best way to handle
this.

Alternatively, at the messages from the PC can be large and
broadcast, rather than divided up. You could even make an
EtherCAT-style serial protocol (using the hybrid RS-422 bus
you suggested earlier). The PC could send a single massive
serial telegram consisting of multiple small ones:

<header><padding><tele1><padding><tele2><padding>...<pause>

Each slave would reply after hearing its own telegram, fast
enough to be complete in good time before the next slave
starts. (Adjust padding as necessary to give this timing.)

Then from the PC side, you have one big telegram out, and one
big telegram in - using 3 MBaud if you like.

I've been giving this some thought and it might work, but it's
not guaranteed. This will prevent the slaves from talking over
one another. But I don't know if the replies will be seen as a
unit for shipping over Ethernet or USB by the adapter. I've been
told that the messages will see delays in the adapters, but no
one has indicated how they block the data. In the case of the
FTDI adapter, the issue is the polling rate.

This is the format I'm currently thinking of 01 23 45 C\r\n - 11
chars 01 23 45 C 67\r\n - 14 chars

The transmitted message would add 15 char of padding for a total
of 26 chars per end point. At 3 Mbps a message takes 87 us to
transmit on the serial bus for 11,500 messages a second, or 90
messages per second per end point. That certainly would do the
job, if I've done the math right. Even assuming other factors cut
this rate in half, and it's still around 45 messages per end
point each second.

Just to be clear - the slaves should not send any kind of dummy
characters. When they have read their part of the incoming stream,
they turn on their driver, send their reply, then turn off the
driver.

The master side might need dummy characters for padding if the
slave replies (including any handling delay - the slaves might be
fast, but they still take some time) can be longer than the master
side telegrams.

Each subtelegram in the master's telegram chain must be
self-contained - a start character, an ending CRC or simple
checksum, and so on. Replies from slaves must also be
self-contained.

It doesn't matter how the USB-to-serial or Ethernet-to-serial
adaptors break up the messages - applications read the data as
serial streams, not synchronous timed data. The only timing you
have is a pause between master telegrams, which can be many
milliseconds long, used to ensure that if something has gone wrong
or lost synchronisation, their receiving state machine is reset and
ready for the next round.

It absolutely does matter how the messages get broken up. That's
where the delays come in. If the slave replies are sent over the network/USB bus one at a time, it's not significantly better than the original approach.

I mean it doesn't matter how the messages are broken up from the
application code's viewpoint, as long as you handle it correctly as a
stream and don't incorrectly assume you always read whole telegrams at a time.

Of course the application doesn't care. No one is worried about the application. The concern is the timing of the messages on the various buses. A message broken up too much may be sent in multiple small pieces resulting in more delays.

You can expect the converter to buffer up the incoming data and send it
in large lumps up the USB or Ethernet bus. That's how it can work at
high baud rates and throughputs. You lose the precise timing
information, however, and have extra latency and jitter - so you be sure
to treat the incoming data as a stream and then that does not matter.

I don't "expect" anything of the adapter. They have delays that are largely unexplained, at least in any detail. That's why this is hard to deal with.

Right now I'm looking at using a priority enable across the 8 end points within a test fixture board. That will allow a 400 us message block at 3 Mbps, with 350 us of overlap between the commands and the replies, so 450 us total. That would work well
with either a 1 ms polling rate or a 0.5 ms polling rate, if available, and provide 50 us of breathing room for the adapter.

This is a lot like making gears for a mechanical clock, with a calendar and an appointment reminder. LOL

--

Rick C.

+-+ Get 1,000 miles of free Supercharging
+-+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Rick C on Tue Dec 6 02:30:19 2022

Rick C <gnuarm.deletethisbit@gmail.com> wrote:

On Sunday, December 4, 2022 at 10:33:28 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

Well, my use is rather light. Most is for debugging at say 9600 or 115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that
it actually worked, but speed was significantly different, I had to
set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters
and it looks easy to get it wrong). Similary, when CH340 was set to 576800
I had to set MCU to 541300. Even after matching speed at nomial
576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those
rates, did not want to wait for full test). Also, 500000 was significantly
slower than 460800 (but "merely" 2 times slower for 1 character messages
and catching up with longer messages). Still, ATM CH340 looks
resonably good.

Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a

name brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

Well, I say you what I observed. People say various thing on the
net. I was interested if net know something about my trouble with
CP2104 so I googled for "CP2104 lockup". And I got a bunch of
complaints about FTDI devices, solved by using CP2104. So, there
is a lot of noise and ATM I prefer to stay with what I see.

What sort of complaints about FTDI? Did you contact them about it?

Things like computer locking up (IIUC fixed by newer driver). Or "communication did not work" (no real info). ATM I have enough
converters. If I need more/better I will look at FTDI products
and possible ask them questions.

Remark: I bought all my convertors from Chinese sellers. IIUC
FTDI chip is faked a lot, but other too. Still, I think they
show what is possible and illustrate some difficulties.

FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

AFAIK Linux driver does not discriminate againt non-FTDI devices.
So fact that convertors works with Linux driver tells you nothing
about its origin. And for the record, I bought mine several years
ago.

I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the

bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.

It was FTDI who bricked fakes, that was widely discussed. I did not
hear about Prolific doing something like that.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

It is a chip by Silicon Laboratories. Datasheet gives contact address in Austin, TX.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

No. My slave simply returns back each received character. There is
some software delay but it should be less than 2us. So even waiting test has some overlap at character level. To get more overlap above
I cheated: my test program was sending 1 more character than it should. So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read
and buffer excess characters, but it seems that at least for
CP2104 they cause trouble. My current guess is that OS is
reading only when requested, but I did not investigate deeper...

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

OK, I tried modifeed version of my test program. It first sends
k messages without reading anything, then goes to main loop where
after sending each message it read one. At the end it tail loop
which reads last k messages without sending anything. So, there
is k + 1 messages in transit: after sending message k + i program
waits for answer to message i. In total there is 10000 messages. Results are:

CH340, 15 char message 20 char message
k = 0 6.869s 7.163s
k = 1 4.682s 1.320s
k = 2 0.992s 1.320s
k = 3 0.991s 1.319s
k = 4 0.991s 1.320s
k = 5 0.990s 1.319s
k = 8 0.992s 1.320s
k = 12 0.990s 1.320s
k = 20 0.992s 1.319s
k = 36 0.991s 1.321s
k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message
k = 0 2.508s 3.756s
k = 1 1.897s 1.993s
k = 2 1.668s 2.087s
k = 3 1.486s 1.887s
k = 4 1.457s 1.917s
k = 5 1.559s 1.877s
k = 8 1.455s 1.803s
k = 12 1.337s 1.501s
k = 20 1.123s 1.499s
k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R,
2 Mbit/s 15 char message 20 char message
k = 0 5.478s 3.755s
k = 1 4.929s 3.030s
k = 2 2.506s 3.339s
k = 3 2.459s 2.020s
k = 4 1.708s 1.061s
k = 5 1.671s 1.032s
k = 8 0.764s 1.021s
k = 12 0.772s 1.014s
k = 20 0.763s 1.009s
k = 36 0.758s 1.007s
k = 128 0.757s 1.008s

FTDI232R,
3 Mbit/s 15 char message 20 char message
k = 0 8.216s 10.007s
k = 1 5.006s 4.344s
k = 2 3.338s 1.602s
k = 3 2.406s 1.444s
k = 4 1.766s 1.316s
k = 5 1.599s 1.673s
k = 8 1.040s 1.327s
k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is

the polling rate... if you adjust it. I think the default for FTDI was 8 ms.

Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte
blocks (they say that actual payload is 62-bytes and there are 2-bytes
of protocol info). With 15 characters messages 0.764s really means
98% of use of serial bandwidth, so essentiall as good as possible.

Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???

It seems that your news agent messed formating of tables. I gave
results in two columns, one column for 15 character messages, second
for 20 character messages. 0.764s is for 15 character messages,
1.021s is for 20 character messages.

Corresponding k = 8 means really 9 messages in transit, so 135
characters which is slightly more than 2 buffers. More data in
transit does not help, but also does not make things worse.
With 20 charaster messages main improvement is at k = 4 which
means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char
messages we see main improvement for k = 2, which corresponds
to 45 characters in transit. With 20 char messages we get
impovement for k = 1 which is 40 charactes in transit.
CH340 uses 32 character transfer buffers, so improvemnet corresponds
to somwhat more than 1 buffer in transit. Now, if transfers
between converter and PC were at optimal times, then one buffer
+ one character would be enough to get full serial speed. But
USB tranfers can not be started at arbitrary times, IIUC there
are discrete time slots when transfer can occur. When tranfer
can not be done in given slot it must wait for next slot.
So, depending on locations of possible slots more buffering
and more data in transit may be needed for optimal performance.
OTOH 2-3 buffers should be enough to allow PC to get full
bandwidth and this is in good agreement with FTDI results.
In case of CH340 there is extra factor: CH340 also uses 8 byte
transfers. I do not know what function they have, but
resonably likely guess is that those 8 byte pack tranfer control
info that FTDI bundles with normal data. Anyway, those
are "interrupt" tranfers in USB sense, so have higher priority
than data transfer. Resonable guess it that they steal some
USB bandwith from data tranfers. Also, smaller than maximal
data block size limits efficiency, so it is possible that
CH340 is limited by USB bandwith (lack of enough slots).

Now, concerning 3 Mbits/s, due to different serial speed
optimal times for transfers are different than in 2 Mbits/s
case. It is possible that there is worse fit of desired
and possible transfer times. Buffering allows to at least
partially cure this, so initial improvement. But clearly,
there is some extra bottleneck. Now some speculation:
with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
cycle. I would have to look at spec to be sure, but this
is close to 150 byte worst case FS transfer. Beside data
there is some USB protocol overhead and (speculatively) it
is possible that low level USB diver may refuse to schedule
two 64-byte transfers in single cycle. In such case effective
bandwith for serial data would be 4096000 bits, which
correspond to 5120000 serial bits (serial sends start and stop
bits which are not needed for USB). This is less than
full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
must go trouh the same USB). With larger amount of data in
transit this could give wild oscilations in amount of
buffered data, leading to slowdown when buffers get empty
and giving stall when receive buffer overflows.

Of course there is another speculation: convertor may be fake.
Supposedly fakes use MCU-s with special program. Software
could crate delays which limit transfer rate at 3 Mbits/s
and lead to data loss/stall with more data in transit.

I last part I was partially wrong. USB-2.0 spec says that transmission
between PC and high speed hub is always high speed. For full speed
devices hub is supposed to buffer messages and transmit to device
at its speed. In effect PC needs two high speed messages per low
speed message. My tests above was with converter connected via
high speed hub. There was also Stlink dongle plugged into the
same hub. To remove effect of hub I tried plugging converter
directly into USB-1.1 port on separate USB controller. That
led to significantly longer times. I also tried to connect Stlink
into separate port so that converter was the only thing connected
to the hub. I run several times few cases at 3 Mbits/s, for short
messages and low k results vary significanlty,
for 120 characters and k = 0 I got times from 6.375s to 6.598s.
At 2 Mbits/s in 25 runs I got one outlier at 6.667s, the rest
was between 6.029s and 0m6.049s.

Anyway, USB seem to have significant impact on possible speed,
with full speed convertor and full duplex trasmission 2 Mbits/s
seem to give better speed than 3 Mbits/s. Maybe better USB
hub could help (I do not know how to find out size of buffers
in my hub, but by the spec hub may have buffers just for 2
bulk transfers or mauch more). Given the above I would expect
convertor connected via high speed USB to perform better at 3 Mbits/s.

Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver

anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second.

That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command.
The beauty of full-duplex!

Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.

See above. Note that my test slave started replay after receiving first character. ATM it seems that with enough overlap at 2 Mbits/s I getting repeatably almost optimal speed (even with 1.1 port). But with less
overlap there are randomly looking variations, which probably means
high sensitivity to precise timing of messages. And at 3 Mbits/s
variation seem to be much worse.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to anti...@math.uni.wroc.pl on Mon Dec 5 23:41:14 2022

On Monday, December 5, 2022 at 9:30:24 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Sunday, December 4, 2022 at 10:33:28 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Sunday, December 4, 2022 at 4:30:35 PM UTC-5, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

On Wednesday, November 30, 2022 at 9:08:25 PM UTC-4, anti...@math.uni.wroc.pl wrote:

Rick C <gnuarm.del...@gmail.com> wrote:

With relatively cheap convertors
on Linux to handle 10000 roundtrips for 15 bytes messages I need the following times:

CH340 2Mb/s, waiting, 6.890s

That's 11.3 per target, per second. (128 targets)

CH340 2Mb/s, overlapped 1.058s

That's pretty close to 74 per target, per second.

I used to use the CH340 devices, but we had intermittent lockups of the serial port when testing all day long. I switched to FTDI and that went away. I think you told me you have no such problems. Maybe it's the CH340 Windows serial drivers.

Well, my use is rather light. Most is for debugging at say 9600 or 115200. And when plugged in convertor mostly sits idle. I previously wrote that CH340 did not work at 921600. More testing showed that
it actually worked, but speed was significantly different, I had to set my MCU to 847000 communicate. This could be bug in Linux driver (there is rather funky formula connecting speed to parameters
and it looks easy to get it wrong). Similary, when CH340 was set to 576800
I had to set MCU to 541300. Even after matching speed at nomial 576800, 921600 and 1152000 test time was much (more than 10 times) higher than for other rates (I only tested 1 character messages at those
rates, did not want to wait for full test). Also, 500000 was significantly
slower than 460800 (but "merely" 2 times slower for 1 character messages
and catching up with longer messages). Still, ATM CH340 looks resonably good.

Yes, it's reasonably good for situations where it does not need to work reliably. I was surprised when the finger was pointed to the CH340 adapter. But someone (probably here) had warned me they are not dependable, and now I know. The cost of a

name brand adapter is not so much that it's worth saving the difference, only to have to throw it out and go with FTDI anyway, when you have real work to do.

Well, I say you what I observed. People say various thing on the
net. I was interested if net know something about my trouble with
CP2104 so I googled for "CP2104 lockup". And I got a bunch of
complaints about FTDI devices, solved by using CP2104. So, there
is a lot of noise and ATM I prefer to stay with what I see.

What sort of complaints about FTDI? Did you contact them about it?

Things like computer locking up (IIUC fixed by newer driver). Or "communication did not work" (no real info). ATM I have enough
converters. If I need more/better I will look at FTDI products
and possible ask them questions.

Remark: I bought all my convertors from Chinese sellers. IIUC
FTDI chip is faked a lot, but other too. Still, I think they
show what is possible and illustrate some difficulties.

FTDI fakes no longer work with the FTDI drivers. Maybe they play a cat and mouse game, with each side one upping the other, but it's not worth the bother to try it out. FTDI sells cables. It's easier to just buy them from FTDI.

AFAIK Linux driver does not discriminate againt non-FTDI devices.
So fact that convertors works with Linux driver tells you nothing
about its origin. And for the record, I bought mine several years
ago.

I'm not using Linux. I don't have any FTDI fakes. I have some Prolific fakes somewhere, if I could find them. I never had one bricked, but I think it was Prolific that did that some years ago. Or, I may have them confused with FTDI. I remember the

bricking driver was released with a Windows update and MS was pretty pissed off when the bricking hit the news.

It was FTDI who bricked fakes, that was widely discussed. I did not
hear about Prolific doing something like that.

CP2104 2Mb/s, waiting, 2.514s
CP2104 2Mb/s, overlapped 1.214s

I don't know what the CP2104 is.

It is a chip by Silicon Laboratories. Datasheet gives contact address
in Austin, TX.

I'm not certain what "overlapped" means in this test. Did you just continue to send 15 byte messages with no delays 10,000 times?

No. My slave simply returns back each received character. There is some software delay but it should be less than 2us. So even waiting test has some overlap at character level. To get more overlap above I cheated: my test program was sending 1 more character than it should.
So sent message was 16 bytes, read was 15. After reading 15 another batch of 16 was sent and so on. In total there were 10000 more characters sent than received. My hope was that OS would read
and buffer excess characters, but it seems that at least for
CP2104 they cause trouble. My current guess is that OS is
reading only when requested, but I did not investigate deeper...

Since you are in the mood for testing, what happens if you run overlapped, with 128 messages of 15 characters and wait for the replies before sending the next batch? Also, if you don't mind, can you try 20 character messages?

OK, I tried modifeed version of my test program. It first sends
k messages without reading anything, then goes to main loop where after sending each message it read one. At the end it tail loop which reads last k messages without sending anything. So, there
is k + 1 messages in transit: after sending message k + i program waits for answer to message i. In total there is 10000 messages. Results are:

CH340, 15 char message 20 char message
k = 0 6.869s 7.163s
k = 1 4.682s 1.320s
k = 2 0.992s 1.320s
k = 3 0.991s 1.319s
k = 4 0.991s 1.320s
k = 5 0.990s 1.319s
k = 8 0.992s 1.320s
k = 12 0.990s 1.320s
k = 20 0.992s 1.319s
k = 36 0.991s 1.321s
k = 128 0.991s 1.319s

CP2104, 15 char message 20 char message
k = 0 2.508s 3.756s
k = 1 1.897s 1.993s
k = 2 1.668s 2.087s
k = 3 1.486s 1.887s
k = 4 1.457s 1.917s
k = 5 1.559s 1.877s
k = 8 1.455s 1.803s
k = 12 1.337s 1.501s
k = 20 1.123s 1.499s
k = 36 1.125s 1.502s

k = 128 reliably stalled, there were random stalls in other cases

FTDI232R,
2 Mbit/s 15 char message 20 char message
k = 0 5.478s 3.755s
k = 1 4.929s 3.030s
k = 2 2.506s 3.339s
k = 3 2.459s 2.020s
k = 4 1.708s 1.061s
k = 5 1.671s 1.032s
k = 8 0.764s 1.021s
k = 12 0.772s 1.014s
k = 20 0.763s 1.009s
k = 36 0.758s 1.007s
k = 128 0.757s 1.008s

FTDI232R,
3 Mbit/s 15 char message 20 char message
k = 0 8.216s 10.007s
k = 1 5.006s 4.344s
k = 2 3.338s 1.602s
k = 3 2.406s 1.444s
k = 4 1.766s 1.316s
k = 5 1.599s 1.673s
k = 8 1.040s 1.327s
k = 12 1.071s 1.312s

With k = 20, k = 36 and k = 128 communication stalled.

Some of the results seem odd, hard to understand, like why the message rate improves so much as k is increased, but so dramatically at 3 Mbps. They all seem to approach ~1.3 second as k increases. At k=0 they are around 1 ms per message, which is

the polling rate... if you adjust it. I think the default for FTDI was 8 ms.

Let me first comment 2Mbit/s results. FTDI transfers data in 64-byte blocks (they say that actual payload is 62-bytes and there are 2-bytes of protocol info). With 15 characters messages 0.764s really means
98% of use of serial bandwidth, so essentiall as good as possible.

Yeah, I'm not following that at all. At k=8, the 2 Mbps FTDI transferred in 1.021s. What is 0.764s???

It seems that your news agent messed formating of tables. I gave
results in two columns, one column for 15 character messages, second
for 20 character messages. 0.764s is for 15 character messages,
1.021s is for 20 character messages.

Yes, I see that now. Google Groups removes excess spaces. Not a good idea and for no apparent reason. If they want to conserve bytes, maybe they should delete the message contents. That would greatly reduce the noise and only reduce the signal
slightly in many cases.

Corresponding k = 8 means really 9 messages in transit, so 135 characters which is slightly more than 2 buffers. More data in
transit does not help, but also does not make things worse.
With 20 charaster messages main improvement is at k = 4 which
means 100 characters, which is smaller than 2 buffers, with extra improvements for more data in transit. With CH340 and 15 char
messages we see main improvement for k = 2, which corresponds
to 45 characters in transit. With 20 char messages we get
impovement for k = 1 which is 40 charactes in transit.
CH340 uses 32 character transfer buffers, so improvemnet corresponds
to somwhat more than 1 buffer in transit. Now, if transfers
between converter and PC were at optimal times, then one buffer
+ one character would be enough to get full serial speed. But
USB tranfers can not be started at arbitrary times, IIUC there
are discrete time slots when transfer can occur. When tranfer
can not be done in given slot it must wait for next slot.
So, depending on locations of possible slots more buffering
and more data in transit may be needed for optimal performance.
OTOH 2-3 buffers should be enough to allow PC to get full
bandwidth and this is in good agreement with FTDI results.
In case of CH340 there is extra factor: CH340 also uses 8 byte transfers. I do not know what function they have, but
resonably likely guess is that those 8 byte pack tranfer control
info that FTDI bundles with normal data. Anyway, those
are "interrupt" tranfers in USB sense, so have higher priority
than data transfer. Resonable guess it that they steal some
USB bandwith from data tranfers. Also, smaller than maximal
data block size limits efficiency, so it is possible that
CH340 is limited by USB bandwith (lack of enough slots).

Now, concerning 3 Mbits/s, due to different serial speed
optimal times for transfers are different than in 2 Mbits/s
case. It is possible that there is worse fit of desired
and possible transfer times. Buffering allows to at least
partially cure this, so initial improvement. But clearly,
there is some extra bottleneck. Now some speculation:
with 1/8 ms USB-2.0 cycle, there is 1500 FS clock per
cycle. I would have to look at spec to be sure, but this
is close to 150 byte worst case FS transfer. Beside data
there is some USB protocol overhead and (speculatively) it
is possible that low level USB diver may refuse to schedule
two 64-byte transfers in single cycle. In such case effective
bandwith for serial data would be 4096000 bits, which
correspond to 5120000 serial bits (serial sends start and stop
bits which are not needed for USB). This is less than
full duplex 3 Mbits/s (both directions add to 6 Mbits/s and
must go trouh the same USB). With larger amount of data in
transit this could give wild oscilations in amount of
buffered data, leading to slowdown when buffers get empty
and giving stall when receive buffer overflows.

Of course there is another speculation: convertor may be fake. Supposedly fakes use MCU-s with special program. Software
could crate delays which limit transfer rate at 3 Mbits/s
and lead to data loss/stall with more data in transit.

I last part I was partially wrong. USB-2.0 spec says that transmission between PC and high speed hub is always high speed. For full speed
devices hub is supposed to buffer messages and transmit to device
at its speed. In effect PC needs two high speed messages per low
speed message. My tests above was with converter connected via
high speed hub. There was also Stlink dongle plugged into the
same hub. To remove effect of hub I tried plugging converter
directly into USB-1.1 port on separate USB controller. That
led to significantly longer times. I also tried to connect Stlink
into separate port so that converter was the only thing connected
to the hub. I run several times few cases at 3 Mbits/s, for short
messages and low k results vary significanlty,
for 120 characters and k = 0 I got times from 6.375s to 6.598s.
At 2 Mbits/s in 25 runs I got one outlier at 6.667s, the rest
was between 6.029s and 0m6.049s.

Anyway, USB seem to have significant impact on possible speed,
with full speed convertor and full duplex trasmission 2 Mbits/s
seem to give better speed than 3 Mbits/s. Maybe better USB
hub could help (I do not know how to find out size of buffers
in my hub, but by the spec hub may have buffers just for 2
bulk transfers or mauch more).

I would not be using a hub at all. This PC would be dedicated to testing and only a mouse would use a USB in addition to the serial dongle. Oh, I think they use a bar code scanner too, so three USB ports.

Given the above I would expect
convertor connected via high speed USB to perform better at 3 Mbits/s.

Reading your tests has made me realize, that while combining the messages for every target into one batch can be a bit unwieldy, I could limit the combinations to the end points on a single card. The responses have to be combined for the one driver

anyway. Between the 8 end points on a single board I could easily combine those commands, and then stagger the replies without any extra signals between the boards and no special characters in the command stream.

Again, thinking out loud, at 3 Mbps, 8 * 150 bits per command is 1,200 bits or 400 us. That would greatly reduce the wasted time, even with a 1 ms polling period. It would allow an exchange of 8 commands and 8 replies every 2 ms, or 4,000 per second.

That would be almost 32 per end point, which would great! Actually, it could be faster than this, since the staggering of the replies, doesn't require the first reply to wait for the last command. So replies will start at the end of the first command.
The beauty of full-duplex!

Any chance you could run your test on the FTDI cable at 3 Mbps with a 1,200 bit block of data (120 characters)? I imagine the RS-232 waveform is getting a bit triangular a that speed.

See above. Note that my test slave started replay after receiving first character. ATM it seems that with enough overlap at 2 Mbits/s I getting repeatably almost optimal speed (even with 1.1 port). But with less
overlap there are randomly looking variations, which probably means
high sensitivity to precise timing of messages. And at 3 Mbits/s
variation seem to be much worse.

The priority protocol I described above would overlap after one message. So not a lot of difference.

Thanks for the info. It was very useful.

--

Rick C.

+-- Get 1,000 miles of free Supercharging
+-- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Rick C on Tue Dec 6 13:32:21 2022

On 06/12/2022 08:41, Rick C wrote:

Yes, I see that now. Google Groups removes excess spaces. Not a
good idea and for no apparent reason. If they want to conserve
bytes, maybe they should delete the message contents. That would
greatly reduce the noise and only reduce the signal slightly in many
cases.

You do realise it is up to /you/, the person making a post, to snip
excess content? For some reason, google posters do this extremely badly
- either they never snip, or they cut too much (including attributions).

Google groups ruins the format of Usenet posts - including removing
leading spaces and screwing up line endings. It's one of the reasons
why so many Usenet users dislike it.

(Yes, I know you have some particular personal reasons for using GG.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to David Brown on Tue Dec 6 18:03:15 2022

On Tuesday, December 6, 2022 at 8:32:28 AM UTC-4, David Brown wrote:

On 06/12/2022 08:41, Rick C wrote:

Yes, I see that now. Google Groups removes excess spaces. Not a
good idea and for no apparent reason. If they want to conserve
bytes, maybe they should delete the message contents. That would
greatly reduce the noise and only reduce the signal slightly in many
cases.

You do realise it is up to /you/, the person making a post, to snip
excess content? For some reason, google posters do this extremely badly
- either they never snip, or they cut too much (including attributions).

Google groups ruins the format of Usenet posts - including removing
leading spaces and screwing up line endings. It's one of the reasons
why so many Usenet users dislike it.

(Yes, I know you have some particular personal reasons for using GG.)

I guess I should have used a smiley. I was trying to say that much of what is posted here would be better not posted at all... as a joke.

--

Rick C.

+-+ Get 1,000 miles of free Supercharging
+-+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Rick C on Wed Dec 7 08:08:15 2022

On 07/12/2022 03:03, Rick C wrote:

On Tuesday, December 6, 2022 at 8:32:28 AM UTC-4, David Brown wrote:

On 06/12/2022 08:41, Rick C wrote:

Yes, I see that now. Google Groups removes excess spaces. Not a
good idea and for no apparent reason. If they want to conserve
bytes, maybe they should delete the message contents. That would
greatly reduce the noise and only reduce the signal slightly in many
cases.

You do realise it is up to /you/, the person making a post, to snip
excess content? For some reason, google posters do this extremely badly
- either they never snip, or they cut too much (including attributions).

Google groups ruins the format of Usenet posts - including removing
leading spaces and screwing up line endings. It's one of the reasons
why so many Usenet users dislike it.

(Yes, I know you have some particular personal reasons for using GG.)

I guess I should have used a smiley. I was trying to say that much of what is posted here would be better not posted at all... as a joke.

I think there's been a lot of interesting stuff posted in this thread.
Maybe not all of it has been useful to /you/, but you're not paying us
for the job. So we chatter - sometimes people learn something new or
get some new ideas.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to David Brown on Wed Dec 7 03:36:38 2022

On Wednesday, December 7, 2022 at 3:08:22 AM UTC-4, David Brown wrote:

On 07/12/2022 03:03, Rick C wrote:

On Tuesday, December 6, 2022 at 8:32:28 AM UTC-4, David Brown wrote:

On 06/12/2022 08:41, Rick C wrote:

Yes, I see that now. Google Groups removes excess spaces. Not a
good idea and for no apparent reason. If they want to conserve
bytes, maybe they should delete the message contents. That would
greatly reduce the noise and only reduce the signal slightly in many
cases.

You do realise it is up to /you/, the person making a post, to snip
excess content? For some reason, google posters do this extremely badly
- either they never snip, or they cut too much (including attributions). >>
Google groups ruins the format of Usenet posts - including removing
leading spaces and screwing up line endings. It's one of the reasons
why so many Usenet users dislike it.

(Yes, I know you have some particular personal reasons for using GG.)

I guess I should have used a smiley. I was trying to say that much of what is posted here would be better not posted at all... as a joke.

I think there's been a lot of interesting stuff posted in this thread.
Maybe not all of it has been useful to /you/, but you're not paying us
for the job. So we chatter - sometimes people learn something new or
get some new ideas.

Again, I was not being clear enough. By "here", I did not mean this thread. I was referring to newsgroups as a whole.

IT WAS JUST A JOKE!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (0 / 16)
Uptime:	168:29:53
Calls:	10,385
Calls today:	2
Files:	14,057
Messages:	6,416,545

Serial Bus Speed on PCs

Who's Online

Recent Visitors

System Info