Hi together,
As i see the tasks-Modul from ET here on the list, i'm thinking about
to switch to threads to handle the request in a thread-pool and hope
to get speed improvements.
So, what do you think of it, what is the better way ?
Is it expected to have speed improvements when using a thread pool in contrast to fork on every new incoming connection and will the improvements be big enough to legitimate the effort for switching ?
Currently i have 1 binary with all procedures for handling the request included, which then forked. As far as i know, if i switch to threads, i
have to import all my defined procedures in every new starting thread,
Hi together,
Currently i have 1 binary with all procedures for handling the request included, which then forked. As far as i know, if i switch to threads, i
have to import all my defined procedures in every new starting thread, so
i have to split my binary into 2 parts. Is that right or is there a simple way to define all procedures of the current running process in a thread
that this process starts ?
On 5/31/2022 11:38 PM, Michael Niehren wrote:
Hi together,
Currently i have 1 binary with all procedures for handling the request
included, which then forked. As far as i know, if i switch to threads, i
have to import all my defined procedures in every new starting thread, so
i have to split my binary into 2 parts. Is that right or is there a
simple way to define all procedures of the current running process in a
thread that this process starts ?
The first question I'd ask is: are you faced with performance issues
now? Is your program running out of steam or are you just looking to
improve something that's already working.
One difference between thread pools and tasks, is with tpool it has an
upper and lower boundary on the number of threads in the pool. Tasks
allocate but one set at startup. While it is possible to add more or
reduce the number of them, there is no support for that at present and
none likely in the future. That was one complication I decided was not
worth the trouble, but that's just my opinion.
If you decide to use tpool, you can set the upper and lower to the same number and it will not (afaik) allocate any more or kill off any of the existing threads, and so there won't be any new importing of code into a
new thread, since you'll just be reusing the ones you have. Then it
should work like tasks. And what sort of importing do you think you are
going to need?
Tasks have a proc re-constructor, and it can take several. If you
specify just * as one of the elements in the import list argument, it
will use [info proc *] and reconstruct each proc. Likewise if you have
these in namespaces, so you could do name::* as an element. If you have
other inits to do, say TCLOO, then you would have to import them
differently. I've often wondered if TCLOO can be completely introspected
so it can be imported into a thread. I don't know enough about it, and I personally don't use TCLOO so I can't speculate on that.
With tasks, you can have a script variable, i.e. set script {...} and
then specify -$script as one of the initializers. Tasks allow any number
of these along with any number of wildcards that an [info proc pattern]
can take. Tpool has a single argument for that, but you could probably
easily build several into a single one.
I've not used ttrace, but it would appear that it's purpose is similar,
but seems to do other things as well.
As to performance, do you fork off a process for each connection or do
you keep them around for additional ones? What does each fork do? Do
they talk to each other?
As to resources, I've estimated the cost of a new thread in a rather
crude method: On 32 bit windows, I could only do about 150 given the 2gb address space limit. So, on the order of 10-20 mb per thread. You could
do some easy tpool tests. On 64 bit this likely won't matter, what with cheap ram these days.
I know that tsv is reasonably fast, because I've measured the amount of
time it takes to give a task work (and it does it via tsv), and it's on
the order of 50 microsecs, where a proc call is about 1 microsecs (on my
4ghz 4090k intel chip). How reasonable this is depends on how much work
you do in each call. It would not be worthwhile to use tasks to compute anything that can be done with a single proc call in say, 100 or less microsecs. I also found that using thread::send sync was about 1/3 the
cost of doing task calls.
One thing you might do if it won't cause your program to crash is to
have your forked processes simply bypass any workload. Sort of like
putting a return at the top of some proc you want to measure, and run it
both ways, one time to do the real job and another to reduce it to no
work, so you can measure the overhead.
This can work as long as you don't need to compute anything. Another way
is to just have a canned answer to simulate your workload.
Anyway, you likely need to know the cost of each connection vs. the cost
of what you do in each connection. With that, you can then probably know
if it's worth switching to threads or tasks. If you do a lot of work in
each connection, I'd stay with what you got since it works. And as Rich
said, it will also depend on any inter thread/process communication you
are doing, if any.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 493 |
Nodes: | 16 (2 / 14) |
Uptime: | 184:01:59 |
Calls: | 9,705 |
Files: | 13,737 |
Messages: | 6,179,583 |