What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?
It seems that with Tcl 8.4, we were able to get the original content
of the strings, but that Tcl 8.6 is altering the input in some way
that makes it incompatible with C++.
Is setting the Tcl_SetSystemEncoding call a reasonable fix for this,
or will we run into other difficulties now or in the future (I notice
that there are a lot of Unicode enhancements set up for Tcl 8.7 and
Tcl 9)?
What happens if someone gives us some non utf-8 encoded string? Is
there a way to support that in this case?
Phillip Brooks <philbrks@gmail.com> wrote:
What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?
Most likely, Tcl became more properly Unicode aware.
Hi,is not localized, enterprising users found that they can enter Unicode text into the file and then when it prints out, it ends up the same way it came in when we print it out from C++. When they started using the Tcl 8.6 version of our product, that
We have noticed a problem in our application that started occurring with our transition to Tcl 8.6 from Tcl 8.4. The problem is that we read some user provided text using Tcl that eventually gets printed by our application. Although our application
What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?
Most likely, Tcl became more properly Unicode aware.
Unless you can:
1) be informed of what actual encoding was used; or
2) write a bunch of code to try to infer the encoding used (and this
will likely be fragile)
then there is not really a general way to 'interpret' any possible
encoding.
you could set the channels to 'binary' mode and that will
disable all the translating of bytes between encodings.
You need to look at the "fconfigure" command for adjusting the encoding
used for file channels (the C API equivalent is the
Tcl_SetChannelOption function). You may simply need to set the
input and output channels to utf-8 for things to work correctly again.
Thanks for the response, Rich. It was very helpful.
On Tuesday, November 15, 2022 at 1:59:36 PM UTC-8, Rich wrote:
What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?
Most likely, Tcl became more properly Unicode aware.
When I look through the 8.4 and 8.5 Tcl release notes, I am not
finding anything about Unicode. Similarly for the list of TIPs -
there are several Unicode TIPs for 8.7/9.0, though.
Unless you can:
1) be informed of what actual encoding was used; or
2) write a bunch of code to try to infer the encoding used (and this
will likely be fragile)
then there is not really a general way to 'interpret' any possible
encoding.
That's what I was thinking.
you could set the channels to 'binary' mode and that will disable
all the translating of bytes between encodings.
The binary setting didn't help - rather it breaks 8.4 in the same way
that 8.6 is broken. This was after calling:
Tcl_SetChannelOption(interp, fc, "-encoding", "binary");
You need to look at the "fconfigure" command for adjusting the
encoding used for file channels (the C API equivalent is the
Tcl_SetChannelOption function). You may simply need to set the
input and output channels to utf-8 for things to work correctly
again.
Thanks for that pointer, fconfigure and Tcl_Get/SetChannelOption have
been very illuminating.
In Tcl 8.4, the "C" Tcl_Channel seems to have "-encoding" set to
"identity" by default. In Tcl 8.6, it is set to "iso8859-1" by
default. In the Tcl script, however, fconfigure shows default
"-encoding" set to "utf-8" for both Tcl 8.4 and Tcl 8.6.
Setting "-encoding" to "identity" in Tcl 8.6 seems to reestablish the previous behavior. Also, setting it explicitly to "utf-8" works as
well. Setting Tcl_SetSystemEncoding to "utf-8" changes the default
to "utf-8" in both Tcl 8.4 and Tcl 8.6.
I see this in the fconfigure doc page under -encoding:
"The default encoding for newly opened channels is the same platform-
and locale-dependent system encoding used for interfacing with the
operating system, as returned by encoding system."
Does that mean that the user can alter this behavior by setting an environment variable on Unix? Any idea where I can find out more
about that?
I am thinking that if I can provide the user with an environment
variable setting, then I won't have to worry about breaking someone
else's clever use of some other international strings in some other
place by forcing it to utf-8. I tried explicitly setting
LANG=en_US.UTF-8, but that didn't help. I'd also like to avoid
breaking things in new ways for Tcl 8.7 and Tcl 9.
I don't know, if it was mentioned before.
The tcl initialization code changed. To initialze static stuff, first: Tcl_FindExecutable(argv)
should be called.
On Wednesday, November 16, 2022 at 9:33:08 AM UTC-8, Harald Oehlmann wrote:
I don't know, if it was mentioned before.
The tcl initialization code changed. To initialze static stuff, first:
Tcl_FindExecutable(argv)
should be called.
That helps immensely - If I add the Tcl_FindExecutable(argv) call before creating the interpreter, it resolves the issue in my small testcase. We'll try that in the main application and see how it goes.
Thanks!
Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application. What is this call doing? Clearly it must be more than setting the executable name - also, in my small testcase, I don't seehow knowing the executable name (which is nowhere near the Tcl install tree) helps with anything. Does anyone know what is going on under the covers there?
Ralf - thanks for the info. Also, in searching for info about iso8859-1, it isn't suitable for Korean anyway as it only covers Roman alphabet derivatives.
As mentioned previously, we don't see issues in a pure Tcl script (see main.tcl in the original post), but only when creating a Tcl interpreter from C/C++ code.
Perhaps it is something that gets handled during initialization and isn't being initialized properly for Tcl 8.6?
Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application. What is this call doing? Clearly it must be more than setting the executable name - also, in my small testcase, I don't see howknowing the executable name (which is nowhere near the Tcl install tree) helps with anything. Does anyone know what is going on under the covers there?
Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application......
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 485 |
Nodes: | 16 (2 / 14) |
Uptime: | 131:48:41 |
Calls: | 9,655 |
Calls today: | 3 |
Files: | 13,707 |
Messages: | 6,166,572 |