• Re: Reading Unicode text in a non-localized application

    From Phillip Brooks@21:1/5 to All on Tue Nov 15 12:33:14 2022
    Note that I am building and running this on Red Hat Enterprise Linux 6.

    When I build and run this on Red Hat Enterprise Linux 8, the Tcl 8.4 case also fails to print properly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Brooks@21:1/5 to All on Tue Nov 15 12:22:45 2022
    Hi,

    We have noticed a problem in our application that started occurring with our transition to Tcl 8.6 from Tcl 8.4. The problem is that we read some user provided text using Tcl that eventually gets printed by our application. Although our application is
    not localized, enterprising users found that they can enter Unicode text into the file and then when it prints out, it ends up the same way it came in when we print it out from C++. When they started using the Tcl 8.6 version of our product, that
    stopped working and now garbage is printed where the nice unicode output was printed previously.

    Here is a small example program that illustrates this problem:

    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <tcl.h>
    #include <string.h>

    void dump_buffer( Tcl_Obj* read_obj_ptr ) {
    size_t buflen = strlen( read_obj_ptr->bytes );
    for( size_t i=0; i != buflen; ++i ) {
    if ( i > 0 && ( i % 10 == 0 )) { std::cout << std::endl; }
    unsigned char c = read_obj_ptr->bytes[i];
    std::cout << (unsigned int)c << " ";
    }
    std::cout << std::endl;
    }

    int main()
    {
    Tcl_Interp * interp = Tcl_CreateInterp();
    //Tcl_SetSystemEncoding(interp, "utf-8");

    Tcl_Channel fc = Tcl_OpenFileChannel(interp, "file", "r", 0644);
    if (!fc)
    {
    std::cout << "ERROR: Cannot open input TVF file for reading" << std::endl;
    return 1;
    }

    Tcl_Obj *read_obj_ptr = Tcl_NewObj();
    int chars_read = Tcl_ReadChars(fc, read_obj_ptr, -1, 0);
    char* str = Tcl_GetStringFromObj( read_obj_ptr, nullptr );
    std::cout << "TCL READ String\n";
    std::cout << str << std::endl;
    dump_buffer( read_obj_ptr );

    Tcl_Close(interp, fc);

    std::ifstream fc1("file");
    if ( fc1.fail() ) {
    std::cout << "ERROR: Cannot open input TVF file for reading" << std::endl;
    fc1.close();
    return 1;
    }
    std::stringstream buffer;
    buffer << fc1.rdbuf();

    if ( fc1.fail() || buffer.str().empty() )
    {
    std::cout << "ERROR: No data read from input TVF file" << std::endl;
    fc1.close();
    return 1;
    }
    fc1.close();

    Tcl_Obj *read_obj_ptr1 = Tcl_NewObj();
    Tcl_AppendToObj(read_obj_ptr1, buffer.str().c_str(), -1);
    std::cout << "C++ READ\n";
    std::cout << read_obj_ptr1->bytes << std::endl;
    dump_buffer( read_obj_ptr1 );

    return 0;
    }

    The file "file" contains unicode:
    $ cat file
    Korean : 서요한 가나다라 아야어여
    Armenian : Թեստ
    English : This line is redundant :)

    A Tcl only version of the program is:

    set f [ open file "r" ]
    set lines [ read $f ]
    puts "Tcl script READ Unicode"
    puts $lines

    It behaves as expected in both Tcl 8.4 and Tcl 8.6.

    Note that the commented call to Tcl_SetSystemEncoding will cause the program to work the same way for Tcl 8.6 and Tcl 8.4.

    The questions I have are:

    What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior? It seems that with Tcl 8.4, we were able to get the original content of the strings, but that Tcl 8.6 is altering the input in some way that makes it incompatible with C++.

    Is setting the Tcl_SetSystemEncoding call a reasonable fix for this, or will we run into other difficulties now or in the future (I notice that there are a lot of Unicode enhancements set up for Tcl 8.7 and Tcl 9)? What happens if someone gives us some
    non utf-8 encoded string? Is there a way to support that in this case?

    Be patient - I am not, by any means, a Unicode expert.

    Thanks!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Phillip Brooks on Tue Nov 15 21:59:32 2022
    Phillip Brooks <philbrks@gmail.com> wrote:
    What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?

    Most likely, Tcl became more properly Unicode aware.

    It seems that with Tcl 8.4, we were able to get the original content
    of the strings, but that Tcl 8.6 is altering the input in some way
    that makes it incompatible with C++.

    8.4 was likely using a setting that was transparent while 8.6 is likely
    trying to convert the incoming data into Tcl's internal UTF-8 variant.

    Is setting the Tcl_SetSystemEncoding call a reasonable fix for this,
    or will we run into other difficulties now or in the future (I notice
    that there are a lot of Unicode enhancements set up for Tcl 8.7 and
    Tcl 9)?

    The 'system' encoding is also used when passing strings to the OS API, modifying it /may/ cause other strange issues.

    What happens if someone gives us some non utf-8 encoded string? Is
    there a way to support that in this case?

    Unless you can:
    1) be informed of what actual encoding was used; or
    2) write a bunch of code to try to infer the encoding used (and this
    will likely be fragile)
    then there is not really a general way to 'interpret' any possible
    encodinng.

    However, if you just want the exact bytes present in the files to come
    back out, you could set the channels to 'binary' mode and that will
    disable all the translating of bytes between encodings.

    You need to look at the "fconfigure" command for adjusting the encoding
    used for file channels (the C API equivalent is the
    Tcl_SetChannelOption function). You may simply need to set the
    input and output channels to utf-8 for things to work correctly again.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luc@21:1/5 to Rich on Tue Nov 15 19:44:24 2022
    On Tue, 15 Nov 2022 21:59:32 -0000 (UTC), Rich wrote:

    Phillip Brooks <philbrks@gmail.com> wrote:
    What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?

    Most likely, Tcl became more properly Unicode aware.


    At least that I can attest. I've had these two applications that I made for myself for about 15 years, they use the clipboard and text widgets.
    They never handled Unicode correctly in the 8.4 and 8.5 era, and I just
    gave up on that, learning to live in resignation with some occasional
    garbled content.

    Only a few months ago I decided to try to fix them and it was very easy
    because the old problems I used to have with Unicode just weren't there anymore. I just removed the ugly kludges I had had in place to hide some
    of the problem and everything just worked.

    --
    Luc


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to Phillip Brooks on Tue Nov 15 20:13:06 2022
    On 11/15/2022 3:22 PM, Phillip Brooks wrote:
    Hi,

    We have noticed a problem in our application that started occurring with our transition to Tcl 8.6 from Tcl 8.4. The problem is that we read some user provided text using Tcl that eventually gets printed by our application. Although our application
    is not localized, enterprising users found that they can enter Unicode text into the file and then when it prints out, it ends up the same way it came in when we print it out from C++. When they started using the Tcl 8.6 version of our product, that
    stopped working and now garbage is printed where the nice unicode output was printed previously.


    You seem to have access to both versions of your application.
    Therefore, you could find out the exact encoding that was in place 8.4
    and enforce it in 8.6, or change it to something else.

    # find out current encoding
    % encoding system
    cp1251

    # change it to something else
    % encoding system unicode
    unicode

    # check
    % encoding system
    unicode

    # list all
    % encoding names
    ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Brooks@21:1/5 to Rich on Tue Nov 15 16:34:08 2022
    Thanks for the response, Rich. It was very helpful.

    On Tuesday, November 15, 2022 at 1:59:36 PM UTC-8, Rich wrote:
    What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?

    Most likely, Tcl became more properly Unicode aware.

    When I look through the 8.4 and 8.5 Tcl release notes, I am not finding anything about Unicode. Similarly for the list of TIPs - there are several Unicode TIPs for 8.7/9.0, though.

    Unless you can:
    1) be informed of what actual encoding was used; or
    2) write a bunch of code to try to infer the encoding used (and this
    will likely be fragile)
    then there is not really a general way to 'interpret' any possible
    encoding.

    That's what I was thinking.

    you could set the channels to 'binary' mode and that will
    disable all the translating of bytes between encodings.

    The binary setting didn't help - rather it breaks 8.4 in the same way that 8.6 is broken. This was after calling:

    Tcl_SetChannelOption(interp, fc, "-encoding", "binary");

    You need to look at the "fconfigure" command for adjusting the encoding
    used for file channels (the C API equivalent is the
    Tcl_SetChannelOption function). You may simply need to set the
    input and output channels to utf-8 for things to work correctly again.

    Thanks for that pointer, fconfigure and Tcl_Get/SetChannelOption have been very illuminating.

    In Tcl 8.4, the "C" Tcl_Channel seems to have "-encoding" set to "identity" by default. In Tcl 8.6, it is set to "iso8859-1" by default. In the Tcl script, however, fconfigure shows default "-encoding" set to "utf-8" for both Tcl 8.4 and Tcl 8.6.

    Setting "-encoding" to "identity" in Tcl 8.6 seems to reestablish the previous behavior. Also, setting it explicitly to "utf-8" works as well. Setting Tcl_SetSystemEncoding to "utf-8" changes the default to "utf-8" in both Tcl 8.4 and Tcl 8.6.

    I see this in the fconfigure doc page under -encoding:

    "The default encoding for newly opened channels is the same platform- and locale-dependent system encoding used for interfacing with the operating system, as returned by encoding system."

    Does that mean that the user can alter this behavior by setting an environment variable on Unix? Any idea where I can find out more about that? I am thinking that if I can provide the user with an environment variable setting, then I won't have to
    worry about breaking someone else's clever use of some other international strings in some other place by forcing it to utf-8. I tried explicitly setting LANG=en_US.UTF-8, but that didn't help. I'd also like to avoid breaking things in new ways for Tcl
    8.7 and Tcl 9.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Phillip Brooks on Wed Nov 16 02:17:56 2022
    Phillip Brooks <philbrks@gmail.com> wrote:
    Thanks for the response, Rich. It was very helpful.

    On Tuesday, November 15, 2022 at 1:59:36 PM UTC-8, Rich wrote:
    What changed between Tcl 8.4 and Tcl 8.6 to alter the behavior?

    Most likely, Tcl became more properly Unicode aware.

    When I look through the 8.4 and 8.5 Tcl release notes, I am not
    finding anything about Unicode. Similarly for the list of TIPs -
    there are several Unicode TIPs for 8.7/9.0, though.

    The change might not necesarially referenced Unicode, it might have
    refered to channel encodings, or other terms. Note I'm not saying you
    are wrong, just that if changes did happen (and 8.4 to 8.6 is a wide
    time window) they might not have used the word "Unicode" but still
    might have been impactful.

    Unless you can:
    1) be informed of what actual encoding was used; or
    2) write a bunch of code to try to infer the encoding used (and this
    will likely be fragile)
    then there is not really a general way to 'interpret' any possible
    encoding.

    That's what I was thinking.

    you could set the channels to 'binary' mode and that will disable
    all the translating of bytes between encodings.

    The binary setting didn't help - rather it breaks 8.4 in the same way
    that 8.6 is broken. This was after calling:

    Tcl_SetChannelOption(interp, fc, "-encoding", "binary");

    Interesting...

    You need to look at the "fconfigure" command for adjusting the
    encoding used for file channels (the C API equivalent is the
    Tcl_SetChannelOption function). You may simply need to set the
    input and output channels to utf-8 for things to work correctly
    again.

    Thanks for that pointer, fconfigure and Tcl_Get/SetChannelOption have
    been very illuminating.

    In Tcl 8.4, the "C" Tcl_Channel seems to have "-encoding" set to
    "identity" by default. In Tcl 8.6, it is set to "iso8859-1" by
    default. In the Tcl script, however, fconfigure shows default
    "-encoding" set to "utf-8" for both Tcl 8.4 and Tcl 8.6.

    If your users have been sneaking in UTF-8 encoded data, and the channel
    is now set for iso8859-1, you'll get ugly messes out as a result.

    I.e., if your users entered a Unicode right single quote (U+2019) but
    the channel is set to iso8859-1, you get: @Y out instead of a right
    single quote mark.

    But, if your users have been entering UTF-8 encoded text, you'd also be
    safe setting the channels to UTF-8 as well.

    Setting "-encoding" to "identity" in Tcl 8.6 seems to reestablish the previous behavior. Also, setting it explicitly to "utf-8" works as
    well. Setting Tcl_SetSystemEncoding to "utf-8" changes the default
    to "utf-8" in both Tcl 8.4 and Tcl 8.6.

    The Tcl wiki has this to say about the 'identity' encoding:

    https://wiki.tcl-lang.org/page/encoding+system

    Can soneone elaborate on the meaning of the 'identity' encoding?
    When using freewrap I get:

    % encoding system
    identity

    What is this and what is it used for?

    schlenk 2005-06-27: The identity encoding is for testing purposes,
    it should not be used without very good reasons. If you see your
    encoding system set to identity, you are missing the proper encoding
    files for your setup. This happens with tclkit-sh.exe on windows or
    other wrapped applications which do not include the right encodings
    for the local system they are running on.

    Googie 2012-08-09: The 'identity' encoding is the default encoding
    in my Tcl, even I use regular tclsh and not tclkit. Why is so? (I
    use Linux)

    PYK 2018-12-04: It is so because your Tcl configuration is borked.

    Is your code running inside a 'wrapped' executable -- if the Wiki
    statements here are correct, the fact that you get 'identity' on 8.4
    would imply that the fact that "it worked" was more of a stroke of luck
    than anything else.

    If setting to UTF-8 'fixes things' then your likely best course is to
    set the channels to UTF-8 and let it be. UTF-8 is all but the
    'universal' encoding now for just about everything, so you'd be more
    'future proof' to explictly set UTF-8 than not.

    I see this in the fconfigure doc page under -encoding:

    "The default encoding for newly opened channels is the same platform-
    and locale-dependent system encoding used for interfacing with the
    operating system, as returned by encoding system."

    Does that mean that the user can alter this behavior by setting an environment variable on Unix? Any idea where I can find out more
    about that?

    Sadly, no. And the only real mention of LANG= in the wiki is that Tcl
    uses it to guess what encoding to set as 'system' when it initializes.

    I am thinking that if I can provide the user with an environment
    variable setting, then I won't have to worry about breaking someone
    else's clever use of some other international strings in some other
    place by forcing it to utf-8. I tried explicitly setting
    LANG=en_US.UTF-8, but that didn't help. I'd also like to avoid
    breaking things in new ways for Tcl 8.7 and Tcl 9.

    Try LANG=C, which might 'trick' things. But if you do want to avoid
    future breakage, if switching to 'utf-8' 'fixes' things now, then that
    switch should cause less breakage in the future than not. Anything
    else you to would just be a band-aid over another band-aid and itself
    likely to subtly break in other ways in the future.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Wed Nov 16 17:23:48 2022
    * Rich <rich@example.invalid>
    | Phillip Brooks <philbrks@gmail.com> wrote:
    | > I am thinking that if I can provide the user with an environment
    | > variable setting, then I won't have to worry about breaking someone
    | > else's clever use of some other international strings in some other
    | > place by forcing it to utf-8. I tried explicitly setting
    | > LANG=en_US.UTF-8, but that didn't help. I'd also like to avoid
    | > breaking things in new ways for Tcl 8.7 and Tcl 9.

    | Try LANG=C, which might 'trick' things. But if you do want to avoid
    | future breakage, if switching to 'utf-8' 'fixes' things now, then that
    | switch should cause less breakage in the future than not. Anything
    | else you to would just be a band-aid over another band-aid and itself
    | likely to subtly break in other ways in the future.

    Linux/Opensuse 15.4:

    $ env LANG=de_DE.UTF-8 tclsh
    % fconfigure stdout -encoding
    utf-8

    $ env LANG=en_US.UTF-8 tclsh
    % fconfigure stdout -encoding
    utf-8

    $ env LANG=C tclsh
    % fconfigure stdout -encoding
    iso8859-1

    So LANG=C is probably not the Right Thing in the context of this thread.


    If the LANG=en_US.UTF-8 did not work for the OP, most likely he had set
    some other env-vars (namely LC_ALL or LC_CTYPE):

    unix/tclUnixInit.c, Tcl_GetEncodingNameFromEnvironment():

    /*
    * Determine the current encoding from the LC_* or LANG environment
    * variables.
    --<snip-snip>--
    encoding = getenv("LC_ALL");

    if (encoding == NULL || encoding[0] == '\0') {
    encoding = getenv("LC_CTYPE");
    }
    if (encoding == NULL || encoding[0] == '\0') {
    encoding = getenv("LANG");
    }

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Nov 16 18:32:00 2022
    Am 16.11.2022 um 18:16 schrieb Phillip Brooks:

    I don't know, if it was mentioned before.
    The tcl initialization code changed. To initialze static stuff, first: Tcl_FindExecutable(argv)
    should be called.

    Hope this helps,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Brooks@21:1/5 to All on Wed Nov 16 09:16:28 2022
    The encoding system command doesn't seem to yield anything meaningful in terms of my observed behavior of what default encoding is present. I am finding various builds of tcl, both 8.4 and 8.6, that seem to set it different ways - possibly by something
    in the install tree?

    From my 8.4 product install tree:
    $MGC_HOME/bin/tclsh
    % encoding system
    iso8859-1

    From my generic 8.4 build:
    $ /usr/local/tcl8.4b/bin/tclsh8.4
    % encoding system
    utf-8

    As mentioned previously, we don't see issues in a pure Tcl script (see main.tcl in the original post), but only when creating a Tcl interpreter from C/C++ code.

    Perhaps it is something that gets handled during initialization and isn't being initialized properly for Tcl 8.6?

    I do note that there are a lot of references to iso8859-1 in the Tcl source tree. One of them is in unix/README regarding the configure script:

    --with-encoding=ENCODING Specifies the encoding for compile-time
    configuration values. Defaults to iso8859-1,
    which is also sufficient for ASCII.

    Might it be that I can ask the customer to use iso8859-1 encoding instead of utf-8 for their localized comments?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Brooks@21:1/5 to Harald Oehlmann on Wed Nov 16 11:03:48 2022
    On Wednesday, November 16, 2022 at 9:33:08 AM UTC-8, Harald Oehlmann wrote:

    I don't know, if it was mentioned before.
    The tcl initialization code changed. To initialze static stuff, first: Tcl_FindExecutable(argv)
    should be called.

    That helps immensely - If I add the Tcl_FindExecutable(argv) call before creating the interpreter, it resolves the issue in my small testcase. We'll try that in the main application and see how it goes.

    Thanks!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Thu Nov 17 08:40:53 2022
    Am 16.11.2022 um 20:03 schrieb Phillip Brooks:
    On Wednesday, November 16, 2022 at 9:33:08 AM UTC-8, Harald Oehlmann wrote:

    I don't know, if it was mentioned before.
    The tcl initialization code changed. To initialze static stuff, first:
    Tcl_FindExecutable(argv)
    should be called.

    That helps immensely - If I add the Tcl_FindExecutable(argv) call before creating the interpreter, it resolves the issue in my small testcase. We'll try that in the main application and see how it goes.

    Thanks!

    Great to hear. Cudos to the TCL designers, which worked a lot on the
    embedded issue.
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Thu Nov 17 11:33:01 2022
    * Phillip Brooks <philbrks@gmail.com>
    | Might it be that I can ask the customer to use iso8859-1 encoding
    | instead of utf-8 for their localized comments?

    Don't. UTF-8 is the way to go. iso8859-1 will not even transfer
    properly to Windows, where the default codepage for Europe (cp1252)
    is subtly different from iso8859-1 for 128ff.

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Thu Nov 17 18:12:01 2022
    Am 17.11.2022 um 17:57 schrieb Phillip Brooks:
    Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application. What is this call doing? Clearly it must be more than setting the executable name - also, in my small testcase, I don't see
    how knowing the executable name (which is nowhere near the Tcl install tree) helps with anything. Does anyone know what is going on under the covers there?

    Ralf - thanks for the info. Also, in searching for info about iso8859-1, it isn't suitable for Korean anyway as it only covers Roman alphabet derivatives.

    In one project
    https://wiki.tcl-lang.org/page/Embedding+TCL+program+in+DLL
    I debugged a lot the embedded stuff.
    Tcl_FindExecutable(null) does a lot more.
    I don't remember where the system encoding was set.
    But it passed somewhere on the journey.
    You may need to call Tcl_Init after creation of the interpreter...

    Take care,
    Harald

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Brooks@21:1/5 to All on Thu Nov 17 08:57:38 2022
    Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application. What is this call doing? Clearly it must be more than setting the executable name - also, in my small testcase, I don't see how
    knowing the executable name (which is nowhere near the Tcl install tree) helps with anything. Does anyone know what is going on under the covers there?

    Ralf - thanks for the info. Also, in searching for info about iso8859-1, it isn't suitable for Korean anyway as it only covers Roman alphabet derivatives.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ralf Fassel@21:1/5 to All on Thu Nov 17 18:13:06 2022
    * Phillip Brooks <philbrks@gmail.com>
    | Unfortunately, using Tcl_FindExecutable(argv), which works in the
    | small example program, is not working in our application. What is
    | this call doing?

    Read the source, Luke.

    tcl8.6.13: generic/tclEncoding.c:1449
    void
    Tcl_FindExecutable(
    const char *argv0) /* The value of the application's argv[0]
    * (native). */
    {
    TclInitSubsystems();
    TclpSetInitialEncodings();
    TclpFindExecutable(argv0);
    }

    Could you you show the relevant code from your application (i.e. the
    Tcl_Open* calls, the write calls etc) together with what happens, and
    what you expect to happen?

    | Ralf - thanks for the info. Also, in searching for info about
    | iso8859-1, it isn't suitable for Korean anyway as it only covers Roman
    | alphabet derivatives.

    iso8859-1 also does not even contain €, you need iso8859-15 for that ;-)

    R'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From saitology9@21:1/5 to Phillip Brooks on Thu Nov 17 14:23:28 2022
    On 11/16/2022 12:16 PM, Phillip Brooks wrote:

    As mentioned previously, we don't see issues in a pure Tcl script (see main.tcl in the original post), but only when creating a Tcl interpreter from C/C++ code.


    Well, the idea was that you'd find out which encoding works on your
    client side and enforce that everywhere. However, ....


    Perhaps it is something that gets handled during initialization and isn't being initialized properly for Tcl 8.6?


    This is interesting. You are embedding Tcl in a larger C/C++
    application and as you state, Tcl takes care of things fine. So, if you
    still have the issue, it would behoove you to look at the rest of the
    C/C++ program. Namely, I would expect that you'd have to handle the
    encoding there as well. I am not sure if the embedded Tcl interpreter's control reaches outwards into the embedding system.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From briang@21:1/5 to Phillip Brooks on Thu Nov 17 12:39:50 2022
    On Thursday, November 17, 2022 at 8:57:40 AM UTC-8, Phillip Brooks wrote:
    Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application. What is this call doing? Clearly it must be more than setting the executable name - also, in my small testcase, I don't see how
    knowing the executable name (which is nowhere near the Tcl install tree) helps with anything. Does anyone know what is going on under the covers there?

    Are you running multi-threaded? Are you running multiple interps in multiple threads? I think you need to call Tcl_FindExecutable(NULL) in each thread, before creating any interps in the thread.

    -Brian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Werner@21:1/5 to All on Thu Nov 17 13:21:56 2022
    Unfortunately, using Tcl_FindExecutable(argv), which works in the small example program, is not working in our application......

    A larger C++ based program? Does it have global constructors? Which run before main()? Which call Tcl_SomeThing()?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Brooks@21:1/5 to All on Mon Nov 28 08:14:41 2022
    It turns out that the difference between our main C++ application and the smaller test program is that there is a wrapper script that launches the main application that also unsets the LANG variable. I think this was done in response to a previous case
    where some particular setting of LANG was causing problems with our non-localized Tk gui code. Having LANG unset or set to blank also causes whatever initialization was happening in Tcl_FindExecutable not to happen anymore. I think we'll need to hard-
    wire LANG to en_US.UTF-8 or some such.

    Thanks for all the help in tracking this down.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)