• Re: How to set string to NULL (Hex value 00)

    From Torsten Berg@21:1/5 to Donal K. Fellows on Sat Jan 28 15:34:39 2023
    Wow, this is an old discussion ... but this is the problem I seem to have with Tcl 8.6.12 ...

    I need to build a BLOB for a field in an SQLite table. It should start with these four bytes:

    byte[2] magic = 0x4750;
    byte version;
    byte flags;

    So, the first one is ASCII "GP", the second one should be a zero as an "8-bit unsigned integer" and the third one is a byte with flags that is "00000011" (only the two right-most bits are set) in my case. What I do is

    binary format a2BB8 GP 0 00000011

    Looking at the hex representation of the BLOB, it looks like this: "47 50 c0 80 03"
    I can see the correct first two bytes (the "GP") and the last byte (the flags) but the NULL comes out as "c080".

    Even if I do

    set BLOB \x47\x50\x00\x03

    I get the same output.

    So, how do I get the BLOB to look like this (hex representation): "47 50 00 03"

    On Thursday, April 1, 2004 at 2:43:49 PM UTC+2, Donal K. Fellows wrote:
    Michael Schlenker wrote:
    So your above code probably should work, as co80 was a valid NULL in
    some UTF-8, but nowadays this is broken.
    There was some discussion of this topic before Xmas between some UNICODE people and some of the Core Team. We couldn't reach agreement over what
    the right way forward was; their preferred solutions (which varied from making the app exit immediately to substituting such sequences with the UNICODE "unknown character sequence" character) would have broken far
    too much existing code and data for our taste, and our preferred
    solutions (which can be summed up largely by the IETF dictum "Be liberal
    in what you accept and strict in what you generate") had them throwing
    up their arms in horror. We did not see eye-to-eye... :^(
    Donal.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Gollwitzer@21:1/5 to All on Sun Jan 29 11:58:10 2023
    Am 29.01.23 um 00:34 schrieb Torsten Berg:
    Wow, this is an old discussion ... but this is the problem I seem to have with Tcl 8.6.12 ...

    I need to build a BLOB for a field in an SQLite table. It should start with these four bytes:

    byte[2] magic = 0x4750;
    byte version;
    byte flags;

    So, the first one is ASCII "GP", the second one should be a zero as an "8-bit unsigned integer" and the third one is a byte with flags that is "00000011" (only the two right-most bits are set) in my case. What I do is

    binary format a2BB8 GP 0 00000011

    Looking at the hex representation of the BLOB, it looks like this: "47 50 c0 80 03"
    I can see the correct first two bytes (the "GP") and the last byte (the flags) but the NULL comes out as "c080".

    This means that the string has been encoded in UTF-8. The problem is not
    with "binary format", it is the transition from the string to the
    database. For example, if you were writing the content to a file, you
    would need to do "fconfigure $fd -encoding binary -translation binary"
    to do so, and what you see is alike to "fconfigure $fd -encoding utf8".

    Hence, you need to check the database interface layer if there is an
    option to pass binary contents.

    Christian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to Torsten Berg on Sun Jan 29 13:54:29 2023
    Torsten Berg <berg@typoscriptics.de> wrote:
    Wow, this is an old discussion ... but this is the problem I seem to have with Tcl 8.6.12 ...

    I need to build a BLOB for a field in an SQLite table. It should start with these four bytes:

    byte[2] magic = 0x4750;
    byte version;
    byte flags;

    So, the first one is ASCII "GP", the second one should be a zero as
    an "8-bit unsigned integer" and the third one is a byte with flags
    that is "00000011" (only the two right-most bits are set) in my case.
    What I do is

    binary format a2BB8 GP 0 00000011

    Looking at the hex representation of the BLOB, it looks like this:
    "47 50 c0 80 03"

    I can see the correct first two bytes (the "GP") and the last byte
    (the flags) but the NULL comes out as "c080".

    Even if I do

    set BLOB \x47\x50\x00\x03

    I get the same output.

    So, how do I get the BLOB to look like this (hex representation): "47 50 00 03"

    You have not stated how you are looking at the "hex representation".
    If you ask for the 'hex' in the normal Tcl way, it appears to work
    properly:

    $ rlwrap tclsh
    % set blob [binary format a2BB8 GP 0 00000011]
    GP
    % binary scan $blob H* hex
    1
    % set hex
    47500003
    %

    And as Christian pointed out, the hex you quote is the UTF-8 encoding
    of the binary blob. So it looks like you are gaining a UTF-8 encoding
    of the blob somewhere.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Torsten Berg@21:1/5 to Rich on Sun Jan 29 11:01:20 2023
    Hi,

    and thanks for your thoughts! They made me read the SQLite documentation again carefully for the Tcl binding and the 'eval' command. I found this sentence:

    "If the $bigstring variable has both a string and a "bytearray" representation, then TCL inserts the value as a string. If it has only a "bytearray" representation, then the value is inserted as a BLOB. To force a value to be inserted as a BLOB even if
    it also has a text representation, use a "@" character to in place of the "$"."

    That did the trick!

    On Sunday, January 29, 2023 at 2:54:34 PM UTC+1, Rich wrote:
    Torsten Berg <be...@typoscriptics.de> wrote:
    Wow, this is an old discussion ... but this is the problem I seem to have with Tcl 8.6.12 ...

    I need to build a BLOB for a field in an SQLite table. It should start with these four bytes:

    byte[2] magic = 0x4750;
    byte version;
    byte flags;

    So, the first one is ASCII "GP", the second one should be a zero as
    an "8-bit unsigned integer" and the third one is a byte with flags
    that is "00000011" (only the two right-most bits are set) in my case.
    What I do is

    binary format a2BB8 GP 0 00000011

    Looking at the hex representation of the BLOB, it looks like this:
    "47 50 c0 80 03"

    I can see the correct first two bytes (the "GP") and the last byte
    (the flags) but the NULL comes out as "c080".

    Even if I do

    set BLOB \x47\x50\x00\x03

    I get the same output.

    So, how do I get the BLOB to look like this (hex representation): "47 50 00 03"
    You have not stated how you are looking at the "hex representation".
    If you ask for the 'hex' in the normal Tcl way, it appears to work
    properly:

    $ rlwrap tclsh
    % set blob [binary format a2BB8 GP 0 00000011]
    GP
    % binary scan $blob H* hex
    1
    % set hex
    47500003
    %

    And as Christian pointed out, the hex you quote is the UTF-8 encoding
    of the binary blob. So it looks like you are gaining a UTF-8 encoding
    of the blob somewhere.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)