• Re: Hex string literals (was Re: C23 thoughts and opinions)

    From David Brown@21:1/5 to Keith Thompson on Mon Jun 17 11:42:22 2024
    On 17/06/2024 01:48, Keith Thompson wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    uc"..." string literals might be made even simpler, for example allowing
    only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
    could be useful in other contexts, and programmers will want
    flexibility. Maybe something like hex"01020304" (embedded spaces could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
    [...]

    *If* hexadecimal string literals were to be added to a future version
    of the language, I think I have a syntax that I like better than
    what I suggested.


    I like your suggestion here. It's very similar to mine, though with a
    prefix 0x"..." rather than b"...". I'd be fine with either.

    Inspired by the existing syntax for integer and floating-point
    hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an
    expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
    with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is
    irrelevant; we're specifying byte values in order, not bytes of
    the representation of some larger type. memcpy()ing 0x"deadbeef"
    to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
    more exotic possibilities).

    Again, unlike other string literals, there is no implicit terminating
    null byte. And I suggest making them const, since there's no
    existing code to break.

    If CHAR_BIT==8, each byte is represented by two hex digits. More
    generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
    the absence of whitespace. Added whitespace marks the end of a byte, 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8 respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
    0x"" is a syntax error, since C doesn't support zero-length arrays.
    Anything between the quotes other than hex digits and spaces is a
    syntax error.

    Fair enough.


    0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the
    end of a byte, but the usage of spaces doesn't have to be consistent.

    This could be made more flexible by allowing various backslash
    escapes, but I'm not inclined to complicate it too much.

    I would /definitely/ vote against any kind of backslash escapes here.
    That would mess up the simplicity of the syntax.

    There might be benefits in having standardised macros that generate
    multiple copies of a given hex string and that sort of thing.


    Note that the value of a (proposed) hex string literal is not a
    string unless it happens to end in zero. I still use the term
    "string literal" because it's closely tied to existing string
    literal syntax, and existing string literals don't necessarily
    represent strings anyway ("embedded\0null\0characters").

    Binary string literals 0b"11001001" might also be worth
    considering (that's of type `const unsigned char[1]`).

    That is /highly/ unlikely to be useful. I work in the field that uses
    binary more than anywhere else, and where compilers have supported
    0b11001001 format for binary literals from /long/ before they reached
    the C standards - and I have very rarely seen them in practice. When
    you do see them, they are in isolation - no one will write enough binary
    values in a row for such a format to be useful. Hex strings are
    potentially useful because you are cutting { 0x12, 0x34, 0x45, 0x67 } to 0x"12344567", which is a fair bit more compact. For binary, the
    compaction is irrelevant and indeed counter-productive - binary literals
    became a lot more practical with the introduction of digit separators.
    (For standard C, these are from C23, but for C++ they came in C++14, and compilers have supported them as extensions in C.)


    Octal
    string literals 0"012 345 670" *might* be worth considering.

    Most situations where octal could be useful died out many decades ago -
    it is vastly more likely that "012" is intended to mean 12 than 10. No
    serious programming language supports a leading 0 as an indication of
    octal unless they are forced to do so by backwards compatibility, and
    many that used to support them have dropped them.

    Having /some/ way to write octal can be helpful to old *nix programmers
    who prefer 046 to "S_IRUSR | S_IWUSR | S_IRGRP" in their chmod calls.
    (And to be fair, the constant names made in ancient history with short identifier length limits are pretty ugly.) But it is not something to
    be encouraged, and I think there is no simple syntax that is obviously
    octal, and not easily mistaken for something else.

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm>
    proposes a new "0o123" syntax for octal constants; if that's adopted,
    I propose allowing 0o"..." and *not" 0"...". I'm not sure whether
    to suggest hex only, or doing hex, octal, and binary for the sake
    of completeness.

    Binary support is useless, and octal support would be worse than useless
    - even using an 0o rather than 0 prefix. Completeness is not a
    justification for repeating old mistakes or complicating a good idea
    with features that will never be used.


    What I'm trying to design here is a more straightforward way to
    represent raw (unsigned char[]) data in C code, largely but not
    exclusively for use by #embed.


    Personally, I'd see it as useful when /not/ using #embed. I really do
    not think programmers will care what format #embed uses. I don't share
    your concerns about efficiency of implementation, or that programmers
    need to know when it is efficient or not. In almost all circumstances,
    C programmers never see or need to think about a separation between a C preprocessor and a post-processed C compiler - they are seen as a single entity, and can use whatever format is convenient between them. And
    once you ignore the implementation details, which are an SEP, the way
    #embed is defined is better than a definition using these new hex blob
    strings.

    But I have seen situations where it is useful to have embedded blobs
    directly in the source file, and then a compact solution would be
    convenient. Currently most people use a list of hex constants, either
    byte for byte or sometimes in larger units, and hex strings like this
    would make it neater and more convenient. (Attempts to use current
    string literals for the purpose look more like corruption in the file
    than source code.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Kettlewell@21:1/5 to Keith Thompson on Mon Jun 17 11:41:03 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Inspired by the existing syntax for integer and floating-point
    hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an
    expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
    with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is
    irrelevant; we're specifying byte values in order, not bytes of
    the representation of some larger type. memcpy()ing 0x"deadbeef"
    to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
    more exotic possibilities).

    I like the syntax and I’d find it useful.

    There’s more to life than byte arrays, though, so I wonder if there’s
    more to be said here. I find myself dealing a lot with large integers
    generally represented as arrays of some unsigned type (commonly uint32_t
    but other possibilities arise too).

    In C as it stands today this requires a translation step before
    constants can be embedded in source code (which is error-prone if
    someone attempts to do it manually).

    So being able to say ‘0x8732456872648956348596893765836543 as array of uint64_t, LSW first’ (in some suitably C-like syntax) would be a big improvement from my perspective, primarily as an accelerator to
    development but also as a small improvement in robustness.

    Again, unlike other string literals, there is no implicit terminating
    null byte. And I suggest making them const, since there's no
    existing code to break.

    If CHAR_BIT==8, each byte is represented by two hex digits. More
    generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
    the absence of whitespace. Added whitespace marks the end of a byte, 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8 respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
    0x"" is a syntax error, since C doesn't support zero-length arrays.
    Anything between the quotes other than hex digits and spaces is a
    syntax error.

    Would "0x1 23 45 67" be a syntax error or { 0x1, 0x23, 0x45, 0x67 }?

    What I'm trying to design here is a more straightforward way to
    represent raw (unsigned char[]) data in C code, largely but not
    exclusively for use by #embed.

    Compilers can already implement #embed however they like, there’s no
    need for a standardized way to represent the ‘inside’ of a #embed.

    --
    https://www.greenend.org.uk/rjk/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Keith Thompson on Mon Jun 17 14:21:32 2024
    On 17/06/2024 00:48, Keith Thompson wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    uc"..." string literals might be made even simpler, for example allowing
    only hex digits and not requiring \x (uc"01020304" rather than
    uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
    could be useful in other contexts, and programmers will want
    flexibility. Maybe something like hex"01020304" (embedded spaces could
    be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
    [...]

    *If* hexadecimal string literals were to be added to a future version
    of the language, I think I have a syntax that I like better than
    what I suggested.

    Inspired by the existing syntax for integer and floating-point
    hex constants, I propose using a "0x" prefix. 0x"deadbeef" is an
    expression of type `const unsigned char[4]` (assuming CHAR_BIT==8),
    with values 0xde, 0xad, 0xbe, 0xef in that order. Byte order is
    irrelevant; we're specifying byte values in order, not bytes of
    the representation of some larger type. memcpy()ing 0x"deadbeef"
    to a uint32 might yield either 0xdeadbeef or uxefbeadde (or other
    more exotic possibilities).

    Some points:

    * Can the hex string span multiple lines? (You say space is the only
    white-space allowed)

    * If not, would adjacent hex strings be concatenated, as happens with
    ordinary strings? Since hex data for one char array can be large.

    * Your examples use only digits a-f but I assume A-F will work too.

    * Can individual byte values end early, so allowing B to mean 0B? (My
    scheme requires hex digits to be in pairs.)


    Again, unlike other string literals, there is no implicit terminating
    null byte. And I suggest making them const, since there's no
    existing code to break.

    If CHAR_BIT==8, each byte is represented by two hex digits. More
    generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
    the absence of whitespace. Added whitespace marks the end of a byte, 0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8 respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
    0x"" is a syntax error, since C doesn't support zero-length arrays.
    Anything between the quotes other than hex digits and spaces is a
    syntax error.

    0x"dead beef" is still 4 bytes if CHAR_BIT==8; the space forces the
    end of a byte, but the usage of spaces doesn't have to be consistent.

    Here it gets confusing. But first, I understand that CHAR_BIT could be
    64, where hex literals get long enough that they could do with
    separators. But spaces now are significant in marking the early end of a
    64-bit value.

    What I have in mind is that somebody might write 0x"12 34 56 78" to
    designate 4 8-bit values totalling 32 bits, and wants the spaces for readability. Compiled for a machine with 16-bit characters, it will now represent (in little-endian) the 64-bit value 0x0078005600340012 instead
    of 0x78563412.

    I assume the hex string can only be used to initialise a char[] array?
    (The feature I presented elsewhere, 'data-strings', could be used to
    initialise any array type, just like #embed IIUC.)



    This could be made more flexible by allowing various backslash
    escapes, but I'm not inclined to complicate it too much.

    Note that the value of a (proposed) hex string literal is not a
    string unless it happens to end in zero. I still use the term
    "string literal" because it's closely tied to existing string
    literal syntax, and existing string literals don't necessarily
    represent strings anyway ("embedded\0null\0characters").

    Binary string literals 0b"11001001" might also be worth
    considering (that's of type `const unsigned char[1]`).

    You mean, values that can only be one byte long? I don't get it. How
    many use-cases are there for char-arrays that are only a byte long?

    Assuming that [1] was a typo for [], then I still have trouble finding
    uses for this.

    Perhaps initialise a char[][] table representing a one-bit-per-pixel
    image? Bit-order becomes critical here.

    Here, C already has 64-bit binary literals, using those might be a
    better idea, since a char[][] is the wrong type anyway, unless you can
    have bool[][] which is guaranteed to use 1-bit bools.

    Octal
    string literals 0"012 345 670" *might* be worth considering.

    AFAIK nobody uses octal anymore.


    What I'm trying to design here is a more straightforward way to
    represent raw (unsigned char[]) data in C code, largely but not
    exclusively for use by #embed.

    Sorry, I thought this was an alternative to #embed, for smaller amounts
    of data directly written in source code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Kettlewell@21:1/5 to Richard Kettlewell on Mon Jun 17 14:57:54 2024
    Richard Kettlewell <invalid@invalid.invalid> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    If CHAR_BIT==8, each byte is represented by two hex digits. More
    generally, each byte is represented by (CHAR_BIT+3)/4 hex digits in
    the absence of whitespace. Added whitespace marks the end of a byte,
    0x"deadbeef" is 1, 2, 3, or 4 bytes if CHAR_BIT is 32, 16, 12, or 8
    respectively, but 0x"de ad be ef" is 4 bytes regardless of CHAR_BIT.
    0x"" is a syntax error, since C doesn't support zero-length arrays.
    Anything between the quotes other than hex digits and spaces is a
    syntax error.

    Would "0x1 23 45 67" be a syntax error or { 0x1, 0x23, 0x45, 0x67 }?

    FTAOD I mean
    0x"1 23 45 67"

    --
    https://www.greenend.org.uk/rjk/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Tue Jun 18 04:19:59 2024
    On Mon, 17 Jun 2024 17:19:50 -0700, Keith Thompson wrote:

    C23 adds the option to use apostrophes as separators in numeric
    constants: 123'456'789 or 0xdead'beef, for example. (This is borrowed
    from C++.

    Why not underscores, as supported in both Ada and Python?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Tue Jun 18 04:19:19 2024
    On Mon, 17 Jun 2024 11:42:22 +0200, David Brown wrote:

    Most situations where octal could be useful died out many decades ago -
    it is vastly more likely that "012" is intended to mean 12 than 10. No serious programming language supports a leading 0 as an indication of
    octal unless they are forced to do so by backwards compatibility, and
    many that used to support them have dropped them.

    For one example, Python didn’t drop octal numbers in the 2→3 transition, but it changed the syntax from a simple “0” prefix to having a “0o” prefix
    (analogous to “0x” for hex literals) instead.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to bart on Mon Jun 17 22:39:00 2024
    bart <bc@freeuk.com> writes:

    AFAIK nobody uses octal anymore.

    There are circumstances where being able to write constants
    in octal is useful. It also would be nice to be able to
    write constants in base 4 and base 32 (because 5 is half
    of 10). I don't have occasion to prefer octal very often
    but I'm glad it's there for those times when I do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Tue Jun 18 08:12:15 2024
    On Mon, 17 Jun 2024 18:57:09 -0700, Keith Thompson wrote:

    You could use some kind of type punning. For example, this is currently legal:

    union {
    unsigned char buf[4];
    uint32_t n;
    } obj = {
    .buf = { 0x01, 0x02, 0x03, 0x04 }
    };

    The { 0x01, 0x02, 0x03, 0x04 } could be replaced with 0x"01020304".

    In Open Source, the definition of “source” is (to the effect of) “the preferred representation of the program for doing development with”.

    The implication to me is, if the C source form has to be cryptic and error-prone and basically hard to work with, then I would back up one step
    and use some other form for that part of the source, that would be
    translated to C source as part of the build process and linked against the
    rest of the code. The generated C source would not be part of the repo
    commit history, but the input used to generate it (along with any custom
    build tool setup/programming) would.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Tue Jun 18 12:39:40 2024
    On Mon, 17 Jun 2024 22:39:00 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    bart <bc@freeuk.com> writes:

    AFAIK nobody uses octal anymore.

    There are circumstances where being able to write constants
    in octal is useful. It also would be nice to be able to
    write constants in base 4 and base 32 (because 5 is half
    of 10). I don't have occasion to prefer octal very often
    but I'm glad it's there for those times when I do.

    Ada/VHDL permits any base from 2 to 16. They didn't go as far up as
    32.
    I would imagine that reading base 32 number would take time to become accustomed.
    Besides, using I and O as digits is problematic because of visual
    similarity to 1 an 0. Using l is problematic both because of visual
    similarity to 1 and because of clash with existing use as suffix.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to bart on Tue Jun 18 09:50:34 2024
    On 2024-06-17, bart <bc@freeuk.com> wrote:
    AFAIK nobody uses octal anymore.

    Unix shell and C programmers fairly often use octal unintentionally,
    whereby harmless-looking leading zero changes what was supposed to
    be, say, 77 to 63.


    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Tue Jun 18 11:28:01 2024
    On 18/06/2024 10:39, Michael S wrote:
    On Mon, 17 Jun 2024 22:39:00 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    bart <bc@freeuk.com> writes:

    AFAIK nobody uses octal anymore.

    There are circumstances where being able to write constants
    in octal is useful. It also would be nice to be able to
    write constants in base 4 and base 32 (because 5 is half
    of 10). I don't have occasion to prefer octal very often
    but I'm glad it's there for those times when I do.

    Ada/VHDL permits any base from 2 to 16. They didn't go as far up as
    32.

    So did I for a while (in my language), for both integer and float
    constants. (Possibly influenced by languages such as Ada which allowed
    any base.)

    But it was more of a novelty. In the end I decided they were just not
    useful enough, and simplified it so that only bases 2, 10 and 16 were supported.

    Although output of integers in those bases is still possible (supported
    by code in a library, rather than within a compiler):

    print 81:"x3" # displays 10000

    (Output of floats is done via sprintf; the C library won't do those odd
    bases.)


    While I did have it though, I noticed this strange discrepancy between
    between my hex floats and C's. In C, this value:

    0x12.34p10

    has the decimal value 18640.0; why was that? It turns that this is:

    18.203125 x 2 ** 10

    The main part is interpreted as actual hex; the exponent is in decimal,
    and exponent scaling is in binary bits. So THREE diferent bases are
    involved!

    In my scheme (which as I said worked for bases from 2 to 16), the
    0x12.34p10 value would mean this:

    18.203125 x 16 ** 16 ~= 3.58e20

    The same base is used for all parts, including the exponent, and the
    digits that are to be shifted. After all in decimal, 12.34e10 would mean:

    12.34 x 10 ** 10 = 123400000000.0

    In my language, a base 5 version 5x12.34e10 (not valid below base 5)
    would mean this (I think, as I no longer have the compiler):

    7.76 x 5 ** 5 = 24250.0

    This has a consistency lacking in C's hex floats.


    I would imagine that reading base 32 number would take time to become accustomed.

    My big-number /decimal/ library uses base 1000000000. That value fits
    into one i32 'limb'. But it doesn't need a billion distinct symbols, it
    just uses base-10 decimal as input and output.

    As to why you can't use a similar idea for base-32, you'd have to look
    at what base-32 was used for. Certainly within source code, you could
    have a construct like:

    base32(31,31,31) equivalent to 31*32**2 + 31*32 + 32 = 32767

    A user running an application that promises base-32 arithmetic may
    expect something more exotic however.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Tue Jun 18 15:54:15 2024
    On 18/06/2024 02:19, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 17/06/2024 01:48, Keith Thompson wrote:
    [...]
    For binary,
    the compaction is irrelevant and indeed counter-productive - binary
    literals became a lot more practical with the introduction of digit
    separators. (For standard C, these are from C23, but for C++ they came
    in C++14, and compilers have supported them as extensions in C.)

    I forgot about digit separators.

    C23 adds the option to use apostrophes as separators in numeric
    constants: 123'456'789 or 0xdead'beef, for example. (This is
    borrowed from C++. Commas are more commonly used in real life,
    at least in my experience, but that wouldn't work given the other
    meanings of commas.)

    Commas would be entirely unsuitable here, since half the world uses
    decimal commas rather than decimal points. I think underscores are a
    nicer choice, used by many languages, but C++ could not use underscores
    due to their use in user-defined literals, and C followed C++.


    I briefly considered that, for consistency, we might want to
    use apostrophes rather than spaces in hex string constants:
    0x"de'ad'be'ef". But since digit separators are purely decorative,
    and spaces in my proposed hex string literals are semantically
    significant (they terminate a byte), I'll stick with spaces.

    I think you were using spaces as byte separators, whereas apostrophes
    should be completely ignored when parsing.


    You could even write 0x"0 0 0 0" to denote 4 zero bytes (where
    "0x0000" is 2 bytes) but 0x"00 00 00 00" or "0x00000000" is probably
    clearer.

    I think allowing both spaces and apostrophes would be too confusing.


    Fair enough.

    Octal
    string literals 0"012 345 670" *might* be worth considering.

    Most situations where octal could be useful died out many decades ago
    - it is vastly more likely that "012" is intended to mean 12 than 10.
    No serious programming language supports a leading 0 as an indication
    of octal unless they are forced to do so by backwards compatibility,
    and many that used to support them have dropped them.

    Having /some/ way to write octal can be helpful to old *nix
    programmers who prefer 046 to "S_IRUSR | S_IWUSR | S_IRGRP" in their
    chmod calls. (And to be fair, the constant names made in ancient
    history with short identifier length limits are pretty ugly.) But it
    is not something to be encouraged, and I think there is no simple
    syntax that is obviously octal, and not easily mistaken for something
    else.

    There is, the proposed "0o" prefix. It's already supported in both Perl
    and Python, and likely other languages.

    Some languages apparently use 0q, because 0o might be confusing in some
    fonts. I'm not sure I agree, and 0q is not very intuitive. I'd rate 0o
    as vastly better than 0, but I would not bother with supporting it in a
    new feature like this.


    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3193.htm>
    proposes a new "0o123" syntax for octal constants; if that's adopted,
    I propose allowing 0o"..." and *not" 0"...". I'm not sure whether
    to suggest hex only, or doing hex, octal, and binary for the sake
    of completeness.

    Binary support is useless, and octal support would be worse than
    useless - even using an 0o rather than 0 prefix. Completeness is not
    a justification for repeating old mistakes or complicating a good idea
    with features that will never be used.

    I like binary integer constants (0b11001001), but I suppose I
    agree that they're not useful for larger chunks of data.

    Perhaps I am so used to binary and hex that I convert without thinking,
    and thus rarely need binary.

    The one place I find binary useful is for bitmap fonts. I use these a
    lot less than I used to, but sometimes you need to make new characters
    for an old-style low resolution LCD screen, and then binary constants
    can be useful. Often, however, I prefer characters like . and @ rather
    than 0 and 1 as it makes the contrast much higher.

    I have no
    problem supporting only hex string literals, not binary or octal --
    but I'd have no problem with having all three if anyone thinks that
    would be sufficiently useful.


    Fair enough.

    What I'm trying to design here is a more straightforward way to
    represent raw (unsigned char[]) data in C code, largely but not
    exclusively for use by #embed.

    Personally, I'd see it as useful when /not/ using #embed. I really do
    not think programmers will care what format #embed uses. I don't
    share your concerns about efficiency of implementation, or that
    programmers need to know when it is efficient or not. In almost all
    circumstances, C programmers never see or need to think about a
    separation between a C preprocessor and a post-processed C compiler -
    they are seen as a single entity, and can use whatever format is
    convenient between them. And once you ignore the implementation
    details, which are an SEP, the way #embed is defined is better than a
    definition using these new hex blob strings.

    I think my main problem with the current #embed is that it's
    conceptually messy. I'm probably an outlier in how much I care about
    that.

    It's not clear whether the problems with the current definition of
    #embed are as serious as I suggest; you clearly think they aren't.

    I am still not convinced that there /are/ problems, never mind serious problems, nor that it it is "conceptually messy". (I'd care about that
    too, at least to some extent.) I don't think the feature will lead to
    any dramatic changes in the way I work, but it could sometimes be
    convenient and avoid the need of external scripts or programs in a build
    file.

    But
    even if the current #embed is ok, I think adding hex string literals and adding a language defined embed parameter that specifies using hex
    string literals rather than a list of integer constant expressions would
    be useful.

    Agreed.

    Among other things, it lets the programmer specify that a
    given #embed is only to be used to initialize an array of unsigned char.

    For example, given a 4-byte foo.dat containing bytes 1, 2, 3, and 4:
    const unsigned char buf[] = {
    #embed "foo.dat"
    };
    would expand to something like:
    const unsigned char buf[] = {
    1, 2, 3, 4
    };
    (and the same if buf is of type int[] or double[]), while this:
    const unsigned char buf[] =
    #embed "foo.dat" hex(true) // proposed new parameter
    ;
    would expand to something like:
    const unsigned char buf[] =
    0x"01020304"
    ;
    (and would result in an error if buf is of type int[] or double[]).

    [...]


    I don't see the benefit here. This is C - the programmer is expected to
    get the type right, and I think it would be rare to get it wrong (or
    worse wrong than forgetting "unsigned") in a case like this. So the
    extra type checking here has little or no benefit. (In general, I am a
    fan of stronger type checking, but it is only important if it catches
    real errors.)

    The end result is completely identical to the user - adding "hex(true)"
    makes no difference to the generated code. Thus it is just an
    implementation detail which the user should not have to deal with.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Kaz Kylheku on Tue Jun 18 13:56:26 2024
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2024-06-17, bart <bc@freeuk.com> wrote:
    AFAIK nobody uses octal anymore.

    Unix shell and C programmers fairly often use octal unintentionally,

    Perhaps you do. Don't speak for others, please.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Kettlewell@21:1/5 to Keith Thompson on Tue Jun 18 16:14:38 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Richard Kettlewell <invalid@invalid.invalid> writes:
    There’s more to life than byte arrays, though, so I wonder if there’s
    more to be said here. I find myself dealing a lot with large integers
    generally represented as arrays of some unsigned type (commonly uint32_t
    but other possibilities arise too).

    In C as it stands today this requires a translation step before
    constants can be embedded in source code (which is error-prone if
    someone attempts to do it manually).

    So being able to say ‘0x8732456872648956348596893765836543 as array of
    uint64_t, LSW first’ (in some suitably C-like syntax) would be a big
    improvement from my perspective, primarily as an accelerator to
    development but also as a small improvement in robustness.

    You could use some kind of type punning. For example, this is currently legal:

    union {
    unsigned char buf[4];
    uint32_t n;
    } obj = {
    .buf = { 0x01, 0x02, 0x03, 0x04 }
    };

    I can’t use type punning if the data type is already set, which it often
    is.

    The { 0x01, 0x02, 0x03, 0x04 } could be replaced with 0x"01020304".

    Of course you have to deal with endianness.

    That’s a fatal problem for my use case.

    --
    https://www.greenend.org.uk/rjk/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Scott Lurndal on Tue Jun 18 17:21:00 2024
    On 18/06/2024 15:56, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2024-06-17, bart <bc@freeuk.com> wrote:
    AFAIK nobody uses octal anymore.

    Unix shell and C programmers fairly often use octal unintentionally,

    Perhaps you do. Don't speak for others, please.


    Others do it too. Perhaps not "fairly often", but it certainly happens.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Tue Jun 18 17:20:12 2024
    On 18/06/2024 11:39, Michael S wrote:
    On Mon, 17 Jun 2024 22:39:00 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    bart <bc@freeuk.com> writes:

    AFAIK nobody uses octal anymore.

    There are circumstances where being able to write constants
    in octal is useful. It also would be nice to be able to
    write constants in base 4 and base 32 (because 5 is half
    of 10). I don't have occasion to prefer octal very often
    but I'm glad it's there for those times when I do.

    Ada/VHDL permits any base from 2 to 16. They didn't go as far up as
    32.
    I would imagine that reading base 32 number would take time to become accustomed.

    I can't imagine any possible use for such bases. Base 16 is very
    common, and base 2 is useful in some circumstances. Base 8 has, to my knowledge, a single non-archaic use-case and that is for chmod modes in
    *nix programming.

    I think support for other bases exists in some languages as a
    side-effect of wanting an explicit numbered base notation for bases 2,
    16 and possibly 8, rather than using 0b, 0x and 0o (or 0q, or just 0 as
    an April fool's joke).

    If I were the BDFL of C, I'd remove octal constants and add a macro
    "_Octal" with definition:

    #define _Octal(n) (((n) % 10) + ((n) / 10 % 10) * 8 \
    + ((n) / 100 % 10) * 64 + ((n) / 1000 % 10) * 512)


    If anyone can present a good use for base 4 or base 32, I might change
    my mind :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Tue Jun 18 11:04:09 2024
    Michael S <already5chosen@yahoo.com> writes:

    On Mon, 17 Jun 2024 22:39:00 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    bart <bc@freeuk.com> writes:

    AFAIK nobody uses octal anymore.

    There are circumstances where being able to write constants
    in octal is useful. It also would be nice to be able to
    write constants in base 4 and base 32 (because 5 is half
    of 10). I don't have occasion to prefer octal very often
    but I'm glad it's there for those times when I do.

    Ada/VHDL permits any base from 2 to 16. They didn't go as far up as
    32.
    I would imagine that reading base 32 number would take time to become accustomed.
    Besides, using I and O as digits is problematic because of visual
    similarity to 1 an 0. Using l is problematic both because of visual similarity to 1 and because of clash with existing use as suffix.

    It would be nice (in some circumstances) to be able to write
    constants in base 32. That doesn't mean I'm proposing that
    such constants be written using the common 10-digits-22-letters
    form of representation. Realistically I think it's unlikely
    that the C standard will ever add a base-32 form for integer
    constants, and even if it did I wouldn't want to wait that
    long before it could be used reliably. So all I'm saying is
    that base-32 constants are sometimes useful, even if they
    aren't incorporated into standard C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to bart on Tue Jun 18 11:12:21 2024
    bart <bc@freeuk.com> writes:

    [...] I noticed this strange discrepancy between
    between my hex floats and C's. In C, this value:

    0x12.34p10

    has the decimal value 18640.0; why was that? It turns that this is:

    18.203125 x 2 ** 10

    The main part is interpreted as actual hex; the exponent is in
    decimal, and exponent scaling is in binary bits. So THREE diferent
    bases are involved!

    In my scheme (which as I said worked for bases from 2 to 16), the
    0x12.34p10 value would mean this:

    18.203125 x 16 ** 16 ~= 3.58e20

    The same base is used for all parts, including the exponent, and the
    digits that are to be shifted. After all in decimal, 12.34e10 would
    mean:

    12.34 x 10 ** 10 = 123400000000.0

    In my language, a base 5 version 5x12.34e10 (not valid below base 5)
    would mean this (I think, as I no longer have the compiler):

    7.76 x 5 ** 5 = 24250.0

    This has a consistency lacking in C's hex floats.

    The C hex float format has benefits that your format does not.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to David Brown on Tue Jun 18 19:25:38 2024
    On 18/06/2024 16:21, David Brown wrote:
    On 18/06/2024 15:56, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2024-06-17, bart <bc@freeuk.com> wrote:
    AFAIK nobody uses octal anymore.

    Unix shell and C programmers fairly often use octal unintentionally,

    Perhaps you do.   Don't speak for others, please.


    Others do it too.  Perhaps not "fairly often", but it certainly happens.


    This has bit me ...

    $ ping -c 1 192.168.1.17
    PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
    64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.097 ms

    --- 192.168.1.17 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.097/0.097/0.097/0.000 ms

    $ ping -c 1 192.168.001.017
    PING 192.168.001.017 (192.168.1.15) 56(84) bytes of data.
    From 192.168.1.17 icmp_seq=1 Destination Host Unreachable

    --- 192.168.001.017 ping statistics ---
    1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to Richard Harnden on Tue Jun 18 19:38:11 2024
    On 18/06/2024 19:25, Richard Harnden wrote:
    On 18/06/2024 16:21, David Brown wrote:
    On 18/06/2024 15:56, Scott Lurndal wrote:
    Kaz Kylheku <643-408-1753@kylheku.com> writes:
    On 2024-06-17, bart <bc@freeuk.com> wrote:
    AFAIK nobody uses octal anymore.

    Unix shell and C programmers fairly often use octal unintentionally,

    Perhaps you do.   Don't speak for others, please.


    Others do it too.  Perhaps not "fairly often", but it certainly happens.


    This has bit me ...

    $ ping -c 1 192.168.1.17
    PING 192.168.1.17 (192.168.1.17) 56(84) bytes of data.
    64 bytes from 192.168.1.17: icmp_seq=1 ttl=64 time=0.097 ms

    --- 192.168.1.17 ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 0.097/0.097/0.097/0.000 ms

    $ ping -c 1 192.168.001.017
    PING 192.168.001.017 (192.168.1.15) 56(84) bytes of data.
    From 192.168.1.17 icmp_seq=1 Destination Host Unreachable

    --- 192.168.001.017 ping statistics ---
    1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms



    Which is obviously more fun when the .15 also exists.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Wed Jun 19 07:25:12 2024
    On Tue, 18 Jun 2024 15:54:15 +0200, David Brown wrote:

    ... C++ could not use underscores
    due to their use in user-defined literals, and C followed C++.

    C can still offer the option for them, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Wed Jun 19 09:37:20 2024
    On 19/06/2024 00:00, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 18/06/2024 02:19, Keith Thompson wrote:
    [...]
    I forgot about digit separators.
    C23 adds the option to use apostrophes as separators in numeric
    constants: 123'456'789 or 0xdead'beef, for example. (This is
    borrowed from C++. Commas are more commonly used in real life,
    at least in my experience, but that wouldn't work given the other
    meanings of commas.)

    Commas would be entirely unsuitable here, since half the world uses
    decimal commas rather than decimal points. I think underscores are a
    nicer choice, used by many languages, but C++ could not use
    underscores due to their use in user-defined literals, and C followed
    C++.

    C already uses '.' as the decimal point, though half the world uses ','.

    Sure - a programming language has to pick /one/ such option. And since
    C was written by Americans, they got to choose. (I don't mind - we use
    a decimal point in the UK too.) However, if someone from one of several European countries sees "1,234" they will read it as 1.234, not 1234.
    This means using a comma as a separator would be a bad idea.

    Of course the current multiple uses of commas in C grammar mean that
    commas as digit separators are out of the question anyway.

    That's already US-centric. ',' is unusable as a digit separator because `123,456` already has any of several meanings, depending on the context.
    If it weren't for that issue, I think that using ',' as a digit separator would be no more problematic than using '.' as a decimal point. And C++
    and C23 already use the apostrophe as a digit separator, which is likely
    to be jarring to anyone outside Switzerland.


    Yes, but the apostrophe is equally jarring to almost everyone, and
    directly confusing to almost no one.

    It might have been better if C++ had used ' for user-defined literals (reminiscent of Ada attributes), and then left underscore for a digit separator, but that's all history now.

    In any case, as discussed, I'm not proposing an ignorable digit
    separator for hex string literals.


    OK. Digit separators are useful for numbers you read and write
    manually, while these hex string literals are more likely to come from generated sources (such as copy-and-paste from a hexdump output).

    [...]

    I don't see the benefit here. This is C - the programmer is expected
    to get the type right, and I think it would be rare to get it wrong
    (or worse wrong than forgetting "unsigned") in a case like this. So
    the extra type checking here has little or no benefit. (In general, I
    am a fan of stronger type checking, but it is only important if it
    catches real errors.)

    The end result is completely identical to the user - adding
    "hex(true)" makes no difference to the generated code. Thus it is
    just an implementation detail which the user should not have to deal
    with.

    The point isn't to change the generated code. The point is to let programmers say more directly what they mean: "Treat the contents
    of this file as an array of unsigned char", rather than the existing
    "Treat the contents of this file as a sequence of comma-separated
    integer constant expressions (which, by the way, I'm going to use
    in an initializer for an array of unsigned char)".

    (I don't think either of us is going to change our minds on the
    esthetics. And yes, that sentence isn't entirely grammatically
    correct.)


    Fair enough.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Wed Jun 19 10:49:24 2024
    On 19/06/2024 09:25, Lawrence D'Oliveiro wrote:
    On Tue, 18 Jun 2024 15:54:15 +0200, David Brown wrote:

    ... C++ could not use underscores
    due to their use in user-defined literals, and C followed C++.

    C can still offer the option for them, though.

    Sometimes it makes sense for C to do the same thing in a different way
    from C++ - but it is rare, and needs very strong justification.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Kaz Kylheku on Wed Jun 19 13:44:24 2024
    On Wed, 19 Jun 2024 10:17:45 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:


    Computing is Amerian, basically. Almost any big co. you can think of,
    current or historic, was or is American. IBM, DEC, Sun Microsystems, Microsoft, Google, Apple, ...


    Arguably, today's most influential CPU company is British, even if two
    of the 3 founders were American and current owner is from Japan.
    Today's most important silicon manufacturer, the one that keeps the
    progress crawling forward instead of standing still, is Taiwanese.
    And the company which made most of research that allowed to this
    manufacturer to make progress is based in Netherlands.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to David Brown on Wed Jun 19 10:17:45 2024
    On 2024-06-19, David Brown <david.brown@hesbynett.no> wrote:
    On 19/06/2024 00:00, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 18/06/2024 02:19, Keith Thompson wrote:
    [...]
    I forgot about digit separators.
    C23 adds the option to use apostrophes as separators in numeric
    constants: 123'456'789 or 0xdead'beef, for example. (This is
    borrowed from C++. Commas are more commonly used in real life,
    at least in my experience, but that wouldn't work given the other
    meanings of commas.)

    Commas would be entirely unsuitable here, since half the world uses
    decimal commas rather than decimal points. I think underscores are a
    nicer choice, used by many languages, but C++ could not use
    underscores due to their use in user-defined literals, and C followed
    C++.

    C already uses '.' as the decimal point, though half the world uses ','.

    Sure - a programming language has to pick /one/ such option. And since
    C was written by Americans, they got to choose. (I don't mind - we use
    a decimal point in the UK too.) However, if someone from one of several European countries sees "1,234" they will read it as 1.234, not 1234.
    This means using a comma as a separator would be a bad idea.

    That is what I chose to integrate in TXR Lisp:

    (+ 1,234.56 2,000)
    3234.56

    That format lets me copy and paste figures from the outside world (e.g. financial or scientific reports or applications) and have them be
    understood.

    As you can see, commas are not used as separators between items; simple whitespace is. That helps to make this possible.

    The pic macro I developed also only outputs commas and decimal points.

    (pic "#,###,###" 1)
    " 1"
    (pic "0,###,###" 1)
    "0,000,001"
    (pic "0,###,###.##" 1)
    "0,000,001.00"
    (pic "0,###_###.##" 1)
    ** expr-1:1: pic: insufficient arguments for format
    (pic "0,###_###.##" 1 2)
    "0,001_ 2.00"

    I don't care about conventions outside of North America.

    Computing is Amerian, basically. Almost any big co. you can think of,
    current or historic, was or is American. IBM, DEC, Sun Microsystems,
    Microsoft, Google, Apple, ...

    Everyone doing any programming in any mainstream programming language
    for which there are significant job postings has to begrudingly accept
    the period as the fraction separator, and English keywords and function
    names.

    (I suspect that one day the world will convert to a single standard for
    this, and it might not be far off.)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bart@21:1/5 to Michael S on Wed Jun 19 11:57:59 2024
    On 19/06/2024 11:44, Michael S wrote:
    On Wed, 19 Jun 2024 10:17:45 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:


    Computing is Amerian, basically. Almost any big co. you can think of,
    current or historic, was or is American. IBM, DEC, Sun Microsystems,
    Microsoft, Google, Apple, ...


    Arguably, today's most influential CPU company is British, even if two
    of the 3 founders were American and current owner is from Japan.
    Today's most important silicon manufacturer, the one that keeps the
    progress crawling forward instead of standing still, is Taiwanese.
    And the company which made most of research that allowed to this
    manufacturer to make progress is based in Netherlands.



    And the language which seems dominant in North America, and within
    programming languages, and the basis for ASCII, is English.

    Which came from England, I guess!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Wed Jun 19 13:46:30 2024
    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 19 Jun 2024 10:17:45 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:


    Computing is Amerian, basically. Almost any big co. you can think of,
    current or historic, was or is American. IBM, DEC, Sun Microsystems,
    Microsoft, Google, Apple, ...


    Arguably, today's most influential CPU company is British,

    By what criteria? Yes, they ship a lot of CPUs, but architecturally,
    what have they influenced?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Wed Jun 19 18:02:56 2024
    On Wed, 19 Jun 2024 13:46:30 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    Michael S <already5chosen@yahoo.com> writes:
    On Wed, 19 Jun 2024 10:17:45 -0000 (UTC)
    Kaz Kylheku <643-408-1753@kylheku.com> wrote:


    Computing is Amerian, basically. Almost any big co. you can think
    of, current or historic, was or is American. IBM, DEC, Sun
    Microsystems, Microsoft, Google, Apple, ...


    Arguably, today's most influential CPU company is British,

    By what criteria? Yes, they ship a lot of CPUs, but architecturally,
    what have they influenced?



    They convinced everybody except UC Berkeley that development of new general-purpose instruction sets is futile.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Michael S on Thu Jun 20 06:51:49 2024
    On Tue, 18 Jun 2024 12:39:40 +0300, Michael S wrote:

    I would imagine that reading base 32 number would take time to become accustomed.

    Base 30 would be a useful base, because it includes 2, 3 and 5 as prime factors. This means fractions incorporating those factors as denominators
    can be represented exactly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Fri Jun 21 07:13:00 2024
    On Wed, 19 Jun 2024 10:49:24 +0200, David Brown wrote:

    On 19/06/2024 09:25, Lawrence D'Oliveiro wrote:

    On Tue, 18 Jun 2024 15:54:15 +0200, David Brown wrote:

    ... C++ could not use underscores due to their use in user-defined
    literals, and C followed C++.

    C can still offer the option for them, though.

    Sometimes it makes sense for C to do the same thing in a different way
    from C++ - but it is rare, and needs very strong justification.

    The fact that it is something of a de-facto standard among other popular languages would count.

    Is C doomed to remain forever a strict subset of C++?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Fri Jun 21 13:06:14 2024
    On 21/06/2024 09:13, Lawrence D'Oliveiro wrote:
    On Wed, 19 Jun 2024 10:49:24 +0200, David Brown wrote:

    On 19/06/2024 09:25, Lawrence D'Oliveiro wrote:

    On Tue, 18 Jun 2024 15:54:15 +0200, David Brown wrote:

    ... C++ could not use underscores due to their use in user-defined
    literals, and C followed C++.

    C can still offer the option for them, though.

    Sometimes it makes sense for C to do the same thing in a different way
    from C++ - but it is rare, and needs very strong justification.

    The fact that it is something of a de-facto standard among other popular languages would count.


    The apostrophe was already the standard - not just a "de-facto standard"
    - in the language that is most relevant for cooperation with C.

    Is C doomed to remain forever a strict subset of C++?

    C is not a subset of C++. Their intersection covers most of C, but not
    all of it.

    But C and C++ are often used together and compiled together in the same binaries. A large proportion of C and C++ programmers work with both languages, while almost none of them have any use for, say, Ada with its underscore digit separator.

    It makes sense when introducing new features to either language to be compatible with the other (if the feature is relevant to both
    languages). C thus copies from C++, and C++ copies from C. Sometimes
    there must be differences, but gratuitous differences are bad for
    everyone, even if they might seem a little nicer in one language in
    isolation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Lawrence D'Oliveiro on Fri Jun 21 10:15:51 2024
    On 6/21/24 03:13, Lawrence D'Oliveiro wrote:
    On Wed, 19 Jun 2024 10:49:24 +0200, David Brown wrote:
    ...
    Sometimes it makes sense for C to do the same thing in a different way
    from C++ - but it is rare, and needs very strong justification.

    The fact that it is something of a de-facto standard among other popular languages would count.

    Is C doomed to remain forever a strict subset of C++?

    C is not now, nor has it ever been, a strict subset of C++, so it seems unlikely that it is doomed to become one. C++ was initially intended to
    be an extension to C, and a desire to maintain backwards compatibility
    with C played a role in many of the design decisions for C++. Nowadays,
    the C committee and C++ committee have agreed to a policy of avoiding incompatibilities with each other. That doesn't mean that there should
    be no incompatibilities, only that they need to be strongly motivated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Harnden on Fri Jun 21 22:49:59 2024
    On Tue, 18 Jun 2024 19:25:38 +0100, Richard Harnden wrote:

    This has bit me ...

    $ ping -c 1 192.168.001.017 ...

    Haha, the ghost of the PDP-11 will never be exorcised ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to David Brown on Fri Jun 21 22:48:07 2024
    On Fri, 21 Jun 2024 13:06:14 +0200, David Brown wrote:

    - in the language that is most relevant for cooperation with C.

    Not sure why C++ is relevant to C at all, since C++ does pretty much
    everything that C does (if a bit differently) and more, which renders C essentially obsolete in that scenario.

    Where the usage of C would be more relevant is as an implementation
    language for CPU-intensive “engine” code meant to be callable from higher- level languages. For example, extension modules for Python. Being able to interoperate with such languages would be more of a benefit.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Lawrence D'Oliveiro on Sat Jun 22 13:40:01 2024
    On 22/06/2024 00:48, Lawrence D'Oliveiro wrote:
    On Fri, 21 Jun 2024 13:06:14 +0200, David Brown wrote:

    - in the language that is most relevant for cooperation with C.

    Not sure why C++ is relevant to C at all, since C++ does pretty much everything that C does (if a bit differently) and more, which renders C essentially obsolete in that scenario.


    Have you actually worked with any C++ programs? Have you ever included
    a C header in your C++ code?

    Fortunately, the C and C++ standards committees both know more about the importance of consistency than you do.

    Where the usage of C would be more relevant is as an implementation
    language for CPU-intensive “engine” code meant to be callable from higher-
    level languages. For example, extension modules for Python. Being able to interoperate with such languages would be more of a benefit.

    I wonder why nobody ever thought of that before you suggested it!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)