I like using this for long strings:
fputs
(
"When an uncleft or a bulkbit wins one or more bernstonebits above\n"
"its own, it takes on a backward lading. When it loses one or\n"
"more, it takes on a forward lading. Such a mote is called a\n"
"*farer*, for that the drag between unlike ladings flits it. When\n"
"bernstonebits flit by themselves, it may be as a bolt of\n"
"lightning, a spark off some faststanding chunk, or the everyday\n"
"flow of bernstoneness through wires.\n",
stdout
);
Of languages that derive ideas from C, only C++ and Python seem to have
kept this. Java, JavaScript and PHP have not, for some reason.
In Java you have at least the string concatenation operator + which is,
IMO, pretty good for that line structuring across source lines.
Of languages that derive ideas from C, only C++ and Python seem to have
kept this. Java, JavaScript and PHP have not, for some reason.
On Sun, 25 Feb 2024 17:38:38 +0100, Janis Papanagnou wrote:
In Java you have at least the string concatenation operator + which is,
IMO, pretty good for that line structuring across source lines.
Implicit concatenation works well in Python because you also have the
“%” operator overloaded to perform printf-style formatting with a
string. If you had to use “+” then, because that binds less tightly
than “%”,
you would have to have parentheses as well, which are
unnecessary with implicit concatenation. E.g.
# depreciation entries
sql.cursor.execute \
(
"insert into payments set when_made = %(when_made)s,"
" description = %(description)s, other_party_name = \"\","
" amount = %(amount)d, kind = \"D\", tax_year = %(tax_year)d"
%
{
"when_made" : end_for_tax_year(tax_year) - 1,
"description" :
sql_string
(
"%s: %s $%s at %d%% from %s"
%
(
entry["description"],
entry["method"],
format_amount(entry["initial_value"]),
entry["rate"],
format_date(entry["when_purchased"]),
)
),
"amount" : - entry["amount"],
"tax_year" : tax_year,
}
)
Or, for added fun, how about parameterizing a format:
num_format = "%%.%dg" % nr_digits
...
for axis in range(3) :
out.write \
(
" (%s, %s),\n"
%
(num_format, num_format)
%
(
min(v.co[axis] for v in the_mesh.vertices),
max(v.co[axis] for v in the_mesh.vertices)
)
)
#end for
Java (Text Blocks):
String s = """
multi line string""";
JavaScript (Template Literal):
let s = `multi line string`;
Still more convenient than C.
PHP? Don't care about PHP, it's shit, not even checking, most likely
some kind of a Perl-ish <<<EOF expression.
I like using this for long strings:
fputs
(
"When an uncleft or a bulkbit wins one or more bernstonebits above\n"
"its own, it takes on a backward lading. When it loses one or\n"
"more, it takes on a forward lading. Such a mote is called a\n"
"*farer*, for that the drag between unlike ladings flits it. When\n"
"bernstonebits flit by themselves, it may be as a bolt of\n"
"lightning, a spark off some faststanding chunk, or the everyday\n"
"flow of bernstoneness through wires.\n",
stdout
);
Of languages that derive ideas from C, only C++ and Python seem to have
kept this. Java, JavaScript and PHP have not, for some reason.
I like using this for long strings:
fputs
(
"When an uncleft or a bulkbit wins one or more bernstonebits above\n"
"its own, it takes on a backward lading. When it loses one or\n"
"more, it takes on a forward lading. Such a mote is called a\n"
"*farer*, for that the drag between unlike ladings flits it. When\n"
"bernstonebits flit by themselves, it may be as a bolt of\n"
"lightning, a spark off some faststanding chunk, or the everyday\n"
"flow of bernstoneness through wires.\n",
stdout
);
Of languages that derive ideas from C, only C++ and Python seem to have
kept this. Java, JavaScript and PHP have not, for some reason.
Easy solution Lawrence. Why not use something like bin2c:
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
I like using this for long strings:
fputs
(
"When an uncleft or a bulkbit wins one or more bernstonebits above\n"
"its own, it takes on a backward lading. When it loses one or\n"
"more, it takes on a forward lading. Such a mote is called a\n"
"*farer*, for that the drag between unlike ladings flits it. When\n"
"bernstonebits flit by themselves, it may be as a bolt of\n"
"lightning, a spark off some faststanding chunk, or the everyday\n" >> "flow of bernstoneness through wires.\n",
stdout
);
Of languages that derive ideas from C, only C++ and Python seem to have
kept this. Java, JavaScript and PHP have not, for some reason.
Easy solution Lawrence. Why not use something like bin2c:
<https://www.segger.com/free-utilities/bin2c/>
The <<EOD construct that Perl has comes from POSIX shells, and it is very useful in both places. Bash also adds a <<<-construct.
Question: How would you do two separate <<-strings in the same shell
command?
Because it generates files that have Segger copyright notices stamped on them? At least, that's how it appears from that web page.
There are lots of open source alternatives that do similar things, with different variations in the way they generate the output. Or you can
write your own in about 10 lines of Python, which of course makes it a
lot easier to customise to fit your own styles and requirements.
And with C23, we will get #embed, though it is not yet supported by
major tools.
<https://en.cppreference.com/w/c/preprocessor/embed>
My tool for easy editing of such embedded text is the Emacs macros in multiquote.el, here <https://gitlab.com/ldo/emacs-prefs>.
On 26/02/2024 23:03, Mike Sanders wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
I like using this for long strings:
fputs
(
"When an uncleft or a bulkbit wins one or more bernstonebits
above\n"
"its own, it takes on a backward lading. When it loses one >>> or\n"
"more, it takes on a forward lading. Such a mote is called a\n"
"*farer*, for that the drag between unlike ladings flits it.
When\n"
"bernstonebits flit by themselves, it may be as a bolt of\n"
"lightning, a spark off some faststanding chunk, or the >>> everyday\n"
"flow of bernstoneness through wires.\n",
stdout
);
Of languages that derive ideas from C, only C++ and Python seem to have
kept this. Java, JavaScript and PHP have not, for some reason.
Easy solution Lawrence. Why not use something like bin2c:
<https://www.segger.com/free-utilities/bin2c/>
Because it generates files that have Segger copyright notices stamped on them? At least, that's how it appears from that web page.
There are lots of open source alternatives that do similar things, with different variations in the way they generate the output. Or you can
write your own in about 10 lines of Python, which of course makes it a
lot easier to customise to fit your own styles and requirements.
And with C23, we will get #embed, though it is not yet supported by
major tools.
<https://en.cppreference.com/w/c/preprocessor/embed>
And with C23, we will get #embed, though it is not yet supported by
major tools.
On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
And with C23, we will get #embed, though it is not yet supported by
major tools.
More and more hacks on the preprocessor. Why not just get rid of it and replace it with something like m4?
Because then you will discover that string-based macros are inherently an unmanageable problem.
On 27/02/2024 08:36, David Brown wrote:
On 26/02/2024 23:03, Mike Sanders wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
I like using this for long strings:
fputs
(
"When an uncleft or a bulkbit wins one or more
bernstonebits above\n"
"its own, it takes on a backward lading. When it loses one
or\n"
"more, it takes on a forward lading. Such a mote is called
a\n"
"*farer*, for that the drag between unlike ladings flits >>>> it. When\n"
"bernstonebits flit by themselves, it may be as a bolt of\n"
"lightning, a spark off some faststanding chunk, or the >>>> everyday\n"
"flow of bernstoneness through wires.\n",
stdout
);
Of languages that derive ideas from C, only C++ and Python seem to have >>>> kept this. Java, JavaScript and PHP have not, for some reason.
Easy solution Lawrence. Why not use something like bin2c:
<https://www.segger.com/free-utilities/bin2c/>
Because it generates files that have Segger copyright notices stamped
on them? At least, that's how it appears from that web page.
There are lots of open source alternatives that do similar things,
with different variations in the way they generate the output. Or you
can write your own in about 10 lines of Python, which of course makes
it a lot easier to customise to fit your own styles and requirements.
And with C23, we will get #embed, though it is not yet supported by
major tools.
<https://en.cppreference.com/w/c/preprocessor/embed>
Actually I've had such feature, for text files, for some years in my
older compiler:
#include <stdio.h>
int main(void) {
puts(strinclude(__FILE__));
}
This prints out the contents of this sourcefile. Binary files don't work because of embedded zeros, but could have been made to.
Some stuff is just very easy to do; other stuff like designator chains
less easy and also less useful.
On 26.02.2024 21:31, Lawrence D'Oliveiro wrote:
Question: How would you do two separate <<-strings in the same shell
command?
Can you give an example what you intend here? (With what semantics?)
Since '<<' is redirecting the here-document text to stdin of the command
you can have only one channel.
(m4? Seriously?)
The C preprocessor operates on preprocessor tokens, not just strings.
The #embed pre-processor directive turns the file into a list of integer constants, one per byte (unless an implementation offers other options).
On Tue, 27 Feb 2024 13:18:20 +0100, Janis Papanagnou wrote:
On 26.02.2024 21:31, Lawrence D'Oliveiro wrote:
Question: How would you do two separate <<-strings in the same shell
command?
Can you give an example what you intend here? (With what semantics?)
Since '<<' is redirecting the here-document text to stdin of the command
you can have only one channel.
Perl lets you do something like
func(<<EOD1, <<EOD2);
... contents of first string ...
EOD1
... contents of second string ...
EOD2
But this doesn’t work in Bash. However, in a Posix shell, remember you can specify the number of the file descriptor you want to redirect, e.g.
diff -u /dev/fd/8 /dev/fd/9 8<<'EOD1' 9<<'EOD2'
... contents of first string ...
EOD1
... contents of second string ...
EOD2
Note I add the single quotes to prevent expansion of “$”-sequences within the strings. (I think this might be needed in Perl, too.)
On Tue, 27 Feb 2024 23:21:28 +0100, David Brown wrote:
The #embed pre-processor directive turns the file into a list of integer
constants, one per byte (unless an implementation offers other options).
What a waste of time.
On 27/02/2024 20:25, Lawrence D'Oliveiro wrote:
On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
And with C23, we will get #embed, though it is not yet supported by
major tools.
More and more hacks on the preprocessor. Why not just get rid of it and
replace it with something like m4?
Because then you will discover that string-based macros are inherently an
unmanageable problem.
I hadn't notice that #embed was a preprocessor directive. But that is
not the problem here, it is this:
"The expansion of a #embed directive is a token sequence formed from the
list of integer constant expressions described below."
If a string like "ABC" really is converted to the five tokens 'A' comma
'B' comma 'C', then it's going to make long strings and binary files inefficient.
Embedding a 100KB file will result in a 100KB bigger executable, but
along the way it may have to generate 200,000 tokens within the
compiler, half of them commas. Which in turn will need to be turned into 100,000 integer expressions.
I would hope that implementations find some way of streamlining that
process, perhaps by turning that 100KB of data directly into a 100KB
string.
On 2024-02-27, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Tue, 27 Feb 2024 23:21:28 +0100, David Brown wrote:
The #embed pre-processor directive turns the file into a list of integer >>> constants, one per byte (unless an implementation offers other options).
What a waste of time.
Plus easily doable in 1970's Lisp.
On 27/02/2024 23:12, bart wrote:
On 27/02/2024 20:25, Lawrence D'Oliveiro wrote:
On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
And with C23, we will get #embed, though it is not yet supported by
major tools.
More and more hacks on the preprocessor. Why not just get rid of it and
replace it with something like m4?
Because then you will discover that string-based macros are
inherently an
unmanageable problem.
I hadn't notice that #embed was a preprocessor directive. But that is
not the problem here, it is this:
"The expansion of a #embed directive is a token sequence formed from
the list of integer constant expressions described below."
If a string like "ABC" really is converted to the five tokens 'A'
comma 'B' comma 'C', then it's going to make long strings and binary
files inefficient.
Embedding a 100KB file will result in a 100KB bigger executable, but
along the way it may have to generate 200,000 tokens within the
compiler, half of them commas. Which in turn will need to be turned
into 100,000 integer expressions.
I would hope that implementations find some way of streamlining that
process, perhaps by turning that 100KB of data directly into a 100KB
string.
They won't use strings, they will use data blobs - binary data. Then
there is no issue with null bytes.
And yes, implementations will skip
the token generation (unless you are doing something weird, such as
using #embed to read the parameters to a function call).
On 28/02/2024 11:54, David Brown wrote:
On 27/02/2024 23:12, bart wrote:
On 27/02/2024 20:25, Lawrence D'Oliveiro wrote:
On Tue, 27 Feb 2024 09:36:38 +0100, David Brown wrote:
And with C23, we will get #embed, though it is not yet supported by
major tools.
More and more hacks on the preprocessor. Why not just get rid of it and >>>> replace it with something like m4?
Because then you will discover that string-based macros are
inherently an
unmanageable problem.
I hadn't notice that #embed was a preprocessor directive. But that is
not the problem here, it is this:
"The expansion of a #embed directive is a token sequence formed from
the list of integer constant expressions described below."
If a string like "ABC" really is converted to the five tokens 'A'
comma 'B' comma 'C', then it's going to make long strings and binary
files inefficient.
Embedding a 100KB file will result in a 100KB bigger executable, but
along the way it may have to generate 200,000 tokens within the
compiler, half of them commas. Which in turn will need to be turned
into 100,000 integer expressions.
I would hope that implementations find some way of streamlining that
process, perhaps by turning that 100KB of data directly into a 100KB
string.
They won't use strings, they will use data blobs - binary data. Then
there is no issue with null bytes.
AFAIK strings in C can have embedded zeros when not assumed to be zero-terminated. So here:
char s[]={1,2,3,0,4,5,6};
s will have a length of 7.
And yes, implementations will skip the token generation (unless you
are doing something weird, such as using #embed to read the parameters
to a function call).
What happens if you do -E to preprocess only?
... people write utilities for them in a variety of languages ...
But it will often be more convenient to have it built into the language
and compiler.
On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:
... people write utilities for them in a variety of languages ...
But it will often be more convenient to have it built into the language
and compiler.
What can be built into the language can only ever be a small subset of
the many and varied ways that people have incorporated data blobs into
their programs. Often these will need to have custom structures with
computed header fields, that kind of thing. So you will need custom
build tools to construct these structures, and then you might as well
include those blobs directly into the final build, rather than go
through some extra step of pretending to turn them back into some
source form.
For example, here’s an old Android project of mine (OK, so the app is
Java code, but the same principle applies) <https://bitbucket.org/ldo17/unicode_browser_android/src/master/>
where I wrote a custom Python script to read a Nameslist.txt file
downloaded from unicode.org to generate a table which could be loaded
into memory quickly for easy searching.
David Brown <david.brown@hesbynett.no> writes:
[...]
They won't use strings, they will use data blobs - binary data. Then
there is no issue with null bytes. And yes, implementations will skip
the token generation (unless you are doing something weird, such as
using #embed to read the parameters to a function call).
Tests with prototype implementations gave extremely fast results.
I'm not sure how that would work. #embed is a preprocessor directive,
and at least in the abstract model it has to expand to valid C code.
I would have expected that it would simply generate the list of comma-separated integer constants described in the standard; later
phases would simply parse that list and generate code as if that
sequence had been written in the original source file. Do you know of
an implementation that does something else?
For example, say you have a file "foo.dat" containing 4 bytes with
values 0, 1, 2, and 3. This would be perfectly valid:
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
#embed isn't defined to translate an input file to a sequence of bytes.
It's defined to translate an input file to a sequence of integer
constant expressions.
In C:
void Add(int CategoryCode, ItemType Item) {
CodeToIndex_put(CategoryCode, getCount());
add(Item);
}
4 non-comment lines versus 9. I know Java needs tons of boilerplate, but
but it is not all the language's fault.
bart <bc@freeuk.com> writes:
It would be unfortunate if your example was allowed. Clearly a binary
representation of an instance of your struct would probably require 16
bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure.
So what?
If you have suggestions for alternate ways to define #embed, they might
be interesting, but it's too late to change the existing specification.
On Wed, 28 Feb 2024 21:34:14 +0000, bart wrote:
In C:
void Add(int CategoryCode, ItemType Item) {
CodeToIndex_put(CategoryCode, getCount());
add(Item);
}
4 non-comment lines versus 9. I know Java needs tons of boilerplate, but
but it is not all the language's fault.
Or how about
void Add(int CategoryCode, ItemType Item) {CodeToIndex_put(CategoryCode, getCount());add(Item);}
Wow! I never realized you could do that in C!! I thought it was an
error to put stuff after column 72 or something. Thanks for the tip!!!
Or you can use common sense and avoiding writing code which is either
too compact or so spread out vertically that you have to hunt for the
actual code. Like trying to find the bits of meat in a thin soup.
On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:
... people write utilities for them in a variety of languages ...
But it will often be more convenient to have it built into the language
and compiler.
What can be built into the language can only ever be a small subset of
the many and varied ways that people have incorporated data blobs into
their programs.
David Brown <david.brown@hesbynett.no> writes:
[...]
They won't use strings, they will use data blobs - binary data. Then
there is no issue with null bytes. And yes, implementations will skip
the token generation (unless you are doing something weird, such as
using #embed to read the parameters to a function call).
Tests with prototype implementations gave extremely fast results.
I'm not sure how that would work. #embed is a preprocessor directive,
and at least in the abstract model it has to expand to valid C code.
I would have expected that it would simply generate the list of comma-separated integer constants described in the standard; later
phases would simply parse that list and generate code as if that
sequence had been written in the original source file. Do you know of
an implementation that does something else?
For example, say you have a file "foo.dat" containing 4 bytes with
values 0, 1, 2, and 3. This would be perfectly valid:
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
#embed isn't defined to translate an input file to a sequence of bytes.
It's defined to translate an input file to a sequence of integer
constant expressions.
*Maybe* a compiler could optimize for the case where it knows that it's
being used to initialize an array of unsigned char, but (a) that would require the preprocessor to have information that normally doesn't exist until later phases, and (b) I'm not convinced it would be worth the
effort.
On Thu, 29 Feb 2024 00:15:17 +0000, bart wrote:
Or you can use common sense and avoiding writing code which is either
too compact or so spread out vertically that you have to hunt for the
actual code. Like trying to find the bits of meat in a thin soup.
Terribly sorry about that. I wonder if you could look at this part of the same code file:
final android.util.SparseArray<Integer> CodeToIndex =
new android.util.SparseArray<Integer>();
and show me how to thicken that part of my humble, tasteless gruel? Maybe using that same “_” trick you used to do OO in C in your previous example?
On 28/02/2024 22:57, Keith Thompson wrote:
*Maybe* a compiler could optimize for the case where it knows that it's
being used to initialize an array of unsigned char, but (a) that would
require the preprocessor to have information that normally doesn't exist
until later phases, and (b) I'm not convinced it would be worth the
effort.
Look at <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.
In those tests, for a 40 MB file gcc #embed is 200 times faster than
"xxd -i" generated files, and takes about 2.5% of the memory. It scales
to 1 GB files. And that's just a proof-of-concept implementation.
bart <bc@freeuk.com> writes:
[...]
AFAIK strings in C can have embedded zeros when not assumed to be
zero-terminated. So here:
char s[]={1,2,3,0,4,5,6};
s will have a length of 7.
Strings *by definition* cannot have embedded zeros. A null character terminates a string.
A string literal can have embedded \0 characters, but if you're
suggesting that #embed should expand to a string literal, I can see
several disadvantages and no significant advantages. For one thing, the
data may or may not end with a null character; string literals always
do.
On 28/02/2024 21:36, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
[...]
AFAIK strings in C can have embedded zeros when not assumed to be
zero-terminated. So here:
char s[]={1,2,3,0,4,5,6};
s will have a length of 7.
Strings *by definition* cannot have embedded zeros. A null character
terminates a string.
A string literal can have embedded \0 characters, but if you're
suggesting that #embed should expand to a string literal, I can see
several disadvantages and no significant advantages. For one thing, the
data may or may not end with a null character; string literals always
do.
Not here:
char s[] = "ABC";
char t[3] = "DEF";
The "DEF" string doesn't end with a zero.
Is 'string' given a special meaning in the standard?
/That/ would seem to me to be too restrictive. Does this:
char *s;
define a pointer to a such string, or can it be any kind of data? For
example, `char*` is used by the GetOpenFileName WinAPI function for a /series/ of zero-terminated strings which itself is terminated with two
zero bytes.
So it is some property that is attributed to the data that will be stored.
I normally use `cstring` or `stringz` outside the language when refering
to a zero-terminated sequences of characters, which implies that
embedded zeros aren't allowed.
On 28/02/2024 21:36, Keith Thompson wrote:
bart <bc@freeuk.com> writes:C strings. Not strings in other programming languages.
[...]
AFAIK strings in C can have embedded zeros when not assumed to be
zero-terminated. So here:
char s[]={1,2,3,0,4,5,6};
s will have a length of 7.
Strings *by definition* cannot have embedded zeros. A null character
terminates a string.
And only if you
define "C strings" in a rather restrictive but, to be fair, totally legitimate way. So I wouldn't have put in the asterisks.
My early comments on this were about compiler performance. I suggested
there might be a way to turn 100,000 byte values in a file, directly
into a 100KB string or data block, without needing to first convert
100,000 values into 100,000 integer expressions representated as tokens,
and to then parse those 100,000 expressions into AST nodes etc.
Basically, #embed is dumb.
On 28/02/2024 21:36, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
[...]
AFAIK strings in C can have embedded zeros when not assumed to be
zero-terminated. So here:
char s[]={1,2,3,0,4,5,6};
s will have a length of 7.
Strings *by definition* cannot have embedded zeros. A null character
terminates a string.
A string literal can have embedded \0 characters, but if you're
suggesting that #embed should expand to a string literal, I can see
several disadvantages and no significant advantages. For one thing, the
data may or may not end with a null character; string literals always
do.
Not here:
char s[] = "ABC";
char t[3] = "DEF";
The "DEF" string doesn't end with a zero.
Is 'string' given a special meaning in the standard?
/That/ would seem to me to be too restrictive. Does this:
char *s;
define a pointer to a such string, or can it be any kind of data? For example, `char*` is used by the GetOpenFileName WinAPI function for a /series/ of zero-terminated strings which itself is terminated with two
zero bytes.
So it is some property that is attributed to the data that will be stored.
I normally use `cstring` or `stringz` outside the language when refering
to a zero-terminated sequences of characters, which implies that
embedded zeros aren't allowed.
Using 'strinclude' in my old C compiler, it took about 1 second to build
this program:
#include <stdio.h>
#include <string.h>
char* s=strinclude("data");
int main(void) {
printf("%zu\n", strlen(s));
}
On 28/02/2024 23:52, Lawrence D'Oliveiro wrote:
On Wed, 28 Feb 2024 21:34:14 +0000, bart wrote:
In C:
void Add(int CategoryCode, ItemType Item) {
CodeToIndex_put(CategoryCode, getCount());
add(Item);
}
4 non-comment lines versus 9. I know Java needs tons of boilerplate, but >>> but it is not all the language's fault.
Or how about
void Add(int CategoryCode, ItemType Item) {CodeToIndex_put(CategoryCode, getCount());add(Item);}
Wow! I never realized you could do that in C!! I thought it was an
error to put stuff after column 72 or something. Thanks for the tip!!!
Well, you could write an entire program on one line.
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
(A winner from the obfuscated C contest).
On 28/02/2024 23:31, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
It would be unfortunate if your example was allowed. Clearly a binary
representation of an instance of your struct would probably require 16
bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure.
So what?
If you have suggestions for alternate ways to define #embed, they might
be interesting, but it's too late to change the existing specification.
My early comments on this were about compiler performance. I suggested
there might be a way to turn 100,000 byte values in a file, directly
into a 100KB string or data block, without needing to first convert
100,000 values into 100,000 integer expressions representated as tokens,
and to then parse those 100,000 expressions into AST nodes etc.
DB suggested something like that was actually done. But you can't do
that if those 100,000 numbers represent from 100KB to 800KB of memory >depending on the data type of the strucure they're initialising.
On 29.02.2024 16:48, Scott Lurndal wrote:
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
What does it do?
What preconditions must be fulfilled or what additions
does it need to compile?
(A winner from the obfuscated C contest).
(Are non-compiling C sources allowed in the contest?)
On 2/29/24 01:47, bart wrote:
My early comments on this were about compiler performance. I suggested
there might be a way to turn 100,000 byte values in a file, directly
into a 100KB string or data block, without needing to first convert
100,000 values into 100,000 integer expressions representated as tokens,
and to then parse those 100,000 expressions into AST nodes etc.
But you HAVE to do that il #embed is in the preprocessor,
because his job is to give compilable text to the real
compiler. No other way is possible.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 29.02.2024 16:48, Scott Lurndal wrote:
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
What does it do?
What preconditions must be fulfilled or what additions
does it need to compile?
(A winner from the obfuscated C contest).
(Are non-compiling C sources allowed in the contest?)
https://www.ioccc.org/years.html
The above is from 'burton'.
"abc\0def" is a valid string literal, but its value is not a string.
(No, the standard doesn't say that the value of a string literal is a string.)
scott@slp53.sl.home (Scott Lurndal) writes:
bart <bc@freeuk.com> writes:
On 28/02/2024 23:31, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
It would be unfortunate if your example was allowed. Clearly a binary >>>>> representation of an instance of your struct would probably require 16 >>>>> bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure.
So what?
If you have suggestions for alternate ways to define #embed, they might >>>> be interesting, but it's too late to change the existing specification. >>>>
My early comments on this were about compiler performance. I suggested >>>there might be a way to turn 100,000 byte values in a file, directly
into a 100KB string or data block, without needing to first convert >>>100,000 values into 100,000 integer expressions representated as tokens, >>>and to then parse those 100,000 expressions into AST nodes etc.
DB suggested something like that was actually done. But you can't do
that if those 100,000 numbers represent from 100KB to 800KB of memory >>>depending on the data type of the strucure they're initialising.
An implementation is free to simply pass a variant (or the directive
itself) of #embed from the pre-processor to the compiler if the programmer >> isn't using -E, and the compiler could simply copy the embedded file
into the object file directly, without processing it as a series of
integer values. Much like the #file and #line directives passed by
the pre-processor to the compiler.
Sure, an implementation has to operate *as if* it implemented the 8 >translation phases separately. But given a structure initialized with >#embed, it would have to generate additional code to initialize the
structure members from the bytes of the binary blob.
On 29.02.2024 17:17, Scott Lurndal wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 29.02.2024 16:48, Scott Lurndal wrote:
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
What does it do?
What preconditions must be fulfilled or what additions
does it need to compile?
With the link below I see it "needs" a 600+ lines long Makefile.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
bart <bc@freeuk.com> writes:
On 28/02/2024 23:31, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
It would be unfortunate if your example was allowed. Clearly a binary >>>>>> representation of an instance of your struct would probably require 16 >>>>>> bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure.
So what?
If you have suggestions for alternate ways to define #embed, they might >>>>> be interesting, but it's too late to change the existing specification. >>>>>
My early comments on this were about compiler performance. I suggested >>>> there might be a way to turn 100,000 byte values in a file, directly
into a 100KB string or data block, without needing to first convert
100,000 values into 100,000 integer expressions representated as tokens, >>>> and to then parse those 100,000 expressions into AST nodes etc.
DB suggested something like that was actually done. But you can't do
that if those 100,000 numbers represent from 100KB to 800KB of memory
depending on the data type of the strucure they're initialising.
An implementation is free to simply pass a variant (or the directive
itself) of #embed from the pre-processor to the compiler if the programmer >>> isn't using -E, and the compiler could simply copy the embedded file
into the object file directly, without processing it as a series of
integer values. Much like the #file and #line directives passed by
the pre-processor to the compiler.
Sure, an implementation has to operate *as if* it implemented the 8
translation phases separately. But given a structure initialized with
#embed, it would have to generate additional code to initialize the
structure members from the bytes of the binary blob.
Would it? Or could it simply assume that the binary blob
is already in the same binary format that writing an instance
of the structure from a C application on the same host would have created?
On 29/02/2024 18:28, Scott Lurndal wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
bart <bc@freeuk.com> writes:
On 28/02/2024 23:31, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
It would be unfortunate if your example was allowed. Clearly a binary >>>>>>> representation of an instance of your struct would probably require 16 >>>>>>> bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure.
So what?
If you have suggestions for alternate ways to define #embed, they might >>>>>> be interesting, but it's too late to change the existing specification. >>>>>>
My early comments on this were about compiler performance. I suggested >>>>> there might be a way to turn 100,000 byte values in a file, directly >>>>> into a 100KB string or data block, without needing to first convert
100,000 values into 100,000 integer expressions representated as tokens, >>>>> and to then parse those 100,000 expressions into AST nodes etc.
DB suggested something like that was actually done. But you can't do >>>>> that if those 100,000 numbers represent from 100KB to 800KB of memory >>>>> depending on the data type of the strucure they're initialising.
An implementation is free to simply pass a variant (or the directive
itself) of #embed from the pre-processor to the compiler if the programmer >>>> isn't using -E, and the compiler could simply copy the embedded file
into the object file directly, without processing it as a series of
integer values. Much like the #file and #line directives passed by
the pre-processor to the compiler.
Sure, an implementation has to operate *as if* it implemented the 8
translation phases separately. But given a structure initialized with
#embed, it would have to generate additional code to initialize the
structure members from the bytes of the binary blob.
Would it? Or could it simply assume that the binary blob
is already in the same binary format that writing an instance
of the structure from a C application on the same host would have created?
That would depend on the sizes of the fields in the struct, and the size
of the integer constants in the #embed.
David Brown <david.brown@hesbynett.no> writes:
On 29/02/2024 18:28, Scott Lurndal wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
scott@slp53.sl.home (Scott Lurndal) writes:
That would depend on the sizes of the fields in the struct, and the sizeAn implementation is free to simply pass a variant (or the directive >>>>> itself) of #embed from the pre-processor to the compiler if the programmer
isn't using -E, and the compiler could simply copy the embedded file >>>>> into the object file directly, without processing it as a series of
integer values. Much like the #file and #line directives passed by
the pre-processor to the compiler.
Sure, an implementation has to operate *as if* it implemented the 8
translation phases separately. But given a structure initialized with >>>> #embed, it would have to generate additional code to initialize the
structure members from the bytes of the binary blob.
Would it? Or could it simply assume that the binary blob
is already in the same binary format that writing an instance
of the structure from a C application on the same host would have created? >>
of the integer constants in the #embed.
I'm embedding a binary file. I want the representation in memory
to be _exactly_ the same as in the file, regardless of how it is
defined in the C code (array of char, array of int, array of long, struct whatever).
David Brown <david.brown@hesbynett.no> writes:
On 28/02/2024 22:57, Keith Thompson wrote:[...]
David Brown <david.brown@hesbynett.no> writes:
[...]
They won't use strings, they will use data blobs - binary data. ThenI'm not sure how that would work. #embed is a preprocessor
there is no issue with null bytes. And yes, implementations will skip >>>> the token generation (unless you are doing something weird, such as
using #embed to read the parameters to a function call).
Tests with prototype implementations gave extremely fast results.
directive,
and at least in the abstract model it has to expand to valid C code.
I would have expected that it would simply generate the list of
comma-separated integer constants described in the standard; later
phases would simply parse that list and generate code as if that
sequence had been written in the original source file. Do you know of
an implementation that does something else?
The key thing, as I understand it, is that the compiler gets to know
that the integers in the list are all "nice". And since the
preprocessor and the compiler are part of the same implementation
(even if they are separate programs communicating with pipes or
temporary files), the preprocessor could pass on the binary blob in a
pre-parsed form.
Sure, an implementation *could* optimize #embed so it expands to some implementation-defined nonstandard form that later phases can treat as
raw data. But since it's defined as a preprocessor directive, it's
difficult to see how it could do so while covering all cases.
[...]
The results of testing are that #embed is /massively/ faster and lower
memory compared to external generators, especially for larger files.
And it gives you the data on-hand for optimisation purposes, unlike
external direct linking of binary blobs. (So you can get the size of
the array, or use values from it as compile-time known values.)
What testing? The very latest versions of gcc and clang (I checked both their git repos yesterday) do not yet implement #embed.
For example, say you have a file "foo.dat" containing 4 bytes with
values 0, 1, 2, and 3. This would be perfectly valid:
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
#embed isn't defined to translate an input file to a sequence of
bytes.
It's defined to translate an input file to a sequence of integer
constant expressions.
Yes. But the prime speed (and memory usage) gains come in, are for
large files, and that means array initialisers. That does not
conflict with using it for cases like yours.
So a compiler that does this would have to be able to handle
struct foo obj = {
#blob
<binary data>
#endblob>
};
and initialize a, b, c, and d to 0, 1, 2, and 3.0, respectively from successive bytes of the binary data. Either that, or the preprocessor
would have to use information it doesn't have to determine how to expand #embed.
*Maybe* a compiler could optimize for the case where it knows that it's
being used to initialize an array of unsigned char, but (a) that would
require the preprocessor to have information that normally doesn't exist >>> until later phases, and (b) I'm not convinced it would be worth the
effort.
Look at
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.
In those tests, for a 40 MB file gcc #embed is 200 times faster than
"xxd -i" generated files, and takes about 2.5% of the memory. It
scales to 1 GB files. And that's just a proof-of-concept
implementation.
That's for std::embed, a proposed C++ feature that's *not* defined as a preprocessor directive. Sample usage from the paper:
constexpr std::span<const std::byte> fxaa_binary =
std::embed( "fxaa.spirv" );
So the compiler knows the type of the object being initialized.
(Note that the author of that C++ paper is also the editor for the C standard.)
I'm still skeptical that C's #embed will actually be implemented other
than as expanding to a sequence of integer constants.
On the other hand, C23 allows for additional implementation-defined parameters to #embed (as well as the standard embed parameters limit,
prefix, suffix, and is_empty). Such a parameter could specify how it's expanded, perhaps to some implementation-defined blob format. *If*
compilers optimize #embed to something other than a sequence of integer constant expressions, that's probably how it would be done. But since neither gcc nor clang implements #embed at all, it may be too early to speculate.
On 2024-02-29, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Exactly, "string" is not a type.
It is a type in the broader sense, in that is a logical proposition
about the attributes of an object that is true or false.
David Brown <david.brown@hesbynett.no> writes:
On 29/02/2024 18:28, Scott Lurndal wrote:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:That would depend on the sizes of the fields in the struct, and the size
scott@slp53.sl.home (Scott Lurndal) writes:
bart <bc@freeuk.com> writes:
On 28/02/2024 23:31, Keith Thompson wrote:
bart <bc@freeuk.com> writes:
It would be unfortunate if your example was allowed. Clearly a binary >>>>>>>> representation of an instance of your struct would probably require 16 >>>>>>>> bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure. >>>>>>> So what?
If you have suggestions for alternate ways to define #embed, they might >>>>>>> be interesting, but it's too late to change the existing specification. >>>>>>>
My early comments on this were about compiler performance. I suggested >>>>>> there might be a way to turn 100,000 byte values in a file, directly >>>>>> into a 100KB string or data block, without needing to first convert >>>>>> 100,000 values into 100,000 integer expressions representated as tokens, >>>>>> and to then parse those 100,000 expressions into AST nodes etc.
DB suggested something like that was actually done. But you can't do >>>>>> that if those 100,000 numbers represent from 100KB to 800KB of memory >>>>>> depending on the data type of the strucure they're initialising.
An implementation is free to simply pass a variant (or the directive >>>>> itself) of #embed from the pre-processor to the compiler if the programmer
isn't using -E, and the compiler could simply copy the embedded file >>>>> into the object file directly, without processing it as a series of
integer values. Much like the #file and #line directives passed by
the pre-processor to the compiler.
Sure, an implementation has to operate *as if* it implemented the 8
translation phases separately. But given a structure initialized with >>>> #embed, it would have to generate additional code to initialize the
structure members from the bytes of the binary blob.
Would it? Or could it simply assume that the binary blob
is already in the same binary format that writing an instance
of the structure from a C application on the same host would have created? >>
of the integer constants in the #embed.
I'm embedding a binary file. I want the representation in memory
to be _exactly_ the same as in the file, regardless of how it is
defined in the C code (array of char, array of int, array of long, struct whatever).
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 29.02.2024 17:18, Keith Thompson wrote:
"abc\0def" is a valid string literal, but its value is not a string.
(No, the standard doesn't say that the value of a string literal is a
string.)
This sounds somewhat strange in my ears. Usually a literal for a type
will constitute an instance of the type. - I suppose the irregularity
stems from the fact that there's no explicit string object type in C.
Exactly, "string" is not a type.
On 28/02/2024 21:56, Lawrence D'Oliveiro wrote:
On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:
... people write utilities for them in a variety of languages ...
But it will often be more convenient to have it built into the
language and compiler.
What can be built into the language can only ever be a small subset of
the many and varied ways that people have incorporated data blobs into
their programs.
Of course. But that doesn't mean that a language should not include a feature that makes it easy for a lot of people to get some data blobs
into their code.
I have an actual use case today where #embed of a (C++) std::map binary object created by separate tool would be very useful. I'm planning on
using mmap to load it at runtime at the moment.
scott@slp53.sl.home (Scott Lurndal) writes:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 29.02.2024 17:17, Scott Lurndal wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 29.02.2024 16:48, Scott Lurndal wrote:
int main(int b,char**i){long long n=B,a=I^n,r=(a/b&a)>>4,y=atoi(*++i),_=(((a^n/b)*(y>>T)|y>>S)&r)|(a^r);printf("%.8s\n",(char*)&_);}
What does it do?
What preconditions must be fulfilled or what additions
does it need to compile?
With the link below I see it "needs" a 600+ lines long Makefile.
The readme simply says compile it and run it
as ./prog <value between 1 and 512>.
No, you have to compile it with specific command-line arguments to
define B and I. The Makefile does that (don't ask me why it's so long),
but you can do it manually.
From hint.txt:
"""
On a little-endian machine:
clang -include stdio.h -include stdlib.h -Wall -Weverything -pedantic -DB=6945503773712347754LL -DI=5859838231191962459LL -DT=0 -DS=7 -o prog prog.c
On a big-endian machine:
clang -include stdio.h -include stdlib.h -Wall -Weverything -pedantic -DB=7091606191627001958LL -DI=6006468689561538903LL -DT=1 -DS=0 -o prog.be prog.c
"""
An array of bytes is not a "string".
On Thu, 29 Feb 2024 08:58:40 +0100, David Brown wrote:
On 28/02/2024 21:56, Lawrence D'Oliveiro wrote:
On Wed, 28 Feb 2024 12:50:10 +0100, David Brown wrote:
... people write utilities for them in a variety of languages ...
But it will often be more convenient to have it built into the
language and compiler.
What can be built into the language can only ever be a small subset of
the many and varied ways that people have incorporated data blobs into
their programs.
Of course. But that doesn't mean that a language should not include a
feature that makes it easy for a lot of people to get some data blobs
into their code.
Maybe the C compiler should concentrate on compiling C code, and leave it
to the rest of the build toolchain to deal with other data.
On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
I have an actual use case today where #embed of a (C++) std::map binary
object created by separate tool would be very useful. I'm planning on
using mmap to load it at runtime at the moment.
Why not convert it to a .o file and statically link it into your program
as part of the build process?
On 2/29/24 11:18, bart wrote:
Using 'strinclude' in my old C compiler, it took about 1 second to
build this program:
#include <stdio.h>
#include <string.h>
char* s=strinclude("data");
int main(void) {
printf("%zu\n", strlen(s));
}
tth@redlady:~/Desktop$ man strinclude
No manual entry for strinclude
tth@redlady:~/Desktop$
On Thu, 29 Feb 2024 16:19:45 +0100, David Brown wrote:
An array of bytes is not a "string".
It is in PHP, I think also in Perl, and also in (obsolete) Python 2.
And what about C string functions that take explicit lengths?
On 29/02/2024 15:34, tTh wrote:
On 2/29/24 11:18, bart wrote:
Using 'strinclude' in my old C compiler, it took about 1 second to
build this program:
#include <stdio.h>
#include <string.h>
char* s=strinclude("data");
int main(void) {
printf("%zu\n", strlen(s));
}
tth@redlady:~/Desktop$ man strinclude
No manual entry for strinclude
tth@redlady:~/Desktop$
'strinclude' is an extension I made for that compiler.
#embed is the new feature of C23. Although I'm not sure how it would be
used to initialise a char* pointer. Perhaps like this:
char dummy[] {
#embed "data"
,0};
char* s = dummy;
(I've added a 0-terminator here; I don't know if #embed will take care
of that.)
My 'strinclude' produces a zero-terminated string, but it is done within
the parser rather than lexer.
It is possible to be actively involved in the development of the
standards - preparing and discussing proposals, joining committees, or
at least joining mailing lists for the discussions. If you are not
doing the work and showing the interest /before/ decisions are made, you don't get a say afterwards. It is more productive to discuss what you
can do with the features C has, than to wish it never had them.
bart <bc@freeuk.com> writes:
[...]
In't it cheating when half the program is part of the build
instructions?
Apparently not. If it were, the judges of the IOCCC would not have
accepted it.
[...]
One of the winners of the 1988 contest was:
```
#include "/dev/tty"
On 2024-03-01, David Brown <david.brown@hesbynett.no> wrote:
It is possible to be actively involved in the development of the
standards - preparing and discussing proposals, joining committees, or
at least joining mailing lists for the discussions. If you are not
doing the work and showing the interest /before/ decisions are made, you
don't get a say afterwards. It is more productive to discuss what you
can do with the features C has, than to wish it never had them.
Also, if you don't join the gang that breaks windows and spray
paints walls, you don't get to say aftward which windows are broken
and what is scribbled on what wall.
You mean: There's a danger that a function that returns a 'string', but truncates it to n chars, might not be returning a string at all ?
Like most Abuse of the Rules winners, it resulted in a rule change for
the following years.
On 29/02/2024 21:27, Lawrence D'Oliveiro wrote:
On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
I have an actual use case today where #embed of a (C++) std::map
binary object created by separate tool would be very useful. I'm
planning on using mmap to load it at runtime at the moment.
Why not convert it to a .o file and statically link it into your
program as part of the build process?
That's exactly what #embed will enable.
"A *string* is a contiguous sequence of characters
terminated by and including the first null character."
On Fri, 1 Mar 2024 11:52:16 +0000, bart wrote:
On 29/02/2024 21:27, Lawrence D'Oliveiro wrote:
On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
I have an actual use case today where #embed of a (C++) std::map
binary object created by separate tool would be very useful. I'm
planning on using mmap to load it at runtime at the moment.
Why not convert it to a .o file and statically link it into your
program as part of the build process?
That's exactly what #embed will enable.
You can call it a toy version of objcopy ><https://manpages.debian.org/1/objcopy.1.html>.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Fri, 1 Mar 2024 11:52:16 +0000, bart wrote:
On 29/02/2024 21:27, Lawrence D'Oliveiro wrote:
On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
I have an actual use case today where #embed of a (C++) std::map
binary object created by separate tool would be very useful. I'm
planning on using mmap to load it at runtime at the moment.
Why not convert it to a .o file and statically link it into your
program as part of the build process?
That's exactly what #embed will enable.
You can call it a toy version of objcopy >><https://manpages.debian.org/1/objcopy.1.html>.
While objcopy supports a number of ways to manipulate an ELF file, I
wouldn't equate it with #embed at all.
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Thu, 29 Feb 2024 14:14:52 -0800, Keith Thompson wrote:
"A *string* is a contiguous sequence of characters terminated by and
including the first null character."
So how come strlen(3) does not include the null?
Because the *length of a string* is by definition "the number of bytes preceding the null character".
On Mon, 04 Mar 2024 20:55:28 -0800, Keith Thompson wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Thu, 29 Feb 2024 14:14:52 -0800, Keith Thompson wrote:
"A *string* is a contiguous sequence of characters terminated by and
including the first null character."
So how come strlen(3) does not include the null?
Because the *length of a string* is by definition "the number of bytes
preceding the null character".
So the “string” itself includes the null character, but its “length” does
not?
Kaz Kylheku <433-929-6894@kylheku.com> writes:
On 2024-03-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Mon, 04 Mar 2024 20:55:28 -0800, Keith Thompson wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Thu, 29 Feb 2024 14:14:52 -0800, Keith Thompson wrote:
"A *string* is a contiguous sequence of characters terminated by and >>>>>> including the first null character."
So how come strlen(3) does not include the null?
Because the *length of a string* is by definition "the number of bytes >>>> preceding the null character".
So the “string” itself includes the null character, but its “length” does
not?
That's correct. However, its size includes it.
sizeof "abc" == 4
strlen("abc") == 3
The abstract string does not include the null character;
we understand "abc" to be a three character string.
Sure, if you define "abstract string" that way. I'll just note that C's definition of the word "string" does include the terminating null
character, and does not talk about "abstract strings". (A string in the abstract machine clearly includes the null character, but that's a bit
of a stretch.)
Kaz Kylheku <433-929-6894@kylheku.com> writes:
On 2024-03-07, Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Mon, 04 Mar 2024 20:55:28 -0800, Keith Thompson wrote:
Lawrence D'Oliveiro <ldo@nz.invalid> writes:
On Thu, 29 Feb 2024 14:14:52 -0800, Keith Thompson wrote:
"A *string* is a contiguous sequence of characters terminated by and >>>>>> including the first null character."
So how come strlen(3) does not include the null?
Because the *length of a string* is by definition "the number of bytes >>>> preceding the null character".
So the “string” itself includes the null character, but its “length” does
not?
That's correct. However, its size includes it.
sizeof "abc" == 4
strlen("abc") == 3
The abstract string does not include the null character;
we understand "abc" to be a three character string.
Sure, if you define "abstract string" that way. I'll just note that C's definition of the word "string" does include the terminating null
character, and does not talk about "abstract strings". (A string in the abstract machine clearly includes the null character, but that's a bit
of a stretch.)
Yes, I'm being annoyingly pedantic.
The C representation of the string includes the null character;
the size is a representational concept so it counts it.
It is common for C programs to break encapsulation and openly deal with
that terminating null.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 150:17:50 |
Calls: | 10,383 |
Files: | 14,054 |
Messages: | 6,417,787 |