So: Which of those should a compiler support? Should it support more
than one form? If so, should the language allow the programmer to
specify which form to use on any particular string?
Do you guys have any thoughts on the best ways for strings of characters
to be stored?
1. There's the C way, of course, of reserving one value (zero) and using
it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past (where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
Do you guys have any thoughts on the best ways for strings of characters
to be stored?
1. There's the C way, of course, of reserving one value (zero) and using
it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past (where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
Options 1 and 2 have the advantage that they can be referred to simply
by address. Option 3 needs an additional place in which to store the
(first, past) control block.
Option 1 has the advantage that it's easy for a program to process (by
either pointer or index).
Options 1 and 3 have the advantage that one can refer to the tail of the string (anything past the first character) without creating a copy,
although option 3 would need a new control block to be created. Option 2 would require a new string to be created.
In fact, option 3 has the advantage that it allows any continuous
substring - head, mid, or tail - to be referred to without making a copy
of the required part of the string.
Options 2 and 3 make it fast to find the length. They also allow any
value (i.e. including zero) to be part of the string.
So: Which of those should a compiler support? Should it support more
than one form? If so, should the language allow the programmer to
specify which form to use on any particular string?
If that's not complicated enough, the above essentially considers
strings whose contents could be read-only or read-write but their
lengths don't change. If the lengths can change then there are
additional issues of storage management. Eek! ;)
On 2022-10-24 12:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and
past (where 'past' means first plus length such that the second
pointer points one position beyond the last character).
Any others?
4. String body only. The constraints are known outside.
This is the way string slices and fixed length strings are implemented.
In the later case the compiler knows the strings bounds (first and last indices and thus the length). In the former case the compiler passes a "string dope" along with the naked body. The dope contains the bounds.
This has an effect on pointers. E.g. if you want slices and efficient
raw strings you must distinguish pointers to definite (constrained) vs. indefinite (unconstrained) objects of same type.
E.g. in Ada you cannot take an indefinite string pointer to a fixed
length string because there is no bounds. If you wanted that feature you would use a "fat pointer" to carry bounds with it.
This is similar to atomic, volatile objects and pointers to. The
mechanics is same. You cannot take a general-purpose pointer to an
atomic object, because the client code would not know that it should
take care upon dereferencing.
James Harris <james.harris.1@gmail.com> writes:
So: Which of those should a compiler support? Should it support more
than one form? If so, should the language allow the programmer to
specify which form to use on any particular string?
I think the idea of C is to leave it up to the programmer.
The C string literals and functions are just some kind of
suggestion, and they help to provide basic services, such
as printing some text to the terminal. But otherwise, the
programmer is free to implement his own string type(s) or
use string libraries.
The choice depends on the expected type of use. For example,
some ways to store strings are known as "ropes" (Hans J Boehm,
1994), others are known as "gap buffers". A text editor
might simultaneously use ropes for its text buffers and
C strings for filenames.
The crucial thing for allowing programmers to implement
their own string type is that the languages is fast enough
to do this with little overhead compared to an implementation
of strings in the langugage itself. Implementing custom
string representations in slow languages might not feasible.
On 24/10/2022 11:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
For lower level strings, I'd highly recommend using zero-terminated
strings, or using them as the basis, or at least having it as an option.
This is not the 'C way', as I'd long used this outside of C and Unix
(eg. in DEC assembly, and in my own stuff for at least a decode before I first dealt with C.
I still use them, and among many advantages such as pure simplicity,
allow you to directly make use of innumerable APIs that specify such
strings.
They can be used in contexts such as the compact string fields of
structs, since the only overhead is allowing space for that terminator **.
The next step up, in lower level code, is to use a slice. This is a
(pointer, length) descriptor. Here no terminator is necessary, and
allows strings to also contain embedded zeros (so can contain any binary data).
String slices can point into another string (allowing sharing), or into another slice, or into a regular zero-terminated string.
However to call an API function expecting a zero-terminated string
('stringz` as I sometimes call it), the pointer is not enough: you need
to ensure there's a zero following those <length> characters!
Within my dynamic scripting language, I have a full-on counted string
type, with reference counting to manage sharing and allow automatic
memory management.
But with the same headache when calling low-level FFI
functions that expect C-like strings.
On 24/10/2022 13:07, Dmitry A. Kazakov wrote:
On 2022-10-24 12:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and
past (where 'past' means first plus length such that the second
pointer points one position beyond the last character).
Any others?
4. String body only. The constraints are known outside.
This is the way string slices and fixed length strings are
implemented. In the later case the compiler knows the strings bounds
(first and last indices and thus the length). In the former case the
compiler passes a "string dope" along with the naked body. The dope
contains the bounds.
That doesn't seem meaningfully different from case 3. To be clear, case
3 would be represented by, in addition to the bytes of the string,
struct
first: pointer to first byte of string
past: pointer to byte after last byte of string
.... other fields ....
end struct
The string length would be past - first. The bytes of the string would
be those pointed at (which I presume is what you are calling the naked
body).
This has an effect on pointers. E.g. if you want slices and efficient
raw strings you must distinguish pointers to definite (constrained)
vs. indefinite (unconstrained) objects of same type.
E.g. in Ada you cannot take an indefinite string pointer to a fixed
length string because there is no bounds. If you wanted that feature
you would use a "fat pointer" to carry bounds with it.
Any reason you'd recommend against storing bounds as in the struct, above?
This is similar to atomic, volatile objects and pointers to. The
mechanics is same. You cannot take a general-purpose pointer to an
atomic object, because the client code would not know that it should
take care upon dereferencing.
I am not sure what that means. I guess the point you are making is that
there are levels of classification which don't affect the data type but
they do affect how it can be accessed - with the language needing to
prevent a reference weakening the storage model. For example, a
read-write reference to a substring should be prevented from being used
to access part of a string which is supposed to be read-only.
Potential operations on string structures:
* allocate a new string
* create a slice (view) of an existing string
* index into a string
On 24/10/2022 15:28, Bart wrote:
On 24/10/2022 11:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
..
For lower level strings, I'd highly recommend using zero-terminated
strings, or using them as the basis, or at least having it as an option.
They certainly seem easiest to work with although they do have
limitations such as:
* cannot include a character with the encoding of zero (as you say)
* must be scanned to determine length
* awkward to add to or delete from the end of as they don't carry any
data about whether the memory immediately following is available or not
The next step up, in lower level code, is to use a slice. This is a
(pointer, length) descriptor. Here no terminator is necessary, and
allows strings to also contain embedded zeros (so can contain any
binary data).
String slices can point into another string (allowing sharing), or
into another slice, or into a regular zero-terminated string.
That's more universal and therefore perhaps the best to implement if
only one scheme is to be available.
would be hard to manage the memory for. Instead of just (first, length)
or (first, past) perhaps one would need something like
struct
first: pointer to first element
past: pointer just past last element
count: number of slices pointing to this slice/string
base: the parent string or memory
flags: various
end struct
The base field would refer to the string object we were a slice of or,
if we were not a slice but the base string, the memory area in which the string was stored.
The flags would indicate whether the string/slice could have its
contents changed and whether it could have its length changed, whether
the contents could be moved in memory, etc.
However to call an API function expecting a zero-terminated string
('stringz` as I sometimes call it), the pointer is not enough: you
need to ensure there's a zero following those <length> characters!
Within my dynamic scripting language, I have a full-on counted string
type, with reference counting to manage sharing and allow automatic
memory management.
What fields did you use to manage such stuff? Am I on the right lines
with the ideas above?
But with the same headache when calling low-level FFI functions that
expect C-like strings.
Just a thought: ensure there is always at least one more byte of memory
than the string requires and put a zero byte at the end of the string
before calling any function which expects a C-like string. (User responsibility to ensure there are no zero bytes embedded in the string.)
Many of such operations are provided by the standard library
of C++. You could have a look at its implementation. One might
even think of kinda "backporting" it to C. Or use C++.
On 2022-10-26 21:43, James Harris wrote:
On 24/10/2022 13:07, Dmitry A. Kazakov wrote:
On 2022-10-24 12:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and
past (where 'past' means first plus length such that the second
pointer points one position beyond the last character).
Any others?
4. String body only. The constraints are known outside.
This is the way string slices and fixed length strings are
implemented. In the later case the compiler knows the strings bounds
(first and last indices and thus the length). In the former case the
compiler passes a "string dope" along with the naked body. The dope
contains the bounds.
That doesn't seem meaningfully different from case 3. To be clear,
case 3 would be represented by, in addition to the bytes of the string,
struct
first: pointer to first byte of string
past: pointer to byte after last byte of string
.... other fields ....
end struct
The string length would be past - first. The bytes of the string would
be those pointed at (which I presume is what you are calling the naked
body).
That is the structure of a string dope, not the string itself, unless
you have the body in other fields, but then why would you need pointers?
To clarify terms. String representation must include the string body if
we are talking about values of strings. The things like pointers and vectorized dopes are references to a string, not strings. You can pass a string by a reference, sure. But the string value is somewhere else.
What you pass is not a string it is a substitute.
This has an effect on pointers. E.g. if you want slices and efficient
raw strings you must distinguish pointers to definite (constrained)
vs. indefinite (unconstrained) objects of same type.
E.g. in Ada you cannot take an indefinite string pointer to a fixed
length string because there is no bounds. If you wanted that feature
you would use a "fat pointer" to carry bounds with it.
Any reason you'd recommend against storing bounds as in the struct,
above?
Start with interoperability of strings and slices of. The crucial requirements would be:
A slice can be passed to a subprogram expecting a string without copying.
Consider efficiency and low-level close to hardware stuff:
Aggregation of strings with known bounds does not require storing them.
E.g. you can have arrays of fixed length strings (like an image buffer).
If a member of a structure is a fixed length string, no bounds are
stored. A pointer to a fixed length string is a plain pointer etc.
This is similar to atomic, volatile objects and pointers to. The
mechanics is same. You cannot take a general-purpose pointer to an
atomic object, because the client code would not know that it should
take care upon dereferencing.
I am not sure what that means. I guess the point you are making is
that there are levels of classification which don't affect the data
type but they do affect how it can be accessed - with the language
needing to prevent a reference weakening the storage model. For
example, a read-write reference to a substring should be prevented
from being used to access part of a string which is supposed to be
read-only.
Yes, it is a type constraint. There are all sorts of constraints one
could put on a type in order to produce a constrained subtype.
Constraining limits operations, e.g. immutability removes mutators. It
also directs certain implementations like using locking instructions or dropping known bounds.
On 27/10/2022 08:28, Dmitry A. Kazakov wrote:
On 2022-10-26 21:43, James Harris wrote:
On 24/10/2022 13:07, Dmitry A. Kazakov wrote:
On 2022-10-24 12:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and
past (where 'past' means first plus length such that the second
pointer points one position beyond the last character).
Any others?
4. String body only. The constraints are known outside.
This is the way string slices and fixed length strings are
implemented. In the later case the compiler knows the strings bounds
(first and last indices and thus the length). In the former case the
compiler passes a "string dope" along with the naked body. The dope
contains the bounds.
That doesn't seem meaningfully different from case 3. To be clear,
case 3 would be represented by, in addition to the bytes of the string,
struct
first: pointer to first byte of string
past: pointer to byte after last byte of string
.... other fields ....
end struct
The string length would be past - first. The bytes of the string
would be those pointed at (which I presume is what you are calling
the naked body).
That is the structure of a string dope, not the string itself, unless
you have the body in other fields, but then why would you need pointers?
Curious use of terms. I presume that by "dope" you mean a dope vector
which can also be called a control block or a descriptor.
As for this specific case, the same information can be conveyed in
different ways: (start, length), (start, memsize), (first, last),
(first, past). I chose the latter as it should be slightly faster than
the others and does not run into problems when the elements are other
than single bytes.
To explain, for common operations,
memsize() = past - first
length() = memsize() >> alignbits
forward iteration proceeds while address < past
backward iteration proceeds while address >= first
The only stipulation is that the body must not be allocated at the very
top or bottom of the addressable range.
Using (first, past) should be as simple as that. By contrast, the
similar (first, last) runs into a slight problem when elements are wider
than single bytes: should the last pointer point to the start or the end
of the last item?
The others, which involve memsize or length, make it slightly slower to
judge the limits of iteration in the general case, requiring a
calculation to see if a pointer is outside the limits of the string
being referred to.
To clarify terms. String representation must include the string body
if we are talking about values of strings. The things like pointers
and vectorized dopes are references to a string, not strings. You can
pass a string by a reference, sure. But the string value is somewhere
else. What you pass is not a string it is a substitute.
That depends, surely, on how "a string" is defined. If strings are
defined as descriptors starting with the fields first and past then the bodies of such strings can be elsewhere. (There would be other fields of
a string descriptor to assist with memory management and probably some
flags, though I am open to suggestions as to what those fields should be.)
This has an effect on pointers. E.g. if you want slices and
efficient raw strings you must distinguish pointers to definite
(constrained) vs. indefinite (unconstrained) objects of same type.
E.g. in Ada you cannot take an indefinite string pointer to a fixed
length string because there is no bounds. If you wanted that feature
you would use a "fat pointer" to carry bounds with it.
Any reason you'd recommend against storing bounds as in the struct,
above?
Start with interoperability of strings and slices of. The crucial
requirements would be:
A slice can be passed to a subprogram expecting a string without
copying.
Indeed, that's a major benefit of slices, IMO, being able to pass
something which looks and acts like a string but which doesn't need the elements of the string to be copied.
That said, a slice would probably have a length which the callee can determine but which the callee cannot change. I presume that's what
you'd call a constraint.
If a callee wanted to be able to change the length of a string then it
would have to be passed a real string, not a slice.
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be altered
by the callee. (Could be a real string or a slice.)
James Harris <james.harris.1@gmail.com> writes:
Potential operations on string structures:
* allocate a new string
* create a slice (view) of an existing string
* index into a string
Many of such operations are provided by the standard library
of C++. You could have a look at its implementation. One might
even think of kinda "backporting" it to C. Or use C++.
Suggested Video: "The strange details of std::string at
Facebook" - Nicholas Ormrod (2016)
On 26/10/2022 21:33, James Harris wrote:
On 24/10/2022 15:28, Bart wrote:
On 24/10/2022 11:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
String slices can point into another string (allowing sharing), or
into another slice, or into a regular zero-terminated string.
That's more universal and therefore perhaps the best to implement if
only one scheme is to be available.
Most strings are fixed-length once created; strings that can grow are
rare. You don't need a 'capacity' field for example (like C++'s Vector
type).
But managing memory can still be an issue because you don't know if a particular slice owns its memory, or points to a string literal, or
points into a shared string, or points to external memory.
Within my dynamic scripting language, I have a full-on counted string
type, with reference counting to manage sharing and allow automatic
memory management.
What fields did you use to manage such stuff? Am I on the right lines
with the ideas above?
The structure I use is not lightweight because it is for interpreted
code. The following object descriptor is a 32-byte record, used for all objects. I've shown only the fields used by string objects:
record objrec =
u32 refcount
byte mutable # 1 for mutable strings
byte objtype
u16 dummy
ichar strptr # (ref char)
u64 length
union
u64 alloc64
object objptr2 # (ref objptr)
end
end
The string data itself is separate, pointed to by 'strptr'. This is nil
when the length is zero (it doesn't point to ""). It is not
zero-terminated (unless an external slice happens to be).
Most strings are mutable, then .alloc64 gives the capacity of the
allocation.
An important field is objtype; its values are:
Normal Regular string (uses alloc64)
Slice Slice into another (uses objptr2)
Extslice Strings lie outside the object scheme
Note that if you take those 32 bytes, then the middle 16 bytes (.strptr
and .length fields) correspond to a raw Slice as used in my lower level language.
On 29/10/2022 12:23, James Harris wrote:
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be
altered by the callee. (Could be a real string or a slice.)
4. Extensible string. This is not quite the same as your (1) which
requires only a mutable string.
You can mutate a string (alter individual characters) without needing to
know the overall length or its allocated capacity.
(You might further split that into mutable/non-mutable extensible
strings. Usually if growing a string by appending to it, you don't want
to also alter existing parts of the string.)
(You probably need to consider Unicode strings too, especially if
represented as UTF8, as the meaning of 'length' needs pinning down.)
On 29/10/2022 13:24, Bart wrote:
On 29/10/2022 12:23, James Harris wrote:
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be
altered by the callee. (Could be a real string or a slice.)
4. Extensible string. This is not quite the same as your (1) which
requires only a mutable string.
You mean a string which can be made longer but the existing contents
could not be changed? I cannot think of a use case for that.
You can mutate a string (alter individual characters) without needing
to know the overall length or its allocated capacity.
Wouldn't you need to know how long the string was so that a callee could
make sure it was trying to modify characters within the string rather
than memory locations outside it?
(You might further split that into mutable/non-mutable extensible
strings. Usually if growing a string by appending to it, you don't
want to also alter existing parts of the string.)
Mutable and extensible are good descriptions though as above I don't yet
see the value in allowing a string to be extensible but its existing
contents to be immutable.
A slice would be inextensible but could be mutable or immutable, AISI.
(You probably need to consider Unicode strings too, especially if
represented as UTF8, as the meaning of 'length' needs pinning down.)
I haven't mentioned it but ATM my chars are 32-bit and any 32-bit value
can be stored in them, including zero. It also means there's no way to reserve a value for EOF so that condition has to be handled a different
way from what C programmers are used to where EOF is a value which is
outside the range permitted for chars. Challenges a plenty!
On 29/10/2022 15:16, James Harris wrote:
On 29/10/2022 13:24, Bart wrote:
On 29/10/2022 12:23, James Harris wrote:
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be
altered by the callee. (Could be a real string or a slice.)
4. Extensible string. This is not quite the same as your (1) which
requires only a mutable string.
You mean a string which can be made longer but the existing contents
could not be changed? I cannot think of a use case for that.
That's a pattern I used all the time to incrementally build strings, for example to generate C or ASM source files from a language app.
Or it can be as simple as this:
errormess +:= " on line "+tostr(linenumber)
Once extended, the existing parts of the string are never modified.
Perhaps you can give an example of where mutating the characters of a
string, extensible or otherwise, comes in useful.
(My strings generally are mutable, but it's not a feature I use a great
deal.
For applications like text editors, I use a list of strings, one per
line. And editing within each line create a new string for each edit. Efficiency here is not critical, and the needs are diverse, like
deleting within the string, or insertion. It's just easier to construct
a new one.)
(You might further split that into mutable/non-mutable extensible
strings. Usually if growing a string by appending to it, you don't
want to also alter existing parts of the string.)
Mutable and extensible are good descriptions though as above I don't
yet see the value in allowing a string to be extensible but its
existing contents to be immutable.
A slice would be inextensible but could be mutable or immutable, AISI.
(You probably need to consider Unicode strings too, especially if
represented as UTF8, as the meaning of 'length' needs pinning down.)
I haven't mentioned it but ATM my chars are 32-bit and any 32-bit
value can be stored in them, including zero. It also means there's no
way to reserve a value for EOF so that condition has to be handled a
different way from what C programmers are used to where EOF is a value
which is outside the range permitted for chars. Challenges a plenty!
But you're not using all 2**32 bit patterns? It could reserve -1 or all
1s for EOF just like C does. Because EOF would generally be used for character-at-a-time streaming, which is typically 8-bit anyway.
Or have you developed a binary file system which works with 32-bit-wide 'bytes'?
On 27/10/2022 08:28, Dmitry A. Kazakov wrote:
On 2022-10-26 21:43, James Harris wrote:
On 24/10/2022 13:07, Dmitry A. Kazakov wrote:
On 2022-10-24 12:31, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and
past (where 'past' means first plus length such that the second
pointer points one position beyond the last character).
Any others?
4. String body only. The constraints are known outside.
This is the way string slices and fixed length strings are
implemented. In the later case the compiler knows the strings bounds
(first and last indices and thus the length). In the former case the
compiler passes a "string dope" along with the naked body. The dope
contains the bounds.
That doesn't seem meaningfully different from case 3. To be clear,
case 3 would be represented by, in addition to the bytes of the string,
struct
first: pointer to first byte of string
past: pointer to byte after last byte of string
.... other fields ....
end struct
The string length would be past - first. The bytes of the string
would be those pointed at (which I presume is what you are calling
the naked body).
That is the structure of a string dope, not the string itself, unless
you have the body in other fields, but then why would you need pointers?
Curious use of terms. I presume that by "dope" you mean a dope vector
which can also be called a control block or a descriptor.
As for this specific case, the same information can be conveyed in
different ways: (start, length), (start, memsize), (first, last),
(first, past). I chose the latter as it should be slightly faster than
the others and does not run into problems when the elements are other
than single bytes.
Using (first, past) should be as simple as that. By contrast, the
similar (first, last) runs into a slight problem when elements are wider
than single bytes: should the last pointer point to the start or the end
of the last item?
To clarify terms. String representation must include the string body
if we are talking about values of strings. The things like pointers
and vectorized dopes are references to a string, not strings. You can
pass a string by a reference, sure. But the string value is somewhere
else. What you pass is not a string it is a substitute.
That depends, surely, on how "a string" is defined.
This has an effect on pointers. E.g. if you want slices and
efficient raw strings you must distinguish pointers to definite
(constrained) vs. indefinite (unconstrained) objects of same type.
E.g. in Ada you cannot take an indefinite string pointer to a fixed
length string because there is no bounds. If you wanted that feature
you would use a "fat pointer" to carry bounds with it.
Any reason you'd recommend against storing bounds as in the struct,
above?
Start with interoperability of strings and slices of. The crucial
requirements would be:
A slice can be passed to a subprogram expecting a string without
copying.
Indeed, that's a major benefit of slices, IMO, being able to pass
something which looks and acts like a string but which doesn't need the elements of the string to be copied.
That said, a slice would probably have a length which the callee can determine but which the callee cannot change. I presume that's what
you'd call a constraint.
If a callee wanted to be able to change the length of a string then it
would have to be passed a real string, not a slice.
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be altered
by the callee. (Could be a real string or a slice.)
Consider efficiency and low-level close to hardware stuff:
Aggregation of strings with known bounds does not require storing
them.
E.g. you can have arrays of fixed length strings (like an image
buffer). If a member of a structure is a fixed length string, no
bounds are stored. A pointer to a fixed length string is a plain
pointer etc.
You mean the string bounds could be known at compile time, say, rather
than at run time. Good point. Any suggestions on how that should be implemented?
This is similar to atomic, volatile objects and pointers to. The
mechanics is same. You cannot take a general-purpose pointer to an
atomic object, because the client code would not know that it should
take care upon dereferencing.
I am not sure what that means. I guess the point you are making is
that there are levels of classification which don't affect the data
type but they do affect how it can be accessed - with the language
needing to prevent a reference weakening the storage model. For
example, a read-write reference to a substring should be prevented
from being used to access part of a string which is supposed to be
read-only.
Yes, it is a type constraint. There are all sorts of constraints one
could put on a type in order to produce a constrained subtype.
Constraining limits operations, e.g. immutability removes mutators. It
also directs certain implementations like using locking instructions
or dropping known bounds.
Was with you all the way until you mentioned dropping known bounds. What
does that mean? How can it be legitimate to drop any bounds?
On 29/10/2022 16:30, Bart wrote:
Further, functions which /return/ a string would create the string and
return it whole.
It is only functions which /modify/ a string, i.e. take it as an inout parameter, where it would matter whether the string was read/write or extensible. For an inout string what should be the defaults? If we say
an inout string defaults to immutable and inextensible then that would
lead to the following ways to specify a string, s, as a parameter:
f: function(s: inout string char)
f: function(s: inout string char rw)
f: function(s: inout string char ext rw)
f: function(s: inout string char ext)
Note the "ext" and "rw" attributes. The idea is that they would specify
how the string could be modified in the function. Adding rw would allow
the string's existing contents to be taken as read-write rather than read-only. Adding ext would allow the string to be extended.
That's effectively me thinking out loud and trying out some ideas. How
does it look to you?
What about other permissions such as prepend, split, insert, delete,
etc?
I intend a string to be simply an array whose length can be changed.
On 27/10/2022 12:14, Bart wrote:
Most strings are fixed-length once created; strings that can grow are
rare. You don't need a 'capacity' field for example (like C++'s Vector
type).
Having watched some videos on string storage recently I now think I know
what you mean by the capacity field - basically that a string descriptor would consist of these fields:
start
length
capacity
so that the string could be extended at the end (up to the capacity).
That may be a bit restrictive. A programmer might want to remove or add characters at the beginning rather than just at the end, even though
such would be done less often.
So what do you think of having a string descriptor more like
first
past
memfirst
mempast
where memfirst and mempast would define the allocated space in which the string body would sit.
An important field is objtype; its values are:
Normal Regular string (uses alloc64)
Slice Slice into another (uses objptr2)
Extslice Strings lie outside the object scheme
OK. I may use something like that or, possibly, some flags.
..
Note that if you take those 32 bytes, then the middle 16 bytes
(.strptr and .length fields) correspond to a raw Slice as used in my
lower level language.
Good point. I'd need slices to have the same format as strings and for
both to have flags. As there's no space for flags in the (first, past)
pair I'd need to add a flags word, making the structure
first
past
misc
memfirst
mempast
where misc would store various pieces of information, not just flag
bits. Slices would have only the first three fields. Strings would have
all five. Flags would indicate whether this was a string or a slice.
For me it's too early to optimise but it's worth noting that even for
64-bit machines the above would occupy only 24 or 40 bytes of a 64-byte
cache line so short string bodies could be stored in the same line,
again with flags indicating that that was so.
On 2022-10-29 13:23, James Harris wrote:
As for this specific case, the same information can be conveyed in
different ways: (start, length), (start, memsize), (first, last),
(first, past). I chose the latter as it should be slightly faster than
the others and does not run into problems when the elements are other
than single bytes.
Yes. The problem with (first, next) is that next could be inexpressible.
Most difficulties arise with strings/arrays over enumerations and
modular types. (first, last) has no such problem.
Both have issues with empty strings, e.g. with a multitude of
representations of. Compare with +/-0 problem for non-2-complement
integers.
That said, a slice would probably have a length which the callee can
determine but which the callee cannot change. I presume that's what
you'd call a constraint.
It could be a constraint for fixed length slices.
If a callee wanted to be able to change the length of a string then it
would have to be passed a real string, not a slice.
A callee might pass a variable length slice, which, for example, can be enlarged or shortened. Many languages with dynamically allocated strings
have this.
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be
altered by the callee. (Could be a real string or a slice.)
Think of it in terms of constraints. Immutability is a constraint. Fixed length is a constraint. Bounded length is a constraint. Non-sliding
lower bound is a constraint. Non-sliding upper bound is a constraint.
This should cover all spectrum. You can express all cases in terms of constraints.
On 29/10/2022 21:17, Dmitry A. Kazakov wrote:
On 2022-10-29 13:23, James Harris wrote:
..
As for this specific case, the same information can be conveyed in
different ways: (start, length), (start, memsize), (first, last),
(first, past). I chose the latter as it should be slightly faster
than the others and does not run into problems when the elements are
other than single bytes.
Yes. The problem with (first, next) is that next could be
inexpressible. Most difficulties arise with strings/arrays over
enumerations and modular types. (first, last) has no such problem.
Both have issues with empty strings, e.g. with a multitude of
representations of. Compare with +/-0 problem for non-2-complement
integers.
That sounds interesting. Do you see multiple representations of the
empty string in the following? Monospacing required. Here's how the
string "abcd" would be stored
!_a_!_b_!_c_!_d_!
^ ^
! !
first past
* so first would point at the first element of the string
* and past would point one cell beyond the last element of the string.
I don't see where you see a multitude of representations of the null
string. AISI the empty string would simply have past equal to first in
all cases.
That said, a slice would probably have a length which the callee can
determine but which the callee cannot change. I presume that's what
you'd call a constraint.
It could be a constraint for fixed length slices.
If a callee wanted to be able to change the length of a string then
it would have to be passed a real string, not a slice.
A callee might pass a variable length slice, which, for example, can
be enlarged or shortened. Many languages with dynamically allocated
strings have this.
What is your definition of a slice? Is it /part/ of an underlying string
or is it a /copy/ of part of a string? For example, if
string S = "abcde"
slice T = S[1..3] ;"bcd"
then changes to T would do what to S?
If slice is a view of an underlying string (which is what I had in mind)
then I don't get how you could meaningfully enlarge or shorten it.
I guess there would be these kinds of string argument:
1. Read-write string. Anything could be done to the string by the
callee. (Would have to be a real string.)
2. Read-write fixed-length string. The string's contents could be
altered but it could not be made longer or shorter. (Could be a real
string or a slice.)
3. Read-only string. Neither its length nor it contents could be
altered by the callee. (Could be a real string or a slice.)
Think of it in terms of constraints. Immutability is a constraint.
Fixed length is a constraint. Bounded length is a constraint.
Non-sliding lower bound is a constraint. Non-sliding upper bound is a
constraint.
This should cover all spectrum. You can express all cases in terms of
constraints.
I presume such constraints would be specified when objects are declared.
As a programmer how would you want to specify such constraints? Would
each have a reserved word, for example?
On 2022-10-30 12:24, James Harris wrote:
On 29/10/2022 21:17, Dmitry A. Kazakov wrote:
On 2022-10-29 13:23, James Harris wrote:
..
As for this specific case, the same information can be conveyed in
different ways: (start, length), (start, memsize), (first, last),
(first, past). I chose the latter as it should be slightly faster
than the others and does not run into problems when the elements are
other than single bytes.
Yes. The problem with (first, next) is that next could be
inexpressible. Most difficulties arise with strings/arrays over
enumerations and modular types. (first, last) has no such problem.
Both have issues with empty strings, e.g. with a multitude of
representations of. Compare with +/-0 problem for non-2-complement
integers.
That sounds interesting. Do you see multiple representations of the
empty string in the following? Monospacing required. Here's how the
string "abcd" would be stored
!_a_!_b_!_c_!_d_!
^ ^
! !
first past
* so first would point at the first element of the string
* and past would point one cell beyond the last element of the string.
I don't see where you see a multitude of representations of the null
string. AISI the empty string would simply have past equal to first in
all cases.
...
(0..0)
(1..1)
(2..2)
...
(n..n)
...
With pointers it becomes even worse as some of them might point to
invalid addresses.
string S = "abcde"
slice T = S[1..3] ;"bcd"
then changes to T would do what to S?
No idea. It depends. Is slice in your example an independent object?
But considering this:
declare
S : String := "abcde";
begin
S (1..3) := "x"; -- Illegal in Ada
But should it be legal, then the result would be
"xde"
Many implementations make this illegal because it would require either bounded or dynamically allocated unbounded string.
As above, the language is meant to treat strings as arrays. So AISI it should not ascribe any particular meaning to their contents.
Do you guys have any thoughts on the best ways for strings of characters
to be stored?
1. There's the C way, of course, of reserving one value (zero) and using
it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past (where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
Options 1 and 2 have the advantage that they can be referred to simply
by address. Option 3 needs an additional place in which to store the
(first, past) control block.
Option 1 has the advantage that it's easy for a program to process (by
either pointer or index).
Options 1 and 3 have the advantage that one can refer to the tail of the string (anything past the first character) without creating a copy,
although option 3 would need a new control block to be created. Option 2 would require a new string to be created.
In fact, option 3 has the advantage that it allows any continuous
substring - head, mid, or tail - to be referred to without making a copy
of the required part of the string.
Options 2 and 3 make it fast to find the length. They also allow any
value (i.e. including zero) to be part of the string.
So: Which of those should a compiler support? Should it support more
than one form? If so, should the language allow the programmer to
specify which form to use on any particular string?
If that's not complicated enough, the above essentially considers
strings whose contents could be read-only or read-write but their
lengths don't change. If the lengths can change then there are
additional issues of storage management. Eek! ;)
Recommendations welcome!
On 29/10/2022 15:01, James Harris wrote:
On 27/10/2022 12:14, Bart wrote:
Most strings are fixed-length once created; strings that can grow are
rare. You don't need a 'capacity' field for example (like C++'s
Vector type).
Having watched some videos on string storage recently I now think I
know what you mean by the capacity field - basically that a string
descriptor would consist of these fields:
start
length
capacity
so that the string could be extended at the end (up to the capacity).
That may be a bit restrictive. A programmer might want to remove or
add characters at the beginning rather than just at the end, even
though such would be done less often.
Doing a prepend is not a problem. What's critical is whether the new
length is still within the current allocation. (Prepend requires
shifting of the old string so is less efficient anyway.)
If a new allocation is needed, you may be copying data for both prepend
and append.
With delete however, you may need to think about whether to /reduce/ the allocation size.
So what do you think of having a string descriptor more like
first
past
memfirst
mempast
where memfirst and mempast would define the allocated space in which
the string body would sit.
What's the difference between 'first' and 'memfirst'?
Would you have a
string that doesn't start at the beginning of its allocated block?
On 29/10/2022 18:42, James Harris wrote:
On 29/10/2022 16:30, Bart wrote:
Further, functions which /return/ a string would create the string and
return it whole.
Not necessarily. My dynamic language can return a string which is a
slice into another. (Slices are not exposed in this language; they are
in the static one, where slices are distinct types.)
Example:
func trim(s) =
if s.len=2 then return "" fi
return s[2..$-1]
end
This trims the first and last character of string. But here it returns a slice into the original string. If I wanted a fresh copy, I'd have to
use copy() inside the function, or copy() (or a special kind of
assignment) outside it.
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of characters
to be stored?
1. There's the C way, of course, of reserving one value (zero) and using
it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past
(where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
I think an exhaustive list of options would be very large if you're not pre-judging and filtering as you're adding options.
4) [List|Array|Tuple|Iterator] of character objects
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7
can be used to format the data to squeeze it into 7 bits.
6) Use UCS4 codes (24bit) padded out to 32 bits, and then you get a
whole byte for metadata attached to each character.
On 2022-10-30 12:24, James Harris wrote:
I don't see where you see a multitude of representations of the null
string. AISI the empty string would simply have past equal to first in
all cases.
...
(0..0)
(1..1)
(2..2)
...
(n..n)
...
With pointers it becomes even worse as some of them might point to
invalid addresses.
What is your definition of a slice? Is it /part/ of an underlying
string or is it a /copy/ of part of a string? For example, if
string S = "abcde"
slice T = S[1..3] ;"bcd"
then changes to T would do what to S?
No idea. It depends. Is slice in your example an independent object?
But considering this:
declare
S : String := "abcde";
begin
S (1..3) := "x"; -- Illegal in Ada
But should it be legal, then the result would be
"xde"
Many implementations make this illegal because it would require either bounded or dynamically allocated unbounded string.
You can consider make it legal for these, but then you would have
different semantics of slices for different strings. And this would contradict the design principle of having all strings interchangeable regardless the implementation method.
I presume such constraints would be specified when objects are declared.
Objects and/or subtypes. Depending on the language preferences. Note
also that you can have constrained views of the same object. E.g. you
have a mutable variable passed down as in-argument. That would be an immutable view of the same object.
As a programmer how would you want to specify such constraints? Would
each have a reserved word, for example?
In some cases constraints might be implied. But usually language have
lots of [sub]type modifiers like
in, in out, out, constant
atomic, volatile, shared
aliased (can get pointers to)
external, static
public, private, protected (visibility constraints)
range, length, bounds
parameter AKA discriminant (general purpose constraint)
specific type AKA static/dynamic up/downcast (view as another type)
class-wide (view as a class of types rooted in this one)
...
measurement unit
On 30/10/2022 14:20, Dmitry A. Kazakov wrote:
On 2022-10-30 12:24, James Harris wrote:
..
I don't see where you see a multitude of representations of the null
string. AISI the empty string would simply have past equal to first
in all cases.
...
(0..0)
(1..1)
(2..2)
...
(n..n)
...
With pointers it becomes even worse as some of them might point to
invalid addresses.
In the general case strings would live at arbitrary addresses so no
meaning could be inferred from any address.
In all cases
past - first
would define the length of the string.
If the length was zero then it would be an empty string.
But considering this:
declare
S : String := "abcde";
begin
S (1..3) := "x"; -- Illegal in Ada
In Ada would the following be legal?
S (1..3) := "xxx"; --replacement same size as what it is replacing
I'd be happy with that.
But should it be legal, then the result would be
"xde"
Many implementations make this illegal because it would require either
bounded or dynamically allocated unbounded string.
You can consider make it legal for these, but then you would have
different semantics of slices for different strings. And this would
contradict the design principle of having all strings interchangeable
regardless the implementation method.
I don't mind there being differences along the lines of 'constraints'
where a less-constrained object can be passed to a callee which expects
an object with such constraints or imposes more constraints, but not one which needs fewer constraints.
I presume such constraints would be specified when objects are declared.
Objects and/or subtypes. Depending on the language preferences. Note
also that you can have constrained views of the same object. E.g. you
have a mutable variable passed down as in-argument. That would be an
immutable view of the same object.
Yes, and an immutable object could not be passed to a callee which
wanted a mutable object.
As a programmer how would you want to specify such constraints? Would
each have a reserved word, for example?
In some cases constraints might be implied. But usually language have
lots of [sub]type modifiers like
in, in out, out, constant
atomic, volatile, shared
aliased (can get pointers to)
external, static
public, private, protected (visibility constraints)
range, length, bounds
parameter AKA discriminant (general purpose constraint)
specific type AKA static/dynamic up/downcast (view as another type) >> class-wide (view as a class of types rooted in this one)
...
measurement unit
So you wouldn't have a keyword to indicate a constraint such as
"Non-sliding lower bound" which you mentioned before but IIUC you might
have some qualification of the 'bounds' keyword as in
bounds(^..)
to indicate an unchangeable lower bound (with ^ meaning the start of the string)?
On 29/10/2022 22:02, Bart wrote:
On 29/10/2022 18:42, James Harris wrote:
On 29/10/2022 16:30, Bart wrote:
Further, functions which /return/ a string would create the string
and return it whole.
Not necessarily. My dynamic language can return a string which is a
slice into another. (Slices are not exposed in this language; they are
in the static one, where slices are distinct types.)
Example:
func trim(s) =
if s.len=2 then return "" fi
return s[2..$-1]
end
This trims the first and last character of string. But here it returns
a slice into the original string. If I wanted a fresh copy, I'd have
to use copy() inside the function, or copy() (or a special kind of
assignment) outside it.
That's a challenging example. In a sense it returns either of two
different types: the caller could be handed a string or a slice.
On 30/10/2022 16:21, luserdroog wrote:
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:..
Do you guys have any thoughts on the best ways for strings of characters >> to be stored?
1. There's the C way, of course, of reserving one value (zero) and using >> it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past
(where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
I think an exhaustive list of options would be very large if you're not pre-judging and filtering as you're adding options.
4) [List|Array|Tuple|Iterator] of character objectsYou mean where the characters are stored individually (one per node)?
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7Interesting idea. It's certainly one I hadn't thought of.
can be used to format the data to squeeze it into 7 bits.
6) Use UCS4 codes (24bit) padded out to 32 bits, and then you get aThat's definitely thinking outside the box. I can see it working if the
whole byte for metadata attached to each character.
user (the programmer) wanted a string of 24-bit values but it could be awkward in other cases such as if he wanted a string of 32-bit or 8-bit values. I don't think I mentioned it but I'd like the programmer to be
able to choose what the elements of the string would be.
On 30/10/2022 16:21, luserdroog wrote:
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of characters >>> to be stored?
1. There's the C way, of course, of reserving one value (zero) and using >>> it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past
(where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
..
I think an exhaustive list of options would be very large if you're not
pre-judging and filtering as you're adding options.
4) [List|Array|Tuple|Iterator] of character objects
You mean where the characters are stored individually (one per node)?
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7
can be used to format the data to squeeze it into 7 bits.
Interesting idea. It's certainly one I hadn't thought of.
6) Use UCS4 codes (24bit) padded out to 32 bits, and then you get a
whole byte for metadata attached to each character.
That's definitely thinking outside the box. I can see it working if the
user (the programmer) wanted a string of 24-bit values but it could be awkward in other cases such as if he wanted a string of 32-bit or 8-bit values. I don't think I mentioned it but I'd like the programmer to be
able to choose what the elements of the string would be.
On 30/10/2022 19:13, James Harris wrote:
On 30/10/2022 16:21, luserdroog wrote:
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters
to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using
it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and past >>>> (where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
..
I think an exhaustive list of options would be very large if you're not
pre-judging and filtering as you're adding options.
4) [List|Array|Tuple|Iterator] of character objects
You mean where the characters are stored individually (one per node)?
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7
can be used to format the data to squeeze it into 7 bits.
Interesting idea. It's certainly one I hadn't thought of.
Nor should you - that is a crazy idea. It is massively inefficient, as
well as being inconsistent with everything else.
On 31/10/2022 16:58, David Brown wrote:
On 30/10/2022 19:13, James Harris wrote:
On 30/10/2022 16:21, luserdroog wrote:
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters
to be stored?
1. There's the C way, of course, of reserving one value (zero) and
using
it as a terminator.
2. There's the 'length prefix' option of putting the length of the
string in a machine word before the characters.
3. There's the 'double pointer' way of pointing at, say, first and
past
(where 'past' means first plus length such that the second pointer
points one position beyond the last character).
Any others?
..
I think an exhaustive list of options would be very large if you're not >>>> pre-judging and filtering as you're adding options.
4) [List|Array|Tuple|Iterator] of character objects
You mean where the characters are stored individually (one per node)?
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7 >>>> can be used to format the data to squeeze it into 7 bits.
Interesting idea. It's certainly one I hadn't thought of.
Nor should you - that is a crazy idea. It is massively inefficient,
as well as being inconsistent with everything else.
It's a perfectly fine idea - for the 1970s.
(Now the 8th bit is better put to use to represent UTF8.)
On 30/10/2022 17:52, James Harris wrote:
On 29/10/2022 22:02, Bart wrote:
On 29/10/2022 18:42, James Harris wrote:
On 29/10/2022 16:30, Bart wrote:
Further, functions which /return/ a string would create the string
and return it whole.
Not necessarily. My dynamic language can return a string which is a
slice into another. (Slices are not exposed in this language; they
are in the static one, where slices are distinct types.)
Example:
func trim(s) =
if s.len=2 then return "" fi
return s[2..$-1]
end
This trims the first and last character of string. But here it
returns a slice into the original string. If I wanted a fresh copy,
I'd have to use copy() inside the function, or copy() (or a special
kind of assignment) outside it.
That's a challenging example. In a sense it returns either of two
different types: the caller could be handed a string or a slice.
In this language, it only has a String type, not a Slice. Slicing is an operation you apply on strings to yield another String object.
(Internally, it has to distinguish between owned strings and slices into strings owned by other objects, but as I said that aspect is not exposed.)
On 30/10/2022 19:13, James Harris wrote:
On 30/10/2022 16:21, luserdroog wrote:
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters
to be stored?
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7
can be used to format the data to squeeze it into 7 bits.
Interesting idea. It's certainly one I hadn't thought of.
Nor should you - that is a crazy idea. It is massively inefficient, as
well as being inconsistent with everything else.
I would also recommend treating characters and character strings as
something very different from raw bytes and binary blobs. Users want to
do very different things with them, and many of the useful operations
are completely different. Some languages have made the mistake of conflating the two concepts - it's difficult to fix once that design
flaw is set into a language.
On 2022-10-30 18:46, James Harris wrote:
In Ada would the following be legal?
Yes, in Ada slice length is constrained as the string length is.
S (1..3) := "xxx"; --replacement same size as what it is replacing
I'd be happy with that.
It is still not fully defined. You need to consider the issue of sliding bounds. E.g.
S (2..4) (2) := 'x'; -- Assign a character
Now with sliding:
S (2..4) (2) := 'x' gives "abxde", x is second in the slice
without sliding
S (2..4) (2) := 'x' gives "axcde", x is at 2 in the original string
In Ada the right side slides, the left does not. Sliding the right side allows doing logical things like:
S1 (1..5) := S1 (5..9); -- 5..9 slides to 1..5
I am not sure if sliding constraint might be usable. It is a different
issue to constraining bounds because it involves operations like
assignment. And it is not clear how to implement such a constraint effectively. Most constraints are either static (compile time), or
simple to represent, like bounds or type tags. Sliding might be
implemented as a flag, but then you will have to check it all the time.
Maybe not worth having it as a choice. And it is unclear what is the unconstrained state, sliding or non-sliding? (:-))
On 31/10/2022 16:58, David Brown wrote:
On 30/10/2022 19:13, James Harris wrote:
On 30/10/2022 16:21, luserdroog wrote:
On Monday, October 24, 2022 at 5:31:14 AM UTC-5, James Harris wrote:
Do you guys have any thoughts on the best ways for strings of
characters
to be stored?
..
5) Use 7 bits for data, 8th bit for terminator. Either ASCII7 or UTF-7 >>>> can be used to format the data to squeeze it into 7 bits.
Interesting idea. It's certainly one I hadn't thought of.
Nor should you - that is a crazy idea. It is massively inefficient,
as well as being inconsistent with everything else.
The model I have chosen (at least, for now) is to have a string indexed logically from zero (so indices do not need to be stored) and, for implementation, delimited by two pointers.
The one downside I am aware of is that it will, at times, require
creation and destruction of a small descriptor. I'll have to see how the approach works out in practice.
..
I would also recommend treating characters and character strings as
something very different from raw bytes and binary blobs. Users want
to do very different things with them, and many of the useful
operations are completely different. Some languages have made the
mistake of conflating the two concepts - it's difficult to fix once
that design flaw is set into a language.
That sounds interesting but I cannot tell what you have in mind.
One could consider strings as having two categories of operation: those
which involve only the memory used by strings such as allocation, concatenation, insertion, deletion, etc; and those which care about the contents of a string such as capitalisation, comparison, whitespace recognition, parsing, etc. Why could the mechanics not apply to raw
bytes and blobs?
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or conversion operations on them.
The concept of "signed char" and "unsigned char" in C is a serious
design flaw. A type designed to hold letters should not have a sign,
and should not be used to hold arbitrary raw, low-level data.
On 04/11/2022 13:58, David Brown wrote:
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or
conversion operations on them.
Bytes are small integers, typically of u8 type.
I can't see why arithmetic can't be done with them, unless you want a
purer kind of language where arithmetic is only allowed on signed
numbers, and bitwise ops only on unsigned numbers, which is usually
going to be a pain for all concerned.
On 04/11/2022 17:50, Bart wrote:
On 04/11/2022 13:58, David Brown wrote:
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or
conversion operations on them.
Bytes are small integers, typically of u8 type.
I can't see why arithmetic can't be done with them, unless you want a
purer kind of language where arithmetic is only allowed on signed
numbers, and bitwise ops only on unsigned numbers, which is usually
going to be a pain for all concerned.
I think what David means is that arithmetic operations don't apply to characters
(even though some languages permit such operations). For
example, neither
'a' * 5
nor even
'R' + 1
have any meaning over the set of characters.
Prohibiting arithmetic on
them could be dome but would make classifying and manipulating
characters difficult unless one had a comprehensive set of library
functions such as
is_digit(char)
is_alphanum(locale, char)
is_lower(locale, char)
upper(locale, char)
and many more.
On 03/11/2022 16:48, James Harris wrote:
On 31/10/2022 16:58, David Brown wrote:
I would also recommend treating characters and character strings as
something very different from raw bytes and binary blobs. Users want
to do very different things with them, and many of the useful
operations are completely different. Some languages have made the
mistake of conflating the two concepts - it's difficult to fix once
that design flaw is set into a language.
That sounds interesting but I cannot tell what you have in mind.
I mean you should consider a "string" to be a way of holding a sequence
of "character" units which can hold a code unit of UTF-8 (since any
other choice of character encoding is madness).
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or conversion operations on them.
The concept of "signed char" and "unsigned char" in C is a serious
design flaw. A type designed to hold letters should not have a sign,
and should not be used to hold arbitrary raw, low-level data.
You might also consider not having a character type at all. Python 3
has no character types - "a" is a string, not a character.
Raw binary buffers require nothing more than an address and a size to describe them - anything more, and it is too high level. (Again,
there's nothing wrong with providing higher level features and
interfaces, but they have to build on the fundamental ones.)
On 04/11/2022 21:35, James Harris wrote:
On 04/11/2022 17:50, Bart wrote:
On 04/11/2022 13:58, David Brown wrote:
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or
conversion operations on them.
Bytes are small integers, typically of u8 type.
I can't see why arithmetic can't be done with them, unless you want a
purer kind of language where arithmetic is only allowed on signed
numbers, and bitwise ops only on unsigned numbers, which is usually
going to be a pain for all concerned.
I think what David means is that arithmetic operations don't apply to
characters
I was picking on the 'byte' type; it seems extraordinary that you
shouldn't be allowed to do arithmetic with them. If you can initialise a
byte value with a number like this:
byte a = 123
then it's a number!
(even though some languages permit such operations). For example, neither
'a' * 5
nor even
'R' + 1
have any meaning over the set of characters.
I actually had such a restriction for a while: char*5 wasn't allowed,
but char+1 was. After all why on earth shouldn't you want the next
character in that alphabet?
Why should code like this be made illegal:
a := a * 10 + (c - '0')
Then I realised I shouldn't be telling the programmer what they can and
can't do with characters, as there might be some perfectly valid
use-case that I simply hadn't thought of.
Prohibiting arithmetic on them could be dome but would make
classifying and manipulating characters difficult unless one had a
comprehensive set of library functions such as
is_digit(char)
is_alphanum(locale, char)
is_lower(locale, char)
upper(locale, char)
and many more.
As I said, you and I don't know all the possibilites.
Of course there
would need to be conversions between char and int, but this can become a nuisance.
On 04/11/2022 13:58, David Brown wrote:
I mean you should consider a "string" to be a way of holding a
sequence of "character" units which can hold a code unit of UTF-8
(since any other choice of character encoding is madness).
We've discussed before that (IMO) Unicode is useful for physical
printing to paper or electronic rendering such as to PDF but that it's a nightmare for programmers and users when it is used for any kind of
input so I won't go over that again except to say that AISI Unicode
should be handled by library functions rather than a language.
What I do have in mind is strings of 'containers' where a string might
be declared as of type
string of char8 -- meaning a string of char8 containers
string of char32 -- meaning a string of char32 containers
What goes in each 8-bit or 32-bit 'container' would be another matter.
That agnostic ideal is somewhat in tension with the desire to include
string literals in a program text. For that, as I've mentioned before,
my preference is to have the program text and any literals within it
written in ASCII and American English; supplementary files would express
the string literals in other languages.
For example,
print "Hello world"
would be accompanied by a file for French which included
"Hello world" --> "Bonjour le monde"
Naturally, multilingual programming is much more complex than that
simple example but it shows the basic idea. The compiler would be able
to check that language files had everything required for a given piece
of source code.
On 04/11/2022 22:28, Bart wrote:
I actually had such a restriction for a while: char*5 wasn't allowed,
but char+1 was. After all why on earth shouldn't you want the next
character in that alphabet?
That's because 'R' + 1 may not be the next character in all alphabets. Defining 'next' is more than difficult. It depends on intended collation order which varies in different parts of the world and can even change
over time as authorities choose different collation orders. Some
plausible meanings of 'R' + 1:
'S' (as in ASCII)
'r' (what the user may want as sort order)
's' (what the user may want as sort order)
a non-character (as in EBCDIC)
Perhaps a pseudo-call would be better such as
char_plus(collation, 'R', 1)
where 'collation' would be used to determine what was the specified
number of characters away from 'R'.
Aside from converting digits (and any other characters used as digits in
a higher number base) is there's any meaning to converting chars to/from ints?
On 05/11/2022 10:36, James Harris wrote:
On 04/11/2022 22:28, Bart wrote:
I actually had such a restriction for a while: char*5 wasn't allowed,
but char+1 was. After all why on earth shouldn't you want the next
character in that alphabet?
That's because 'R' + 1 may not be the next character in all alphabets.
Defining 'next' is more than difficult. It depends on intended
collation order which varies in different parts of the world and can
even change over time as authorities choose different collation
orders. Some plausible meanings of 'R' + 1:
'S' (as in ASCII)
You said elsewhere that you want to use ASCII within programs. Which is
it happens, corresponds to the first 128 points of Unicode. Here:
char c
for c in 'A'..'Z' do
print c
od
this displays 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. The for-loop works by adding
+1 to 'c'; it doesn't care about collating order!
(This also illustrates the difference between `byte` and `char`; using a
byte type, the output would be '656667...90'.)
'r' (what the user may want as sort order)
's' (what the user may want as sort order)
a non-character (as in EBCDIC)
My feeling is that it is these diverse requirements that require user-supplied functions.
My 'char' is still a thinly veiled numeric type, so ordinary integer arithmatic can be used. Otherwise even something like this becomes impossible:
['A'..'Z']int histogram
midpoint := (histogram.upb - histogram.lwb)/2
++histogram[midpoint+1]
This requires certain properties of array indicates, like being able to
do arithmetic, as well as being consecutive ordinal values.
Perhaps a pseudo-call would be better such as
char_plus(collation, 'R', 1)
where 'collation' would be used to determine what was the specified
number of characters away from 'R'.
Sure, as I said, you can provide any interpretation you like. But if you
do C+1, you expect to get the code of the next character (or next
codepoint if venturing outside ASCII).
Aside from converting digits (and any other characters used as digits
in a higher number base) is there's any meaning to converting chars
to/from ints?
My static language makes byte and char slightly different types. (Types involving char may get printed differently.)
That meant that `ref byte` and `ref char` were incompatible, which
rapidly turned into a nightmare: I might have a readfile() routine that returned a `ref byte` type, a pointer to a block of memory.
But I wanted to interpret that block as `ref char` - a string. So this
meant loads of casts to either `ref byte` or `ref char` to get things to work, but it got too much (a bit like 'const poisoning' in C where it
just propagates everywhere). That was clearly the wrong approach.
In the end I relaxed the type rules so that `ref byte` and `ref char`
are compatible, and everything is now SO much simpler.
On 04/11/2022 17:50, Bart wrote:
On 04/11/2022 13:58, David Brown wrote:
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or
conversion operations on them.
Bytes are small integers, typically of u8 type.
I can't see why arithmetic can't be done with them, unless you want a
purer kind of language where arithmetic is only allowed on signed
numbers, and bitwise ops only on unsigned numbers, which is usually
going to be a pain for all concerned.
I think what David means is that arithmetic operations don't apply to characters (even though some languages permit such operations). For
example, neither
'a' * 5
nor even
'R' + 1
have any meaning over the set of characters. Prohibiting arithmetic on
them could be dome but would make classifying and manipulating
characters difficult unless one had a comprehensive set of library
functions such as
is_digit(char)
is_alphanum(locale, char)
is_lower(locale, char)
upper(locale, char)
and many more.
On 04/11/2022 13:58, David Brown wrote:
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or
conversion operations on them.
Bytes are small integers, typically of u8 type.
I can't see why arithmetic can't be done with them, unless you want a
purer kind of language where arithmetic is only allowed on signed
numbers, and bitwise ops only on unsigned numbers, which is usually
going to be a pain for all concerned.
The concept of "signed char" and "unsigned char" in C is a serious
design flaw. A type designed to hold letters should not have a sign,
and should not be used to hold arbitrary raw, low-level data.
Signed and unsigned chars are not so bad;
presumably C intended these to
do the job of a 'byte' type for small integers. So it was just a poor
choice of name. (After all there is no separate type in C for bytes
holding character data.)
What's bad is that third kind: a 'plain char' type, which is
incompatible with both signed and unsigned char, even though it
necessarily needs to be one of the other on a specific platform. It
occurs in no other language, and causes problems within FFI APIs.
On 05/11/2022 11:14, James Harris wrote:
For example,
print "Hello world"
would be accompanied by a file for French which included
"Hello world" --> "Bonjour le monde"
Naturally, multilingual programming is much more complex than that
simple example but it shows the basic idea. The compiler would be able
to check that language files had everything required for a given piece
of source code.
Is it? This pretty much all I did when I used to write internationalised applications. Although that was only done for French, German and Dutch.
But that print example would be written like this:
print /"Hello World"
The "/" was a translation operator, so only certain strings were
translated. This also made it easy to scan source code to build a list
of messages, used to maintain the dictionary as entries were added,
deleted or modified.
The scheme did need some hints sometimes, written like this, to get
around ambiguities:
print /"Green!colour"
print /"Green!fresh"
The hint was usually filtered out.
But this is little to do with how strings are represented. Even in
English, messages may include characters like "£" (pound sign) which is
not part of ASCII.
So a way to represent Unicode within literals is still needed (didn't we discuss this a couple of years ago?).
On 04/11/2022 21:35, James Harris wrote:
On 04/11/2022 17:50, Bart wrote:
On 04/11/2022 13:58, David Brown wrote:
Neither "byte" nor "character" should have any kind of arithmetic
operators - they are not integers. But you will need cast or
conversion operations on them.
Bytes are small integers, typically of u8 type.
I can't see why arithmetic can't be done with them, unless you want a
purer kind of language where arithmetic is only allowed on signed
numbers, and bitwise ops only on unsigned numbers, which is usually
going to be a pain for all concerned.
I think what David means is that arithmetic operations don't apply to
characters
I was picking on the 'byte' type; it seems extraordinary that you
shouldn't be allowed to do arithmetic with them. If you can initialise a
byte value with a number like this:
byte a = 123
then it's a number!
(even though some languages permit such operations). For example, neither
'a' * 5
nor even
'R' + 1
have any meaning over the set of characters.
I actually had such a restriction for a while: char*5 wasn't allowed,
but char+1 was. After all why on earth shouldn't you want the next
character in that alphabet? Why should code like this be made illegal:
a := a * 10 + (c - '0')
Then I realised I shouldn't be telling the programmer what they can and
can't do with characters, as there might be some perfectly valid
use-case that I simply hadn't thought of.
Maybe 'a' * 5 yields the value 'aaaaa' or the string "aaaaa", or this is
some kind on encryption algorithm.
So now they are treated like integers, other than printing an array of
char or pointer to char assumes they are strings.
Prohibiting arithmetic on them could be dome but would make
classifying and manipulating characters difficult unless one had a
comprehensive set of library functions such as
is_digit(char)
is_alphanum(locale, char)
is_lower(locale, char)
upper(locale, char)
and many more.
As I said, you and I don't know all the possibilites. Of course there
would need to be conversions between char and int, but this can become a nuisance.
On 05/11/2022 13:38, Bart wrote:
On 05/11/2022 10:36, James Harris wrote:
On 04/11/2022 22:28, Bart wrote:
I actually had such a restriction for a while: char*5 wasn't
allowed, but char+1 was. After all why on earth shouldn't you want
the next character in that alphabet?
That's because 'R' + 1 may not be the next character in all
alphabets. Defining 'next' is more than difficult. It depends on
intended collation order which varies in different parts of the world
and can even change over time as authorities choose different
collation orders. Some plausible meanings of 'R' + 1:
'S' (as in ASCII)
You said elsewhere that you want to use ASCII within programs. Which
is it happens, corresponds to the first 128 points of Unicode. Here:
char c
for c in 'A'..'Z' do
print c
od
this displays 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. The for-loop works by
adding +1 to 'c'; it doesn't care about collating order!
That piece of code is fine for an English-speaking user but to a
Spaniard the alphabet has a character missing, and Greeks wouldn't agree
with it at all.
Where L is the locale why not allow something like
for c in L.alpha_first..L.alpha_last
print c
od
?
That should work in English or Spanish or Greek etc, shouldn't it?
(This also illustrates the difference between `byte` and `char`; using
a byte type, the output would be '656667...90'.)
'r' (what the user may want as sort order)
's' (what the user may want as sort order)
a non-character (as in EBCDIC)
My feeling is that it is these diverse requirements that require
user-supplied functions.
Functions, yes, (or what appear to be functions) though surely they
should be part of a library that comes with the language.
My 'char' is still a thinly veiled numeric type, so ordinary integer
arithmatic can be used. Otherwise even something like this becomes
impossible:
['A'..'Z']int histogram
midpoint := (histogram.upb - histogram.lwb)/2
++histogram[midpoint+1]
This requires certain properties of array indicates, like being able
to do arithmetic, as well as being consecutive ordinal values.
Why not
[L.alpha_first..L.alpha_last] int histogram
?
As for the calculations what about using L.ord and L.chr to convert
between chars and integers?
Perhaps a pseudo-call would be better such as
char_plus(collation, 'R', 1)
where 'collation' would be used to determine what was the specified
number of characters away from 'R'.
Sure, as I said, you can provide any interpretation you like. But if
you do C+1, you expect to get the code of the next character (or next
codepoint if venturing outside ASCII).
If you use codepoints then you might not get the next character in
sequence - as in the case of 'R' in ebcdic (you'd get a non-printing character) or 'N' in Spanish (you'd get 'O' rather than the N with a hat
that a Spaniard would expect).
If the programmer wants "the next character in the alphabet" then
shouldn't the programming language or a standard library help him get
that irrespective of the human language the program is meant to be processing?
Aside from converting digits (and any other characters used as digits
in a higher number base) is there's any meaning to converting chars
to/from ints?
My static language makes byte and char slightly different types.
(Types involving char may get printed differently.)
That meant that `ref byte` and `ref char` were incompatible, which
rapidly turned into a nightmare: I might have a readfile() routine
that returned a `ref byte` type, a pointer to a block of memory.
But I wanted to interpret that block as `ref char` - a string. So this
meant loads of casts to either `ref byte` or `ref char` to get things
to work, but it got too much (a bit like 'const poisoning' in C where
it just propagates everywhere). That was clearly the wrong approach.
C's const propagation sounds like Java with its horrible, and sticky, exception propagation.
In the end I relaxed the type rules so that `ref byte` and `ref char`
are compatible, and everything is now SO much simpler.
Would there have been any value in defining a layout for the untyped
area of bytes (or parts thereof)? That's where I think I am headed.
On 04/11/2022 23:28, Bart wrote:
I actually had such a restriction for a while: char*5 wasn't allowed,
but char+1 was. After all why on earth shouldn't you want the next
character in that alphabet? Why should code like this be made illegal:
a := a * 10 + (c - '0')
Why not :
char a; // Use whatever syntax you prefer
int i; // and whatever type names you prefer
a = digit(i);
The function "digit" might be defined :
char digit(int i) {
return char(i + ord('0'));
}
You want to find the next letter after "x"? "char(ord(x) + 1)". Or perhaps, like Pascal and Ada, "succ(x)".
On 05/11/2022 17:07, James Harris wrote:
On 05/11/2022 13:38, Bart wrote:
My feeling is that it is these diverse requirements that require
user-supplied functions.
Functions, yes, (or what appear to be functions) though surely they
should be part of a library that comes with the language.
How much of a library you provide depends on the goals of the language.
You don't have to provide /everything/ !
C's const propagation sounds like Java with its horrible, and sticky,
exception propagation.
Getting "const" right is something to think long and hard about. When
do you mean "constant", when do you mean "read-only", when do you mean
"I promise this data will never change", "I will assume this data will
never change", "I promise /I/ won't change this data via this
reference", "This data will be unchanged logically but may change in underlying representation, such as using a cache of some sort", etc. ?
Constness is a hugely powerful concept, and something you definitely
want in a language. Modern language design fashion is to making things constant be default and require explicit indication that they can
change. Some programming languages (pure functional programming
languages, for example) have /only/ constant data - there is no such
thing as variables.
On 05/11/2022 17:08, David Brown wrote:
On 05/11/2022 17:07, James Harris wrote:
On 05/11/2022 13:38, Bart wrote:
..
My feeling is that it is these diverse requirements that require
user-supplied functions.
Functions, yes, (or what appear to be functions) though surely they
should be part of a library that comes with the language.
How much of a library you provide depends on the goals of the
language. You don't have to provide /everything/ !
That's good to hear. :)
While I am trying to make it easy to invoke functions written by other
people ISTM that it's also right for the language to have associated
with it a load of standard provisions - i18n support, various data structures, display support, maths libraries, etc for one simple reason:
code maintenance; it's easier to maintain software which uses library
calls one already knows than to have to learn yet another set of i18n
calls, for example.
..
C's const propagation sounds like Java with its horrible, and sticky,
exception propagation.
Getting "const" right is something to think long and hard about. When
do you mean "constant", when do you mean "read-only", when do you mean
"I promise this data will never change", "I will assume this data will
never change", "I promise /I/ won't change this data via this
reference", "This data will be unchanged logically but may change in
underlying representation, such as using a cache of some sort", etc. ?
That sounds really interesting and I'd like to get in to it but this is
not the thread. If you wanted to start a new thread on the topic I would reply. Suffice to say here that I don't use "const" but do have "ro" and
"rw" as usable in various contexts which effect many of the things you mention but I don't know if I have covered everything a programmer might need.
Constness is a hugely powerful concept, and something you definitely
want in a language. Modern language design fashion is to making
things constant be default and require explicit indication that they
can change. Some programming languages (pure functional programming
languages, for example) have /only/ constant data - there is no such
thing as variables.
I haven't gone that far but, for example, I have globals as, by default,
read only and while parameters are read-write a function would have to
keep the originals around if there's a chance they would be needed.
As I say, though, such things need a thread of their own so I'll resist
the urge to say more.
On 05/11/2022 17:01, David Brown wrote:
On 04/11/2022 23:28, Bart wrote:
..
I actually had such a restriction for a while: char*5 wasn't allowed,
but char+1 was. After all why on earth shouldn't you want the next
character in that alphabet? Why should code like this be made illegal:
a := a * 10 + (c - '0')
Why not :
char a; // Use whatever syntax you prefer
int i; // and whatever type names you prefer
a = digit(i);
The function "digit" might be defined :
char digit(int i) {
return char(i + ord('0'));
}
Wouldn't char and ord need a locale?
That may be the wrong term but by locale I mean a bundled set of rules (including, in this case, what the digits are and how many there are of
them) which apply to the language and region the program is executing for.
Maybe it is the right term. I see on Wikipedia: "In computing, a locale
is a set of parameters that defines the user's language, region and any special variant preferences that the user wants to see in their user interface."
https://en.wikipedia.org/wiki/Locale_(computer_software)
You want to find the next letter after "x"? "char(ord(x) + 1)". Or
perhaps, like Pascal and Ada, "succ(x)".
pred and succ are great, and I was thinking to start a thread on how
they might be used for different data types. But I have to point out
that they are not enough on their own. If a user wanted the character
ten away from the current one then he wouldn't want to code ten succ operations.
On 06/11/2022 10:17, James Harris wrote:
On 05/11/2022 17:08, David Brown wrote:
How much of a library you provide depends on the goals of the
language. You don't have to provide /everything/ !
That's good to hear. :)
While I am trying to make it easy to invoke functions written by other
people ISTM that it's also right for the language to have associated
with it a load of standard provisions - i18n support, various data
structures, display support, maths libraries, etc for one simple
reason: code maintenance; it's easier to maintain software which uses
library calls one already knows than to have to learn yet another set
of i18n calls, for example.
Sure. But pick one i18n library, write the FFI wrapper, and call that
your standard library. Then users don't have to deal with third-party
code and libraries, and you don't have to learn the intricacies of how
to support multiple languages properly (you don't have enough lifetimes
to learn enough to write it yourself). Everyone wins!
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (0 / 16) |
Uptime: | 169:21:07 |
Calls: | 10,385 |
Calls today: | 2 |
Files: | 14,057 |
Messages: | 6,416,551 |