It's been a while since we had a discussion. I hope you guys are all well.
Am I right that a programmer might want to distinguish between multiple potential meanings of EOF, as below? How should a programming language
or library support such different indications?
1. In the simplest case (that which is most familiar) one has a file of
8-bit bytes that nothing else has open for writing. If we are reading
that file and there are no bytes left then we get an EOF indication.
That's fine.
2. But what if we have a file or a stream that something else is
potentially still writing to? The reader finding that there are no bytes
left to read is just being told that there are no bytes /now/ but there
may be more soon. (To an extent, that's true of the simple case, too,
where some other task could append to a file.)
3. There's also record-based IO where the file ends with a record which
is so-far incomplete. Something may still be writing that last record
(but doing so without full-record writes).
You may think that strict record-based IO is unusual - and it is in many contexts - but it also applies with byte-by-byte IO where, for example,
one program wants to read a line at a time and yet that file is being
written to character-by-character. If the last line of the file doesn't
/yet/ have a terminating newline does that mean that it's still being
written or that the writer has closed the file without adding the newline?
4. There's a further case of a composite or multiplexed stream which
consists of multiple logical streams but I'll not go into that in this
post in order to try to keep the post short.
So on a read, if there are /at that moment/ no more bytes left to read
what indication(s) should be returned to the program?
IOW, what facilities should a programming language or library provide to
a programmer to help him to handle different cases of EOF?
On 2022-08-31 10:40, James Harris wrote:
It's been a while since we had a discussion. I hope you guys are all
well.
Am I right that a programmer might want to distinguish between
multiple potential meanings of EOF, as below? How should a programming
language or library support such different indications?
1. In the simplest case (that which is most familiar) one has a file
of 8-bit bytes that nothing else has open for writing. If we are
reading that file and there are no bytes left then we get an EOF
indication. That's fine.
2. But what if we have a file or a stream that something else is
potentially still writing to? The reader finding that there are no
bytes left to read is just being told that there are no bytes /now/
but there may be more soon. (To an extent, that's true of the simple
case, too, where some other task could append to a file.)
3. There's also record-based IO where the file ends with a record
which is so-far incomplete. Something may still be writing that last
record (but doing so without full-record writes).
You may think that strict record-based IO is unusual - and it is in
many contexts - but it also applies with byte-by-byte IO where, for
example, one program wants to read a line at a time and yet that file
is being written to character-by-character. If the last line of the
file doesn't /yet/ have a terminating newline does that mean that it's
still being written or that the writer has closed the file without
adding the newline?
4. There's a further case of a composite or multiplexed stream which
consists of multiple logical streams but I'll not go into that in this
post in order to try to keep the post short.
So on a read, if there are /at that moment/ no more bytes left to read
what indication(s) should be returned to the program?
There is only one case of EOF, namely the end of file (:-))
Other scenarios you described are not.
#2. This is called potential blocking. Blocking /= end of file, as
simple as that. The potential blocking scenario is treated depending on
the I/O mode. Blocking I/O occasionally blocks, non-blocking (also
called immediate) I/O faults with "more data" error. See terminal
interfaces, overlapped I/O under Windows etc.
#3. This is either encoding/buffering or concurrency. It is unclear
which one you meant but either has nothing to do with end of file.
Encoding is transparent to the higher I/O level, so however you encode
the file end it does not matter. Even the QLC SSDs have all files ended
(:-))
Concurrency in I/O just means that you have a race condition while determining the end of file status. Normally, when undesired, this sort
of stuff is prevented using transactions. Again, that is transparent and
thus does not really matter. The reader starts one transaction, the
writer another, so whatever the writer does the reader sees the old end
of file and consistent stream of. For pipes etc see #2.
#4. However you combine non-ends they remain unended... (:-))
IOW, what facilities should a programming language or library provide
to a programmer to help him to handle different cases of EOF?
Do not let the programmer test for EOF. It is either greatly inefficient
or impossible and non-portable. So just do not provide the test. Force exceptions for exceptional cases:
begin
loop
Foo := Read (File);
... -- Do something useful
end loop;
exception
when End_Error =>
null; -- We are done
when Timeout_Error =>
... -- Hey, the peer is too slow
when Data_Error =>
... -- Opps, the file is corrupted
end;
On 31/08/2022 10:36, Dmitry A. Kazakov wrote:
There is only one case of EOF, namely the end of file (:-))
So you would prefer EOF to mean "there are zero more bytes available to
read at the moment"?
Other scenarios you described are not.
#2. This is called potential blocking. Blocking /= end of file, as
simple as that. The potential blocking scenario is treated depending
on the I/O mode. Blocking I/O occasionally blocks, non-blocking (also
called immediate) I/O faults with "more data" error. See terminal
interfaces, overlapped I/O under Windows etc.
Have to say that "blocking" is possibly a bad name as it has another
meaning of "assembling into blocks". Maybe "waiting" would be better
meaning that the caller doesn't mind waiting for data.
#3. This is either encoding/buffering or concurrency. It is unclear
which one you meant but either has nothing to do with end of file.
To make up an example, say a process is reading a Unix-form text file expecting whole lines but the last line in the file doesn't have a terminating newline character. Its options for each 'line read' are
1) Don't wait.
2) Wait forever.
3) Wait a limited amount of time.
I am not trying to be awkward, by the way. :) Just to think about
situations a programmer might want to distinguish between.
#4. However you combine non-ends they remain unended... (:-))
I don't understand that. [:(] Is it important?
IOW, what facilities should a programming language or library provide
to a programmer to help him to handle different cases of EOF?
Do not let the programmer test for EOF. It is either greatly
inefficient or impossible and non-portable. So just do not provide the
test. Force exceptions for exceptional cases:
begin
loop
Foo := Read (File);
... -- Do something useful
end loop;
exception
when End_Error =>
null; -- We are done
when Timeout_Error =>
... -- Hey, the peer is too slow
when Data_Error =>
... -- Opps, the file is corrupted
end;
That's good. My language has exceptions and they are the mechanism I
planned to use for EOF and the other cases. I just wasn't sure which conditions (including exceptions) a programmer might want to test for.
In your code I take it that there being zero bytes to read would lead to End_Error.
On 2022-09-01 15:57, James Harris wrote:
On 31/08/2022 10:36, Dmitry A. Kazakov wrote:
There is only one case of EOF, namely the end of file (:-))
So you would prefer EOF to mean "there are zero more bytes available
to read at the moment"?
No. EOF means the file/container ends here, like 'Z' is the last letter
of the alphabet.
To make up an example, say a process is reading a Unix-form text file
expecting whole lines but the last line in the file doesn't have a
terminating newline character. Its options for each 'line read' are
1) Don't wait.
2) Wait forever.
3) Wait a limited amount of time.
If EOF™ has been reached neither above happens. Instead you get:
0) Data_Error exception: file is corrupt, no end line delimiter found
before the file end.
You are trying to break an encoding abstraction here: line is a sequence
of characters ending with LF. Arguable a poor one, but must of UNIX
stuff is... (:-))
[...]
I am not trying to be awkward, by the way. :) Just to think about
situations a programmer might want to distinguish between.
He need not, because in a properly designed software he would call
"gets" or "Get_Line" and let the library take care of.
Do not let the programmer test for EOF. It is either greatly
inefficient or impossible and non-portable. So just do not provide
the test. Force exceptions for exceptional cases:
begin
loop
Foo := Read (File);
... -- Do something useful
end loop;
exception
when End_Error =>
null; -- We are done
when Timeout_Error =>
... -- Hey, the peer is too slow
when Data_Error =>
... -- Opps, the file is corrupted
end;
That's good. My language has exceptions and they are the mechanism I
planned to use for EOF and the other cases. I just wasn't sure which
conditions (including exceptions) a programmer might want to test for.
In your code I take it that there being zero bytes to read would lead
to End_Error.
AFAIK, that is a convention deployed in TCP sockets only. If you read
empty payload from TCP socket that is an indication that the connection
was gracefully closed by the peer, an equivalent of EOF.
On 01/09/2022 16:12, Dmitry A. Kazakov wrote:
On 2022-09-01 15:57, James Harris wrote:
On 31/08/2022 10:36, Dmitry A. Kazakov wrote:
There is only one case of EOF, namely the end of file (:-))
So you would prefer EOF to mean "there are zero more bytes available
to read at the moment"?
No. EOF means the file/container ends here, like 'Z' is the last
letter of the alphabet.
How is "the file ends here" different from there being nothing left to
read? If the file ends here (your definition) then there's nothing left, surely.
There's a further case, too, as follows.
Imagine that the offset of the next byte to read is the same as the
file's size. Take that as the standard condition for a read to return EOF.
What if the offset is set to some byte /after/ where the file ends, e.g.
the file length is 50 and the offset is 55.
As a programmer, would you want to get EOF in that case, too, or would
you want some separate exception such as 'read past EOF'?
You are trying to break an encoding abstraction here: line is a
sequence of characters ending with LF. Arguable a poor one, but must
of UNIX stuff is... (:-))
[...]
I am not trying to be awkward, by the way. :) Just to think about
situations a programmer might want to distinguish between.
He need not, because in a properly designed software he would call
"gets" or "Get_Line" and let the library take care of.
Well, gets or Get_Line will still have the same issues to deal with. In
fact, the programmer might want to be able to tell such a function how
it should respond to different conditions: throw exception, return a
partial result, return nothing, etc.
Do not let the programmer test for EOF. It is either greatly
inefficient or impossible and non-portable. So just do not provide
the test. Force exceptions for exceptional cases:
begin
loop
Foo := Read (File);
... -- Do something useful
end loop;
exception
when End_Error =>
null; -- We are done
when Timeout_Error =>
... -- Hey, the peer is too slow
when Data_Error =>
... -- Opps, the file is corrupted
end;
That's good. My language has exceptions and they are the mechanism I
planned to use for EOF and the other cases. I just wasn't sure which
conditions (including exceptions) a programmer might want to test for.
In your code I take it that there being zero bytes to read would lead
to End_Error.
AFAIK, that is a convention deployed in TCP sockets only. If you read
empty payload from TCP socket that is an indication that the
connection was gracefully closed by the peer, an equivalent of EOF.
At the point of the call
Foo := Read (File);
let's say the file has ended. What would you expect to happen? I thought
that was where your End_Error exception would be thrown.
It's been a while since we had a discussion. I hope you guys are all well.
Am I right that a programmer might want to distinguish between multiple potential meanings of EOF, as below? How should a programming language
or library support such different indications?
On 2022-09-03 14:12, James Harris wrote:
On 01/09/2022 16:12, Dmitry A. Kazakov wrote:
On 2022-09-01 15:57, James Harris wrote:
On 31/08/2022 10:36, Dmitry A. Kazakov wrote:
There is only one case of EOF, namely the end of file (:-))
So you would prefer EOF to mean "there are zero more bytes available
to read at the moment"?
No. EOF means the file/container ends here, like 'Z' is the last
letter of the alphabet.
How is "the file ends here" different from there being nothing left to
read? If the file ends here (your definition) then there's nothing
left, surely.
Reading is an operation, EOF is a state. The semantics of read depends
on the state.
There's a further case, too, as follows.
Imagine that the offset of the next byte to read is the same as the
file's size. Take that as the standard condition for a read to return
EOF.
Unnecessary assumptions. There could be no bytes and the file newer constructed as a whole as in the case with pipes.
What if the offset is set to some byte /after/ where the file ends,
e.g. the file length is 50 and the offset is 55.
Then you have wrong offset provided offset exist, since that depends on
the type of file, e.g. a random access file.
As a programmer, would you want to get EOF in that case, too, or would
you want some separate exception such as 'read past EOF'?
You cannot read past EOF per definition of.
Whether reading past the
file end causes an exception or like in the case of the C library
returns a special value is up to the designer of the API.
On 03/09/2022 15:47, Dmitry A. Kazakov wrote:
Unnecessary assumptions. There could be no bytes and the file newer
constructed as a whole as in the case with pipes.
Some streams of data (e.g. TCP and pipes) do have out-of-band indication
of the difference between "there is nothing more to read now" and "the
stream has ended; nothing more will or can be added".
But plain files do not. Which comes back to the question of what are the
best indications to return to a program.
You cannot read past EOF per definition of.
OK, "read requested when file pointer is past EOF", if you prefer.
Basically, AISI an ostensibly simple 'read' call could get a reply of
any of the above responses and maybe others. The problem is to work out
how such info could be returned so as to make a programmer's life as
easy as possible, especially bearing in mind that his program may run in different environments and on different types of stream.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 38:15:15 |
Calls: | 10,392 |
Files: | 14,064 |
Messages: | 6,417,169 |