• End of File (EOF) - multiple meanings

    From James Harris@21:1/5 to All on Wed Aug 31 09:40:34 2022
    It's been a while since we had a discussion. I hope you guys are all well.

    Am I right that a programmer might want to distinguish between multiple potential meanings of EOF, as below? How should a programming language
    or library support such different indications?

    1. In the simplest case (that which is most familiar) one has a file of
    8-bit bytes that nothing else has open for writing. If we are reading
    that file and there are no bytes left then we get an EOF indication.
    That's fine.

    2. But what if we have a file or a stream that something else is
    potentially still writing to? The reader finding that there are no bytes
    left to read is just being told that there are no bytes /now/ but there
    may be more soon. (To an extent, that's true of the simple case, too,
    where some other task could append to a file.)

    3. There's also record-based IO where the file ends with a record which
    is so-far incomplete. Something may still be writing that last record
    (but doing so without full-record writes).

    You may think that strict record-based IO is unusual - and it is in many contexts - but it also applies with byte-by-byte IO where, for example,
    one program wants to read a line at a time and yet that file is being
    written to character-by-character. If the last line of the file doesn't
    /yet/ have a terminating newline does that mean that it's still being
    written or that the writer has closed the file without adding the newline?

    4. There's a further case of a composite or multiplexed stream which
    consists of multiple logical streams but I'll not go into that in this
    post in order to try to keep the post short.

    So on a read, if there are /at that moment/ no more bytes left to read
    what indication(s) should be returned to the program?

    IOW, what facilities should a programming language or library provide to
    a programmer to help him to handle different cases of EOF?


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Wed Aug 31 11:36:04 2022
    On 2022-08-31 10:40, James Harris wrote:
    It's been a while since we had a discussion. I hope you guys are all well.

    Am I right that a programmer might want to distinguish between multiple potential meanings of EOF, as below? How should a programming language
    or library support such different indications?

    1. In the simplest case (that which is most familiar) one has a file of
    8-bit bytes that nothing else has open for writing. If we are reading
    that file and there are no bytes left then we get an EOF indication.
    That's fine.

    2. But what if we have a file or a stream that something else is
    potentially still writing to? The reader finding that there are no bytes
    left to read is just being told that there are no bytes /now/ but there
    may be more soon. (To an extent, that's true of the simple case, too,
    where some other task could append to a file.)

    3. There's also record-based IO where the file ends with a record which
    is so-far incomplete. Something may still be writing that last record
    (but doing so without full-record writes).

    You may think that strict record-based IO is unusual - and it is in many contexts - but it also applies with byte-by-byte IO where, for example,
    one program wants to read a line at a time and yet that file is being
    written to character-by-character. If the last line of the file doesn't
    /yet/ have a terminating newline does that mean that it's still being
    written or that the writer has closed the file without adding the newline?

    4. There's a further case of a composite or multiplexed stream which
    consists of multiple logical streams but I'll not go into that in this
    post in order to try to keep the post short.

    So on a read, if there are /at that moment/ no more bytes left to read
    what indication(s) should be returned to the program?

    There is only one case of EOF, namely the end of file (:-))

    Other scenarios you described are not.

    #2. This is called potential blocking. Blocking /= end of file, as
    simple as that. The potential blocking scenario is treated depending on
    the I/O mode. Blocking I/O occasionally blocks, non-blocking (also
    called immediate) I/O faults with "more data" error. See terminal
    interfaces, overlapped I/O under Windows etc.

    #3. This is either encoding/buffering or concurrency. It is unclear
    which one you meant but either has nothing to do with end of file.

    Encoding is transparent to the higher I/O level, so however you encode
    the file end it does not matter. Even the QLC SSDs have all files ended
    (:-))

    Concurrency in I/O just means that you have a race condition while
    determining the end of file status. Normally, when undesired, this sort
    of stuff is prevented using transactions. Again, that is transparent and
    thus does not really matter. The reader starts one transaction, the
    writer another, so whatever the writer does the reader sees the old end
    of file and consistent stream of. For pipes etc see #2.

    #4. However you combine non-ends they remain unended... (:-))

    IOW, what facilities should a programming language or library provide to
    a programmer to help him to handle different cases of EOF?

    Do not let the programmer test for EOF. It is either greatly inefficient
    or impossible and non-portable. So just do not provide the test. Force exceptions for exceptional cases:

    begin
    loop
    Foo := Read (File);
    ... -- Do something useful
    end loop;
    exception
    when End_Error =>
    null; -- We are done
    when Timeout_Error =>
    ... -- Hey, the peer is too slow
    when Data_Error =>
    ... -- Opps, the file is corrupted
    end;

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Thu Sep 1 14:57:08 2022
    On 31/08/2022 10:36, Dmitry A. Kazakov wrote:
    On 2022-08-31 10:40, James Harris wrote:
    It's been a while since we had a discussion. I hope you guys are all
    well.

    Am I right that a programmer might want to distinguish between
    multiple potential meanings of EOF, as below? How should a programming
    language or library support such different indications?

    1. In the simplest case (that which is most familiar) one has a file
    of 8-bit bytes that nothing else has open for writing. If we are
    reading that file and there are no bytes left then we get an EOF
    indication. That's fine.

    2. But what if we have a file or a stream that something else is
    potentially still writing to? The reader finding that there are no
    bytes left to read is just being told that there are no bytes /now/
    but there may be more soon. (To an extent, that's true of the simple
    case, too, where some other task could append to a file.)

    3. There's also record-based IO where the file ends with a record
    which is so-far incomplete. Something may still be writing that last
    record (but doing so without full-record writes).

    You may think that strict record-based IO is unusual - and it is in
    many contexts - but it also applies with byte-by-byte IO where, for
    example, one program wants to read a line at a time and yet that file
    is being written to character-by-character. If the last line of the
    file doesn't /yet/ have a terminating newline does that mean that it's
    still being written or that the writer has closed the file without
    adding the newline?

    4. There's a further case of a composite or multiplexed stream which
    consists of multiple logical streams but I'll not go into that in this
    post in order to try to keep the post short.

    So on a read, if there are /at that moment/ no more bytes left to read
    what indication(s) should be returned to the program?

    There is only one case of EOF, namely the end of file (:-))

    So you would prefer EOF to mean "there are zero more bytes available to
    read at the moment"?


    Other scenarios you described are not.

    #2. This is called potential blocking. Blocking /= end of file, as
    simple as that. The potential blocking scenario is treated depending on
    the I/O mode. Blocking I/O occasionally blocks, non-blocking (also
    called immediate) I/O faults with "more data" error. See terminal
    interfaces, overlapped I/O under Windows etc.

    Have to say that "blocking" is possibly a bad name as it has another
    meaning of "assembling into blocks". Maybe "waiting" would be better
    meaning that the caller doesn't mind waiting for data.


    #3. This is either encoding/buffering or concurrency. It is unclear
    which one you meant but either has nothing to do with end of file.

    To make up an example, say a process is reading a Unix-form text file
    expecting whole lines but the last line in the file doesn't have a
    terminating newline character. Its options for each 'line read' are

    1) Don't wait.
    2) Wait forever.
    3) Wait a limited amount of time.

    Each has its disadvantages. Under option 1 a program may conclude that
    it has reached EOF (and the last line in the file doesn't have a
    terminating newline) even when another program is still writing to the file.

    Imagine a file which had been created years ago but the last line of
    which omitted the trailing newline which it should have. Under option 2
    a program reading it would wait forever and never see EOF.

    Option 3 could have the disadvantages of the other two but in addition
    it could ignore an unterminated trailing line even though a user looking
    at the file would see text which he expected to be processed.

    I am not trying to be awkward, by the way. :) Just to think about
    situations a programmer might want to distinguish between.


    Encoding is transparent to the higher I/O level, so however you encode
    the file end it does not matter. Even the QLC SSDs have all files ended
    (:-))

    Concurrency in I/O just means that you have a race condition while determining the end of file status. Normally, when undesired, this sort
    of stuff is prevented using transactions. Again, that is transparent and
    thus does not really matter. The reader starts one transaction, the
    writer another, so whatever the writer does the reader sees the old end
    of file and consistent stream of. For pipes etc see #2.

    Yes, a higher-level concept can be imposed on a simple stream.


    #4. However you combine non-ends they remain unended... (:-))

    I don't understand that. [:(] Is it important?


    IOW, what facilities should a programming language or library provide
    to a programmer to help him to handle different cases of EOF?

    Do not let the programmer test for EOF. It is either greatly inefficient
    or impossible and non-portable. So just do not provide the test. Force exceptions for exceptional cases:

       begin
          loop
             Foo := Read (File);
             ... -- Do something useful
          end loop;
       exception
          when End_Error =>
             null; -- We are done
          when Timeout_Error =>
             ... -- Hey, the peer is too slow
          when Data_Error =>
             ... -- Opps, the file is corrupted
       end;


    That's good. My language has exceptions and they are the mechanism I
    planned to use for EOF and the other cases. I just wasn't sure which
    conditions (including exceptions) a programmer might want to test for.

    In your code I take it that there being zero bytes to read would lead to End_Error.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Thu Sep 1 17:12:11 2022
    On 2022-09-01 15:57, James Harris wrote:
    On 31/08/2022 10:36, Dmitry A. Kazakov wrote:

    There is only one case of EOF, namely the end of file (:-))

    So you would prefer EOF to mean "there are zero more bytes available to
    read at the moment"?

    No. EOF means the file/container ends here, like 'Z' is the last letter
    of the alphabet.

    Other scenarios you described are not.

    #2. This is called potential blocking. Blocking /= end of file, as
    simple as that. The potential blocking scenario is treated depending
    on the I/O mode. Blocking I/O occasionally blocks, non-blocking (also
    called immediate) I/O faults with "more data" error. See terminal
    interfaces, overlapped I/O under Windows etc.

    Have to say that "blocking" is possibly a bad name as it has another
    meaning of "assembling into blocks". Maybe "waiting" would be better
    meaning that the caller doesn't mind waiting for data.

    No, blocking vs. non-blocking I/O is kind of official term. The thing
    you meant is called just "block," e.g. "block device" under Linux,
    "block I/O" (in blocks).

    #3. This is either encoding/buffering or concurrency. It is unclear
    which one you meant but either has nothing to do with end of file.

    To make up an example, say a process is reading a Unix-form text file expecting whole lines but the last line in the file doesn't have a terminating newline character. Its options for each 'line read' are

    1) Don't wait.
    2) Wait forever.
    3) Wait a limited amount of time.

    If EOF™ has been reached neither above happens. Instead you get:

    0) Data_Error exception: file is corrupt, no end line delimiter found
    before the file end.

    You are trying to break an encoding abstraction here: line is a sequence
    of characters ending with LF. Arguable a poor one, but must of UNIX
    stuff is... (:-))

    [...]

    I am not trying to be awkward, by the way. :) Just to think about
    situations a programmer might want to distinguish between.

    He need not, because in a properly designed software he would call
    "gets" or "Get_Line" and let the library take care of.

    #4. However you combine non-ends they remain unended... (:-))

    I don't understand that. [:(] Is it important?

    Yes, abstractions are important, just in order to keep things
    consistent. If you break abstraction you might get into a situation
    where no answer exist.

    IOW, what facilities should a programming language or library provide
    to a programmer to help him to handle different cases of EOF?

    As I said. There is just one case and it is handled pretty much
    consistently across different OSes. E.g.

    https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-setendoffile

    Do not let the programmer test for EOF. It is either greatly
    inefficient or impossible and non-portable. So just do not provide the
    test. Force exceptions for exceptional cases:

        begin
           loop
              Foo := Read (File);
              ... -- Do something useful
           end loop;
        exception
           when End_Error =>
              null; -- We are done
           when Timeout_Error =>
              ... -- Hey, the peer is too slow
           when Data_Error =>
              ... -- Opps, the file is corrupted
        end;


    That's good. My language has exceptions and they are the mechanism I
    planned to use for EOF and the other cases. I just wasn't sure which conditions (including exceptions) a programmer might want to test for.

    In your code I take it that there being zero bytes to read would lead to End_Error.

    AFAIK, that is a convention deployed in TCP sockets only. If you read
    empty payload from TCP socket that is an indication that the connection
    was gracefully closed by the peer, an equivalent of EOF.

    This is a quirk of the socket library. It could use error states to differentiate closed socket and allow zero payload. Zero payload is used
    in many protocols, e.g. for pinging, keeping connection alive, time synchronization etc. So the choice is rather unfortunate. However, on
    the other hand non-zero stuff is very helpful for parsers and messaging.
    That you never stay in same place is a guaranty against live-locks.

    In serial communication the same idea led to introduction of EOT
    character in ASCII. So you could read EOT instead of nothing when there
    is nothing to read but you still wanted to... (:-)) Of course, in
    practice nobody ever uses EOT. If I correctly remember, Microsoft used
    EOT at the end of text files in MS-DOS?

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Sat Sep 3 13:12:48 2022
    On 01/09/2022 16:12, Dmitry A. Kazakov wrote:
    On 2022-09-01 15:57, James Harris wrote:
    On 31/08/2022 10:36, Dmitry A. Kazakov wrote:

    There is only one case of EOF, namely the end of file (:-))

    So you would prefer EOF to mean "there are zero more bytes available
    to read at the moment"?

    No. EOF means the file/container ends here, like 'Z' is the last letter
    of the alphabet.

    How is "the file ends here" different from there being nothing left to
    read? If the file ends here (your definition) then there's nothing left, surely.

    There's a further case, too, as follows.

    Imagine that the offset of the next byte to read is the same as the
    file's size. Take that as the standard condition for a read to return EOF.

    What if the offset is set to some byte /after/ where the file ends, e.g.
    the file length is 50 and the offset is 55.

    As a programmer, would you want to get EOF in that case, too, or would
    you want some separate exception such as 'read past EOF'?


    ...


    To make up an example, say a process is reading a Unix-form text file
    expecting whole lines but the last line in the file doesn't have a
    terminating newline character. Its options for each 'line read' are

    1) Don't wait.
    2) Wait forever.
    3) Wait a limited amount of time.

    If EOF™ has been reached neither above happens. Instead you get:

    0) Data_Error exception: file is corrupt, no end line delimiter found
    before the file end.

    I like that. Throwing an exception is a good way to address the problem
    of an incomplete last line. Then the caller can choose how to respond.


    You are trying to break an encoding abstraction here: line is a sequence
    of characters ending with LF. Arguable a poor one, but must of UNIX
    stuff is... (:-))

    [...]

    I am not trying to be awkward, by the way. :) Just to think about
    situations a programmer might want to distinguish between.

    He need not, because in a properly designed software he would call
    "gets" or "Get_Line" and let the library take care of.

    Well, gets or Get_Line will still have the same issues to deal with. In
    fact, the programmer might want to be able to tell such a function how
    it should respond to different conditions: throw exception, return a
    partial result, return nothing, etc.

    ...


    Do not let the programmer test for EOF. It is either greatly
    inefficient or impossible and non-portable. So just do not provide
    the test. Force exceptions for exceptional cases:

        begin
           loop
              Foo := Read (File);
              ... -- Do something useful
           end loop;
        exception
           when End_Error =>
              null; -- We are done
           when Timeout_Error =>
              ... -- Hey, the peer is too slow
           when Data_Error =>
              ... -- Opps, the file is corrupted
        end;


    That's good. My language has exceptions and they are the mechanism I
    planned to use for EOF and the other cases. I just wasn't sure which
    conditions (including exceptions) a programmer might want to test for.

    In your code I take it that there being zero bytes to read would lead
    to End_Error.

    AFAIK, that is a convention deployed in TCP sockets only. If you read
    empty payload from TCP socket that is an indication that the connection
    was gracefully closed by the peer, an equivalent of EOF.

    At the point of the call

    Foo := Read (File);

    let's say the file has ended. What would you expect to happen? I thought
    that was where your End_Error exception would be thrown.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Sat Sep 3 16:47:36 2022
    On 2022-09-03 14:12, James Harris wrote:
    On 01/09/2022 16:12, Dmitry A. Kazakov wrote:
    On 2022-09-01 15:57, James Harris wrote:
    On 31/08/2022 10:36, Dmitry A. Kazakov wrote:

    There is only one case of EOF, namely the end of file (:-))

    So you would prefer EOF to mean "there are zero more bytes available
    to read at the moment"?

    No. EOF means the file/container ends here, like 'Z' is the last
    letter of the alphabet.

    How is "the file ends here" different from there being nothing left to
    read? If the file ends here (your definition) then there's nothing left, surely.

    Reading is an operation, EOF is a state. The semantics of read depends
    on the state.

    There's a further case, too, as follows.

    Imagine that the offset of the next byte to read is the same as the
    file's size. Take that as the standard condition for a read to return EOF.

    Unnecessary assumptions. There could be no bytes and the file newer
    constructed as a whole as in the case with pipes.

    What if the offset is set to some byte /after/ where the file ends, e.g.
    the file length is 50 and the offset is 55.

    Then you have wrong offset provided offset exist, since that depends on
    the type of file, e.g. a random access file.

    As a programmer, would you want to get EOF in that case, too, or would
    you want some separate exception such as 'read past EOF'?

    You cannot read past EOF per definition of. Whether reading past the
    file end causes an exception or like in the case of the C library
    returns a special value is up to the designer of the API.

    You are trying to break an encoding abstraction here: line is a
    sequence of characters ending with LF. Arguable a poor one, but must
    of UNIX stuff is... (:-))

    [...]

    I am not trying to be awkward, by the way. :) Just to think about
    situations a programmer might want to distinguish between.

    He need not, because in a properly designed software he would call
    "gets" or "Get_Line" and let the library take care of.

    Well, gets or Get_Line will still have the same issues to deal with. In
    fact, the programmer might want to be able to tell such a function how
    it should respond to different conditions: throw exception, return a
    partial result, return nothing, etc.

    If you are not satisfied with the abstraction use a different one.

    Note that before the Dark Age, advanced file systems supported
    lines/records physically. E.g. in VMS the line would be a varying
    record, no delimiters. If you make an abstraction transparent you limit possible implementations of.

    Do not let the programmer test for EOF. It is either greatly
    inefficient or impossible and non-portable. So just do not provide
    the test. Force exceptions for exceptional cases:

        begin
           loop
              Foo := Read (File);
              ... -- Do something useful
           end loop;
        exception
           when End_Error =>
              null; -- We are done
           when Timeout_Error =>
              ... -- Hey, the peer is too slow
           when Data_Error =>
              ... -- Opps, the file is corrupted
        end;


    That's good. My language has exceptions and they are the mechanism I
    planned to use for EOF and the other cases. I just wasn't sure which
    conditions (including exceptions) a programmer might want to test for.

    In your code I take it that there being zero bytes to read would lead
    to End_Error.

    AFAIK, that is a convention deployed in TCP sockets only. If you read
    empty payload from TCP socket that is an indication that the
    connection was gracefully closed by the peer, an equivalent of EOF.

    At the point of the call

      Foo := Read (File);

    let's say the file has ended. What would you expect to happen? I thought
    that was where your End_Error exception would be thrown.

    If the case of sockets? They are not proper files. So you need to
    reformulate the question. E.g. let you wanted to add a stream interface
    to a socket? The answer is that you could do that, but usually such
    streams are useless in the sense that most network protocols are packet-oriented. Thus you read packets rather than a raw stream of
    octets, and you always know how much there is to read. If the peer
    closes the connection in the middle of a packet, you have a Data_Error
    rather than End_Error. Furthermore production code tends to use socket
    select rather than blocking reads:

    https://man7.org/linux/man-pages/man2/select.2.html

    It is a totally inverse abstraction. You get a socket signaled when
    there is something to read and then take the buffered stuff from the
    socket. So if Foo is a composite object you could not just read as:

    Foo := Read (Socket);

    because you do not know if the encoded instance of the object is all
    there. It is like the situation with lines. The line might be incomplete
    and there is nothing to read yet. You cannot recover from that unless
    you block but you are not allowed to block. The abstraction (stream of
    octets) leaks. This is why socket streams have very limited use in
    practice, namely only for quick and dirty implementations deploying
    blocking I/O.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to James Harris on Sun Sep 11 18:58:13 2022
    On Wednesday, August 31, 2022 at 3:40:37 AM UTC-5, James Harris wrote:
    It's been a while since we had a discussion. I hope you guys are all well.

    Am I right that a programmer might want to distinguish between multiple potential meanings of EOF, as below? How should a programming language
    or library support such different indications?


    Ignoring all of your actual questions, I've been using a few different means of indicating an EOF condition in my parser combinator library (which doubles as the compiler's data structure library or the interpreter runtime -- depending on
    what application is built from the parsers).

    An EOF condition that is produced while reading the bytes of a file, where the file is represented as a lazy list of its bytes, is represented by the <symbol> EOF.
    Although this symbol can also be chopped off in which case the end of file is indicated by a NIL instead of another cons node. While the symbol EOF is *backed*
    by its code value of -1, it has a distinguished type from the bytes of the file which
    will be <integer> typed objects. A file stream might even contain a 32bit -1 encoded
    in extended UTF8 if the file is fed through the UTF8 decoder, but the <integer> -1 and
    the <symbol> -1 (whose print name is "EOF") are different things.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 16 10:39:21 2022
    On 03/09/2022 15:47, Dmitry A. Kazakov wrote:
    On 2022-09-03 14:12, James Harris wrote:
    On 01/09/2022 16:12, Dmitry A. Kazakov wrote:
    On 2022-09-01 15:57, James Harris wrote:
    On 31/08/2022 10:36, Dmitry A. Kazakov wrote:

    Going back to this thread as I had to make a choice for some code I
    wrote recently.


    There is only one case of EOF, namely the end of file (:-))

    So you would prefer EOF to mean "there are zero more bytes available
    to read at the moment"?

    No. EOF means the file/container ends here, like 'Z' is the last
    letter of the alphabet.

    How is "the file ends here" different from there being nothing left to
    read? If the file ends here (your definition) then there's nothing
    left, surely.

    Reading is an operation, EOF is a state. The semantics of read depends
    on the state.

    My best guess at what you are driving at is that to you EOF is a
    higher-level, logical state rather than a physical one. That's fine.
    Sometimes one needs to recognise that "/at the moment/ the file ends
    here" is different from "we are at the end of a complete file".

    For instance, a file may currently end at byte 49 but a nanosecond later something is going to write byte 50 and the file is not complete until
    byte 50 is also present.

    The problem is that the OS may well not know, so it could not tell a
    program anything other than "there are currently no more bytes to read".

    If the OS had some way to know that byte 50 was needed it could block
    the reader until byte 50 arrived or return an indication that more data
    was to follow. But that's not always possible.


    There's a further case, too, as follows.

    Imagine that the offset of the next byte to read is the same as the
    file's size. Take that as the standard condition for a read to return
    EOF.

    Unnecessary assumptions. There could be no bytes and the file newer constructed as a whole as in the case with pipes.

    Some streams of data (e.g. TCP and pipes) do have out-of-band indication
    of the difference between "there is nothing more to read now" and "the
    stream has ended; nothing more will or can be added".

    But plain files do not. Which comes back to the question of what are the
    best indications to return to a program.



    What if the offset is set to some byte /after/ where the file ends,
    e.g. the file length is 50 and the offset is 55.

    Then you have wrong offset provided offset exist, since that depends on
    the type of file, e.g. a random access file.

    As a programmer, would you want to get EOF in that case, too, or would
    you want some separate exception such as 'read past EOF'?

    You cannot read past EOF per definition of.

    OK, "read requested when file pointer is past EOF", if you prefer.


    Whether reading past the
    file end causes an exception or like in the case of the C library
    returns a special value is up to the designer of the API.

    Sure. I was just wondering what a programmer would find most convenient
    and useful to cover the different cases. There are two parts which must
    be brought together:

    * what the programmer would like to know
    * what the environment (the RTS or OS) may be able to say

    The programmer may want full information but the environment in which
    the program runs may not be able to give that much detail. Yet the same
    program will need to be able to run in different environments and on
    different types of stream.

    This doesn't sound easy to reconcile. For example, a program may want to
    know when the logical end of the data has been reached but the
    environment may only be able to say "there's nothing more just now" as
    with the case of files, above.

    So, trying to brainstorm what a program could be told when it tries to
    read from a stream of data:

    For blocking reads
    * Here's all the data you asked for,
    * Here's some but less than you asked for.
    * I have x amount of the data but it's less than you asked for so I am returning nothing.
    * I have nothing more for you to read just now.
    * I have nothing more for you to read just now but the stream is
    closable and is not closed so there may be more.
    * The stream is closable and is closed; there will be nothing more.
    * There was an unrecoverable input error.
    * There was an input error which may be correctable but could not be
    corrected before the timeout expired.
    Or the environment could block until enough data arrive or there's a
    timeout.

    For nonblocking reads the responses would probably be the same except,
    of course, for the potential for blocking. Put another way, nonblocking
    reads may be the same as blocking reads with a timeout of zero (?).

    Basically, AISI an ostensibly simple 'read' call could get a reply of
    any of the above responses and maybe others. The problem is to work out
    how such info could be returned so as to make a programmer's life as
    easy as possible, especially bearing in mind that his program may run in different environments and on different types of stream.

    Can anyone see where I am going with this - and save me a few steps!
    There must already be some standard model of reads that resolves these
    issues.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Wed Nov 16 13:02:40 2022
    On 2022-11-16 11:39, James Harris wrote:
    On 03/09/2022 15:47, Dmitry A. Kazakov wrote:

    Unnecessary assumptions. There could be no bytes and the file newer
    constructed as a whole as in the case with pipes.

    Some streams of data (e.g. TCP and pipes) do have out-of-band indication
    of the difference between "there is nothing more to read now" and "the
    stream has ended; nothing more will or can be added".

    No, in general case there is no way to tell, unless you close connection
    or close the pipe.

    But plain files do not. Which comes back to the question of what are the
    best indications to return to a program.

    Whatever way suitable in the language. E.g. if you have exceptions, then
    an exception. Note that streams are not files, but in an OO language you
    can set a file interface on the stream top and conversely. E.g. if you
    create a file interface to a stream, then the interface implementation
    is responsible to signal end of file if determinable from the stream
    state. For example, when stream is memory resident, e.g. backed by a
    string then end of the string = end of the file.

    You cannot read past EOF per definition of.

    OK, "read requested when file pointer is past EOF", if you prefer.

    File offsets can be a burden in some cases. E.g. null device, stock
    device, terminal device may have no pointers at all.

    Basically, AISI an ostensibly simple 'read' call could get a reply of
    any of the above responses and maybe others. The problem is to work out
    how such info could be returned so as to make a programmer's life as
    easy as possible, especially bearing in mind that his program may run in different environments and on different types of stream.

    You pass an array in and get the index of the last overwritten element.
    Errors raise exceptions:

    - Timeout
    - Data error
    - End of file, when requested data unavailable

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)