I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.
If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get flushed to the file until close is issued and the file closes cleanly.
2 methods to recreate - 1st run from interpreter directly:
f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
I have run in to this a few times and finally reproduced it. Whether
it is as expected I am not sure since it is slightly on the user, but
I can think of scenarios where this would be undesirable behavior..
This occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the >reasoning lingers somewhere else.
If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get >flushed to the file until close is issued and the file closes cleanly.
2 methods to recreate - 1st run from interpreter directly:
f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
you can cat the file and see it stops at 49626 until you issue an f.close()
cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.
Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose
a portion of what was assumed already complete...
If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get flushed to the file until close is issued and the file closes cleanly.
Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use flush() >followed by os.fsync() to ensure this behavior.
I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.
If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get flushed to the file until close is issued and the file closes cleanly.
2 methods to recreate - 1st run from interpreter directly:
f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
you can cat the file and see it stops at 49626 until you issue an f.close()
a script to recreate:
f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
while(1):
pass
cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.
Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose a portion of what was assumed already complete...
Cameron
Eryk
On 05Mar2023 10:38, aapost <aapost@idontexist.club> wrote:
Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use
flush() followed by os.fsync() to ensure this behavior.
Yes. You almost _never_ need or want this behaviour. A database tends to fsync at the end of a transaction and at other critical points.
However, once you've `flush()`ed the file the data are then in the hands
of the OS, to get to disc in a timely but efficient fashion. Calling
fsync(), like calling flush(), affects writing _efficiency_ by depriving
the OS (or for flush(), the Python I/O buffering system) the opportunity
to bundle further data efficiency. It will degrade the overall performance.
Also, fsync() need not expedite the data getting to disc. It is equally
valid that it just blocks your programme _until_ the data have gone to
disc. I practice it probably does expedite things slightly, but the real world effect is that your pogramme will gratuitously block anyway, when
it could just get on with its work, secure in the knowledge that the OS
has its back.
flush() is for causality - ensuring the data are on their way so that
some external party _will_ see them rather than waiting forever for data
with are lurking in the buffer. If that external party, for you, is an
end user tailing a log file, then you might want to flush(0 at the end
of every line. Note that there is a presupplied line-buffering mode you
can choose which will cause a file to flush like that for you
automatically.
So when you flush is a policy decision which you can make either during
the programme flow or to a less flexible degree when you open the file.
As an example of choosing-to-flush, here's a little bit of code in a
module I use for writing packet data to a stream (eg a TCP connection): https://github.com/cameron-simpson/css/blob/00ab1a8a64453dc8a39578b901cfa8d1c75c3de2/lib/python/cs/packetstream.py#L624
Starting at line 640: `if Q.empty():` it optionally pauses briefly to
see if more packets are coming on the source queue. If another arrives,
the flush() is _skipped_, and the decision to flush made again after the
next packet is transcribed. In this way a busy source of packets can
write maximally efficient data (full buffers) as long as there's new
data coming from the queue, but if the queue is empty and stays empty
for more that `grace` seconds we flush anyway so that the receiver
_will_ still see the latest packet.
Cheers,
Cameron Simpson <cs@cskk.id.au>
On 6/03/23 1:02 pm, Cameron Simpson wrote:
Also, fsync() need not expedite the data getting to disc. It is equally valid that it just blocks your programme _until_ the data have gone to disc.
Or until it *thinks* the data has gone to the disk. Some drives
do buffering of their own, which may impose additional delays
before the data actually gets written.
Also, fsync() need not expedite the data getting to disc. It is equally
valid that it just blocks your programme _until_ the data have gone to
disc.
On 6 Mar 2023, at 01:42, Greg Ewing via Python-list <python-list@python.org> wrote:
On 6/03/23 1:02 pm, Cameron Simpson wrote:
Also, fsync() need not expedite the data getting to disc. It is equally valid that it just blocks your programme _until_ the data have gone to disc.
Or until it *thinks* the data has gone to the disk. Some drives
do buffering of their own, which may impose additional delays
before the data actually gets written.
--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
...
If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get >flushed to the file until close is issued and the file closes cleanly.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (0 / 16) |
Uptime: | 168:24:39 |
Calls: | 10,385 |
Calls today: | 2 |
Files: | 14,057 |
Messages: | 6,416,545 |