Forum: >>> Magnum BBS <<<

Bug 3.11.x behavioral, open file buffers not flushed til file closed.

From aapost@21:1/5 to All on Sun Mar 5 09:35:15 2023

I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can
think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get
flushed to the file until close is issued and the file closes cleanly.

2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

a script to recreate:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
while(1):
pass

cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.

Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose a portion of what was assumed already complete...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From aapost@21:1/5 to aapost on Sun Mar 5 10:38:41 2023

On 3/5/23 09:35, aapost wrote:

Guess it could just be an annoying gotcha thing on me.

calling at least

f.flush()

in any cases where an explicit close is delayed would be the solution.

Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank B@21:1/5 to All on Sun Mar 5 17:37:38 2023

Am 05.03.23 um 15:35 schrieb aapost:

I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get flushed to the file until close is issued and the file closes cleanly.

2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")

use

with open("abc", "w") as f:
for i in range(50000):
f.write(str(i) + "\n")

and all is well

Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cameron Simpson@21:1/5 to aapost on Mon Mar 6 10:48:09 2023

On 05Mar2023 09:35, aapost <aapost@idontexist.club> wrote:

I have run in to this a few times and finally reproduced it. Whether
it is as expected I am not sure since it is slightly on the user, but
I can think of scenarios where this would be undesirable behavior..
This occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the >reasoning lingers somewhere else.

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get >flushed to the file until close is issued and the file closes cleanly.

Yes, because files are _buffered_ by default. See the `buffering`
parameter to the open() function in the docs.

2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

Or until you issue an `f.flush()`. hich is what flush is for.

cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.

Yes, because of bfufering. This is normal and IMO correct. You can turn
it off, or catch-and-flush these circumstances (SIGKILL excepted,
because SIGKILL's entire purpose it to be uncatchable).

Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose
a portion of what was assumed already complete...

f.flush()

Cheers,
Cameron Simpson <cs@cskk.id.au>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eryk Sun@21:1/5 to aapost on Sun Mar 5 18:36:54 2023

On 3/5/23, aapost <aapost@idontexist.club> wrote:

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get flushed to the file until close is issued and the file closes cleanly.

This is normal behavior for buffered file I/O. There's no timer set to
flush the buffer after operations have "ceased for a time". It
automatically flushes only when the buffer is full or, for line
buffering, when a newline is written.

The default buffer size is based on the raw file object's _blksize
attribute. If st_blksize can't be determined via fstat(), the default
_blksize is 8 KiB.

Here's an example on Linux. In this example, the buffer size is 4 KiB.

>>> f = open('abc', 'w')
>>> os.fstat(f.fileno()).st_blksize
4096
>>> f.buffer.raw._blksize
4096
>>> f.writelines(f'{i}\n' for i in range(50000))
>>> with open('abc') as g: g.readlines()[-1]
...
'49626\n'

>>> pre_flush_size = os.path.getsize('abc')
>>> f.flush()
>>> post_flush_size = os.path.getsize('abc')
>>> post_flush_size - pre_flush_size
2238

Verify that this makes sense, based on what was left in the buffer
prior to flushing:

>>> remaining_lines = 50000 - 49626 - 1
>>> bytes_per_line = 6
>>> remaining_lines * bytes_per_line
2238

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Cameron Simpson@21:1/5 to aapost on Mon Mar 6 11:02:02 2023

On 05Mar2023 10:38, aapost <aapost@idontexist.club> wrote:

Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use flush() >followed by os.fsync() to ensure this behavior.

Yes. You almost _never_ need or want this behaviour. A database tends to
fsync at the end of a transaction and at other critical points.

However, once you've `flush()`ed the file the data are then in the hands
of the OS, to get to disc in a timely but efficient fashion. Calling
fsync(), like calling flush(), affects writing _efficiency_ by depriving
the OS (or for flush(), the Python I/O buffering system) the opportunity
to bundle further data efficiency. It will degrade the overall
performance.

Also, fsync() need not expedite the data getting to disc. It is equally
valid that it just blocks your programme _until_ the data have gone to
disc. I practice it probably does expedite things slightly, but the real
world effect is that your pogramme will gratuitously block anyway, when
it could just get on with its work, secure in the knowledge that the OS
has its back.

flush() is for causality - ensuring the data are on their way so that
some external party _will_ see them rather than waiting forever for data
with are lurking in the buffer. If that external party, for you, is an
end user tailing a log file, then you might want to flush(0 at the end
of every line. Note that there is a presupplied line-buffering mode you
can choose which will cause a file to flush like that for you
automatically.

So when you flush is a policy decision which you can make either during
the programme flow or to a less flexible degree when you open the file.

As an example of choosing-to-flush, here's a little bit of code in a
module I use for writing packet data to a stream (eg a TCP connection): https://github.com/cameron-simpson/css/blob/00ab1a8a64453dc8a39578b901cfa8d1c75c3de2/lib/python/cs/packetstream.py#L624

Starting at line 640: `if Q.empty():` it optionally pauses briefly to
see if more packets are coming on the source queue. If another arrives,
the flush() is _skipped_, and the decision to flush made again after the
next packet is transcribed. In this way a busy source of packets can
write maximally efficient data (full buffers) as long as there's new
data coming from the queue, but if the queue is empty and stays empty
for more that `grace` seconds we flush anyway so that the receiver
_will_ still see the latest packet.

Cheers,
Cameron Simpson <cs@cskk.id.au>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From aapost@21:1/5 to aapost on Sun Mar 5 20:40:07 2023

On 3/5/23 09:35, aapost wrote:

I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get flushed to the file until close is issued and the file closes cleanly.

2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

a script to recreate:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
while(1):
pass

cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.

Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose a portion of what was assumed already complete...

Cameron
Eryk

Yeah, I later noticed open() has the buffering option in the docs, and
the warning on a subsequent page:

Warning
Calling f.write() without using the with keyword or calling f.close()
might result in the arguments of f.write() not being completely written
to the disk, even if the program exits successfully.

I will have to set the buffer arg to 1. I just hadn't thought about
buffering in quite a while since python just handles most of the things
lower level languages don't. I guess my (of course incorrect)
assumptions would have leaned toward some sort of auto handling of the
flush, or a non-buffer default (not saying it should).

And I understand why it is the way it is from a developer standpoint,
it's sort of a mental thing in the moment, I was in a sysadmin way of
thinking, switching around from doing things in bash with multiple
terminals, forgetting the fundamentals of what the python interpreter is
vs a sequence of terminal commands.

That being said, while "with" is great for many use cases, I think its
overuse causes concepts like flush and the underlying "whys" to atrophy (especially since it is obviously a concept that is still important). It
also doesn't work well when doing quick and dirty work in the
interpreter to build a file on the fly with a sequence of commands you
haven't completely thought through yet, in addition to the not wanting
to close yet, the subsequent indention requirement is annoying. f =
open("fn", "w", 1) will be the go to for that type of work since now I
know. Again, just nitpicking, lol.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From aapost@21:1/5 to Cameron Simpson on Sun Mar 5 20:50:17 2023

On 3/5/23 19:02, Cameron Simpson wrote:

On 05Mar2023 10:38, aapost <aapost@idontexist.club> wrote:

Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use
flush() followed by os.fsync() to ensure this behavior.

Yes. You almost _never_ need or want this behaviour. A database tends to fsync at the end of a transaction and at other critical points.

However, once you've `flush()`ed the file the data are then in the hands
of the OS, to get to disc in a timely but efficient fashion. Calling
fsync(), like calling flush(), affects writing _efficiency_ by depriving
the OS (or for flush(), the Python I/O buffering system) the opportunity
to bundle further data efficiency. It will degrade the overall performance.

Also, fsync() need not expedite the data getting to disc. It is equally
valid that it just blocks your programme _until_ the data have gone to
disc. I practice it probably does expedite things slightly, but the real world effect is that your pogramme will gratuitously block anyway, when
it could just get on with its work, secure in the knowledge that the OS
has its back.

flush() is for causality - ensuring the data are on their way so that
some external party _will_ see them rather than waiting forever for data
with are lurking in the buffer. If that external party, for you, is an
end user tailing a log file, then you might want to flush(0 at the end
of every line. Note that there is a presupplied line-buffering mode you
can choose which will cause a file to flush like that for you
automatically.

So when you flush is a policy decision which you can make either during
the programme flow or to a less flexible degree when you open the file.

As an example of choosing-to-flush, here's a little bit of code in a
module I use for writing packet data to a stream (eg a TCP connection): https://github.com/cameron-simpson/css/blob/00ab1a8a64453dc8a39578b901cfa8d1c75c3de2/lib/python/cs/packetstream.py#L624

Starting at line 640: `if Q.empty():` it optionally pauses briefly to
see if more packets are coming on the source queue. If another arrives,
the flush() is _skipped_, and the decision to flush made again after the
next packet is transcribed. In this way a busy source of packets can
write maximally efficient data (full buffers) as long as there's new
data coming from the queue, but if the queue is empty and stays empty
for more that `grace` seconds we flush anyway so that the receiver
_will_ still see the latest packet.

Cheers,
Cameron Simpson <cs@cskk.id.au>

Thanks for the details. And yes, that above quote was from a
non-official doc without a version reference that several forum posts
were referencing, with no further reasoning as to why they make the
suggestion or to what importance it was (for the uninformed trying to
parse it, the suggestion could be because of anything, like python
lacking something that maybe was fixed, or who knows.) Thanks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Chris Angelico@21:1/5 to python-list@python.org on Mon Mar 6 12:46:44 2023

On Mon, 6 Mar 2023 at 12:41, Greg Ewing via Python-list <python-list@python.org> wrote:

On 6/03/23 1:02 pm, Cameron Simpson wrote:

Also, fsync() need not expedite the data getting to disc. It is equally valid that it just blocks your programme _until_ the data have gone to disc.

Or until it *thinks* the data has gone to the disk. Some drives
do buffering of their own, which may impose additional delays
before the data actually gets written.

Sadly true. Usually with SSDs. Unfortunately, at that point, there's
nothing ANYONE can do about it, since the OS is deceived as much as
anyone else.

But Cameron is completely right in that fsync's primary job is "block
until" rather than "do this sooner". Adding fsync calls might possibly
cause a flush when one otherwise wouldn't have happened, but generally
they'll slow things down in the interests of reliability.

ChrisA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Ewing@21:1/5 to Cameron Simpson on Mon Mar 6 14:36:37 2023

On 6/03/23 1:02 pm, Cameron Simpson wrote:

Also, fsync() need not expedite the data getting to disc. It is equally
valid that it just blocks your programme _until_ the data have gone to
disc.

Or until it *thinks* the data has gone to the disk. Some drives
do buffering of their own, which may impose additional delays
before the data actually gets written.

--
Greg

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Barry@21:1/5 to All on Mon Mar 6 12:34:28 2023

On 6 Mar 2023, at 01:42, Greg Ewing via Python-list <python-list@python.org> wrote:

On 6/03/23 1:02 pm, Cameron Simpson wrote:

Also, fsync() need not expedite the data getting to disc. It is equally valid that it just blocks your programme _until_ the data have gone to disc.

Or until it *thinks* the data has gone to the disk. Some drives
do buffering of their own, which may impose additional delays
before the data actually gets written.

This used to be an issue until Microsoft refused to certify and drive that lied about when data was persisted to the medium. WHQL?

That had the effect of stooping driver manufactures having firmware to win benchmarking.

Now the OS will use the commands to the drive that allow the OS to know the data is safe.

Barry

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Weatherby,Gerard@21:1/5 to All on Mon Mar 6 16:03:51 2023

Add f.reconfigure it you want line buffering in your example:

f = open("abc", "w")
f.reconfigure(line_buffering=True)
for i in range(50000):
f.write(str(i) + "\n")

More Pythonic would be:

with open("abc", "w") as f:
for i in range(5000):
print(i,file=f)

From: Python-list <python-list-bounces+gweatherby=uchc.edu@python.org> on behalf of aapost <aapost@idontexist.club>
Date: Sunday, March 5, 2023 at 6:33 PM
To: python-list@python.org <python-list@python.org>
Subject: Bug 3.11.x behavioral, open file buffers not flushed til file closed. *** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can
think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get
flushed to the file until close is issued and the file closes cleanly.

2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

a script to recreate:

f = open("abc", "w")
for i in range(50000):
f.write(str(i) + "\n")
while(1):
pass

cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.

Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose a portion of what was assumed already complete...
-- https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kBYMol9JmMVwZD0iSoeeR1fYTiX8DEG-V4LBm4aAw4IJQ6Am4Ql_HYRZOeO8XK3kZvq2_adnid-FeoHr37Tw2I7k$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/
python-list__;!!Cn_UX_p3!kBYMol9JmMVwZD0iSoeeR1fYTiX8DEG-V4LBm4aAw4IJQ6Am4Ql_HYRZOeO8XK3kZvq2_adnid-FeoHr37Tw2I7k$>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dieter Maurer@21:1/5 to aapost on Mon Mar 6 19:31:00 2023

aapost wrote at 2023-3-5 09:35 -0500:

...
If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get >flushed to the file until close is issued and the file closes cleanly.

This is normal: the buffer is flushed if one of the following conditions
are met:
1. you call `flush`
2. the buffer overflows
3. the file is closed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Plume
  Sun Sep 14 09:34:52 2025
  from Uk via Raw
- Gretchiie
  Sun Sep 14 06:07:30 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (0 / 16)
Uptime:	168:24:39
Calls:	10,385
Calls today:	2
Files:	14,057
Messages:	6,416,545

Bug 3.11.x behavioral, open file buffers not flushed til file closed.

Who's Online

Recent Visitors

System Info