Hi, I'm interested in writing a script that will:
1. Find and compress files recursively
2. After the first 5 seconds of compressing, if the compression ratio >1 (i.e. the compressed file will be larger than the uncompressed file), it tries another compression algorithm.
3. If the other compression algorithm still has a ratio >1, it tries another algorithm, until a list is exhausted.
4. If the list is exhausted, it skips compressing that file.
Any suggestions on how to proceed?
Hi, I'm interested in writing a script that will:
1. Find and compress files recursively
2. After the first 5 seconds of compressing, if the compression ratio >1 (i.e. the compressed file will be larger than the uncompressed file), it tries another compression algorithm.
3. If the other compression algorithm still has a ratio >1, it tries
another algorithm, until a list is exhausted.
4. If the list is exhausted, it skips compressing that file.
Any suggestions on how to proceed?
Any suggestions on how to proceed?
On 6/11/24 01:53, J Newman wrote:
Any suggestions on how to proceed?
As others have said, it's very difficult to tell within the first five seconds what the ultimate compression ratio will be.
Grant Taylor <gtaylor@tnetconsulting.net> writes:
On 6/11/24 01:53, J Newman wrote:
Any suggestions on how to proceed?
As others have said, it's very difficult to tell within the first five
seconds what the ultimate compression ratio will be.
Not just difficult but impossible in general: the input file could
change character in its second half, switching the overall result from
that that is (for example) a gzip win to an xz win.
It's true that you cannot tell within the first 5 seconds what the
ultimate compression ratio will be, but it seems to me (from
compressing avi/mp4/mov files with lzma -9evv) that you can tell
within +/- 5% to a high degree of confidence, what the ultimate
compression ratio will be given the first 5 seconds.
On 12/06/2024 16:13, D wrote:
On Wed, 12 Jun 2024, Richard Kettlewell wrote:
Grant Taylor <gtaylor@tnetconsulting.net> writes:
On 6/11/24 01:53, J Newman wrote:
Any suggestions on how to proceed?
As others have said, it's very difficult to tell within the first five >>>> seconds what the ultimate compression ratio will be.
Not just difficult but impossible in general: the input file could
change character in its second half, switching the overall result from
that that is (for example) a gzip win to an xz win.
This is true! The only thing I can imagine are parsing the file type, and
from that file type, drawing conclusions about the compressability of the
data, or doing a flawed statistical analysis, but as said, the end could be >> vastly different from the start.
OK good point...as mentioned elsewhere my experience is with compressing video files with lzma.
But if we accept that the script will make mistakes sometimes in choosing the right algorithm for compression, do you suggest parsing the file type, or trying to compress each file for the first 5 seconds, as the option with the least errors in choosing the right compression algorithm?
J Newman <jenniferkatenewman@gmail.com> writes:
It's true that you cannot tell within the first 5 seconds what the
ultimate compression ratio will be, but it seems to me (from
compressing avi/mp4/mov files with lzma -9evv) that you can tell
within +/- 5% to a high degree of confidence, what the ultimate
compression ratio will be given the first 5 seconds.
Well then, I believe the solution was already posted. Grab 5% of your
files with dd and see how it compresses.
I'm a little curious, what kind of space savings do you expect to get by doing this? And wouldn't it make more sense to re-encode for lower
bitrate if space saving is your goal?
J Newman <jenniferkatenewman@gmail.com> writes:
It's true that you cannot tell within the first 5 seconds what the
ultimate compression ratio will be, but it seems to me (from
compressing avi/mp4/mov files with lzma -9evv) that you can tell
within +/- 5% to a high degree of confidence, what the ultimate
compression ratio will be given the first 5 seconds.
Well then, I believe the solution was already posted. Grab 5% of your
files with dd and see how it compresses.
I'm a little curious, what kind of space savings do you expect to get by doing this? And wouldn't it make more sense to re-encode for lower
bitrate if space saving is your goal?
Anssi Saari <anssi.saari@usenet.mail.kapsi.fi> wrote:
Well then, I believe the solution was already posted. Grab 5% of your
files with dd and see how it compresses.
The solution that I see grabs the first 1MB, but it would make more
sense to sample eg. 1% of the file size in five places within the
file. 100MB file = 1MB sample, 100MB/5 = 20MB, so use dd to grab
one 1MB sample from the start of the file then four more at an
offset that increments by 20MB each time. Store these separately,
compress them separately, then average the compression ratio of all
the samples.
perhaps have a little database that maps file type to compression algorithm
On 6/13/24 04:55, D wrote:
perhaps have a little database that maps file type to compression algorithm
case ${FILE##*.} in
txt)
#...
;;
jpg|jpeg)
# Jpeg
;;
*)
echo "unknown file type"
;;
esac
;-)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 489 |
Nodes: | 16 (2 / 14) |
Uptime: | 41:50:07 |
Calls: | 9,670 |
Calls today: | 1 |
Files: | 13,716 |
Messages: | 6,169,774 |