Forum: >>> Magnum BBS <<<

How to manipulate PDF documents in Debian?

From Richard Owlett@21:1/5 to All on Sun Jul 20 13:00:02 2025

I'm running Debian 12.8.

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.
I wish to edit those 2 files.
How?
[Simple question but I suspect answer may not be so simple.
What I've read confuses me.]

TIA

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roger Price@21:1/5 to Richard Owlett on Sun Jul 20 14:30:01 2025

On Sun, 20 Jul 2025, Richard Owlett wrote:

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.

How about

mutool merge -o Page-n.pdf <100-page.pdf> n

where <100-page.pdf> is the original file
Page-n.pdf is the one page file extracted

See man mutool. Roger

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Roger Price on Sun Jul 20 15:00:01 2025

On 7/20/25 7:24 AM, Roger Price wrote:

On Sun, 20 Jul 2025, Richard Owlett wrote:

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.

How about

mutool merge -o Page-n.pdf <100-page.pdf> n

where <100-page.pdf> is the original file
Page-n.pdf is the one page file extracted

See man mutool. Roger

I'm on my way out for most of the day.
Did a quick browse of https://manpages.debian.org/bookworm/mupdf-tools/mutool.1.en.html
not sure it works the way I was wanting/thinking/??? .
Is some demo or tutorial that would clarify what it tries to do?
Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roger Price@21:1/5 to Richard Owlett on Sun Jul 20 16:30:01 2025

On Sun, 20 Jul 2025, Richard Owlett wrote:

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.

For a simple graphical solution, try xpdf. The print option allows you to print
specified pages to file.

We cannot help with the editing since you havn't said what you want to do. Is this text or image or something else ? Roger

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Roger Price@21:1/5 to Richard Owlett on Sun Jul 20 16:20:01 2025

On Sun, 20 Jul 2025, Richard Owlett wrote:

On 7/20/25 7:24 AM, Roger Price wrote:

mutool merge -o Page-n.pdf <100-page.pdf> n

where <100-page.pdf> is the original file
Page-n.pdf is the one page file extracted

Is some demo or tutorial that would clarify what it tries to do?

I have a 2 page PDF called Documents/Permis.pdf (My driving license).
I want to put the first page which has my photo in file Photo.pdf .

The command I use is

rprice@maria ~ mutool merge -o Photo.pdf Documents/Permis.pdf 1

The "1" says I want only the page 1. I got file Photo.pdf with my face.

Program pdfseparate could do the same job. The command is

rprice@maria ~ pdfseparate -f 1 -l 1 Documents/Permis.pdf Photo.%d.pdf

and again I got file Photo.1.pdf with my photo. Roger

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefan Monnier@21:1/5 to All on Sun Jul 20 17:50:01 2025

I wish to extract 2 of those pages, each to their own PDF file.
I wish to edit those 2 files.

I've used Inkskape in the past to edit PDFs, and more
recently LibreOffice.

As a general rule, the better option is to do something else, because
editing PDFs is fundamentally "wrong" so the tools have a hard time
doing "the right thing" (which is often ill-defined).

Stefan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans@21:1/5 to All on Sun Jul 20 21:10:01 2025

Am Sonntag, 20. Juli 2025, 20:46:18 CEST schrieb Van Snyder:

On Sun, 2025-07-20 at 14:24 +0200, Roger Price wrote:

On Sun, 20 Jul 2025, Richard Owlett wrote:

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.

How about

mutool merge -o Page-n.pdf <100-page.pdf> n

where <100-page.pdf> is the original file
Page-n.pdf is the one page file extracted

See man mutool. Roger

pdftk can do what you want, and more.

Try "pdfarranger", it is in the debian repo.
Should work for your needs. Also "pdfsam" might also be able to do it,
however, personally I think, pdfarranger is more comfortable.

Best

Hans

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Roger Price on Mon Jul 21 15:00:01 2025

"pdfseparate" is the tool I need.
I need to tweak content of some tables in a large PDF document.
Wish I had known about it ~2 years ago.
*THANK YOU*

On 7/20/25 9:19 AM, Roger Price wrote:

On Sun, 20 Jul 2025, Richard Owlett wrote:

On 7/20/25 7:24 AM, Roger Price wrote:

mutool merge -o Page-n.pdf <100-page.pdf> n

where <100-page.pdf> is the original file
Page-n.pdf is the one page file extracted

Is some demo or tutorial that would clarify what it tries to do?

I have a 2 page PDF called Documents/Permis.pdf (My driving license).
I want to put the first page which has my photo in file Photo.pdf .

The command I use is

rprice@maria ~ mutool merge -o Photo.pdf Documents/Permis.pdf 1

The "1" says I want only the page 1. I got file Photo.pdf with my face.

Program pdfseparate could do the same job. The command is

rprice@maria ~ pdfseparate -f 1 -l 1 Documents/Permis.pdf Photo.%d.pdf

and again I got file Photo.1.pdf with my photo. Roger

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Roger Price on Mon Jul 21 15:20:01 2025

On 7/20/25 9:29 AM, Roger Price wrote:

On Sun, 20 Jul 2025, Richard Owlett wrote:

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.

For a simple graphical solution, try xpdf. The print option allows you to print
specified pages to file.

We cannot help with the editing since you havn't said what you want to do. Is this text or image or something else ? Roger

Text files. I.E. tables in Appendix 4 of "Thrifty Food Plan,2021". https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf

Your previous reference to pdfseparate is what I needed :}!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kent West@21:1/5 to Mike Castle on Mon Jul 21 21:40:01 2025

This is a multi-part message in MIME format.
On 7/21/25 2:25 PM, Mike Castle wrote:

Annoyingly, I am currently trying to print a filled-form PDF with FF
and it is not working.

When I try to print the page, it comes up with the form without all of
my filling.

So, treat my previous comment with suspicion.

mrc

I just used Firefox 140.0.4 on Debian 13 (sid) to download a 1040 pdf
from https://www.irs.gov/pub/irs-pdf/f1040.pdf; the tax form opened in
my Firefox window, and allowed me to click in the Last Name and First
Name fields, and put in my name. I then "printed" the document to a new
.pdf file, and from a terminal window was able to use evince to open
that new .pdf, and it still contained my names in those fields.

I didn't try actually printing to a printer, though.

--
Kent West <")))><
IT Support / Client Support
Abilene Christian University
Westing Peacefully -http://kentwest.blogspot.com

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
 

<div class="moz-cite-prefix">On 7/21/25 2:25 PM, Mike Castle wrote: 
</div>
<blockquote type="cite" cite="mid:CA+t9iMw=n_iVWntibDVZc80Dkr4eSN2EFkTeX_8+=QVXyNq=Bw@mail.gmail.com">
<pre wrap="" class="moz-quote-pre">Annoyingly, I am currently trying to print a filled-form PDF with FF
and it is not working.

When I try to print the page, it comes up with the form without all of
my filling.

So, treat my previous comment with suspicion.

mrc
</pre>
</blockquote>
 

I just used Firefox 140.0.4 on Debian 13 (sid) to download a
1040 pdf from <a class="moz-txt-link-freetext" href="https://www.irs.gov/pub/irs-pdf/f1040.pdf">https://www.irs.gov/pub/irs-pdf/f1040.pdf</a>; the tax
form opened in my Firefox window, and allowed me to click in the
Last Name and First Name fields, and put in my name. I then
"printed" the document to a new .pdf file, and from a terminal
window was able to use evince to open that new .pdf, and it still
contained my names in those fields.
I didn't try actually printing to a printer, though.


<pre class="moz-signature" cols="72">--
Kent West <")))><
IT Support / Client Support
Abilene Christian University
Westing Peacefully - <a class="moz-txt-link-freetext" href="http://kentwest.blogspot.com">http://kentwest.blogspot.com</a></pre>
</body>
</html>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans@21:1/5 to All on Mon Jul 21 21:50:01 2025

Am Montag, 21. Juli 2025, 21:25:36 CEST schrieb Mike Castle:

Annoyingly, I am currently trying to print a filled-form PDF with FF
and it is not working.

When I try to print the page, it comes up with the form without all of
my filling.

So, treat my previous comment with suspicion.

mrc

Did you try the following:

1. Save the PDF file to your computer.

2. Open with "okular", click "show formfields"

3. Now edit it and save it (can also save as second file with other name)

4. Open saved file with okular, and print.

Is working here.

Hans

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Richard Owlett on Tue Jul 22 17:20:01 2025

On 7/20/25 5:52 AM, Richard Owlett wrote:

I'm running Debian 12.8.

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.
I wish to edit those 2 files.
How?
[Simple question but I suspect answer may not be so simple.
What I've read confuses me.]

TIA

I should have put more "em-FAY-sis" on my goal for this thread being
learning how to extract specific pages of a large PDF document.[1] I had
not fully appreciated how graphically oriented the PDF format is.

The sub-goal being to perceive the the byte level structure of *that*
page in order to extract the semantic content perceived by a human. I
would then edit/reformat the content to be *useful* to a different
target audience.

As to "how to edit", I need to collate and evaluate the multiple tools mentioned in this thread. *THANK YOU!*

The original target audience was bureaucrats defining policy/procedures
based on monetary value of food assistance.

My audience would be much bettered served by a weekly grocery shopping
list. If possible I would like to create an at least semi-automated
procedure to do the same for the other similar table in this edition
[and possible future] edition(s) of _Thrifty Food Plan_.

[1] Specifically Table A4.14 of _Thrifty Food Plan, 2021_ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to David Wright on Tue Jul 22 20:20:01 2025

On 7/22/25 11:19 AM, David Wright wrote:

On Tue 22 Jul 2025 at 10:14:37 (-0500), Richard Owlett wrote:

On 7/20/25 5:52 AM, Richard Owlett wrote:

I'm running Debian 12.8.

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.

[ … ]

I should have put more "em-FAY-sis" on my goal for this thread being
learning how to extract specific pages of a large PDF document.[1] I
had not fully appreciated how graphically oriented the PDF format is.

The sub-goal being to perceive the the byte level structure of *that*
page in order to extract the semantic content perceived by a human. I
would then edit/reformat the content to be *useful* to a different
target audience.

It's very simple to burst a document into individual pages with pdftk:

$ pdftk document.pdf burst
$

I'm running Debian 12.8 and package install failed with

Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openjdk-17/openjdk-17-jre

Don't know if upgrade to Debian 12.8 would resolve the issue.
In any case an upgrade is not feasible at the moment ;{

The manpage does look interesting.

[snip]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Greg Wooledge@21:1/5 to Richard Owlett on Tue Jul 22 20:40:01 2025

On Tue, Jul 22, 2025 at 13:17:52 -0500, Richard Owlett wrote:

I'm running Debian 12.8 and package install failed with

Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openjdk-17/openjdk-17-jre

There's no version number on that file. Looks weird. Maybe that's
supposed to be a symlink to the most recent package or something, but
I don't know. The URL looks truncated.

hobbit:~$ cat /etc/debian_version
12.11

hobbit:~$ apt-cache show openjdk-17-jre
[...]
Filename: pool/main/o/openjdk-17/openjdk-17-jre_17.0.15+6-1~deb12u1_amd64.deb

In any case, being 3 point releases behind the current patchlevel is
probably not helping.

What happens if you install the openjdk-17-jre package first?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Greg Wooledge on Tue Jul 22 21:00:02 2025

On 7/22/25 1:31 PM, Greg Wooledge wrote:

On Tue, Jul 22, 2025 at 13:17:52 -0500, Richard Owlett wrote:

I'm running Debian 12.8 and package install failed with

Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openjdk-17/openjdk-17-jre

There's no version number on that file. Looks weird. Maybe that's
supposed to be a symlink to the most recent package or something, but
I don't know. The URL looks truncated.

That was copy-n-past from Synaptic message box.

hobbit:~$ cat /etc/debian_version
12.11

hobbit:~$ apt-cache show openjdk-17-jre
[...]
Filename: pool/main/o/openjdk-17/openjdk-17-jre_17.0.15+6-1~deb12u1_amd64.deb

In any case, being 3 point releases behind the current patchlevel is
probably not helping.

What happens if you install the openjdk-17-jre package first?

Don't know. On 2nd try I check boxed both in Synaptic and got same error.

Real world is interfering with installing current Debian.
Don't know how sane tomorrow will be ;}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Richard Owlett on Wed Jul 23 16:50:02 2025

On 7/20/25 5:52 AM, Richard Owlett wrote:

I'm running Debian 12.8.

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.
I wish to edit those 2 files.
How?
[Simple question but I suspect answer may not be so simple.
What I've read confuses me.]

TIA

For convenience of future readers - these tools have been suggested to
me in this thread.

mutool - all purpose tool for dealing with PDF files
pdftk - Portable Document Format (PDF) page extractor
qpdf - PDF transformation software
xpdf - Portable Document Format (PDF) file viewer for X (xpopple) pdfarranger - Application for PDF Merging, Rearranging, Splitting, and Cropping
pdfsam - PDF Split and Merge

The poppler-utils package includes:
pdfseparate -- page extraction tool
pdftotext -- text extraction
pdftohtml -- PDF to HTML converter

I'll try each under Debian 12.8
I could not install pdftk. I'll retry when I have updated my Debian.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Owlett@21:1/5 to Greg on Wed Jul 23 19:10:02 2025

On 7/23/25 11:23 AM, Greg wrote:

On 2025-07-23, Richard Owlett <rowlett@access.net> wrote:

On 7/20/25 5:52 AM, Richard Owlett wrote:

I'm running Debian 12.8.

I have a 100+ page PDF document.
I wish to extract 2 of those pages, each to their own PDF file.
I wish to edit those 2 files.
How?
[Simple question but I suspect answer may not be so simple.
What I've read confuses me.]

TIA

For convenience of future readers - these tools have been suggested to
me in this thread.

mutool - all purpose tool for dealing with PDF files
pdftk - Portable Document Format (PDF) page extractor
qpdf - PDF transformation software
xpdf - Portable Document Format (PDF) file viewer for X (xpopple) >>> pdfarranger - Application for PDF Merging, Rearranging, Splitting, and Cropping
pdfsam - PDF Split and Merge

The poppler-utils package includes:
pdfseparate -- page extraction tool
pdftotext -- text extraction
pdftohtml -- PDF to HTML converter

I'll try each under Debian 12.8
I could not install pdftk. I'll retry when I have updated my Debian.

Or you just print to file in any browser.

Quoting myself ;}

I wish to edit those 2 files.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Thlc
  Sat Sep 13 17:11:34 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 17:04:03 2025
  from Rognac, France via Telnet
- Thlc
  Sat Sep 13 16:32:19 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 15:41:11 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 07:56:03 2025
  from Rognac, France via SSH
- Gretchiie
  Sat Sep 13 07:22:10 2025
  from Derry, Nh via Telnet
- Thlc
  Sat Sep 13 06:57:56 2025
  from Rognac, France via SSH
- Thlc
  Sat Sep 13 06:47:28 2025
  from Rognac, France via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	546
Nodes:	16 (2 / 14)
Uptime:	148:02:54
Calls:	10,383
Calls today:	8
Files:	14,054
D/L today:	2 files (1,861K bytes)
Messages:	6,417,737

How to manipulate PDF documents in Debian?

Who's Online

Recent Visitors

System Info