• How to manipulate PDF documents in Debian?

    From Richard Owlett@21:1/5 to All on Sun Jul 20 13:00:02 2025
    I'm running Debian 12.8.

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.
    I wish to edit those 2 files.
    How?
    [Simple question but I suspect answer may not be so simple.
    What I've read confuses me.]

    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Price@21:1/5 to Richard Owlett on Sun Jul 20 14:30:01 2025
    On Sun, 20 Jul 2025, Richard Owlett wrote:

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.

    How about

    mutool merge -o Page-n.pdf <100-page.pdf> n

    where <100-page.pdf> is the original file
    Page-n.pdf is the one page file extracted

    See man mutool. Roger

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Roger Price on Sun Jul 20 15:00:01 2025
    On 7/20/25 7:24 AM, Roger Price wrote:
    On Sun, 20 Jul 2025, Richard Owlett wrote:

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.

    How about

    mutool merge -o Page-n.pdf <100-page.pdf> n

    where <100-page.pdf> is the original file
    Page-n.pdf is the one page file extracted

    See man mutool. Roger



    I'm on my way out for most of the day.
    Did a quick browse of https://manpages.debian.org/bookworm/mupdf-tools/mutool.1.en.html
    not sure it works the way I was wanting/thinking/??? .
    Is some demo or tutorial that would clarify what it tries to do?
    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Price@21:1/5 to Richard Owlett on Sun Jul 20 16:30:01 2025
    On Sun, 20 Jul 2025, Richard Owlett wrote:

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.

    For a simple graphical solution, try xpdf. The print option allows you to print
    specified pages to file.

    We cannot help with the editing since you havn't said what you want to do. Is this text or image or something else ? Roger

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Price@21:1/5 to Richard Owlett on Sun Jul 20 16:20:01 2025
    On Sun, 20 Jul 2025, Richard Owlett wrote:
    On 7/20/25 7:24 AM, Roger Price wrote:
    mutool merge -o Page-n.pdf <100-page.pdf> n

    where <100-page.pdf> is the original file
    Page-n.pdf is the one page file extracted

    Is some demo or tutorial that would clarify what it tries to do?

    I have a 2 page PDF called Documents/Permis.pdf (My driving license).
    I want to put the first page which has my photo in file Photo.pdf .

    The command I use is

    rprice@maria ~ mutool merge -o Photo.pdf Documents/Permis.pdf 1

    The "1" says I want only the page 1. I got file Photo.pdf with my face.

    Program pdfseparate could do the same job. The command is

    rprice@maria ~ pdfseparate -f 1 -l 1 Documents/Permis.pdf Photo.%d.pdf

    and again I got file Photo.1.pdf with my photo. Roger

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Monnier@21:1/5 to All on Sun Jul 20 17:50:01 2025
    I wish to extract 2 of those pages, each to their own PDF file.
    I wish to edit those 2 files.

    I've used Inkskape in the past to edit PDFs, and more
    recently LibreOffice.

    As a general rule, the better option is to do something else, because
    editing PDFs is fundamentally "wrong" so the tools have a hard time
    doing "the right thing" (which is often ill-defined).


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans@21:1/5 to All on Sun Jul 20 21:10:01 2025
    Am Sonntag, 20. Juli 2025, 20:46:18 CEST schrieb Van Snyder:
    On Sun, 2025-07-20 at 14:24 +0200, Roger Price wrote:
    On Sun, 20 Jul 2025, Richard Owlett wrote:
    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.

    How about

    mutool merge -o Page-n.pdf <100-page.pdf> n

    where <100-page.pdf> is the original file
    Page-n.pdf is the one page file extracted

    See man mutool. Roger

    pdftk can do what you want, and more.
    Try "pdfarranger", it is in the debian repo.
    Should work for your needs. Also "pdfsam" might also be able to do it,
    however, personally I think, pdfarranger is more comfortable.

    Best

    Hans

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Roger Price on Mon Jul 21 15:00:01 2025
    "pdfseparate" is the tool I need.
    I need to tweak content of some tables in a large PDF document.
    Wish I had known about it ~2 years ago.
    *THANK YOU*

    On 7/20/25 9:19 AM, Roger Price wrote:
    On Sun, 20 Jul 2025, Richard Owlett wrote:
    On 7/20/25 7:24 AM, Roger Price wrote:
    mutool merge -o Page-n.pdf <100-page.pdf> n

    where <100-page.pdf> is the original file
    Page-n.pdf is the one page file extracted

    Is some demo or tutorial that would clarify what it tries to do?

    I have a 2 page PDF called Documents/Permis.pdf (My driving license).
    I want to put the first page which has my photo in file Photo.pdf .

    The command I use is

    rprice@maria ~ mutool merge -o Photo.pdf Documents/Permis.pdf 1

    The "1" says I want only the page 1. I got file Photo.pdf with my face.

    Program pdfseparate could do the same job. The command is

    rprice@maria ~ pdfseparate -f 1 -l 1 Documents/Permis.pdf Photo.%d.pdf

    and again I got file Photo.1.pdf with my photo. Roger



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Roger Price on Mon Jul 21 15:20:01 2025
    On 7/20/25 9:29 AM, Roger Price wrote:
    On Sun, 20 Jul 2025, Richard Owlett wrote:

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.

    For a simple graphical solution, try xpdf. The print option allows you to print
    specified pages to file.

    We cannot help with the editing since you havn't said what you want to do. Is this text or image or something else ? Roger



    Text files. I.E. tables in Appendix 4 of "Thrifty Food Plan,2021". https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf

    Your previous reference to pdfseparate is what I needed :}!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kent West@21:1/5 to Mike Castle on Mon Jul 21 21:40:01 2025
    This is a multi-part message in MIME format.
    On 7/21/25 2:25 PM, Mike Castle wrote:
    Annoyingly, I am currently trying to print a filled-form PDF with FF
    and it is not working.

    When I try to print the page, it comes up with the form without all of
    my filling.

    So, treat my previous comment with suspicion.

    mrc


    I just used Firefox 140.0.4  on Debian 13 (sid) to download a 1040 pdf
    from https://www.irs.gov/pub/irs-pdf/f1040.pdf; the tax form opened in
    my Firefox window, and allowed me to click in the Last Name and First
    Name fields, and put in my name. I then "printed" the document to a new
    .pdf file, and from a terminal window was able to use evince to open
    that new .pdf, and it still contained my names in those fields.

    I didn't try actually printing to a printer, though.

    --
    Kent West <")))><
    IT Support / Client Support
    Abilene Christian University
    Westing Peacefully -http://kentwest.blogspot.com

    <!DOCTYPE html>
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    </head>
    <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 7/21/25 2:25 PM, Mike Castle wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:CA+t9iMw=n_iVWntibDVZc80Dkr4eSN2EFkTeX_8+=QVXyNq=Bw@mail.gmail.com">
    <pre wrap="" class="moz-quote-pre">Annoyingly, I am currently trying to print a filled-form PDF with FF
    and it is not working.

    When I try to print the page, it comes up with the form without all of
    my filling.

    So, treat my previous comment with suspicion.

    mrc
    </pre>
    </blockquote>
    <p><br>
    </p>
    <p>I just used Firefox 140.0.4  on Debian 13 (sid) to download a
    1040 pdf from <a class="moz-txt-link-freetext" href="https://www.irs.gov/pub/irs-pdf/f1040.pdf">https://www.irs.gov/pub/irs-pdf/f1040.pdf</a>; the tax
    form opened in my Firefox window, and allowed me to click in the
    Last Name and First Name fields, and put in my name. I then
    "printed" the document to a new .pdf file, and from a terminal
    window was able to use evince to open that new .pdf, and it still
    contained my names in those fields.</p>
    <p><span style="white-space: pre-wrap">I didn't try actually printing to a printer, though.</span></p>
    <p><span style="white-space: pre-wrap">
    </span></p>
    <pre class="moz-signature" cols="72">--
    Kent West &lt;")))&gt;&lt;
    IT Support / Client Support
    Abilene Christian University
    Westing Peacefully - <a class="moz-txt-link-freetext" href="http://kentwest.blogspot.com">http://kentwest.blogspot.com</a></pre>
    </body>
    </html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans@21:1/5 to All on Mon Jul 21 21:50:01 2025
    Am Montag, 21. Juli 2025, 21:25:36 CEST schrieb Mike Castle:
    Annoyingly, I am currently trying to print a filled-form PDF with FF
    and it is not working.

    When I try to print the page, it comes up with the form without all of
    my filling.

    So, treat my previous comment with suspicion.

    mrc
    Did you try the following:

    1. Save the PDF file to your computer.

    2. Open with "okular", click "show formfields"

    3. Now edit it and save it (can also save as second file with other name)

    4. Open saved file with okular, and print.

    Is working here.

    Hans

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Richard Owlett on Tue Jul 22 17:20:01 2025
    On 7/20/25 5:52 AM, Richard Owlett wrote:
    I'm running Debian 12.8.

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.
    I wish to edit those 2 files.
    How?
    [Simple question but I suspect answer may not be so simple.
     What I've read confuses me.]

    TIA


    I should have put more "em-FAY-sis" on my goal for this thread being
    learning how to extract specific pages of a large PDF document.[1] I had
    not fully appreciated how graphically oriented the PDF format is.

    The sub-goal being to perceive the the byte level structure of *that*
    page in order to extract the semantic content perceived by a human. I
    would then edit/reformat the content to be *useful* to a different
    target audience.

    As to "how to edit", I need to collate and evaluate the multiple tools mentioned in this thread. *THANK YOU!*

    The original target audience was bureaucrats defining policy/procedures
    based on monetary value of food assistance.

    My audience would be much bettered served by a weekly grocery shopping
    list. If possible I would like to create an at least semi-automated
    procedure to do the same for the other similar table in this edition
    [and possible future] edition(s) of _Thrifty Food Plan_.


    [1] Specifically Table A4.14 of _Thrifty Food Plan, 2021_ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to David Wright on Tue Jul 22 20:20:01 2025
    On 7/22/25 11:19 AM, David Wright wrote:
    On Tue 22 Jul 2025 at 10:14:37 (-0500), Richard Owlett wrote:
    On 7/20/25 5:52 AM, Richard Owlett wrote:
    I'm running Debian 12.8.

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.

    [ … ]

    I should have put more "em-FAY-sis" on my goal for this thread being
    learning how to extract specific pages of a large PDF document.[1] I
    had not fully appreciated how graphically oriented the PDF format is.

    The sub-goal being to perceive the the byte level structure of *that*
    page in order to extract the semantic content perceived by a human. I
    would then edit/reformat the content to be *useful* to a different
    target audience.

    It's very simple to burst a document into individual pages with pdftk:

    $ pdftk document.pdf burst
    $


    I'm running Debian 12.8 and package install failed with
    Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openjdk-17/openjdk-17-jre

    Don't know if upgrade to Debian 12.8 would resolve the issue.
    In any case an upgrade is not feasible at the moment ;{

    The manpage does look interesting.

    [snip]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Greg Wooledge@21:1/5 to Richard Owlett on Tue Jul 22 20:40:01 2025
    On Tue, Jul 22, 2025 at 13:17:52 -0500, Richard Owlett wrote:
    I'm running Debian 12.8 and package install failed with
    Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openjdk-17/openjdk-17-jre

    There's no version number on that file. Looks weird. Maybe that's
    supposed to be a symlink to the most recent package or something, but
    I don't know. The URL looks truncated.

    hobbit:~$ cat /etc/debian_version
    12.11

    hobbit:~$ apt-cache show openjdk-17-jre
    [...]
    Filename: pool/main/o/openjdk-17/openjdk-17-jre_17.0.15+6-1~deb12u1_amd64.deb

    In any case, being 3 point releases behind the current patchlevel is
    probably not helping.

    What happens if you install the openjdk-17-jre package first?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Greg Wooledge on Tue Jul 22 21:00:02 2025
    On 7/22/25 1:31 PM, Greg Wooledge wrote:
    On Tue, Jul 22, 2025 at 13:17:52 -0500, Richard Owlett wrote:
    I'm running Debian 12.8 and package install failed with
    Failed to fetch http://security.debian.org/debian-security/pool/updates/main/o/openjdk-17/openjdk-17-jre

    There's no version number on that file. Looks weird. Maybe that's
    supposed to be a symlink to the most recent package or something, but
    I don't know. The URL looks truncated.


    That was copy-n-past from Synaptic message box.

    hobbit:~$ cat /etc/debian_version
    12.11

    hobbit:~$ apt-cache show openjdk-17-jre
    [...]
    Filename: pool/main/o/openjdk-17/openjdk-17-jre_17.0.15+6-1~deb12u1_amd64.deb

    In any case, being 3 point releases behind the current patchlevel is
    probably not helping.

    What happens if you install the openjdk-17-jre package first?

    Don't know. On 2nd try I check boxed both in Synaptic and got same error.

    Real world is interfering with installing current Debian.
    Don't know how sane tomorrow will be ;}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Richard Owlett on Wed Jul 23 16:50:02 2025
    On 7/20/25 5:52 AM, Richard Owlett wrote:
    I'm running Debian 12.8.

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.
    I wish to edit those 2 files.
    How?
    [Simple question but I suspect answer may not be so simple.
     What I've read confuses me.]

    TIA


    For convenience of future readers - these tools have been suggested to
    me in this thread.

    mutool - all purpose tool for dealing with PDF files
    pdftk - Portable Document Format (PDF) page extractor
    qpdf - PDF transformation software
    xpdf - Portable Document Format (PDF) file viewer for X (xpopple) pdfarranger - Application for PDF Merging, Rearranging, Splitting, and Cropping
    pdfsam - PDF Split and Merge
    The poppler-utils package includes:
    pdfseparate -- page extraction tool
    pdftotext -- text extraction
    pdftohtml -- PDF to HTML converter

    I'll try each under Debian 12.8
    I could not install pdftk. I'll retry when I have updated my Debian.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Greg on Wed Jul 23 19:10:02 2025
    On 7/23/25 11:23 AM, Greg wrote:
    On 2025-07-23, Richard Owlett <rowlett@access.net> wrote:
    On 7/20/25 5:52 AM, Richard Owlett wrote:
    I'm running Debian 12.8.

    I have a 100+ page PDF document.
    I wish to extract 2 of those pages, each to their own PDF file.
    I wish to edit those 2 files.
    How?
    [Simple question but I suspect answer may not be so simple.
     What I've read confuses me.]

    TIA


    For convenience of future readers - these tools have been suggested to
    me in this thread.

    mutool - all purpose tool for dealing with PDF files
    pdftk - Portable Document Format (PDF) page extractor
    qpdf - PDF transformation software
    xpdf - Portable Document Format (PDF) file viewer for X (xpopple) >>> pdfarranger - Application for PDF Merging, Rearranging, Splitting, and Cropping
    pdfsam - PDF Split and Merge
    The poppler-utils package includes:
    pdfseparate -- page extraction tool
    pdftotext -- text extraction
    pdftohtml -- PDF to HTML converter

    I'll try each under Debian 12.8
    I could not install pdftk. I'll retry when I have updated my Debian.

    Or you just print to file in any browser.


    Quoting myself ;}
    I wish to edit those 2 files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)