• Support forum for pdfminer.six [especially pdf2txt]

    From Richard Owlett@21:1/5 to All on Wed Oct 30 11:00:01 2024
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
    ].

    I have a combination of vision and perception problems which make using
    a PDF viewer impractical. pdf2txt can convert it to HTML. It's still not
    easily usable for me but examining the produced file and reading various pdfminer.six web pages suggest possible solution.

    [ https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md [
    points readers to a chat group at gitter [
    https://gitter.im/pdfminer-six/Lobby ]. That comes up blank page in
    current SeaMonkey configured for my needs..

    I'm looking for a mailing list or USENET forum where my questions would
    be On Topic.

    Suggestions?
    TIA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From debian-user@howorth.org.uk@21:1/5 to Richard Owlett on Wed Oct 30 16:20:01 2024
    Richard Owlett <rowlett@access.net> wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf ].

    No direct answers from me, I'm afraid but some suggestions.

    (1) Although I've only found the document as a PDF, a page it is
    linked from <https://www.fns.usda.gov/research/cnpp/usda-food-plans>
    gives an email address FNS.FoodPlans@usda.gov for "technical
    inquiries". I'd guess your questions fall under that description, and
    the US gov is fairly good on accessibility issues, so you might be able
    to resolve your issue there.

    I have a combination of vision and perception problems which make
    using a PDF viewer impractical. pdf2txt can convert it to HTML. It's
    still not easily usable for me but examining the produced file and
    reading various pdfminer.six web pages suggest possible solution.

    [ https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md
    [ points readers to a chat group at gitter [ https://gitter.im/pdfminer-six/Lobby ]. That comes up blank page in
    current SeaMonkey configured for my needs..

    (2) The chat group comes up fine for me in FF 128.4.0esr (64-bit) once I
    enable javascript. It uses the element framework, which may give you
    some hints as to how you might access it.

    I'm looking for a mailing list or USENET forum where my questions
    would be On Topic.

    (3) Presumably as well as the github site, you've also found its
    documentation at https://pdfminersix.readthedocs.io/en/latest/ Maybe
    that will answer some questions?

    Suggestions?
    TIA

    (4) It seems Pieter Marsman (pietermarsman) is active on gitter and may
    be the best developer to contact for further information, but I've no
    idea how :(

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to Richard Owlett on Wed Oct 30 17:00:01 2024
    On Wed 30 Oct 2024 at 04:53:27 (-0500), Richard Owlett wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
    ].

    Is this actually true? The document itself contains on page 2:

    "Persons using assistive technology should be able to access
    information in this report. Persons with disabilities who require
    alternative means of communication for program information (e.g.,
    Braille, large print, audiotape, American Sign Language, etc.)
    should contact the responsible Agency or USDA’s TARGET Center at
    (202) 720-2600 (voice and TTY) or contact USDA through the Federal
    Relay Service at (800) 877-8339. Additionally, program information
    may be made available in languages other than English."

    Cheers,
    David.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From debian-user@howorth.org.uk@21:1/5 to David Wright on Wed Oct 30 19:00:01 2024
    David Wright <deblis@lionunicorn.co.uk> wrote:
    On Wed 30 Oct 2024 at 04:53:27 (-0500), Richard Owlett wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document
    [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf ].

    Is this actually true? The document itself contains on page 2:

    "Persons using assistive technology should be able to access
    information in this report. Persons with disabilities who require
    alternative means of communication for program information (e.g.,
    Braille, large print, audiotape, American Sign Language, etc.)
    should contact the responsible Agency or USDA’s TARGET Center at
    (202) 720-2600 (voice and TTY) or contact USDA through the Federal
    Relay Service at (800) 877-8339. Additionally, program information
    may be made available in languages other than English."

    I suppose Richard, and others' problem may be that if they can't read
    the PDF they can't read the instructions for where to get help? The
    help should be outside the document.

    But at least you were able to point it out to him :)

    Cheers,
    David.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Wright@21:1/5 to debian-user@howorth.org.uk on Wed Oct 30 21:50:01 2024
    On Wed 30 Oct 2024 at 17:50:27 (+0000), debian-user@howorth.org.uk wrote:
    David Wright <deblis@lionunicorn.co.uk> wrote:
    On Wed 30 Oct 2024 at 04:53:27 (-0500), Richard Owlett wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that seems to be only available as a PDF document
    [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf ].

    Is this actually true? The document itself contains on page 2:

    "Persons using assistive technology should be able to access
    information in this report. Persons with disabilities who require
    alternative means of communication for program information (e.g.,
    Braille, large print, audiotape, American Sign Language, etc.)
    should contact the responsible Agency or USDA’s TARGET Center at
    (202) 720-2600 (voice and TTY) or contact USDA through the Federal
    Relay Service at (800) 877-8339. Additionally, program information
    may be made available in languages other than English."

    I suppose Richard, and others' problem may be that if they can't read
    the PDF they can't read the instructions for where to get help? The
    help should be outside the document.

    It is: in the web page you quoted, there's an accessibility link near
    the bottom that leads to mechanisms for improving accessibility if you
    think it's inadequate. (Of course, we don't know how, in the first
    place, the OP came upon this document last June.)

    Or do you mean outside the internet—our local library is a good source
    for so much information like this, and they distribute publications
    from the local university extension department, and from city, county,
    state and federal sources, particularly for non-internet people.

    The Thrifty Food Plan came up on this list earlier in the year:

    https://lists.debian.org/debian-user/2024/06/msg00690.html
    https://lists.debian.org/debian-user/2024/06/msg00711.html

    Cheers,
    David.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to debian-user@howorth.org.uk on Thu Oct 31 12:10:02 2024
    On 10/30/24 10:12 AM, debian-user@howorth.org.uk wrote:
    Richard Owlett <rowlett@access.net> wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [
    https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf >> ].

    No direct answers from me, I'm afraid but some suggestions.

    (1) Although I've only found the document as a PDF, a page it is
    linked from <https://www.fns.usda.gov/research/cnpp/usda-food-plans>
    gives an email address FNS.FoodPlans@usda.gov for "technical
    inquiries". I'd guess your questions fall under that description, and
    the US gov is fairly good on accessibility issues, so you might be able
    to resolve your issue there.

    I hadn't been to that page recently. I had taken "technical inquiries"
    to refer to nutritional issues. Having recently explored "PDF to text"
    tools, I can comment on format changes that would improve things. At the
    same time I can ask how to order a printed copy.


    I have a combination of vision and perception problems which make
    using a PDF viewer impractical. pdf2txt can convert it to HTML. It's
    still not easily usable for me but examining the produced file and
    reading various pdfminer.six web pages suggest possible solution.

    [ https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md
    [ points readers to a chat group at gitter [
    https://gitter.im/pdfminer-six/Lobby ]. That comes up blank page in
    current SeaMonkey configured for my needs..

    (2) The chat group comes up fine for me in FF 128.4.0esr (64-bit) once I enable javascript. It uses the element framework, which may give you
    some hints as to how you might access it.

    Debian 12 comes with Firefox 115.14.0esr(64-bit). The site reports it as "unsupported" and recommends switching to Firefox ;/
    I've never used chat. I'll see if local library has a compatible
    browser. Then I can determine if chat is worth the upgrade hassle.


    I'm looking for a mailing list or USENET forum where my questions
    would be On Topic.

    (3) Presumably as well as the github site, you've also found its documentation at https://pdfminersix.readthedocs.io/en/latest/ Maybe
    that will answer some questions?

    It didn't. It was last link on page that had led me to gitter.


    Suggestions?
    TIA

    (4) It seems Pieter Marsman (pietermarsman) is active on gitter and may
    be the best developer to contact for further information, but I've no
    idea how :(



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to David Wright on Thu Oct 31 12:20:01 2024
    On 10/30/24 10:49 AM, David Wright wrote:
    On Wed 30 Oct 2024 at 04:53:27 (-0500), Richard Owlett wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
    ].

    Is this actually true? The document itself contains on page 2:

    "Persons using assistive technology should be able to access
    information in this report. Persons with disabilities who require
    alternative means of communication for program information (e.g.,
    Braille, large print, audiotape, American Sign Language, etc.)
    should contact the responsible Agency or USDA’s TARGET Center at
    (202) 720-2600 (voice and TTY) or contact USDA through the Federal
    Relay Service at (800) 877-8339. Additionally, program information
    may be made available in languages other than English."

    Cheers,
    David.



    I'll try the FNS.FoodPlans@usda.gov address first.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nate Bargmann@21:1/5 to Richard Owlett on Thu Oct 31 14:20:01 2024
    * On 2024 31 Oct 06:02 -0500, Richard Owlett wrote:
    On 10/30/24 10:12 AM, debian-user@howorth.org.uk wrote:
    Richard Owlett <rowlett@access.net> wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that seems to be only available as a PDF document [ https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
    ].

    No direct answers from me, I'm afraid but some suggestions.

    (1) Although I've only found the document as a PDF, a page it is
    linked from <https://www.fns.usda.gov/research/cnpp/usda-food-plans>
    gives an email address FNS.FoodPlans@usda.gov for "technical
    inquiries". I'd guess your questions fall under that description, and
    the US gov is fairly good on accessibility issues, so you might be able
    to resolve your issue there.

    I hadn't been to that page recently. I had taken "technical inquiries" to refer to nutritional issues. Having recently explored "PDF to text" tools, I can comment on format changes that would improve things. At the same time I can ask how to order a printed copy.


    I have a combination of vision and perception problems which make
    using a PDF viewer impractical. pdf2txt can convert it to HTML. It's still not easily usable for me but examining the produced file and reading various pdfminer.six web pages suggest possible solution.

    [ https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md
    [ points readers to a chat group at gitter [ https://gitter.im/pdfminer-six/Lobby ]. That comes up blank page in current SeaMonkey configured for my needs..

    (2) The chat group comes up fine for me in FF 128.4.0esr (64-bit) once I enable javascript. It uses the element framework, which may give you
    some hints as to how you might access it.

    Debian 12 comes with Firefox 115.14.0esr(64-bit). The site reports it as "unsupported" and recommends switching to Firefox ;/

    Currently, my Debian 12 installations have Firefox 128.3.1esr-1~deb12u1. Aptitude also shows that this version is from the Debian Security team.
    You do have security updates enabled, right?

    I opened the USDA PDF in your OP with this version of Firefox with zero
    issues and can zoom to ridiculous levels and also the opened the links
    in the quoted text above just fine.

    Keep abreast of updates and a lot of "issues" disappear.

    - Nate

    --
    "The optimist proclaims that we live in the best of all
    possible worlds. The pessimist fears this is true."
    Web: https://www.n0nb.us
    Projects: https://github.com/N0NB
    GPG fingerprint: 82D6 4F6B 0E67 CD41 F689 BBA6 FB2C 5130 D55A 8819


    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQSC1k9rDmfNQfaJu6b7LFEw1VqIGQUCZyN8JAAKCRD7LFEw1VqI GQ4/AKCA6ZHbClFuyYPRMg2np9n99FxjaQCgqx3BoobXzMmcH5CRtDP6CVeXxMo=
    =+vOL
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Nate Bargmann on Thu Oct 31 15:10:01 2024
    On 10/31/24 7:46 AM, Nate Bargmann wrote:
    * On 2024 31 Oct 06:02 -0500, Richard Owlett wrote:
    On 10/30/24 10:12 AM, debian-user@howorth.org.uk wrote:
    Richard Owlett <rowlett@access.net> wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [
    https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf
    ].

    No direct answers from me, I'm afraid but some suggestions.

    (1) Although I've only found the document as a PDF, a page it is
    linked from <https://www.fns.usda.gov/research/cnpp/usda-food-plans>
    gives an email address FNS.FoodPlans@usda.gov for "technical
    inquiries". I'd guess your questions fall under that description, and
    the US gov is fairly good on accessibility issues, so you might be able
    to resolve your issue there.

    I hadn't been to that page recently. I had taken "technical inquiries" to
    refer to nutritional issues. Having recently explored "PDF to text" tools, I >> can comment on format changes that would improve things. At the same time I >> can ask how to order a printed copy.


    I have a combination of vision and perception problems which make
    using a PDF viewer impractical. pdf2txt can convert it to HTML. It's
    still not easily usable for me but examining the produced file and
    reading various pdfminer.six web pages suggest possible solution.

    [ https://github.com/pdfminer/pdfminer.six/blob/master/CONTRIBUTING.md >>>> [ points readers to a chat group at gitter [
    https://gitter.im/pdfminer-six/Lobby ]. That comes up blank page in
    current SeaMonkey configured for my needs..

    (2) The chat group comes up fine for me in FF 128.4.0esr (64-bit) once I >>> enable javascript. It uses the element framework, which may give you
    some hints as to how you might access it.

    Debian 12 comes with Firefox 115.14.0esr(64-bit). The site reports it as
    "unsupported" and recommends switching to Firefox ;/

    Currently, my Debian 12 installations have Firefox 128.3.1esr-1~deb12u1. Aptitude also shows that this version is from the Debian Security team.
    You do have security updates enabled, right?

    I opened the USDA PDF in your OP with this version of Firefox with zero issues and can zoom to ridiculous levels and also the opened the links
    in the quoted text above just fine.

    Keep abreast of updates and a lot of "issues" disappear.

    - Nate


    In general <GRIN> I agree with you.
    I date back to Netscape {Navigator IIRC}. I continued with SeaMonkey
    having found design decisions with Firefox/Thunderbird annoying. I've
    used Firefox when desperate to access peculiar sites. Now at 80+ I'm
    reasonably set in my ways. YMMV ;}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nate Bargmann@21:1/5 to Richard Owlett on Thu Oct 31 23:10:01 2024
    * On 2024 31 Oct 09:07 -0500, Richard Owlett wrote:
    On 10/31/24 7:46 AM, Nate Bargmann wrote:
    * On 2024 31 Oct 06:02 -0500, Richard Owlett wrote:
    Currently, my Debian 12 installations have Firefox 128.3.1esr-1~deb12u1. Aptitude also shows that this version is from the Debian Security team.
    You do have security updates enabled, right?

    I opened the USDA PDF in your OP with this version of Firefox with zero issues and can zoom to ridiculous levels and also the opened the links
    in the quoted text above just fine.

    Keep abreast of updates and a lot of "issues" disappear.

    - Nate


    In general <GRIN> I agree with you.
    I date back to Netscape {Navigator IIRC}. I continued with SeaMonkey having found design decisions with Firefox/Thunderbird annoying. I've used Firefox when desperate to access peculiar sites. Now at 80+ I'm reasonably set in my ways. YMMV ;}

    I date back to some Quarterdeck thing my first ISP provided way back and
    then migrated to Navigator not long after. The UI isn't really all that different as I recall. Regardless, if your installation has the Firefox
    115 version then it is not only out of date but is a real security risk.
    My /etc/apt/sources.list contains:

    deb https://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware
    deb-src https://security.debian.org/debian-security bookworm-security main contrib non-free non-free-firmware

    This will provide your installation with security updates and the latest Firefox ESR which just received a security update since my last post and
    is now at version 128.4.0esr-1~deb12u1.

    - Nate

    --
    "The optimist proclaims that we live in the best of all
    possible worlds. The pessimist fears this is true."
    Web: https://www.n0nb.us
    Projects: https://github.com/N0NB
    GPG fingerprint: 82D6 4F6B 0E67 CD41 F689 BBA6 FB2C 5130 D55A 8819


    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQSC1k9rDmfNQfaJu6b7LFEw1VqIGQUCZyP+iAAKCRD7LFEw1VqI GcmOAJ0XevLfJtnAFJ1uvf5/XgTIK9qQ/wCghUOk+MJ7ZNNUsVq/iHk7UdbohMI=
    =XISh
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Owlett@21:1/5 to Richard Owlett on Fri Nov 1 16:00:02 2024
    On 10/31/24 6:14 AM, Richard Owlett wrote:
    On 10/30/24 10:49 AM, David Wright wrote:
    On Wed 30 Oct 2024 at 04:53:27 (-0500), Richard Owlett wrote:
    I'm attempting to read a USDA document "Thrifty Food Plan,2021" that
    seems to be only available as a PDF document [
    https://fns-prod.azureedge.us/sites/default/files/resource-files/TFP2021.pdf

    ].

    Is this actually true? The document itself contains on page 2:

      "Persons using assistive technology should be able to access
       information in this report. Persons with disabilities who require
       alternative means of communication for program information (e.g.,
       Braille, large print, audiotape, American Sign Language, etc.)
       should contact the responsible Agency or USDA’s TARGET Center at
       (202) 720-2600 (voice and TTY) or contact USDA through the Federal
       Relay Service at (800) 877-8339. Additionally, program information
       may be made available in languages other than English."

    Cheers,
    David.



    I'll try the FNS.FoodPlans@usda.gov address first.


    They are sending me a printed copy. ~25 hr turnaround.
    Is that a record for a federal agency?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)