I wish to extract CSV formatted data from a PDF document. [1]
Page ES-7 has a weekly grocery list for males grouped by age.
I need only the first and last columns.
Can someone point me in a suitable direction?
TIA
[1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006
Table ES-1. Thrifty Food Plan market baskets, quantities of food
purchased for a week, by age-gender group, 2006
I wish to extract CSV formatted data from a PDF document. [1]
Page ES-7 has a weekly grocery list for males grouped by age.
I need only the first and last columns.
Can someone point me in a suitable direction?
TIA
[1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006
Table ES-1. Thrifty Food Plan market baskets, quantities of food
purchased for a week, by age-gender group, 2006
Richard Owlett <rowlett@access.net> wrote:
I wish to extract CSV formatted data from a PDF document. [1]
Page ES-7 has a weekly grocery list for males grouped by age.
I need only the first and last columns.
Can someone point me in a suitable direction?
TIA
[1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006
Table ES-1. Thrifty Food Plan market baskets, quantities of food
purchased for a week, by age-gender group, 2006
If you look at
https://www.fns.usda.gov/cnpp/thrifty-food-plan-2021 instead, you can
find the underlying data in spreadsheet form (.xlsx). Perhaps that will
be an adequate substitute?
On 2025-02-21, David Wright <deblis@lionunicorn.co.uk> wrote:
[1] https://www.fns.usda.gov/cnpp/thrifty-food-plan-2006
Table ES-1. Thrifty Food Plan market baskets, quantities
of food purchased for a week, by age-gender group, 2006
I don't read PDFs /in/ the browser: it downloads it instead.
So while held captive at home by the weather, I dragged the mouse
across the Males table and dumped it in a file.
I get:
Access Denied
You don't have permission to access "http://www.fns.usda.gov/cnpp/thrifty-food-plan-2006" on this server. Reference #18.dd831002.1740148075.35e89c97
https://errors.edgesuite.net/18.dd831002.1740148075.35e89c97
On 21/02/2025 08:00, David Wright wrote:
I dragged the mouse
across the Males table and dumped it in a file.
David, I recall you mentioned xpdf in your messages. It allows to
select rectangular regions. Sometimes it is convenient since this
strategy does not depend on order of objects inside PDF files.
Other PDF viewers allows to conveniently select contiguous spans of
text, e.g. end of some line and beginning of next one. Unfortunately
enough PDF files have pieces of text put in almost random order. At
least in Firefox selection may work in a quite peculiar way skipping
some fragments and adding visually unrelated ones.
So selection of text in PDF files may strongly depend on viewer.
P.S. "pdftotext -layout" in some cases is better than without
"-layout".
When text file has properly aligned columns, instead of
"quoting" some spaces, it may be better to add TAB characters at
certain positions on each line. Perhaps LibreOffice Calc even has GUI
to select column widths during importing of text files.
On Fri 21 Feb 2025 at 21:20:45 (+0000), debian-user@howorth.org.uk wrote:
I get:
Access Denied
You don't have permission to access "http://www.fns.usda.gov/cnpp/thrifty-food-plan-2006" on this server. Reference #18.dd831002.1740148075.35e89c97
https://errors.edgesuite.net/18.dd831002.1740148075.35e89c97
Perhaps it depends on browser settings (and which browser),
or perhaps on where you are (your timezone is unknown), or
perhaps on your ISP.
in discussions about pdf utilities i've don't recall atril being mentioned it's become my goto viewer
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 147:57:34 |
Calls: | 10,383 |
Calls today: | 8 |
Files: | 14,054 |
D/L today: |
2 files (1,861K bytes) |
Messages: | 6,417,737 |