I want to extract / scrape the “Matrix form” dataset from the BCS website [1], a.k.a., the data appeared in the 3rd column.
I tried with the following python code snippet, but still failed to figure out the trick:
import requests
from bs4 import BeautifulSoup
import re
proxies = {
'http': 'socks5h://127.0.0.1:18888',
'https': 'socks5h://127.0.0.1:18888'
}
requests.packages.urllib3.disable_warnings()
r = requests.get('https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane', proxies=proxies, verify=False)
soup = BeautifulSoup(r.content, features="lxml")
table = soup.find('table')
id = table.find_all('id')
My python environment is as follows:
werner@X10DAi:~$ pyenv shell datasci
(datasci) werner@X10DAi:~$ python --version
Python 3.11.1
Any tips will be appreciated.
[1] https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane
Regards,
Zhao
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 546 |
Nodes: | 16 (2 / 14) |
Uptime: | 39:54:04 |
Calls: | 10,392 |
Files: | 14,064 |
Messages: | 6,417,198 |