• =?UTF-8?Q?Extract_the_=E2=80=9CMatrix_form=E2=80=9D_dataset_from_BCS_we

    From hongyi.zhao@gmail.com@21:1/5 to All on Thu Dec 22 05:35:04 2022
    I want to extract / scrape the “Matrix form” dataset from the BCS website [1], a.k.a., the data appeared in the 3rd column.

    I tried with the following python code snippet, but still failed to figure out the trick:

    import requests
    from bs4 import BeautifulSoup
    import re

    proxies = {
    'http': 'socks5h://127.0.0.1:18888',
    'https': 'socks5h://127.0.0.1:18888'
    }

    requests.packages.urllib3.disable_warnings()
    r = requests.get('https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane', proxies=proxies, verify=False)
    soup = BeautifulSoup(r.content, features="lxml")

    table = soup.find('table')
    id = table.find_all('id')

    My python environment is as follows:

    werner@X10DAi:~$ pyenv shell datasci
    (datasci) werner@X10DAi:~$ python --version
    Python 3.11.1

    Any tips will be appreciated.

    [1] https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane

    Regards,
    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Passin@21:1/5 to hongy...@gmail.com on Thu Dec 22 12:34:05 2022
    On 12/22/2022 8:35 AM, hongy...@gmail.com wrote:
    I want to extract / scrape the “Matrix form” dataset from the BCS website [1], a.k.a., the data appeared in the 3rd column.

    I tried with the following python code snippet, but still failed to figure out the trick:

    Tell what you observed, and what you expected. For example, does the
    data get downloaded? Do you get error messages, and if so what are
    they? Does the id variable contain anything at all? Etc.

    import requests
    from bs4 import BeautifulSoup
    import re

    proxies = {
    'http': 'socks5h://127.0.0.1:18888',
    'https': 'socks5h://127.0.0.1:18888'
    }

    requests.packages.urllib3.disable_warnings()
    r = requests.get('https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane', proxies=proxies, verify=False)
    soup = BeautifulSoup(r.content, features="lxml")

    table = soup.find('table')
    id = table.find_all('id')

    My python environment is as follows:

    werner@X10DAi:~$ pyenv shell datasci
    (datasci) werner@X10DAi:~$ python --version
    Python 3.11.1

    Any tips will be appreciated.

    [1] https://www.cryst.ehu.es/cgi-bin/plane/programs/nph-plane_getgen?gnum=17&type=plane

    Regards,
    Zhao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)