How to extract supplementary info if certain link reside in the div Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of How to extract supplementary info if certain link reside in the div without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I would like to extract an url together with supplementary information from each div

Currently, Im able to extract all the download url as:

for link in soup.find_all('a', href=True):
    if 'https://figshare.com/ndownloader/files' in link['href']:
        print(link['href'])

But, I also would like to extract the infromation reside in class="_2qbpz". For example, in the html below, the information within the class="_2qbpz" is s01_060926_1n.set.

<div class="_2bEHu"><button class="xaHFp" tabindex="-1" type="button"><span class="_2qbpz">s01_060926_1n.set.</span><span class="_2f04f">zip</span><span class="_3c2ks"> (233.62 MB)</span></button>
  <div class="ElLvs"><button aria-label="View file" tabindex="-1" class="_1Nt_k _1jrLT _2Cyxu" type="button"><svg aria-hidden="true" fill="transparent" height="36" preserveAspectRatio="xMidYMid meet" width="20" class="_746D7" focusable="false" viewBox="0 0 20 36" xmlns="https://www.w3.org/2000/svg"><path clip-rule="evenodd" d="M20 17.965C20 18.89 16.713 24 10 24c-6.095 0-10-5.134-10-6.035C0 17.161 3.819 12 9.943 12 16.498 12 20 17.161 20 17.965zm-4.887.05c0-2.659-2.301-4.814-5.14-4.814-2.84 0-5.14 2.155-5.14 4.814 0 2.658 2.3 4.813 5.14 4.813 2.839 0 5.14-2.155 5.14-4.813zm-5.14-3.916c-2.31 0-4.183 1.753-4.183 3.916 0 2.163 1.873 3.917 4.183 3.917 2.31 0 4.182-1.753 4.182-3.917 0-2.163-1.873-3.916-4.182-3.916z" fill-rule="evenodd"></path></svg></button>
    <a
      aria-label="Download file" href="https://figshare.com/ndownloader/files/14249783" class="_2tVAZ"><svg aria-hidden="true" fill="transparent" height="36" preserveAspectRatio="xMidYMid meet" width="16" class="_1VlIg" focusable="false" viewBox="0 0 16 36" xmlns="https://www.w3.org/2000/svg"><path clip-rule="evenodd" d="m9.807 18.891 1.698-1.66a.82.82 0 0 1 1.14 0l.855.836a.776.776 0 0 1 0 1.114L8.806 23.77a.82.82 0 0 1-1.14 0L2.972 19.18a.776.776 0 0 1 0-1.114l.855-.836a.82.82 0 0 1 1.14 0l1.697 1.659V11a1 1 0 0 1 1-1h1.143a1 1 0 0 1 1 1v7.891zM0 25.5a.5.5 0 0 1 .5-.5h15a.5.5 0 0 1 .5.5v1a.5.5 0 0 1-.5.5H.5a.5.5 0 0 1-.5-.5v-1z" fill-rule="evenodd"></path></svg></a>
  </div>
</div>

May I know how to properly extract the said supplementary information?

import requests
from bs4 import BeautifulSoup

URL = "https://figshare.com/articles/dataset/Multi-channel_EEG_recordings_during_a_sustained-attention_driving_task_preprocessed_dataset_/7666055/3"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

for link in soup.find_all('a', href=True):
    if 'https://figshare.com/ndownloader/files' in link['href']:
        print(link['href'])

Answer

you can directly call the API, play with offset and limit

import requests

limit = 10

data = requests.post('https://figshare.com/api/graphql?thirdPartyCookies=true&type=current&operation=getPublicItemFiles',
             json = {"operationName":"getPublicItemFiles","variables":{"itemId":7666055,"version":3,"offset":20,"limit":limit},"query":"query getPublicItemFiles($itemId: Int!, $version: Int, $offset: Int!, $limit: Int!) {n  publicItem: itemVersion(id: $itemId, version: $version) {n    idn    files(offset: $offset, limit: $limit) {n      hasMoren      items: elements {n        idn        namen        statusn        extensionn        sizen        viewerTypen        mimeTypen        virusScanInfo {n          virusFoundn        }n        md5n        isLinkOnlyn        thumbn        previewMetan        suppliedMd5n        previewStaten        previewLocationn        downloadUrln      }n    }n  }n}n"}).json()

files = [[i['name'], i['downloadUrl']] for i in data['data']['publicItem']['files']['items']]
print(files)

# [['s13_060217m.set.zip', 'https://figshare.com/ndownloader/files/14249882'],
#  ['s14_060319m.set.zip', 'https://figshare.com/ndownloader/files/14249885'],
#  ['s14_060319n.set.zip', 'https://figshare.com/ndownloader/files/14249888'],
#  ['s22_080513m.set.zip', 'https://figshare.com/ndownloader/files/14249891'],
#  ['s22_090825n.set.zip', 'https://figshare.com/ndownloader/files/14249894'],
#  ['s22_090922m.set.zip', 'https://figshare.com/ndownloader/files/14249900'],
#  ['s22_091006m.set.zip', 'https://figshare.com/ndownloader/files/14249903'],
#  ['s23_060711_1m.set.zip', 'https://figshare.com/ndownloader/files/14249906'],
#  ['s31_061020m.set.zip', 'https://figshare.com/ndownloader/files/14249909'],
#  ['s31_061103n.set.zip', 'https://figshare.com/ndownloader/files/14249912']]
We are here to answer your question about How to extract supplementary info if certain link reside in the div - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji