How to extract latitude and longitude in a web

I have extract some infos in https://www.peakbagger.com/list.aspx?lid=5651 under the first column https://www.peakbagger.com/peak.aspx?pid=10882

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.peakbagger.com/peak.aspx?pid=10882'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

a= soup.select("td")
a

would want to only retrieve the latitude and longitude which is 35.360638, 138.727347 from the output I get.

It is in

E<br/>35.360638, 138.727347 

And also, is there a better way to retrieve all the latitude and longitude from the links of /www.peakbagger.com/list.aspx?lid=5651 other than doing one by one?

Thanks

Answer

As Brutus mentioned, it is very specific and if you wont to use etree this could be an alternativ.

  • find() <td> with the string Latitude/Longitude (WGS84)
  • findNext() its next <td>
  • grab its contents
  • replace the , and split it by the whitespace
  • by slicing the result to the first two elements you will get the list with lat and long.

.

data = soup.find('td', string='Latitude/Longitude (WGS84)')
            .findNext('td')
            .contents[2]
            .replace(',','')
            .split()[:2]

data

EDIT

You have a list of urls and loop over it – To be considerate of the website and not to be banned, run over the pages with some delay (time.sleep()).

import time
import requests
from bs4 import BeautifulSoup
urls = ['https://www.peakbagger.com/peak.aspx?pid=10882',
 'https://www.peakbagger.com/peak.aspx?pid=10866',
 'https://www.peakbagger.com/peak.aspx?pid=10840',
 'https://www.peakbagger.com/peak.aspx?pid=10868',
 'https://www.peakbagger.com/peak.aspx?pid=10832']

data = {}

for url in urls:
    
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'lxml')
    

    ll= soup.find('td', string='Latitude/Longitude (WGS84)')
                .findNext('td')
                .contents[2]
                .replace(',','')
                .split()[:2]
    
    data[soup.select_one('h1').get_text()]={
        'url':url,
        'lat':ll[0],
        'long':ll[1]
    }
 
    time.sleep(3)

data

Output

{'Fuji-san, Japan': {'url': 'https://www.peakbagger.com/peak.aspx?pid=10882',
  'lat': '35.360638',
  'long': '138.727347'},
 'Kita-dake, Japan': {'url': 'https://www.peakbagger.com/peak.aspx?pid=10866',
  'lat': '35.674537',
  'long': '138.238833'},
 'Hotaka-dake, Japan': {'url': 'https://www.peakbagger.com/peak.aspx?pid=10840',
  'lat': '36.289203',
  'long': '137.647986'},
 'Aino-dake, Japan': {'url': 'https://www.peakbagger.com/peak.aspx?pid=10868',
  'lat': '35.646037',
  'long': '138.228292'},
 'Yariga-take, Japan': {'url': 'https://www.peakbagger.com/peak.aspx?pid=10832',
  'lat': '36.34198',
  'long': '137.647625'}}