Web scraping financial data is done widely around the globe today. With Python being a versatile language, it can be used for a wide variety of tasks, including web scraping.
We’ll scrape Yahoo Finance from the web using the Python programming language.
Web scraping is divided into two parts:
- Fetching data by making an HTTP request. We’ll be using the requests library to make a GET request to the website that we want to scrape.
- Extracting important data by parsing the HTML DOM. We’ll be using the BeautifulSoup library to parse the HTML document that we get back from the website.
def ExtractField (sHtml, fieldName):
    it = sHtml.select_one(f'fin-streamer[data-field="{fieldName}"]')
    return it['data-value']
def ExtractValueByLabel (sHtml, labelName):
    label_pattern = re.compile(labelName)
    oLabel = sHtml.find('span', class_='label', string=label_pattern)
    if oLabel:
        return oLabel.find_next('span', class_='value').text.strip()
    else:
        return "N/A"
Use BeautifulSoup to parse HTML.
def ParseStockData(sHtml, oQuote):
    oQuote['PreviousClose'] = ExtractField(sHtml, 'regularMarketPreviousClose')
    oQuote['Open'] = ExtractField(sHtml, 'regularMarketOpen')
    sRange = ExtractField(sHtml, 'regularMarketDayRange')
    aRange = sRange.split(' - ')
    oQuote['Low'] = aRange[0]
    oQuote['High'] = aRange[1]
    sDividendYield = ExtractValueByLabel(sHtml, 'Forward Dividend')
    #sDividendYield = '6.64 (4.91%)'
    pattern = '[(]([0-9.]+)%[)]'
    match = re.search(pattern, sDividendYield)
    if match != None:
       oQuote['Yield'] = match.group(1)
    dt = ExtractValueByLabel(sHtml, 'Ex-Dividend Date')
    if (dt != 'N/A'):
        oQuote['ExDividendDate'] = datetime.strptime(dt, "%b %d, %Y").strftime('%Y-%m-%d')
Make a GET request to the target URL to get the raw HTML data.
def GetStockData(symbol, oQuote):
    url = 'https://finance.yahoo.com/quote/' + symbol
    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    page = requests.get(url, headers=headers)
    #print(page.text)
    soup = BeautifulSoup(page.text, 'html.parser')
    with open('C:/Export/soup.html', 'wb') as file:
        file.write(soup.prettify('utf-8'))
        
    # Find the specific div tag
    sHtml = soup.find('div', {'data-testid' : 'quote-statistics'})
    ParseStockData(sHtml, oQuote)
Get an array of Python objects containing the financial data of the company Nvidia.
symbol = 'NVDA'
oQuote = {}
oQuote['Symbol'] = symbol
GetStockData(symbol, oQuote)
print(oQuote)