Scraping Baseball-Reference.com with Python BeautifulSoup and pandas
This handy little program scrapes batting statistics data from an HTML table on the Boston Red Sox page at www.baseball-reference.com . It converts that into a pandas DataFrame, then does various cleanup to the data set. And then the data is inserted into a PostgreSQL table. # -*- coding: utf-8 -*- """ Get Red Sox batting statistics from baseball-reference.com. Turn it into a pandas DataFrame. Insert the data into PostgreSQL. """ import requests from bs4 import BeautifulSoup import lxml import pandas as pd import os from sqlalchemy import create_engine import psycopg2 import io DATABASE_URL=os.environ['DATABASE_URL'] def red_sox_batting_stats(): # Get a page from the web url = 'https://www.baseball-reference.com/teams/BOS/2018.shtml' response = requests.get(url) # Process page from the web. soup = BeautifulSoup(response.text, 'lxml') # Find the batting stats table. table = soup.find('table',