Posts

Showing posts from August, 2018

Scraping Baseball-Reference.com with Python BeautifulSoup and pandas

This handy little program scrapes batting statistics data from an HTML table on the Boston Red Sox page at www.baseball-reference.com . It converts that into a pandas DataFrame, then does various cleanup to the data set. And then the data is inserted into a PostgreSQL table. # -*- coding: utf-8 -*- """ Get Red Sox batting statistics from baseball-reference.com. Turn it into a pandas DataFrame. Insert the data into PostgreSQL. """ import requests from bs4 import BeautifulSoup import lxml import pandas as pd import os from sqlalchemy import create_engine import psycopg2 import io DATABASE_URL=os.environ['DATABASE_URL'] def red_sox_batting_stats():     # Get a page from the web     url = 'https://www.baseball-reference.com/teams/BOS/2018.shtml'     response = requests.get(url)     # Process page from the web.     soup = BeautifulSoup(response.text, 'lxml')     # Find the batting stats table.     table = soup.find('table',

Remote GUI Access from Windows to Linux

Image
Set up remote GUI access to a Linux host from a Windows host with VNC (Virtual Network Computing) . It's easy and takes only a few minutes. It took me way longer to document it than to do it, and I had not done it for a few years, so that included time to research. When using virtual machines, GUI access can be very clunky even when the VM is hosted on your own fast, powerful, local machine. VNC provides a much faster and better experience. Description: vncserver is used to start a VNC desktop. vncserver is a Perl script which simplifies the process of starting an Xvnc server. It runs Xvnc with appropriate options and starts a window manager on the VNC desktop. Here is a good implementation of VNC: TigerVNC http://tigervnc.org/ Install the server software on the Linux host. Install the server on the Linux host. I use CentOS 7. I did this as root. You can use sudo if it makes you happy. yum install tigervnc-server.x86_64 View documentat