Posts

Scraping Baseball-Reference.com with Python BeautifulSoup and pandas

This handy little program scrapes batting statistics data from an HTML table on the Boston Red Sox page at www.baseball-reference.com . It converts that into a pandas DataFrame, then does various cleanup to the data set. And then the data is inserted into a PostgreSQL table. # -*- coding: utf-8 -*- """ Get Red Sox batting statistics from baseball-reference.com. Turn it into a pandas DataFrame. Insert the data into PostgreSQL. """ import requests from bs4 import BeautifulSoup import lxml import pandas as pd import os from sqlalchemy import create_engine import psycopg2 import io DATABASE_URL=os.environ['DATABASE_URL'] def red_sox_batting_stats():     # Get a page from the web     url = 'https://www.baseball-reference.com/teams/BOS/2018.shtml'     response = requests.get(url)     # Process page from the web.     soup = BeautifulSoup(response.text, 'lxml')     # Find the batting stats table.     table = soup.find('table',

Remote GUI Access from Windows to Linux

Image
Set up remote GUI access to a Linux host from a Windows host with VNC (Virtual Network Computing) . It's easy and takes only a few minutes. It took me way longer to document it than to do it, and I had not done it for a few years, so that included time to research. When using virtual machines, GUI access can be very clunky even when the VM is hosted on your own fast, powerful, local machine. VNC provides a much faster and better experience. Description: vncserver is used to start a VNC desktop. vncserver is a Perl script which simplifies the process of starting an Xvnc server. It runs Xvnc with appropriate options and starts a window manager on the VNC desktop. Here is a good implementation of VNC: TigerVNC http://tigervnc.org/ Install the server software on the Linux host. Install the server on the Linux host. I use CentOS 7. I did this as root. You can use sudo if it makes you happy. yum install tigervnc-server.x86_64 View documentat

The View from Here and There

I wrote and recorded a new song today, called The View from Here and There. You can hear it here .

Date Dimension Table Definition

A few weeks ago I found this definition of a date dimension table, created by a smart person named  Nicholas Duffy . I'm placing it here so I can find it easily. DROP TABLE IF EXISTS dim_date; CREATE TABLE dim_date ( date_dim_id INT NOT NULL, date_actual DATE NOT NULL, epoch BIGINT NOT NULL, day_suffix VARCHAR(4) NOT NULL, day_name VARCHAR(9) NOT NULL, day_of_week INT NOT NULL, day_of_month INT NOT NULL, day_of_quarter INT NOT NULL, day_of_year INT NOT NULL, week_of_month INT NOT NULL, week_of_year INT NOT NULL, week_of_year_iso CHAR(10) NOT NULL, month_actual INT NOT NULL, month_name VARCHAR(9) NOT NULL, month_name_abbreviated CHAR(3) NOT NULL, quarter_actual INT NOT NULL, quarter_name VARCHAR(9) NOT NULL, year_actual INT NOT

Kingsville, Ontario Lies to the South of Uxbridge, Massachusetts, and West

While looking at Google Maps for the umpteenth time, I finally noticed that Canada comes very far south near Detroit. It appeared to lie further south than my town, Uxbridge, Massachusetts, which is on the border between Massachusetts and Rhode Island, which is not an island. I looked it up. Sure enough, it's true. The uninhabited Pelee Island in Lake Erie, not far north of Sandusky, Ohio is, as I learned, the southernmost point in Canada. Leaving that aside, though,  one can see on a map that Kingsville, Ontario is the southernmost town in Canada, at 42° 2' 10.68" N 82° 44' 20.4" W. The location of Uxbridge is 42° 4' 37.92" N 71° 37' 49.08" W. That puts Uxbridge 2 minutes, 27 seconds north of the southernmost town in Canada. That means I can get in my car and head southwest to Canada. And I might.

tomoptamist

to-mop-ta-mist  \ tǝ-'mäp-tǝ-mǝst \ n  [fr. tomato + optimist] : one who is optimistic about tomatoes -  to-mop-ta-mis-tic  adj  - to-mop-ta-mis-ti-cal adj - to-mop-ta-mis-ti-cal-ly adv Sample usage: 1. Of some leftover tomato : "I'll leave it on the counter for now. Someone might want it. I'm a tomoptamist.' 2. Of a tomato of unknown quality : I'll just try it. I like to think tomoptamistically.' Origin: I just made it up.

Creating a PostgreSQL Database, the Use of Database Templates, and Simple Command Line Tool Commands

A PostgreSQL server installation includes database templates. These are used when creating new databases. By default, template1 is copied when you create a new database. This makes it easy to create copies of a customized database design. You can customize template1, and then easily make as many copies as you like, and they will all start out with the same objects, such as tables, languages, etc. Specify template0 to create a standard, out-of-the-box database with no custom additions that may have been added to your server installation. Make sure that you do not make any changes to template0. Create a database using template0, from a command line A database can be created in various ways. Here we will create one from the command line, using the tool PostgreSQL createdb. By default, the logged in user will be the owner of a newly created database. While logged in as user postgres, from a command line, create a database named exampledb, using template0: bash-4.2$ createdb