Posts

Showing posts from 2018

Scraping Baseball-Reference.com with Python BeautifulSoup and pandas

This handy little program scrapes batting statistics data from an HTML table on the Boston Red Sox page at www.baseball-reference.com . It converts that into a pandas DataFrame, then does various cleanup to the data set. And then the data is inserted into a PostgreSQL table. # -*- coding: utf-8 -*- """ Get Red Sox batting statistics from baseball-reference.com. Turn it into a pandas DataFrame. Insert the data into PostgreSQL. """ import requests from bs4 import BeautifulSoup import lxml import pandas as pd import os from sqlalchemy import create_engine import psycopg2 import io DATABASE_URL=os.environ['DATABASE_URL'] def red_sox_batting_stats():     # Get a page from the web     url = 'https://www.baseball-reference.com/teams/BOS/2018.shtml'     response = requests.get(url)     # Process page from the web.     soup = BeautifulSoup(response.text, 'lxml')     # Find the batting stats table.     table = soup.find('table',

Remote GUI Access from Windows to Linux

Image
Set up remote GUI access to a Linux host from a Windows host with VNC (Virtual Network Computing) . It's easy and takes only a few minutes. It took me way longer to document it than to do it, and I had not done it for a few years, so that included time to research. When using virtual machines, GUI access can be very clunky even when the VM is hosted on your own fast, powerful, local machine. VNC provides a much faster and better experience. Description: vncserver is used to start a VNC desktop. vncserver is a Perl script which simplifies the process of starting an Xvnc server. It runs Xvnc with appropriate options and starts a window manager on the VNC desktop. Here is a good implementation of VNC: TigerVNC http://tigervnc.org/ Install the server software on the Linux host. Install the server on the Linux host. I use CentOS 7. I did this as root. You can use sudo if it makes you happy. yum install tigervnc-server.x86_64 View documentat

The View from Here and There

I wrote and recorded a new song today, called The View from Here and There. You can hear it here .

Date Dimension Table Definition

A few weeks ago I found this definition of a date dimension table, created by a smart person named  Nicholas Duffy . I'm placing it here so I can find it easily. DROP TABLE IF EXISTS dim_date; CREATE TABLE dim_date ( date_dim_id INT NOT NULL, date_actual DATE NOT NULL, epoch BIGINT NOT NULL, day_suffix VARCHAR(4) NOT NULL, day_name VARCHAR(9) NOT NULL, day_of_week INT NOT NULL, day_of_month INT NOT NULL, day_of_quarter INT NOT NULL, day_of_year INT NOT NULL, week_of_month INT NOT NULL, week_of_year INT NOT NULL, week_of_year_iso CHAR(10) NOT NULL, month_actual INT NOT NULL, month_name VARCHAR(9) NOT NULL, month_name_abbreviated CHAR(3) NOT NULL, quarter_actual INT NOT NULL, quarter_name VARCHAR(9) NOT NULL, year_actual INT NOT

Kingsville, Ontario Lies to the South of Uxbridge, Massachusetts, and West

While looking at Google Maps for the umpteenth time, I finally noticed that Canada comes very far south near Detroit. It appeared to lie further south than my town, Uxbridge, Massachusetts, which is on the border between Massachusetts and Rhode Island, which is not an island. I looked it up. Sure enough, it's true. The uninhabited Pelee Island in Lake Erie, not far north of Sandusky, Ohio is, as I learned, the southernmost point in Canada. Leaving that aside, though,  one can see on a map that Kingsville, Ontario is the southernmost town in Canada, at 42° 2' 10.68" N 82° 44' 20.4" W. The location of Uxbridge is 42° 4' 37.92" N 71° 37' 49.08" W. That puts Uxbridge 2 minutes, 27 seconds north of the southernmost town in Canada. That means I can get in my car and head southwest to Canada. And I might.

tomoptamist

to-mop-ta-mist  \ tǝ-'mäp-tǝ-mǝst \ n  [fr. tomato + optimist] : one who is optimistic about tomatoes -  to-mop-ta-mis-tic  adj  - to-mop-ta-mis-ti-cal adj - to-mop-ta-mis-ti-cal-ly adv Sample usage: 1. Of some leftover tomato : "I'll leave it on the counter for now. Someone might want it. I'm a tomoptamist.' 2. Of a tomato of unknown quality : I'll just try it. I like to think tomoptamistically.' Origin: I just made it up.

Creating a PostgreSQL Database, the Use of Database Templates, and Simple Command Line Tool Commands

A PostgreSQL server installation includes database templates. These are used when creating new databases. By default, template1 is copied when you create a new database. This makes it easy to create copies of a customized database design. You can customize template1, and then easily make as many copies as you like, and they will all start out with the same objects, such as tables, languages, etc. Specify template0 to create a standard, out-of-the-box database with no custom additions that may have been added to your server installation. Make sure that you do not make any changes to template0. Create a database using template0, from a command line A database can be created in various ways. Here we will create one from the command line, using the tool PostgreSQL createdb. By default, the logged in user will be the owner of a newly created database. While logged in as user postgres, from a command line, create a database named exampledb, using template0: bash-4.2$ createdb

Python and PostgreSQL: Create a PostgreSQL Database Connection Using a Config File

This technique can easily be applied to other databases. 1. pip install two packages if you do not already have them in your Python environment. (Search the internet for ‘python pip install’ if you need help.) A. psycopg2: To interact with PostgreSQL databases B. configparser: To interact with config files 2. Create a config file named config.ini. Paste in the text below: [ default] host = localhost db_name = your_database_name postgres_user = postgres_username postgres_password = your_password Edit the details to match your environment. 3. Create a Python file named main.py. Paste in the text below: from configparser import SafeConfigParser import psycopg2 parser = SafeConfigParser() parser.read('config.ini') host = parser.get('default', 'host') db_name = parser.get('default', 'db_name') postgres_user = parser.get('default', 'postgres_user') postgres_password = parser.get('default', 'po

Convert a wav file to mp3 with FFmpeg, pydub, and three lines of Python

Ten minutes ago I played a song I had not heard before and said, that was a good song! I should send it to Sam. But it's in WAV format so it's really big. Wouldn't it be nice if Python could convert it to mp3 to make it smaller? Ten minutes later it was done. Amazing. Here are the easy steps to follow if you would like to try it. 1. Search the internet for 'python convert wav mp3'. 2. Download FFmpeg and add it to your system path: http://adaptivesamples.com/how-to-install-ffmpeg-on-windows/ 3. 'pip install pydub', then copy some sample code from here and make slight adjustments to match your file paths and song file name: https://github.com/jiaaro/pydub 4. Run the script: from pydub import AudioSegment song = AudioSegment.from_wav("G:/Music/Rock/The I Don't Cares/Wild Stab/Sorry for Tomorrow Night.wav") song.export("Sorry for Tomorrow Night.mp3", format="mp3") 5. Email the mp3 file to Sam. And do

The Use of Views Has No Impact on Query Performance in PostgreSQL

More evidence that PostgreSQL is the best database. The documentation supports one of my strongest beliefs about database design: Making liberal use of views is a key aspect of good SQL database design. Views allow you to encapsulate the details of the structure of your tables, which might change as your application evolves, behind consistent interfaces. Views can be used in almost any place a real table can be used. Building views upon other views is not uncommon. Source: https://www.postgresql.org/docs/10/static/tutorial-views.html In answer to a question about the impact of views on query performance, veteran PostgreSQL core developer Bruce Momjian gave the following answer: They have no impact whatsoever. The parser thinks the view is a real table. The query structure gets passed to the re-writer and the re-writer wipes out the view and substitutes the underlying tables. And that re-write is what gets passed to the query optimizer. The optimizer doesn't see the v

The Internet Can Be Useful If Used Properly

This post is here because I had not added one for two-and-a-half years. In the interim, the world has not suffered from the lack of my posting. (It has suffered for other reasons, but let's not get into that. The idea here is not to be depressing, no, far from it.) Nonetheless, I feel compelled to put something here, if only to be able to say to myself, "Good job, Pete. You have done it. Let this be a start, and soon you will add many fabulous things to your blog. It will be part of a personal Renaissance. Let many flowers bloom." Or something along those lines. Something very positive and good. Anyway, music continues to be really good. You only need to look for it. One of the best ways is via YouTube. Being a reasonably intelligent person, I find a visit to the front page of YouTube demoralizing. I really don't want to even know what is popular. Knowing about popular things is a great way for me to get really discouraged. However, if you search for so