Name that Baby!

I had another itch to create a fun Data Visualization.  This time, I explore baby names.  The Social Security Administration has a great webpage to look at popular baby names by year.  I knew this data would be perfect for a Tableau visualization.  The problem was, the CGI interface only allowed you to view one year of data at a time and didn't have a data export link.  So, I was faced with two options: Manually go to each of the 135 pages and copy/paste the data into Excel, or write a script.

I knew the social security web page was a CGI program (meaning all processing and HTML generation happens on the server).  It allowed the user to POST to it and pass in a "year" parameter, and in return would get an HTML page with a table of baby names for that year.  I knew I could write a script to loop through the years 1880-2015 and call the page and somehow parse the HTML results into a table I could easily turn into a tableau data set.

I've been getting re-acquainted with coding and have been learning (and loving) Python.  It has turned into a swiss-army-knife of sorts.  There is an http request extension which allows you to generate a POST request and receive the resulting HTML as a string.  So, looping through the years and generating POST requests and receiving HTML output was the easy part.  The hard part was turning that into a data set (either excel or csv) that could be imported.

I could have parsed the HTML to create a csv file by looking for the <TD> and <TR> tags, but I knew for this purpose it was over-engineering.  However, I knew that in a web browser, I could copy and paste the contents of an HTML table and paste them into Excel directly.  Since this was a one-time-use of this data, I thought it was a good approach.

I created a python script to generate an HTML table with 2,700 rows: one row for each rank (1-20) for each year (1880-2015).  To find the "relevant" part of the HTML, I just ham-fisted a find() function to find the start of where their HTML table began.  Of course, this method is not flexible, and would break if they changed their CGI code.  But, again, for a one-time-use, I was happy with it.  I also, for good measure, included a short sleep() at the bottom of each loop just so I didn't accidentally simulate a Denial-of-Service attack.

The code is below, but I AM NOT CERTIFYING THAT IT WORKS.  If you use it it is AT YOUR OWN RISK.  I claim NO RESPONSIBILITIES for what you use this code to do.  That said, it worked for me as a quick one-time-use hack.  I think Python is a great tool to have in the toolkit, and it has a wide array of extensions for things just like this.

So, sorry for geeking out for a bit.  Below you'll see the final result.  My goal was a clean, intuitive, interactive and fun visualization.  

Click on the thumbnail below to see it on Tableau Public!

Some fun facts I discovered while playing with it:

Did you know Bertha was once a popular baby name?

Did you know John is like the Michael Jordan of Baby Names (persistently #1).  Ironically, Michael is the Kobe Bryant of Baby Names, the more recent #1.  

I'm looking forward to all of the college-aged Jacob's entering the workforce in about 10 years.

Emma has got to be the "comeback name of the year."  It only took 100 years for it to come back.

Again, please let me know what you think!

Tableau Dashboard: Labor Statistics 1990-2015

I found some interesting data at the Bureau of Labor Statistics.  I wanted to make a dashboard that focused on one measure (in this case, number of employees) across a time dimension, a categorical dimension and a geographic dimension.    It's a little slow to load because it's driven by 1.2 million rows of data (25 years x every industry x every county).



I tried to emphasize clean and usable visualizations that allowed you to answer interesting questions.  The version above is narrow format to fit on this page.  Click here to view the full version which includes additional graphs.  As always I appreciate any feedback or comments about this.

Analyzing Dice Statistics with Tableau

And I thought it couldn't get nerdier than Bowling and Data Visualzation, oh boy...

More powerful dice simulation tools exist, but I wanted to challenge myself to make an interactive and attractive way to visualize common RPG dice rolls (in this case, damage).  I am also going to work on a "damage per round" simulator where you can "build" a character and monster and have a visualization that shows you the average damage you might do per round as well as the expected number of rounds to defeat it.

I generated the data using a little bit of Python scripting.  I wanted to challenge myself to use python.  I can't take credit for the algorithm, but here's the script.  It generates text in csv format:


And here is the visualization, which is available on Tableau public. 

Click to view on Tableau Public