Introduction
I'm a big Harry Potter fan, I grew up with the movies and have read the books at least three times. I've been following Maven Analytics, a platform that helps people learn new skills, create portfolios, and more, for a while on LinkedIn. The site has a data playground where you can explore and download sample datasets. This data playground has sample datasets that people can explore and analyze. Each month or so they have a challenge to create a visualization with that months dataset.
For December their challenge was analyzing the Harry Potter Movie Scripts in their Maven Magic Challenge. I decided to participate in the challenge and create my own analysis. You can read more about the challenge here.
The dataset contains 6 tables in CSV format:
Dialogue - with lines of dialogue from the 8 Harry Potter movies
Chapter - contains the chapters from the Harry Potter movie scripts
Movies - has all 8 movies from the Harry Potter series
Characters - contains the characters that had dialogue
Places - all the locations from the Wizarding World
Spells - with all the spells cast in the movie
Quick Links:
Table of Contents:
Process
I analyzed my data in Microsoft Excel, then used Tableau Public to create an interactive dashboard.
Microsoft Excel
I used Microsoft Excel to analyze all 6 tables. Below are what I analyzed for each table.
Dialogue
I separated the dialogue into four categories:
Who - which character had the most dialogue using =COUNTIF for each of the different characters
What - determined how many times a character's name was shouted and how many times a spell was used. I used a combination of =SUMPRODUCT and =LEN. I also determined the top 5 most commonly used spells and created a bar chart with the totals
Where - calculated the top 5 locations in the films
When - determined when the most lines of dialogue were spoken (which movie)
Chapter
Determined how many chapters were in each movie. Along with the average number of chapters per movie and the movie with the lowest and highest amount of chapters.
Movies
Calculated the average, minimum, maximum and total for the following:
Runtime
Budget
Box Office
Revenue
I created two charts both with the same information (revenue, box office, and budget) for each movie. I wanted to see which chart would be better for the information. I created a bar chart and a line chart.
Characters
Analyzed the species, gender, school and Patronus breakdown. Counted the different types of species, gender, schools and Patronus'. I also created several charts with this breakdown:
bar chart for the type of species count
pie chart for the gender (male vs. female)
bar chart for the schools (3 total)
bar chart for the Patronus
pie chart for the Hogwarts Houses (how many characters were in each)
Places
Determined the number of places per category. Along with the average number of places per category and which category had the most amount of places and the least.
Spells
For the spells I added in the type of spell (Charm, Counter, Curse, Healing, Jinx) based off of the Harry Potter wiki. Then I counted the number per type of spell along with the category with the most spells and the least. I also calculated the number of light effects for each spell. Along with the the average length of incantation and the longest spell and the shortest spell.
Tableau
In my dashboard I included the following metrics:
Total number of movies: 8
Total revenue: 6.4 billion
Total minutes: 1197
Total number of characters
Total number of places
Total number of spells: 61
Total lines of dialogue: 7444
These were the main stats but there were more. You can view that in the overview of the data section.
Below are the charts (broken up by sections)
Movies
Budget
Box office earnings
Characters
How many characters were in each house (Gryffindor, Hufflepuff, Ravenclaw, Slytherin)
How many witches vs. wizards
Places
Which category had the most locations
Spells
Counting the type of spells
Counting the light effects of each spell
Dialogue
Top 5 characters with the most dialogue
Top 5 spells spoken the most
Top 5 places where the most dialogue was spoken
What movie had the most dialogue
Finished Project
You can view in Tableau Public.

Overview of the Data
Total number of movies: 8
Total revenue: 6.4 billion
Total minutes: 1197
Total number of characters: 166
3 different schools
13 species
17 types of Patronus
Total number of places: 74
5 categories
Total number of spells: 61
6 types of spells
17 light effects
Total lines of dialogue: 7444
234 chapters
Insights
Harry Potter and the Half-Blood Prince and Harry Potter and the Deathly Hallows Part 2 have the biggest budgets at $250,000,000.
Harry Potter and the Deathly Hallows Part 2 has the highest revenue at $1,342,000,000
The Gryffindor house has the most number of characters.
There are more wizards than witches with 83 wizards.
Hogwarts has the most locations at 45 places.
Charms were the most common spell type.
The most common light effects are white and blue.
Harry Potter is the character with the most lines of dialogue at 1,922 lines.
The spell that was used the most was Stupefy with 58 times it was cast.
The Great Hall was the location with the most dialogue at 577 times.
Harry Potter and the Order of the Phoenix has the most lines of dialogue of any of the movies at 1,157.
What I Learned
Advanced Microsoft functions like VLOOKUP and SUMPRODUCT
Importance of checking the data (even if it's been "cleaned"). Because in the Dialogue table several spells were misspelled.
Keeping things simple using common charts like: bar charts and pie charts. No need to overcomplicate the visuals
How vital tooltips for Tableau are. They can show important information on the dashboard without overcrowding the visualization. In my original dashboard I completely forgot to edit the tooltips so it is almost useless.
Summary
This project was my second major project for my portfolio. I wanted to focus on Microsoft Excel and using more advanced functions like VLOOKUPs. This challenge took a total of 30 hours to complete. I improve on my analysis skills specifically on learning how answering questions and explore data. My background knowledge (having read the books and watched the movies) helped immensely, especially when cleaning the data.