I found a hidden gem in Matplotlib’s library: Packed Bubble Charts in Python | by Anna Gordun Peiro | Jul, 2024


For my chart, I am using an Olympic Historical Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public domain license.

Screenshot of dataset

It contains event to Athlete level Olympic Games Results from Athens 1896 to Beijing 2022. After an EDA (Exploratory Data Analysis) I transformed it into a dataset that details the number of female athletes in each sport/event per year. My bubble chart idea is to show which sports have a 50/50 female to male ratio athletes and how it has evolved during time.

My plotting data is composed of two different datasets, one for each year: 2020 and 1996. For each dataset I’ve computed the total sum of athletes that participated to each event (athlete_sum) and how much that sum represents compared to the number of total athletes (male + female) (difference). See a screenshot of the data below:

Screen shot of plotting dataset

This is my approach to visualise it:

  • Size proportion. Using radius of bubbles to compare number athletes per sport. Bigger bubbles will represent highly competitive events, such as Athletics
  • Multi variable interpretation. Making use of colours to represent female representation. Light green bubbles will represent events with a 50/50 split, such as Hockey.

Here is my starting point (using the code and approach from above):

First result

Some easy fixes: increasing figure size and changing labels to empty if the size isn’t over 250 to avoid having words outside bubbles.

fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(aspect="equal"))

#Labels edited directly in dataset

Second result

Well, now at least it’s readable. But, why is Athletics pink and Boxing blue? Let’s add a legend to illustrate the relationship between colours and female representation.

Because it’s not your regular barplot chart, plt.legend() doesn’t do the trick here.

Using matplotlib Annotation Bbox we can create rectangles (or circles) to show meaning behind each colour. We can also do the same thing to show a bubble scale.

import matplotlib.pyplot as plt
from matplotlib.offsetbox import (AnnotationBbox, DrawingArea,
TextArea,HPacker)
from matplotlib.patches import Circle,Rectangle

# This is an example for one section of the legend

# Define where the annotation (legend) will be
xy = [50, 128]

# Create your colored rectangle or circle
da = DrawingArea(20, 20, 0, 0)
p = Rectangle((10 ,10),10,10,color="#fc8d62ff")
da.add_artist(p)

# Add text

text = TextArea("20%", textprops=dict(color="#fc8d62ff", size=14,fontweight='bold'))

# Combine rectangle and text
vbox = HPacker(children=[da, text], align="top", pad=0, sep=3)

# Annotate both in a box (change alpha if you want to see the box)
ab = AnnotationBbox(vbox, xy,
xybox=(1.005, xy[1]),
xycoords='data',
boxcoords=("axes fraction", "data"),
box_alignment=(0.2, 0.5),
bboxprops=dict(alpha=0)
)
#Add to your bubble chart
ax.add_artist(ab)

I’ve also added a subtitle and a text description under the chart just by using plt.text()

Final visualisation

Straightforward and user friendly interpretations of the graph:

  • Majority of bubbles are light green → green means 50% females → majority of Olympic competitions have an even 50/50 female to male split (yay🙌)
  • Only one sport (Baseball), in dark green colour, has no female participation.
  • 3 sports have only female participation but the number of athletes is fairly low.
  • The biggest sports in terms of athlete number (Swimming, Athletics and Gymnastics) are very close to having a 50/50 split

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here