Data Science at Home: Solving the Nanny Schedule Puzzle with Monte Carlo and Genetic Algorithms | by Courtney Perigo | Sep, 2024

Armed with simulation of all the possible ways our schedule can throw curveballs at us, I knew it was time to bring in some heavy-hitting optimization techniques. Enter genetic algorithms — a natural selection-inspired optimization method that finds the best solution by iteratively evolving a population of candidate solutions.

In this case, each “candidate” was a potential set of nanny characteristics, such as their availability and flexibility. The algorithm evaluates different nanny characteristics, and iteratively improves those characteristics to find the one that fits our family’s needs. The result? A highly optimized nanny with scheduling preferences that balance our parental coverage gaps with the nanny’s availability.

At the heart of this approach is what I like to call the “nanny chromosome.” In genetic algorithm terms, a chromosome is simply a way to represent potential solutions — in our case, different nanny characteristics. Each “nanny chromosome” had a set of features that defined their schedule: the number of days per week the nanny could work, the maximum hours she could cover in a day, and their flexibility to adjust to varying start times. These features were the building blocks of every potential nanny schedule the algorithm would consider.

Defining the Nanny Chromosome

In genetic algorithms, a “chromosome” represents a possible solution, and in this case, it’s a set of features defining a nanny’s schedule. Here’s how we define a nanny’s characteristics:

# Function to generate nanny characteristics
def generate_nanny_characteristics():
return {
'flexible': np.random.choice([True, False]),  # Nanny's flexibility
'days_per_week': np.random.choice([3, 4, 5]),  # Days available per week
'hours_per_day': np.random.choice([6, 7, 8, 9, 10, 11, 12])  # Hours available per day
}

Each nanny’s schedule is defined by their flexibility (whether they can adjust start times), the number of days they are available per week, and the maximum hours they can work per day. This gives the algorithm the flexibility to evaluate a wide variety of potential schedules.

Building the Schedule for Each Nanny

Once the nanny’s characteristics are defined, we need to generate a weekly schedule that fits those constraints:

# Function to calculate a weekly schedule based on nanny's characteristics
def calculate_nanny_schedule(characteristics, num_days=5):
shifts = []
for _ in range(num_days):
start_hour = np.random.randint(6, 12) if characteristics['flexible'] else 9  # Flexible nannies have varying start times
end_hour = start_hour + characteristics['hours_per_day']  # Calculate end hour based on hours per day
shifts.append((start_hour, end_hour))
return shifts  # Return the generated weekly schedule

This function builds a nanny’s schedule based on their defined flexibility and working hours. Flexible nannies can start between 6 AM and 12 PM, while others have fixed schedules that start and end at set times. This allows the algorithm to evaluate a range of possible weekly schedules.

Selecting the Best Candidates

Once we’ve generated an initial population of nanny schedules, we use a fitness function to evaluate which ones best meet our childcare needs. The most fit schedules are selected to move on to the next generation:

# Function for selection in genetic algorithm
def selection(population, fitness_scores, num_parents):
# Normalize fitness scores and select parents based on probability
min_fitness = np.min(fitness_scores)
if min_fitness < 0:
fitness_scores = fitness_scores - min_fitnessfitness_scores_sum = np.sum(fitness_scores)
probabilities = fitness_scores / fitness_scores_sum if fitness_scores_sum != 0 else np.ones(len(fitness_scores)) / len(fitness_scores)
# Select parents based on their fitness scores
selected_parents = np.random.choice(population, size=num_parents, p=probabilities)
return selected_parents

In the selection step, the algorithm evaluates the population of nanny schedules using a fitness function that measures how well the nanny’s availability aligns with the family’s needs. The most fit schedules, those that best cover the required hours, are selected to become “parents” for the next generation.

Adding Mutation to Keep Things Interesting

To avoid getting stuck in suboptimal solutions, we add a bit of randomness through mutation. This allows the algorithm to explore new possibilities by occasionally tweaking the nanny’s schedule:

# Function to mutate nanny characteristics
def mutate_characteristics(characteristics, mutation_rate=0.1):
if np.random.rand() < mutation_rate:
characteristics['flexible'] = not characteristics['flexible']
if np.random.rand() < mutation_rate:
characteristics['days_per_week'] = np.random.choice([3, 4, 5])
if np.random.rand() < mutation_rate:
characteristics['hours_per_day'] = np.random.choice([6, 7, 8, 9, 10, 11, 12])
return characteristics

By introducing small mutations, the algorithm is able to explore new schedules that might not have been considered otherwise. This diversity is important for avoiding local optima and improving the solution over multiple generations.

Evolving Toward the Perfect Schedule

The final step was evolution. With selection and mutation in place, the genetic algorithm iterates over several generations, evolving better nanny schedules with each round. Here’s how we implement the evolution process:

# Function to evolve nanny characteristics over multiple generations
def evolve_nanny_characteristics(all_childcare_weeks, population_size=1000, num_generations=10):
population = [generate_nanny_characteristics() for _ in range(population_size)]  # Initialize the populationfor generation in range(num_generations):
print(f"\n--- Generation {generation + 1} ---")
fitness_scores = []
hours_worked_collection = []
for characteristics in population:
fitness_score, yearly_hours_worked = fitness_function_yearly(characteristics, all_childcare_weeks)
fitness_scores.append(fitness_score)
hours_worked_collection.append(yearly_hours_worked)
fitness_scores = np.array(fitness_scores)
# Find and store the best individual of this generation
max_fitness_idx = np.argmax(fitness_scores)
best_nanny = population[max_fitness_idx]
best_nanny['actual_hours_worked'] = hours_worked_collection[max_fitness_idx]
# Select parents and generate a new population
parents = selection(population, fitness_scores, num_parents=population_size // 2)
new_population = []
for i in range(0, len(parents), 2):
parent_1, parent_2 = parents[i], parents[i + 1]
child = {
'flexible': np.random.choice([parent_1['flexible'], parent_2['flexible']]),
'days_per_week': np.random.choice([parent_1['days_per_week'], parent_2['days_per_week']]),
'hours_per_day': np.random.choice([parent_1['hours_per_day'], parent_2['hours_per_day']])
}
child = mutate_characteristics(child)
new_population.append(child)
population = new_population  # Replace the population with the new generation
return best_nanny  # Return the best nanny after all generations

Here, the algorithm evolves over multiple generations, selecting the best nanny schedules based on their fitness scores and allowing new solutions to emerge through mutation. After several generations, the algorithm converges on the best possible nanny schedule, optimizing coverage for our family.

Final Thoughts

With this approach, we applied genetic algorithms to iteratively improve nanny schedules, ensuring that the selected schedule could handle the chaos of Parent 2’s unpredictable work shifts while balancing our family’s needs. Genetic algorithms may have been overkill for the task, but they allowed us to explore various possibilities and optimize the solution over time.

The images below describe the evolution of nanny fitness scores over time. The algorithm was able to quickly converge on the best nanny chromosome after just a few generations.