2024 美國地獄模式上岸DS/MLE經驗分享(肆) — 如何準備Machine Learning & Statistics Interview | by Bert Lee // 李慕家 | Jun, 2024


五、如何準備ML Coding面試?

Image source: Python Central

對於MLE/AS來說,甚至某些比較硬核的DS,有時候會考到Model Implementation from scratch,也就是不使用sklearn, pytorch,只用numpy去實踐model or algorithms,但是通常會考的也就是那幾個:

Unsupervised Model:

import numpy as np

def initialize_centroids(X, k):
"""Randomly initialize k centroids from the dataset X."""
indices = np.random.permutation(X.shape[0])
centroids = X[indices[:k]]
return centroids

def closest_centroid(X, centroids):
"""For each point in X, find the closest centroid."""
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)

def update_centroids(X, labels, k):
"""Recalculate centroids."""
new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])
return new_centroids

def kmeans(X, k, max_iters=100):
"""The main k-means algorithm."""
centroids = initialize_centroids(X, k)
for i in range(max_iters):
labels = closest_centroid(X, centroids)
new_centroids = update_centroids(X, labels, k)
# Check for convergence (if centroids don't change)
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, labels

# Example usage
# Generate some data
np.random.seed(42)
X = np.random.rand(100, 2)

# Perform k-means clustering
k = 3
centroids, labels = kmeans(X, k)

print("Centroids:", centroids)

Supervised Models:

import numpy as np

# Sigmoid function to map predicted values to probabilities
def sigmoid(z):
return 1 / (1 + np.exp(-z))

# Loss function to compute the cost
def compute_loss(y, y_hat):
# Binary crossentropy loss
return -np.mean(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))

# Gradient descent function to update parameters
def gradient_descent(X, y, params, learning_rate, iterations):
m = len(y)
loss_history = np.zeros((iterations,))

for i in range(iterations):
# Calculate predictions
y_hat = sigmoid(np.dot(X, params))
# Update parameters
params -= learning_rate * np.dot(X.T, y_hat - y) / m
# Save loss
loss_history[i] = compute_loss(y, y_hat)

return params, loss_history

# Predict function
def predict(X, params):
return np.round(sigmoid(np.dot(X, params)))

# Generate synthetic data
X = np.random.rand(100, 2) # 100 samples and 2 features
y = np.random.randint(0, 2, 100) # Binary targets

# Add intercept term to feature matrix
X = np.hstack((np.ones((X.shape[0], 1)), X))

# Initialize parameters to zero
params = np.zeros(X.shape[1])

# Set learning rate and number of iterations
learning_rate = 0.01
iterations = 1000

# Perform gradient descent
params, loss_history = gradient_descent(X, y, params, learning_rate, iterations)

# Predict
predictions = predict(X, params)

# Calculate accuracy
accuracy = np.mean(predictions == y)

print(f"Accuracy: {accuracy}")

  • (Multiple) Linear Regression
import numpy as np

def multiple_linear_regression(X, y):
# Adding a column of ones to add the intercept term (b_0)
X_b = np.hstack([np.ones((X.shape[0], 1)), X])

# Using the Normal Equation to compute the best-fit parameters
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)

return theta_best # First element is the intercept, others are coefficients

# Example usage:
X = np.array([
[1, 2], # Two features for each data point
[2, 3],
[3, 4],
[4, 5],
[5, 6]
])
y = np.array([5, 7, 9, 11, 13]) # Target values

# Train the model to find the intercept and coefficients
theta_best = multiple_linear_regression(X, y)

print(f"Intercept and coefficients: {theta_best}")

# Predict function using the derived coefficients
def predict(X, theta_best):
X_b = np.hstack([np.ones((X.shape[0], 1)), X]) # Add the intercept term
return X_b.dot(theta_best)

# Predicting values
X_new = np.array([
[6, 7],
[7, 8]
]) # New data points
predictions = predict(X_new, theta_best)

print(f"Predictions: {predictions}")

Sorting:

有時候會被要求實踐一種Sorting方法,這裡以Insertion Sort為例:

def insertion_sort(arr):
# Traverse through 1 to len(arr)
for i in range(1, len(arr)):

key = arr[i]

# Move elements of arr[0..i-1], that are greater than key,
# to one position ahead of their current position
j = i-1
while j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key

return arr

# Example usage
my_list = [64, 34, 25, 12, 22, 11, 90]
sorted_list = insertion_sort(my_list)
print("Sorted list:", sorted_list)

還有看過被要求Implement Attention, CNN的,不過我沒遇過,更多這種Model Implement from Scratch可以參考:

除了implementing model from scratch以外,ML Coding我還遇過PyTorch填空,可能會要你用PyTorch implement整個Class,然後debug model pipeline完成training。這個部分我當時完全沒準備掛得很慘,平時太依賴ChatGPT了。這種真的很考驗平時使用PyTorch/Tensorflow的經驗,單純看一看Cheatsheet可能都不太夠。

https://www.datacamp.com/cheat-sheet/deep-learning-with-py-torch

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here