2024 ç¾åå°çæ¨¡å¼ä¸å²¸DS/MLEç¶é©åäº«(è) â å¦ä½æºåMachine Learning & Statistics Interview | by Bert Lee // ææå®¶ | Jun, 2024

äºãå¦ä½æºåML Codingé¢è©¦ï¼

å°æ¼MLE/ASä¾èªªï¼çè³æäºæ¯è¼ç¡¬æ ¸çDSï¼ææåæèå°Model Implementation from scratchï¼ä¹å°±æ¯ä¸ä½¿ç¨sklearn, pytorchï¼åªç¨numpyå»å¯¦è¸model or algorithmsï¼ä½æ¯éå¸¸æèçä¹å°±æ¯é£å¹¾åï¼

Unsupervised Modelï¼

import numpy as npdef initialize_centroids(X, k):
"""Randomly initialize k centroids from the dataset X."""
indices = np.random.permutation(X.shape[0])
centroids = X[indices[:k]]
return centroids
def closest_centroid(X, centroids):
"""For each point in X, find the closest centroid."""
distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
def update_centroids(X, labels, k):
"""Recalculate centroids."""
new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])
return new_centroids


def kmeans(X, k, max_iters=100):
"""The main k-means algorithm."""
centroids = initialize_centroids(X, k)
for i in range(max_iters):
labels = closest_centroid(X, centroids)
new_centroids = update_centroids(X, labels, k)
# Check for convergence (if centroids don't change)
if np.all(centroids == new_centroids):
break
centroids = new_centroids
return centroids, labels
# Example usage
# Generate some data
np.random.seed(42)
X = np.random.rand(100, 2)
# Perform k-means clustering
k = 3
centroids, labels = kmeans(X, k)


print("Centroids:", centroids)

Supervised Modelsï¼

import numpy as np# Sigmoid function to map predicted values to probabilities
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Loss function to compute the cost
def compute_loss(y, y_hat):
# Binary crossentropy loss
return -np.mean(y * np.log(y_hat) + (1 - y) * np.log(1 - y_hat))
# Gradient descent function to update parameters
def gradient_descent(X, y, params, learning_rate, iterations):
m = len(y)
loss_history = np.zeros((iterations,))
for i in range(iterations):
# Calculate predictions
y_hat = sigmoid(np.dot(X, params))
# Update parameters
params -= learning_rate * np.dot(X.T, y_hat - y) / m
# Save loss
loss_history[i] = compute_loss(y, y_hat)
return params, loss_history
# Predict function
def predict(X, params):
return np.round(sigmoid(np.dot(X, params)))
# Generate synthetic data
X = np.random.rand(100, 2) # 100 samples and 2 features
y = np.random.randint(0, 2, 100) # Binary targets
# Add intercept term to feature matrix
X = np.hstack((np.ones((X.shape[0], 1)), X))
# Initialize parameters to zero
params = np.zeros(X.shape[1])
# Set learning rate and number of iterations
learning_rate = 0.01
iterations = 1000
# Perform gradient descent
params, loss_history = gradient_descent(X, y, params, learning_rate, iterations)
# Predict
predictions = predict(X, params)
# Calculate accuracy
accuracy = np.mean(predictions == y)
print(f"Accuracy: {accuracy}")

(Multiple) Linear Regression

import numpy as npdef multiple_linear_regression(X, y):
# Adding a column of ones to add the intercept term (b_0)
X_b = np.hstack([np.ones((X.shape[0], 1)), X])
# Using the Normal Equation to compute the best-fit parameters
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y)
return theta_best  # First element is the intercept, others are coefficients
# Example usage:
X = np.array([
[1, 2],  # Two features for each data point
[2, 3],
[3, 4],
[4, 5],
[5, 6]
])
y = np.array([5, 7, 9, 11, 13])  # Target values
# Train the model to find the intercept and coefficients
theta_best = multiple_linear_regression(X, y)
print(f"Intercept and coefficients: {theta_best}")
# Predict function using the derived coefficients
def predict(X, theta_best):
X_b = np.hstack([np.ones((X.shape[0], 1)), X])  # Add the intercept term
return X_b.dot(theta_best)
# Predicting values
X_new = np.array([
[6, 7],
[7, 8]
])  # New data points
predictions = predict(X_new, theta_best)
print(f"Predictions: {predictions}")

Sortingï¼

ææåæè¢«è¦æ±å¯¦è¸ä¸ç¨®Sortingæ¹æ³ï¼éè£¡ä»¥Insertion Sortçºä¾ï¼

def insertion_sort(arr):
# Traverse through 1 to len(arr)
for i in range(1, len(arr)):key = arr[i]
# Move elements of arr[0..i-1], that are greater than key,
# to one position ahead of their current position
j = i-1
while j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
# Example usage
my_list = [64, 34, 25, 12, 22, 11, 90]
sorted_list = insertion_sort(my_list)
print("Sorted list:", sorted_list)

éæçéè¢«è¦æ±Implement Attention, CNNçï¼ä¸éææ²ééï¼æ´å¤éç¨®Model Implement from Scratchå¯ä»¥åèï¼

é¤äºimplementing model from scratchä»¥å¤ï¼ML CodingæéééPyTorchå¡«ç©ºï¼å¯è½æè¦ä½ ç¨PyTorch implementæ´åClassï¼ç¶å¾debug model pipelineå®ætrainingãéåé¨åæç¶æå®å¨æ²æºåæå¾å¾æï¼å¹³æå¤ªä¾è³´ChatGPTäºãéç¨®ççå¾èé©å¹³æä½¿ç¨PyTorch/Tensorflowçç¶é©ï¼å®ç´çä¸çCheatsheetå¯è½é½ä¸å¤ªå¤ ã

https://www.datacamp.com/cheat-sheet/deep-learning-with-py-torch

2024 ç¾åå°çæ¨¡å¼ä¸å²¸DS/MLEç¶é©åäº«(è) â å¦ä½æºåMachine Learning & Statistics Interview | by Bert Lee // ææå®¶ | Jun, 2024

äºãå¦ä½æºåML Codingé¢è©¦ï¼

Recent Articles

These were the badly handled data breaches of 2024

Tsinghua University Researchers Just Open-Sourced CogAgent-9B-20241220: The Latest Version of CogAgent

Calculating a Linear Extrapolation (or Trend) in DAX | by Salvatore Cagliari | Dec, 2024

North Korean Hackers Pull Off $308M Bitcoin Heist from Crypto Firm DMM Bitcoin

Squid Game season 2 review: a brutal remix of Netflix’s biggest show

Related Stories

Leave A Reply Cancel reply

2024 ç¾åå°çæ¨¡å¼ä¸å²¸DS/MLEç¶é©åäº«(è) â å¦ä½æºåMachine Learning & Statistics Interview | by Bert Lee // ææ å®¶ | Jun, 2024

äºãå¦ä½æºåML Codingé¢è©¦ï¼

Recent Articles

Related Stories