[Machine Learning] Interview Questions - Part 1

Authored by Tony Feng

Created on Oct 1st, 2022

Last Modified on Oct 1st, 2022

Intro

The post contains a collection of questions for machine learning interview.


Questions

1) Explain bias and variance, and trade-off between them

Bias is the difference between the average prediction of our model and the ground truth.

Variance refers to the variability in the model prediction. In other words, it reflects the changes in the model when using different portions of the training dataset.

Bias and variance are inversely connected. An underfitting model has high bias and low variance, while an overfitting model has high variance and low bias. So we need to find the right balance.

2) What is gradient descent?

GD is an optimization algorithm that finds the values of parameters (coefficients) of a loss function that minimizes a cost function.

GD is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

3) Difference between Loss Function, Cost Function

The Loss function is associated with single training example, and the cost function is the average value of the loss function over the entire training dataset. In ML, we usually try to optimize our cost function rather than loss function.

4) How to handle overfitting and underfitting

Handling Overfitting Handling Underfitting
More Training Data Removing Data Noise
Regularization Reducing Regularization
Reducing Iterations Increasing Iterations
Reducing Features Increasing Features
Increasing Learning Rate Reducing Learning Rate
Other Strategies: Pruning, Dropout, etc. Increasing Model Complexity

5) What is curse of dimensionality? How to prevent it?

CoD basically means that the error increases with the increase in the number of features. It leads to exponential increase in computational efforts.

  • Improving the ratio of observations over attributes
  • Making distances in feature space more meaningful
  • Removing features that have no correlation with the target distribution
  • Removing or combining features that have redundant correlation with target distribution
  • Extracting new features with a more direct correlation with target distribution

6) What is regularization, and give some examples of common methods?

Regularization refers to techniques that are used to calibrate machine learning models so as to avoid the risk of overfitting.

The obvious disadvantage of ridge is model interpretability. It will shrink the coefficients for least important features, only close to zero. However, for lasso, it can force some of the coefficient estimates to be zero when the weight is sufficiently large. Therefore, the lasso method also performs variable selection and is said to yield sparse models.

L1 norm (Lasso) L2 norm (Ridge)
Penalizes the sum of absolute values of weights Penalizes the sum of squares of weights
Sparse solution Non-sparse solution
Feature selection No feature selection
Robust to outliers Not robust to outliers
Unable to learn complex data patterns Able to learn complex data patterns

7) Explain Principal Component Analysis (PCA)?

PCA is used for the dimensionality reduction, which tries to find the lower-dimensional surface to project the high-dimensional data.

Common Steps:

  • Standardizing the range of continuous initial variables
  • Computing the covariance matrix to identify correlations
  • Computing the eigenvectors and eigenvalues of the covariance matrix to identify the principal components
  • Creating a feature vector to decide which principal components to keep
  • Recasting the data along the principal components axes

8) What is data normalization and why do we need it?

Data normalization is a preprocessing step to re-scale values to fit in a specific range to assure better convergence. In general, it boils down to subtracting the mean of each data point and dividing by its standard deviation. The data normalization makes all features weighted equally. Otherwise, some features with high magnitude will be weighted more in the cost function, while other features with lower values will be allocated less weights.

9) What is the difference between training, validation set and test set?

The training dataset is used for fitting the model’s parameters. However, the accuracy that we achieve on the training set is not reliable for predicting if the model will be accurate on new samples.

The validation dataset is used to measure how well the model does on examples that weren’t part of the training dataset and to provide information for adjusting the model. The more evaluations, the more information is leaked. So we can end up overfitting to the validation data.

The test dataset is used to measure how well the model does on previously unseen examples. It should only be used once we have tuned the parameters using the validation set.

10) What is cross-validation?

CV is a statistical resampling technique that uses different parts of the dataset to train and test an ML algorithm on different iterations. The aim of CV provides the ability to estimate model performance on unseen data.


Reference


MIT License
Last updated on Oct 13, 2022 20:32 EDT
Built with Hugo
Theme Stack designed by Jimmy