[Machine Learning] Interview Questions

Authored by Tony Feng

Created on Oct 8th, 2022

Last Modified on Oct 8th, 2022

Intro

The post contains a collection of questions for machine learning interview.

Questions

1) What’s the difference between Type I and Type II error?

Type I error is a false positive, while Type II error is a false negative. Briefly stated, Type I error means claiming something has happened when it hasn’t, while Type II error means that you claim nothing is happening when in fact something is.

2) Generative model vs. Discriminative model

A generative model cares how the data was generated in order to learn categories of data. (Estimate model, then define the classifier)
A discriminative model will simply learn the distinction between different categories of data. Discriminative models generally outperform generative models on classification tasks. (Directly define the classifier)

3) Instance-Based Learning vs. Model-Based Learning

Instance-based Learning: The system learns the examples by heart, then generalizes to new cases using a similarity measure.
Model-based Learning: The system generalizes from a set of examples by building a model of these examples, then use that model to make predictions.

4) Label Encoding vs. One-Hot Encoding?

This question generally depends on your dataset and the model which you wish to apply.

One-Hot Encoding simply creates additional features based on the number of unique values in the categorical feature. Every unique value in the category will be added as a feature. We apply One-Hot Encoding when:

The categorical feature is not ordinal.
The number of categorical features is less so one-hot encoding can be effectively applied.

Label Encoding means each label is assigned a unique integer based on alphabetical ordering. We apply Label Encoding when:

The categorical feature is ordinal.
The number of categories is quite large, which may lead to high memory consumption.

5) LDA vs. PCA for dimensionality reduction

Both LDA and PCA are linear transformation techniques: LDA is a supervised whereas PCA is unsupervised.

PCA tries to maximize the data’s variability while reducing the dataset’s dimensionality.
LDA finds the linear discriminants in order to maximize the variance between the different classes while minimizing the variance within the class.

6) How does Content-based image retrieval work?

CBIR is the concept of using images to gather metadata on their content. Compared to the current image retrieval approach based on the keywords associated to the images, this technique generates its metadata from computer vision techniques to extract the relevant informations that will be used for queries. Many approach are possible from feature detection to retrieve keywords to the usage of CNN to extract dense features that will be associated to a known distribution of keywords.

We care less about what is shown on the image but more about the similarity between the metadata generated by a known image and a list of known labels projected into this metadata space.

7) Why do we use convolutions for images rather than just FC layers?

Firstly, convolutions preserve, encode, and actually use the spatial information from the image. If we used only FC layers we would have no relative spatial information.

Secondly, CNNs have a partially built-in translation in-variance, since we’re going to apply the convolution in a sliding window fashion across the entire image anyways.

8) Why do we have max-pooling in classification CNNs?

Max-pooling in a CNN allows you to reduce computation since your feature maps are smaller after the pooling. You don’t lose too much semantic information since you’re taking the maximum activation. There’s also a theory that max-pooling contributes a bit to giving CNNs more translation in-variance.

9) What is the significance of Residual Networks?

The skip connections in ResNet solve the problem of vanishing gradient in deep neural networks by allowing for direct feature access from previous layers. The other way that these connections help is by allowing the model to learn the identity functions which ensures that the higher layer will perform at least as good as the lower layer, and not worse.

10) What is batch normalization?

Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. A network is just a series of layers, where the output of one layer becomes the input to the next. The idea is then to normalize the inputs of each layer in such a way that they have a mean output activation of 0 and standard deviation of 1. This is analogous to how the inputs to networks are standardized.

Reference

ML Interview Questions collected by andrewekhalel