ML Algorithms

Logistic Regression

  • Ordered logistic regression in which the outcomes are ordered values
  • Multi-class classification is also called multinomial or sotmax regression.
  • Logit (log of odds) function takes input values in the range of 0 to 1 and transforms them into values over the entire real-number range
  • Under the logistic model, we assume there is a linear relationship between the weighted inputs and the log-odds

Decision Trees

Classification Trees

  • Handle both categorical and continuous data
  • Leaves that contain mixtures of classifications are called Impure
  • Gini Impurity, Entropy and Information Gain are used to measure the impurity of the trees
  • The lower the Gini Impurity of a variable, the better it is at prediction
  • In case of continuous variables, we sort the numbers and take the average of the two consequtive numbers. The average will be the threshold which will be used for building the tree. The average number with low gini impurity will be preferred

K-Means Clustering

  • K-medoids use actual data points as centroids rather than mean positions in a given cluster. The medoids are points that minimize the distance to all other points in their cluster. This variation is more interpretable because the centroids are always data points
  • Fuzzy C-Means Clustering enables the data points to participate in multiple clusters to varying degrees. It replaces hard cluster assignments with degrees of membership depending on distance from the centroids