Linear regression

Linearity between independent and dependent features
Target follows a normal distribution. If this assumption is violated, the estimated confidence interval of the feature weights are invalid
Homoscedasticity (constant variance of errors for prediction) - Suppose the average error (difference between predicted and actual price) in your linear regression model is 50,000 Euros. If you assume homoscedasticity, you assume that the average error of 50,000 is the same for houses that cost 1 million and for houses that cost only 40,000. This is unreasonable because it would mean that we can expect negative house prices.
Independence between data instances
Fixed features - Features are free of measurement errors
Absence of multicollinearity

Increasing the numerical feature by one unit changes the estimated outcome by its weight
Changing the categorical feature from reference category to the other category changes the estimated outcome by the feature’s weight
For an instance with all numeric feature values at zero and categorical features at the reference categories, the model prediction is the intercept weight
R-Squared increases with increase in number of features, even if they do not contain any information about the target value. We use adjusted R-Square, which accounts for the number of features used in a model.
The importance of feature is measured by the value of t-statistic. The t-statistic is the estimated weight scaled with the standard error.
Weight plot is used to visualize the weight and variance of the features
Effect plot can help understand how much the combination of weight and feature contributes to the predictions in the data For individual prediction we can overlay the value for each feature on top of the effect plot
In Lasso regression, we can use lambda as a parameter to control the interpretability of the model

Forward selection: Fit a model with one feature. Do this for all the features. Select the model with high R-squared value. Keep adding features and selecting the best model. Continue till some criterion such as the max number of features are reached
Backward selection: Similar to forward, here we keep removing features till some criterion is reached.
Lasso can be used, as it considers all features simultaneously and can be automated.

Non-linearities need to be hand-crafted
Not good predictive performance
The interpretation of weights can be unintuitive as it depends on all other features. House size and number of rooms are highly correlated. The weight of house size can be positive and number of rooms negative due to correlation.