Abstract (EN):
A common problem in distribution planning is the scarcity of historic data (training examples) relative to the number of variables, meaning that most data-driven techniques cannot be applied in such situations, due to the risk of overfitting. Thus, the suitable regression techniques are restrained to efficient models, preferably with embedded regularization features. This article compares three of these techniques: LASSO, Bayesian and CMLR (Conditioned multi-linear regression - a new approach developed within the scope of a project with a distribution company). The results showed that each technique has its own advantages and limitations. The Bayesian regression has the main advantage of providing inherent confidence intervals. The LASSO is a very economic and efficient regression tool. The CMLR is versatile and provided the best performance.A common problem in distribution planning is the scarcity of historic data (training examples) relative to the number of variables, meaning that most data-driven techniques cannot be applied in such situations, due to the risk of overfitting. Thus, the suitable regression techniques are restrained to efficient models, preferably with embedded regularization features. This article compares three of these techniques: LASSO, Bayesian and CMLR (Conditioned multi-linear regression - a new approach developed within the scope of a project with a distribution company). The results showed that each technique has its own advantages and limitations. The Bayesian regression has the main advantage of providing inherent confidence intervals. The LASSO is a very economic and efficient regression tool. The CMLR is versatile and provided the best performance.
Language:
English
Type (Professor's evaluation):
Scientific
No. of pages:
6