Tackling Underfitting And Overfitting Problems In Data Science

One of the major challenges in data science, especially concerning machine learning, is how well the models align themselves to the training data. Underfitting and overfitting are familiar terms while dealing with the problem mentioned above.

For the uninitiated, in data science, overfitting simply means that the learning model is far too dependent on training data while underfitting means that the model has a poor relationship with the training data. Ideally, both of these should not exist in models, but they usually are hard to eliminate.

Overcoming Overfitting

ML experts and statisticians often have different techniques for bringing down overfitting in ML models. The popular ones stand out to be cross-validation and regularisation. These methods are proven to be effective in understanding the overfit data. Apart from these techniques, there are other ways to eliminate overfitting in models.

Generalisation: For example, make sure the data leads to generalisation rather than just acting as training data. This can be done by feeding more data to the model. More data also means improved accuracy achieved by the model. However, this makes the model, computation and memory-intensive.

Data Augmentation: As a result, another technique called data augmentation comes into the picture. Instead of giving loads of data, improvising and reworking on the existing data can go a long way in reducing overfitting.

Example: Neural networks, which are mostly used in pattern recognition tasks, are prone to overfitting. The larger the network, the complex the functions it creates as a consequence. Hence, an optimum size for the right statistical fit is key. This can be done through a number of methods. The best among them would be retraining neural networks since it is comparably simple and does not involve tweaking much of the parameters.

Generally, overfitting occurs in nonlinear ML models since there are many variables at play to decide the relationship of data in the model. This itself makes the model predict various factors. A better way to address this problem can be methods like k-cross validation. Here, the model is tested k-times for different subsets on the data and can be checked to see how it performs for new data. Any overfitting observed will eventually be diminished.

Lately, ensemble methods such as Bayesian averaging, Boosting and Bagging have indirectly assisted in eliminating overfitting. How? Since ensemble methods deal with complex ML models, they take on the combined overfitting possibilities present in these models. Boosting and Bagging are the two most used methods than Bayesian averaging.

Eliminating Underfitting

Although underfitting is comparatively observed lesser in ML models, it should not be overlooked. To begin with, the general norm here is lack of sense between the data and model. What this means is either the model is way too simple to establish a stable learning pattern or performs very poorly with the training data.

Experts suggest that this problem can be alleviated by simply using more (good!) data for the project. In addition, the following ways can also be used to tackle underfitting.

Increase the size or number of parameters in the ML model.
Increase the complexity or type of the model.
Increasing the training time until cost function in ML is minimised.

Example: Converting a linear model’s data into non-linear data. In this case, the transformation of the model leads to it being more unpredictable with respect to any new as well as training data.

Comment

Both overfitting and underfitting should be reduced at the best. As ML expert Jason Brownlee perfectly puts it, a statistically “good fit” is what matters when it comes to choosing an ML model. This can only be done with repeated testing of the model with different data and see where it falls along the lines of overfitting and underfitting.

Furthermore, before starting with an ML model to solve a problem, it is also suggested to take a hard look into the data too!. After all, there might also be the possibility of conflict with the type of data used in the model.

The post Tackling Underfitting And Overfitting Problems In Data Science appeared first on Analytics India Magazine.

Tackling Underfitting And Overfitting Problems In Data Science

Overcoming Overfitting

Eliminating Underfitting

Comment

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List