Cut Through the Complexity — Navigating Overfitting and Underfitting

Venky
12 min readJan 8, 2024

--

image generated by Dall.E 3

Introduction

Imagine you’re a chef trying to perfect a recipe. You tweak the ingredients again and again, tasting each version. Finally, you create a dish that’s absolutely delicious in your kitchen. But when others try to replicate it in their kitchens, it just doesn’t taste the same. This is similar to a challenge in machine learning called ‘overfitting’ — where a model works perfectly with the data it was trained on, but fails to perform well with new data. On the flip side, there’s ‘underfitting’ — like making a dish that’s too bland, not capturing the essence of the flavors. Both overfitting and underfitting are common hurdles in machine learning, affecting how well our ‘recipes’ — or models — perform in the real world. In this guide, we’ll dive into these concepts, understanding why striking the right balance is crucial for the success of any machine learning model.

Contents:

  • A quick look at glossary of terms
  • Generalization in machine learning
  • Overfitting vs Underfitting
  • How to spot overfitting and tackle it?
  • Underfitting unraveled
  • Navigating the model complexity
  • Bias-Variance trade-off: The ultimate goal in model training
  • Unlocking the Power of Regularization
  • Closing

Decoding the Jargon

A Quick-Fire Machine Learning Glossary

1. Machine Learning (ML): A subset of artificial intelligence focused on creating systems that can learn and make decisions from data, evolving without explicit programming for specific tasks.

2. Model: A representation in machine learning comprising algorithms and trained data, enabling the system to predict outcomes or make decisions.

3. Algorithm: A set of rules or processes followed by a machine learning model for learning from data.

4. Training Data: This dataset is used to ‘train’ or ‘teach’ a machine learning model. It includes both input data and the labels.

5. Test Data: Separate from the training set, this data is used to evaluate how well a machine learning model performs, ensuring it can generalize from its training.

6. Overfitting: This occurs when a machine learning model learns too much from its training data, including noise and outliers, leading to poor performance on new, unseen data due to its over-specialization.

7. Underfitting: Happens when a model is too simplistic, failing to capture the complexities of the data it’s trained on, thus performing poorly on both training and new data.

8. Bias: In machine learning, bias is an error from oversimplifying the algorithms, leading to underfitting. It’s the gap between average predictions and actual values.

9. Variance: Variance is how much a model’s predictions change if it’s trained on different subsets of the training data. High variance can lead to overfitting, as the model becomes too tailored to the specific data it was trained on.

10. Regularization: A technique used to reduce overfitting by imposing penalties on the model for increased complexity, encouraging simplicity and better generalization.

11. Bias-Variance Tradeoff: The balancing act between a model’s complexity (which can lead to high variance and overfitting) and its ability to generalize (which can lead to high bias and underfitting). The goal is to minimize both for the most accurate predictions on unseen data.

12. Generalization: The process of identifying patterns and commonalities across different examples or experiences, and applying the insights gained to novel situations. It is an important concept in machine learning and artificial intelligence.

The Art of Generalization in ML: Why It Matters

Generalization is a fundamental concept in machine learning that refers to the ability of a model to apply the knowledge gained from its training data to new, unseen data. It’s essentially the measure of how well a model can adapt to the broader, real-world context beyond the specific examples it was trained on.

Importance of Generalization:

1. Real-world Application: The ultimate goal of machine learning is not just to perform well on the training data but to make accurate predictions or decisions in real-world scenarios. Generalization ensures that the insights a model learns from its training data are applicable and useful in real-world situations.

2. Robustness: A model that generalizes well is robust. It can handle variations and complexities in data that it wasn’t explicitly trained on. This is crucial because real-world data often comes with unexpected features and noise.

3. Preventing Overfitting and Underfitting: Generalization strikes a balance between overfitting (where a model learns the training data too well, including its noise and outliers) and underfitting (where the model is too simplistic to capture the underlying patterns in the data). A model that generalizes well has learned the true patterns in the training data without being misled by irrelevant details i.e noise in data.

4. Long-term Reliability: In many applications, from healthcare to finance, the stakes for machine learning predictions can be high. Generalization ensures that a model remains reliable and accurate over time, even as it encounters data that may slightly differ from its training set.

The ultimate goal in machine learning is to build models with high generalization capability, ensuring reliable and accurate predictions or decisions in varied and unforeseen situations.

The Tightrope Walk: Balancing Overfitting and Underfitting

In the world of machine learning, developing a model is like walking a tightrope. On one side, there’s the risk of overfitting, and on the other, the danger of underfitting. Both are missteps we need to avoid for the model to perform its best.

image generated by Dall.E 3 by chatGPT

The image above is a visual metaphor that captures the complexity and intricacy associated with overfitting in machine learning. It depicts a robot navigating through a complex, tangled maze, representing the challenges posed by overfitting.

Overfitting is like a memorization trap. Imagine studying for a test by memorizing the questions and answers without understanding the concepts. If the actual test has different questions, you’re likely to fail. Similarly, an overfitted model performs exceptionally well on the data it was trained on, but it struggles with new, unseen data. It’s like the model learned the training data by heart, including random quirks and noise, but failed to grasp the underlying patterns that apply more broadly.

Underfitting is the opposite. It’s like a student who hasn’t studied enough — they understand the topics too vaguely. An underfitted model is too simple, almost naive. It doesn’t capture the complexity of the data it’s trained on, missing out on important patterns. Consequently, it performs poorly not just on new data, but even on the training data.

The plot above demonstrates the concept of underfitting in machine learning. It shows how a simplistic linear model (red line) fails to capture the complexity and the underlying pattern of the data (blue points). This illustrates underfitting, where the model is too simplistic relative to the complexity of the data, resulting in poor performance.

The challenge in machine learning is to balance these two extremes. We need a model that doesn’t just repeat what it has seen (avoiding overfitting) but also understands the core ideas deeply enough to apply them to new situations (avoiding underfitting). Achieving this balance is crucial for building a model that not only excels with its training data but also adapts effectively to new and diverse scenarios it encounters in the real world.

The Overfitting Conundrum: How to Spot It and Tackle It

Let us use analogy of studying for a test. If you have only memorized specific examples from the study guide without understanding the broader concepts, you’ll do well on questions from the study guide but struggle with new questions covering the same concepts.

We can detect overfitting by checking model’s performance on a validation set — data not used in training. If our model does much worse on the validation set than the training set, that indicates overfitting.

To address overfitting, we want to generalize the model better. Below are some common techniques to handle overfitting in machine learning models:

  • Get More Training Data: More diverse and representative training data exposes the model to more examples to learn from and reduces over-reliance on particular training instances.
  • Regularization: Adding penalties like L1 and L2 regularization to the loss function penalizes model complexity, discouraging the model from learning highly specialized patterns.
  • Dropout: Randomly dropping nodes during training prevents complex co-adaptations and forces the model to learn more robust features.
  • Early Stopping: Stop training as soon as performance on a validation set starts decreasing to prevent over-specializing to the training data.
  • Data Augmentation: Generating additional training examples through transformations (e.g. rotation, shifts) reduces overfitting by exposing the model to more variation.
  • Reducing Model Complexity: Using fewer parameters in the model (e.g. fewer hidden layers/units) forces the model to learn more generalizable patterns.
  • Ensembling: Averaging predictions from multiple models cancels out individual model errors and reduces overfitting.
  • Transfer Learning: Starting with pre-trained weights from a model trained on a much larger dataset improves generalizability.

The key is striking a balance — keeping the model complex enough to learn patterns but simple enough to generalize well. Checking validation performance lets you spot and adjust for overfitting during model development.

Underfitting Unraveled: Identifying and Remedying the Issue

Underfitting occurs when a model is not complex enough to accurately capture the underlying pattern in the data. The hallmarks of underfitting are poor performance on both the training data and validation/test data.

We can recognize underfitting by checking the model error on the training data — if it is high, that indicates underfitting since the model cannot even fit the data it was trained on well. Another sign is when training and validation accuracy are both low and close to each other — the model has not grasped the pattern in the data.

Here are some common techniques to fix underfitting in machine learning models:

  • Increase Model Complexity — Use a more powerful model with more parameters to increase its capacity to learn complex patterns. For neural networks, add more hidden layers and units.
  • Try Different Models — Switch to a more expressive model better suited to the problem, like random forest instead of linear regression.
  • Reduce Regularization — Relax regularization like L1/L2 penalties to allow the model to fit the training data better. But don’t eliminate regularization completely.
  • Add More Features — Introduce additional informative features to help the model better capture important patterns in the data.
  • Train Longer — Run training for more epochs/iterations to give the model more chance to utilize its full capacity.
  • Enhance Training Data — Generate more training data via augmentation or sampling techniques to expose model to more examples.
  • Try Ensemble Models — Combine underfitting models together to achieve better performance than individual models.
  • Change Hyperparameters — Tune model hyperparameters like learning rate, batch size, etc. to improve model fitting capability.
  • Remove Outliers — Eliminate or down weight outliers that could be skewing model training.

The key is gradually making the model more complex while monitoring train/validation errors to find the right balance between underfitting and overfitting.

Navigating Model Complexity: Finding the Sweet Spot

Model complexity in machine learning, it refers to the representational capacity and flexibility of a model. Models with higher complexity have more flexibility to fit a variety of complex patterns in the data.

Some key aspects that determine a model’s complexity:

  • Number of parameters — Models with more tune-able parameters (weights, coefficients, etc.) tend to be more complex and flexible. For example, linear regression with many feature coefficients is more complex than with few coefficients.
  • Number of hidden layers and units — In neural networks, more layers and nodes per layer increases model complexity. Each node and layer adds more representation power.
  • Depth of trees — In decision trees and random forests, deeper trees with more branch points are more complex. Shallow trees are more constrained.
  • Degree of polynomial — For polynomial regression, higher degree polynomials can fit more complex curves and are thus more complex.
  • Kernel choice — The kernel function in SVM and other kernelized models impacts complexity. Some kernels like RBF are able to fit very complex patterns.
  • Regularization — Techniques like L1/L2 regularization purposefully limit model complexity by penalizing large parameter values.

In general, more model parameters, flexibility in fitting curved/nonlinear decision boundaries, and depth of representations implies higher model complexity. Complexity is increased or decreased based on problem needs to avoid underfitting or overfitting.

Model complexity plays a key role in balancing underfitting and overfitting when training machine learning models:

  • Models with low complexity have limited flexibility to fit the training data. They may fail to capture important patterns, leading to underfitting with poor performance on training and test sets.
  • Highly complex models have lots of flexibility to fit the training data very closely, including random noise. This can lead to overfitting where the model performs very well on training data but poorly on new test data.
  • The ideal model complexity fits the training set well but does not overfit. It learns the true underlying patterns in the data without fitting random noise.
  • Start with simpler models, then increase complexity as needed to improve fitting of training data. Use validation sets to detect and prevent overfitting.
  • Regularization techniques like L1/L2 regularization, dropout, early stopping etc. reduce model complexity to address overfitting.
  • Model selection is the process of choosing the optimal model complexity for a given problem and dataset. It balances accuracy on training data with generalization performance.
  • Simpler models like Naive Bayes may underfit complex patterns. Highly flexible neural networks can readily overfit without proper regularization.
  • The optimal model complexity provides good train accuracy and generalization. Tuning complexity via hyperparameters is crucial to achieve this balance.

So in summary, model complexity must match the problem complexity to achieve good fitting without overfitting. Monitoring validation performance guides proper tuning of model complexity.

Striking the Perfect Balance: The Ultimate Goal in Model Training

When training machine learning models, striking the perfect balance between bias and variance is the ultimate goal. The ideal model minimizes both bias and variance to reach optimal performance.

Bias refers to error between a model’s predictions and the true values, due to the model making overly simplistic assumptions. High bias leads to underfitting, where the model fails to capture important patterns in the data.

Variance describes error from a model’s sensitivity to minor fluctuations in the training data. High variance causes overfitting, where the model learns noise and random peculiarities.

The objective is to strike a balance — low enough bias so the model fits the true signal, and low enough variance so it ignores the noise. With high bias or variance, performance suffers.

Plotting learning curves during training reveals when models start to overfit or underfit and helps tune complexity to land in the sweet spot between both extremes. Regularization techniques also constrain variance while improving bias.

In the end, machine learning models must balance fitting reality versus fitting the noise. Controlling model complexity and training rigorously to achieve an optimum between bias and variance is absolutely crucial and the supreme goal. It enables accurate generalization to new data, which is the essence of effective modeling.

Unlocking the Power of Regularization: A Key to Model Success

Here is an overview of how regularization unlocks the power of machine learning models and enhances their generalization capabilities:

Regularization is a pivotal technique for improving model performance and generalization. It works by constraining model complexity to reduce overfitting. This “unlocks” the true power of models to learn robust patterns rather than latching onto noise.

Techniques like L1/L2 regularization add a penalty term to the loss function that grows with the magnitude of model parameters. This incentivizes the model to keep parameters relatively low, limiting complexity. Ridge and lasso regression are common regularized algorithms.

Regularization provides a “just right” level of model complexity — enough flexibility to fit important patterns, but not so much that it overfits the peculiarities of the training data. This enhances the model’s ability to generalize.

Other regularization methods like dropout randomly omit subsets of nodes during training to prevent co-adaptation. This makes each node robust and improve generalization. Data augmentation also regularizes by expanding the training data.

Overall, regularization constrains models as they train to focus on explanatory factors rather than noise. This avoids overspecialization and makes the model favor simpler patterns.

Proper regularization is both an art and a science — it requires careful tuning based on validation performance. But when done right, it unlocks a model’s maximum potential by optimizing the balance between fitting reality versus fitting noise.

Closing:

Walking the Path of Balanced Learning

Building good machine learning models is all about balance. Going too far in one direction causes problems.

If a model is too simple, it suffers from underfitting — it fails to grasp the underlying patterns in the data. But models that are too complex overfit — they memorize noise and details instead of learning general concepts.

The goal is to find the sweet spot between these two extremes. Models should be complex enough to capture meaningful relationships in the data but not so intricate that they latch onto coincidences or noise.

Regularization techniques like L1/L2 penalties are useful to prevent overfitting by limiting model complexity. Monitoring validation error helps tune models to the right level of complexity.

In the end, machine learning is an exercise in balance. Avoiding the pitfalls of underfitting and overfitting requires walking a nuanced path between simplicity and complexity. The payoff for finding this balance is models that genuinely understand patterns and can generalize to make good predictions on new data.

Happy Modeling!

References:

  1. https://www.ibm.com/topics/overfitting
  2. https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw
  3. https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
  4. https://arxiv.org/abs/2103.05127
  5. https://curiousily.com/posts/hackers-guide-to-fixing-underfitting-and-overfitting-models/
  6. https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/
  7. https://datascience.stackexchange.com/questions/117189/relation-between-underfitting-vs-high-bias-and-low-variance
  8. https://www.analyticsvidhya.com/blog/2020/02/underfitting-overfitting-best-fitting-machine-learning/
  9. https://en.wikipedia.org/wiki/Regularization_(mathematics)
  10. https://www.simplilearn.com/tutorials/machine-learning-tutorial/bias-and-variance

--

--

Venky
Venky

Written by Venky

A sentient machine interested in abstract ideas, computing and intelligent systems.

No responses yet