Data Science & Machine Learning

Data Bias

The section below contains a bullet point summary from this excellent article from geckoboard...

https://www.geckoboard.com/best-practice/statistical-fallacies/https://www.geckoboard.com/uploads/data-fallacies-to-avoid.pdf
  • Cherry Picking

The practice of selecting results that fit your claim and excluding those that don’t. The worst and most harmful example of being dishonest with data.
  • Data Dredging

The failure to acknowledge that the correlation was in fact the result of chance.
  • Survivorship Bias

Drawing conclusions from an incomplete set of data, because that data has ‘survived’ some selection criteria.
  • Cobra Effect

When an incentive produces the opposite result intended. Also known as a Perverse Incentive.
  • False Causality

To falsely assume when two events occur together that one must have caused the other.
  • Gerrymandering

The practice of deliberately manipulating boundaries of political districts in order to sway the result of an election.
  • Sampling Bias

Drawing conclusions from a set of data that isn’t representative of the population you’re trying to understand.
  • Gambler's Fallacy

The mistaken belief that because something has happened more frequently than usual, it’s now less likely to happen in future and vice versa.
  • Regression Toward the Mean

When something happens that’s unusually good or bad, over time it will revert back towards the average.
  • Hawthorne Effect / Observer Effect

When the act of monitoring someone can affect that person’s behavior.
  • Simpson's Paradox

A phenomenon in which a trend appears in different groups of data but disappears or reverses when the groups are combined.
  • McNamara Fallacy

Relying solely on metrics in complex situations can cause you to lose sight of the bigger picture.
  • Overfitting

A more complex explanation will often describe your data better than a simple one. However, a simpler explanation is usually more representative of the underlying relationship.
  • Publication Bias

How interesting a research finding is affects how likely it is to be published, distorting our impression of reality.
  • Danger of Summary Metrics

It can be misleading to only look at the summary metrics of data sets.