I saw this post on LinkedIn the other day and have been thinking about how true it is. My favorite part: in 90% of cases generalized linear regression will do the trick. This makes me feel a lot better about just how often I use it myself!
To make it easier to copy/paste:
- Data is never clean.
- You will spend most of your time cleaning and preparing data.
- 95% of tasks do not require deep learning.
- In 90% of cases generalized linear regression will do the trick.
- Big Data is just a tool.
- No one cares how you did it.
- Academia and business are two different worlds.
- Presentation is key - be a master of Power Point.
- All models are false, but some are useful.
- There is no fully automated Data Science. You need to get your hands dirty.
While I'm at it, I also came across this interesting article from Nature:
It argues that about 50% of scientists have come to believe that lack of statistical significance means the absence of any effect. The authors argue that this is an incorrect conclusion. A result that differs from zero (or a null hypothesis) but lacks statistical significance could still very well be correct - and may even likely be so. It just is not statistically provable. More data or a different method of investigation is required to get more precise estimates that are statistically significant. I know I've made this mistake a few times myself. Just the other day in fact: when I said our model showed a negative project benefit, but downplayed it by saying it wasn't statistically significant. Next time; I'll say that our best estimate is that we have a negative impact, but additional data is required to be certain.