”Since all models are wrong the scientist cannot obtain a “correct” one by excessive elaboration. ... Just as the ability to devise simple but evocative models is the signature of the great scientist so over elaboration and overparameterization is often the mark of mediocrity.” - George Box '76 Science and Statistics
A few years ago we wrote a bit about how experimentation, and I think data science more generally, can take ideas from Box (of which many of these ideas predate Box) and use them to be more useful. Shout to Ben Labay - his earlier post reminded me to revisit this.
Box's paper on the nature of scientific approach, I think is extremely useful reading for anyone in analysis and AB Testing as it provides a deep foundational grounding and principals to guide how to think about experimentation and analytics.
It also serves as a counter weight to the industry pressure to use more complex approaches, use of which is often more a shibboleth than of any actually utility wrt a task/problem. More often than not, use of advanced methods for their own sake are a more is less and the simpler approaches have a less is more outcome*.
Of course, depending on the problem, it can make a great deal of sense to use more complex, advanced approaches, but these should be considered at the margin for the specific problem at hand - is the additional complexity worth its cost for THIS task? Is so, then it is perfectly rational to do so.
Link in comments. Also feel free to comment - I assume not everyone agrees but even so, I do think it is worthwhile to be intentional and to explicitly consider the marginal value of complexity.
I should also note that I didn't always see it this way. I too, used to want to use and apply more advanced approaches in order to come across as more legitimate and to be part of the cutting edge. It was almost always a mistake for the users. It wasn't until early days at Conductrics INC that the light went off on my admittedly dim bulb 😉. As far back as 2010, Conductrics originally treated optimization as a multi-state RL problem, where we used Temporal Difference (TD(0)) as a message passer between peers of contextual bandits. The original system used online SGD to estimate value functions using an approx. normalized radial basis function net. I was experimenting running online K-means as a preprocessor to find centroids and length scales using the distance between tessellations as input to the kernel functions (e.g. squared exponential and polynomial). One night it just dawned on me how idiotic it was. Not that what we were doing was wrong, or wasn't done in a principled way. Just that it was way too complex to be reasonable given the context that it would most likely be used.
Now I believe very strongly that time is almost always better spent better defining the problem and trying to think about ways to solve a related relaxed problem, in order to use a simpler, more robust approach.