Thanks for your perspectives! It also takes a hell out of a lot of time to do, even with small(ish) data sets and random search (instead of exhaustic grid searches). Ultimately it's about aiming to prevent leakage of information from the "unseen" data, and if hyperparameters are tuned on some of this "unseen" data, then it's not entirely unseen. But it's not always appropriate or necessary, and as with everything it has its pros and cons and these should be considered with respect to the problem.