What works for whom, where, when, and why is the ultimate question of evidence-based software engineering. Still, the empirical research seems mostly concerned with identifying universal relationships that are independent of how work settings and other contexts interact with the processes important to software practice. Questions of “What is best?” seem to prevail. For example, “Which is better: pair or solo programming? test-first or test-last?” However, just as the question of whether a helicopter is better than a bicycle is meaningless, so are these questions because the answers depend on the settings and goals of the projects studied. Practice settings are rarely, if ever, the same. For example, the environments of software organizations differ, as do their sizes, customer types, countries or geography, and history. All these factors influence engineering practices in unique ways. Additionally, the human factors underlying the organizational culture differ from one organization to the next and also influence the way software is developed. We know these issues and the ways they interrelate are important for the successful uptake of research into practice. However, the nature of these relationships is poorly understood. Consequently, we can't a priori assume that the results of a particular study apply outside the specific context in which it was run. Here, I offer an overview of how context affects empirical research and how to better contextualize empirical evidence so that others can better understand what works for whom, where, when, and why.