Over the past decade economics has experienced a "randomista" revolution where randomized control trials (RCTs) have become the dominant approach to hypothesis testing. Nowadays PhD students and tenure-track assistant professors can be found all over the world designing small RCTs to test ideas ranging from the obvious (does school feeding increase attendance?) to the truly exciting (do unconditional cash transfers reduce HIV risk?). The strength of a well-designed RCT is its internal validity, or unparalleled ability to establish causality; the randomistas have built their case around this idea. What is less frequently discussed is the concept of external validity, or the extent to which a study result is applicable in other settings--whether a result can be generalized. Here, the case for RCTs is much weaker. In policy research we need both internal and external validity. A great RCT with airtight internal validity, but no real world relevance or applicability is useless for policy-making. The best studies require both internal and external validity.
There are at least two inherent reasons why RCTs suffer from a lack of external validity. First, in order to make the claim for causality airtight, the researcher typically tries to control and manage every detail of the intervention, ensuring that the intervention group receives the treatment and that the control group does not receive any other comparable form of treatment. The second reason follows from the first; in order to be able to control every detail of the study one needs a rather controlled or localized environment.
Both these features of the typical RCT almost automatically reduce their external validity. In the real world, not only do beneficiaries sometimes fail to receive the treatment (think about adherence to drug regimens or large scale food distribution) but non-members of program usually receive other forms of treatment, eliminating any form of control group. In fact, the absence of a "clean" control group is often a good thing, as in most rural districts in Africa, communities purposely try and spread out benefits so that each household receives something. Meanwhile, localized RCTs raise the question of whether the results would hold true in different regions and among populations with different characteristics.
Both these features of the typical RCT almost automatically reduce their external validity. In the real world, not only do beneficiaries sometimes fail to receive the treatment (think about adherence to drug regimens or large scale food distribution) but non-members of program usually receive other forms of treatment, eliminating any form of control group. In fact, the absence of a "clean" control group is often a good thing, as in most rural districts in Africa, communities purposely try and spread out benefits so that each household receives something. Meanwhile, localized RCTs raise the question of whether the results would hold true in different regions and among populations with different characteristics.
Perhaps the most insidious aspect of over-emphasizing internal validity is that it leads to research on interventions and behaviors that are most readily subject to the conditions required to perform an RCT, and these become the programs that receive attention in the development debate. If we follow the extreme position of the randomistas we run the risk of sidelining potentially useful development ideas simply because they are not amenable to RCTs.
In policy research what is needed is a two stage decision tree. First, all studies that pass a minimum internal validity bar are considered (there are more ways to establish causality than an RCT--James Heckman won the Nobel Prize in Economics for his work in non-experimental program evaluation). Then, among those studies that pass the internal validity test, those that have greater external validity would rank higher. We should also appreciate that established programs operating at large scale deserve to be evaluated: for such programs, messy non-experimental methods are the only approach possible. Evaluations of globally influential large-scale programs such as South Africa’s Child Grant or Brasil’s Bolsa Familia stand less of a chance of making the pages of the journals published by the American Economic Association than an RCT on a largely irrelevant topic using U.S. college students as subjects.
If want our best, most creative minds working on problems that can contribute to policy and people, we need to take a stand in defense of external validity.
In policy research what is needed is a two stage decision tree. First, all studies that pass a minimum internal validity bar are considered (there are more ways to establish causality than an RCT--James Heckman won the Nobel Prize in Economics for his work in non-experimental program evaluation). Then, among those studies that pass the internal validity test, those that have greater external validity would rank higher. We should also appreciate that established programs operating at large scale deserve to be evaluated: for such programs, messy non-experimental methods are the only approach possible. Evaluations of globally influential large-scale programs such as South Africa’s Child Grant or Brasil’s Bolsa Familia stand less of a chance of making the pages of the journals published by the American Economic Association than an RCT on a largely irrelevant topic using U.S. college students as subjects.
If want our best, most creative minds working on problems that can contribute to policy and people, we need to take a stand in defense of external validity.