How RCTs inform scaling decisions.

By Ken Chomitz, Chief Economist

May 10, 2024

In the first of a series of blogs that look at GIF’s use of evidence and practice of scaling, Ken Chomitz examines the role of randomised control trials (RCTs).


Early lessons from GIF’s portfolio

GIF pursues social impact the way venture capital pursues profit. We fund promising innovations, hoping to launch them on a journey to scale but mindful that innovation is risky. 

To mitigate risk, we practise stage-based investment. Small pilots test an innovation’s feasibility on the ground. An innovation which has a pilot under its belt can proceed to a demonstration phase, with greater funding. Innovations with solid demonstrations are eligible for still more expansive funding at what we call scale – but even here, the goal is to pave the way to follow-on support by funders with deeper pockets than ours.

At the end of each stage there comes a decision point: double down on investment, or abandon? It’s a tough choice. Innovation is a journey of exploration with inevitable dead ends and diversions. Initial failure can spur learning and eventual success. However, the power of stage-based investing lies in the option to abandon ventures with dim prospects. On what basis should we decide: go vs no-go?

The answer that underpins GIF’s model is evidence. The use of rigorous evidence – including randomised controlled trials (RCTs) – is baked into GIF’s founding documents. Although RCTs are not the only source of evidence for GIF, they have played a prominent role in deciding what to fund, what to scale up, and how to do so.

During GIF’s lifetime, the use of RCTs in economic development has massively expanded and practice itself has evolved. At the same time, practitioners have questioned: when is an RCT the right tool, when should it be deployed, how should its results be used?

Here are some preliminary reflections on those questions, based on an evaluation of GIF’s first five RCT-related innovations. 

How did the innovations fare?

  • CCTs for Immunisation (mCCTs) showed that small incentive payments could cost-effectively increase uptake of child immunisation in Pakistan. Further, the RCT showed which combination of incentive amount and timing had the greatest impact. The results were crucial in securing funding and government support for scale-up. By 2025, two million children are expected to benefit. This is a best-case scenario for RCT effectiveness.
  • Youth Impact (YI) tested whether a brief educational session – just a few hours -- could empower young women in Botswana to avoid HIV infection and unintended pregnancies. Preliminary RCT results were not supportive, and GIF declined further support. Nonetheless, the program went on to scale.
  • Labelled Remittances (LR) tested an intriguing proposition: that simply labelling the intended use of Philippine migrants’ remittances to their family would increase the amount remitted and steer the funds to the desirable use. The RCT did not support the proposition – a disappointing but valuable finding.
  • No Lean Season (NLS) sought to increase the income of poor rural households by offering small travel subsidies to farm workers, enabling them to migrate to find work during the ‘lean season’ of the year when the demand for labour falls. Prior, smaller scale RCTs had shown favourable results. But this one didn’t. It’s possible that the problem was with implementation rather than the fundamental approach. In any event, NLS did not proceed to further scale.
  • Reducing Anaemia (RA) intended to test an approach to fighting anaemia in Tamil Nadu, India. The innovation was to fortify rice that was already being distributed free of charge. The study was discontinued due to a variety of implementation problems and the RCT was never conducted.

Was an RCT the right approach? 

Four of the five innovations were behavioural nudges. These are well-suited to RCTs. The nudges can be offered to a randomly selected treatment group. Outcomes are compared to those in a control group. This allows researchers and policymakers to test whether the nudge really had an impact. (Without an RCT, one might argue that the causality worked in reverse: people tending toward better outcomes are more likely to accept the nudge. For instance, girls interested in signing up for a training course on HIV prevention might be at lower risk than others.)

However, none of the grant proposals considered any alternatives to an RCT or justified why an RCT was the best methodology for the study. 

Were these RCTs the right approach at the right time? The record is mixed: two that were clearly the right approach, two that might have benefited from better advance planning, and one that foundered on logistical challenges.

  • For mCCTs, implementation built on prior experience but still was able to adapt in progress to changes in the regulatory environment. The multi-arm RCT design provided valuable information on the cost-effectiveness of different designs and levels of incentive, applicable to the next stage of scale up. GiveWell, a donor and advisor to philanthropists, had supported the RCT. After its completion, GiveWell’s recommendation was important to mCCTs’ receipt of a major grant. The RCT results also led to the government’s decision to scale up implementation.
  • LR made a significant change to the original design, but was able to complete the RCT, with informative results. 
  • NLS built on a sequence of prior RCTs, incrementally scaling up to larger populations based on prior learning. A retrospective on the 2017 RCT reported that “subsidies mainly reached those who would have migrated anyway, and the programme was promptly discontinued” and that discontinuation was justified. However, the grantee hypothesised that potentially-correctable implementation issues were responsible for the outcome. 
  • YI, in retrospect, might have collected more descriptive data up front or conducted multiple A/B tests, to test assumptions and modify the program further and sooner. However, the RCT ultimately provided valuable lessons that YI could use to further refine the program, and the organisation has incorporated rapid A/B testing in implementation and scale up.
  • RA did undertake pre-testing of the fortified rice’s acceptability to consumers, but was unsuccessful in working out manufacturing and distribution, leading to cancellation of the RCT. 

Did the grants position the investees to inform and influence scale-up?

The two innovations (YI and mCCTs) that proceeded to scale benefited from strong partnerships with the scaling government. The grantees combined research and implementation experience; mCCTs is a particularly good example. IRD, the researcher/implementer, had a long-standing relationship with local immunisation authorities and had been involved in setting up an Electronic Immunisation Registry. IRD consulted with other donors and implementors. However, initial government support doesn’t guarantee scale-up, as shown in the case of RA. 

The other innovations lacked well-defined scale plans. GIF recognised from the start that securing funding and buy-in for NLS would be challenging. LR lost its initial bank partner. The weak results of its RCT rendered moot the question of scale.

Communications plans are potentially critical to the scaling process. Explicitly or implicitly, the purpose of an RCT is to provide actionable information to decision-makers and stakeholders. Only two of the five investees produced communications plans – the two successful scalers, mCCTs and Y1.

Some reflections

It is perilous to generalise ‘lessons’ from a small sample. GIF has several additional RCT-related investments that are ripe for evaluation, and they will provide a wider evidence base. Nonetheless, the following propositions emerge for consideration.

1. Prioritise innovators who have experience in implementation as well as research, to have strong ties to scalers-up.

2. Establish the purpose of a proposed RCT, and consider against alternatives

Proposals should be clear on the purpose, relevance, and cost-effectiveness of an RCT. For instance, it is to establish causality, test innovation design, measure cost-effectiveness, or a combination? Is the innovation ripe for an RCT, or is it more appropriate to test operational or design issues first? To what extent is the journey to scale iterative, and how relevant will these results be to the decision to scale or replicate?

3. Ensure that there is a strong communications plan

Communications are key for ensuring that scaling happens; they are necessary throughout the scaling process to keep stakeholders informed and engaged and at the end, to share results to either encourage further scaling if the results are positive or to share scaling lessons from projects that failed to scale.

4. Examine how formative or operational evaluations fit into GIF’s Investment Policy

For some innovations, operational issues – such as how to ensure fidelity of implementation – may be critical areas for testing. Yet these may not fit well into GIF’s current criteria for Test & Transition-stage (demonstration) investments, nor be feasible within the tight budgetary limits of the Pilot stage. 

5. Be flexible

For all five cases, implementation deviated from the original plan. Flexibility to deal with regulatory changes and implementation setbacks is essential.