What we wish people would do in leadership evaluation

Juliette Phillipson

From our experience in conducting leadership programme evaluations, and from extensive review of the literature, we have been able to draw conclusions on effective components of leadership programme evaluation that facilitate ongoing development and allow organisations to increase their internal capacity to deliver effective leadership development. Effective evaluation provides opportunities for investments of time and finances to be optimised, through the improvement of programmes as well as paring down or scaling up programmes.

Our literature review found that the majority of published studies evaluating leadership interventions lack rigorous evaluation methods such as validated tools, multi-method approaches, and standardised instruments, leading to a generally poor quality of programme evaluation and a weak body of evidence to support the most effective teaching methods and content.

We recommend that programme evaluation should be considered and incorporated from the start of programme design. It may be beneficial to consult study quality checklists in the programme design phase to help build high-quality quantitative and qualitative evaluation methods into programmes. As with programme content, evaluation needs to be tailored to the specific goals of the programme, which in turn necessitates a clear definition of the goals of a programme. We recommend conducting a thorough needs assessment to ensure content, teaching methods, and evaluation components are tailored to core stakeholders, including the target audience, organisational context, and the desired outcomes. This approach facilitates ongoing evaluation embedded in the design of the programmes to allow for adaptation and the potential for improvement.

Evaluation needs to extend beyond self-assessment and participant satisfaction scores. Although cost-effective and convenient, self-reporting has significant limitations as an outcome measure as it may not accurately reflect actual behavioural changes or the impact of the interventions and is subject to several biases. Effective evaluations incorporate multiple methods, including multi-source data, validated tools, and standardised instruments. This allows for comparisons across different interventions and contexts. The use of both quantitative and qualitative measures is essential in order to both objectively assess the impact of the programme as well as provide the necessary context and depth to understand the nature and extent of this impact.

Control groups and long-term follow-up should be standard to ascertain the sustainability of the interventions’ effect and the presence of environmental confounders. Control groups could perhaps best be established through the use of stepped-wedge designs (in which interventions are introduced in steps, with each group being used as a control for another group before being included in the intervention arm). Other approaches to control groups have been used in the literature, though many of these (for example, waitlist controls or unsuccessful applicants) introduce important sources of bias. Long-term follow-up also allows for a more comprehensive impact assessment and the opportunity to assess higher-order organisational outcomes.

In conclusion, at a minimum, evaluation design should include consideration of assessment at multiple time points, inclusion of control groups, and collection of objective data, as well as collection of qualitative data from interviews, focus groups, questionnaires, or observations. Improving the quality of study design and building tailored evaluation methods into programmes will allow evaluators and educators to more effectively understand factors that are reliably associated with high-level programme outcomes. This could both inform the improvement of individual programmes and contribute to the medical leadership literature as a whole.