Oz Blog News Commentary

The Evaluator General

May 29, 2020 - 11:36 -- Admin

I recently sent a couple of emails explaining the Evaluator General and also did an extended interview explaining the ideas in the context of Matt Jones’ Public Policy class at Melbourne Uni. The first email below is the one I sent him proposing that we explain the Evaluator General in terms of the course of my own thinking in developing it.


Given the subtlety of the idea of the Evaluator General – most people think of it as one idea when it’s several – one way to understand it is to go through the way in which it was the product of my own history in thinking about certain problems.

Steven Jobs talked about how life is like joining the dots backwards to go forwards and the Evaluator General is a response to these points.

  • In 1983 working for John Button as Industry Minister encountering of the Toyota production system and its extraordinary radicalism – the kind of thing for which (for once) it’s not an exaggeration to describe as a paradigm shift). For some of the flavour of this check out the productivity chart below and this video by an American Toyota engineer.
  • Narrated back to myself through a few decades of thinking, I see this as standing for
    • The importance of building accountability from the ground up from the perspective of those who are being held accountable, not those holding them to account.
    • The gravitational force of the latter (wrong) way of doing it being almost of black hole magnitude – we are close to the event horizon. Warren Buffett has a term for it from his point of view which is “the institutional imperative”. He’s talking about the institutional imperative to grow – to aggrandise the business and its managers, rather than to husband capital to the advantage of its owner. In government the institutional imperatives are different – but they contain an institutional imperative common to business and government which is the institutional imperatives of bureaucracy. This is summarised in my little aphorism “if truth is the first casualty of war, candour is the first casualty of bureaucracies”.
    • The resulting tendency for systems of accountability to become systems of accountability theatre. In that regard, this essay is intended as a practical ‘prequel’ to the idea of the Evaluator General with this speech to the Australian Evaluation Society being the philosophical prequel though reading that one is only optional :)
    • Be that as it may, there are some miraculous cases where the institutional imperative has been avoided (as Warren Buffett has avoided it). They include
      • Open-source software;
      • The Toyota production system

Not coincidentally, in both the profound, subtle and pervasive problem of truth-telling from the bottom to the top of the hierarchy appears to have been solved.


  • Arriving at The Australian Centre for Social Innovation (TACSI) in 2009 I discovered human-centred design and co-design, a powerful tool in seeking to deliver services from the perspective of the people you claim to be helping. But, despite wider claims being made for it, it is only that – a tool. It is not the system and the system is broadly correct in thinking of it as just one possible way to improve services. Neither human-centred design (or co-design) nor any other new tool contains within itself any clear recipe for the system to perform the tasks it must perform to
    • nurture good practice
    • learn from it
    • preserve and endlessly improve and optimise that learning
    • scale what works, change and if necessary abolish what doesn’t and progressively fit the relevant parts together as their roles and the division of labour between them change with learning how to improve them.
  • The Evaluator General is my attempt to build a system that might
    • Help innovations like TACSI’s be introduced;
    • Validate them against the evidence in such a way as to protect them from the institutional imperatives of managers further up the hierarchy;
    • Expand successes and improve or contract failures.
  • I did this by reference to modern political principles within the Westminster system which is to structurally separate doing and knowing. Thus Treasury is the line department responsible for advice and action to optimise growth and the ABS measures how well we’re doing in a way that’s independent of the Treasury – but nevertheless closely collaborative with it.
  • Another principle I’ve realised in retrospect is of great significance here I came across when thinking about political questions. That is the ancient Athenian term isegoria or equality of speech (or “ισηγορια” if you’re Plato, Aristotle or you’re just trying to be a smartarse)

Toyota was my first engagement with isegoria, but it rumbles on through my life – and is of great significance to public policy.

In summary, though people typically think of my Evaluator General as a top-down compliance type mechanism – using independence to browbeat the system towards addressing the objectives given it from the top, it’s actually two things and neither works very well without the other.

  • Independence
  • That independence is there not to perform ‘accountability theatre’ 1by imposing it from the top, but to build an accountability system (as Toyota did) based on the self-accountability of those in the field. This is what science does. And as Richard Feynman says, “The first principle is that you must not fool yourself and you are the easiest person to fool”.2
    • In Toyota that’s workers on the line (and beyond them suppliers and customers).
    • In government programs it’s ‘street level bureaucrats’ (and the communities they service) So it’s teachers, their students and their communities, nurses and doctors, their patients and communities, prison warders etc.


The second email explaining the Evaluator General from scratch:

As usual, I learned much from your speech on economics and the third sector. The one thing among the measures I wanted to see in the speech was some commentary on our failure to properly build evidence-based policy and practice. The cost/benefit analysis you guys have done is obviously important and even this is missing in much of the interface between government and the charitable sector.

But I think there’s something much more fundamental which receives virtually no attention because I think the people who should be leading the debate – particularly policy economists – think they know what evidence-based policy and practice are, but in fact they do not.

The most dramatic way I can suggest the potential significance of what I’m proposing is by contrasting the labour productivity achieved with the usual top-down approach to evidence-based practice with a bottom-up evidence-based practice developed in business – by comparing US to Japanese automotive productivity over the 1970s and 80s.

My argument is as follows.

  • In the charitable sector and amongst many of the social services funded by government, we are not even at the level of top-down evidence-based practise, because, as you acknowledge in your speech, we make far less use of cost/benefit analysis than we should
  • Part of the rhetoric of contracting out and purchasing services from the charitable sector involves the idea of innovation – we tell ourselves that we’ll expand the most successful projects and strategies and scale back those that don’t work.
  • But mysteriously, we’ve been saying this for at least two decades and it’s remarkable how little of this actually takes place.
  • I think I have the beginnings of a quite powerful explanation for why that is – there’s a catch 22 at the heart of this learning system that those running it don’t really acknowledge even to themselves. I document that here.
  • Further, why are we so bad at learning from what works out in the field – with or without ‘what works centres’? We think of accountability as essentially a top-down activity – imposed by those above on those below.  (Or alternatively ‘accredited’ by our researchers using the tools of their trade – CBAs and RCTs and then propagated into practice by tools of ‘translation’ such as What Works Centres). But if this is really how it should be done, why are full blown CBAs and RCTs so rarely used in business (as opposed to much lighter weight experimentation and measurement with things like A/B testing)?
  • The system must build accountability to the facts and possibilities revealing themselves in the field. But that knowledge can’t travel upwards in the hierarchy while the system is engaging in accountability theatre and those above are holding those below ‘accountable’. How can they know what those below should be improving, what innovations will be most promising to try if they do not understand the conditions in the field and if those in the field may be penalised on the basis of information they pass up the line. In these circumstances, candour about what is and is not working is replaced by the white out of accountability theatre.A proper accountability system needs to:
    • be focused not just on measuring the system, but principally measuring it with a view to learning and improving it. 3.
    • learn from the field (or the bottom of the hierarchy) where most of the existing knowledge will be and most of the learning needs to take place,
    • have that learning objectively validated, so that ‘experts’ and the domain knowledge on which they draw remain accountable to the emerging evidence,
    • have that validated learning given appropriate weight against senior managers responding to institutional imperatives. For it is in this step that what I call ‘accountability theatre’ actively displaces true accountability for understanding what’s going on.
  • Believe it or not, this is what Toyota achieved in its development of a new way of managing car manufacture. It did so by spending literally ten times the industry standard amount on employee training, training shop floor workers to understand and manage the CNC (Computer numerical control) machine tools and then building the company’s accountability for its own productivity on the self-accountability of shop floor teams.
  • My own proposal for an Evaluator General tries to build the same system for the more complex, and ‘social’ world of delivering services to improve social wellbeing. It’s based on
    • Structurally separating doing things from knowing how well they’re performing. This occurs at the agency level within government where an agency like the statistical office will measure inflation and unemployment independently of politicians or the agencies whose performance will be measured by reference to those numbers.
    • Seeking to do this not just agency by agency, but in principle to anywhere where an agency’s work is.
    • Seeking to build close cooperation as well as structural independence between knowing and doing and from that
    • A system of evidence based professional knowledge and accountability built from self-accountability in the field with learning built from that. (As the great scientist Richard Feynman put it “the first rule of science is that you must not fool yourself, and you are the easiest person to fool”.



  1. or to be generous to sceptics to accept the inevitability of accountability theatre but to try to change the terms on which it takes place.
  2. This is also Adam Smith’s idea when he talks of the impartial spectator as the foundation of morality (and implicitly of knowledge).
  3. For instance, in my experience, the wellbeing agenda has often been measurement heavy but learning light – and this is true of the latest poster boy of New Zealand’s wellbeing budgeting. The measurement system they’ve built aims to be able to tell you Maori wellbeing in Rotorua but is not being built to ensure that those measurements help identify how to improve it