Wednesday, November 10, 2010

Theory of Change

One of the most discussed problem on the 4-day HBS course "Governing for Nonprofit Excellence", both on the course as well as off the course, is how to measure the impact of a strategy, an organization and how to measure the performance of an organization as well as its parts.

It is one of the most interested questions by almost all attendees. And it is also important for the WikiMedia Foundation.

Unfortunately, there is no easy answer to this question, not even from HBS.

The researchers at HBS consider the question in a very systematic way: At first, every organization has outputs. Outputs are things an organization can influence directly with its strategy and action. And they can be measured directly. In comparison to the outputs are outcomes. In HBS jargon outcomes are effects of an organization with their output. It is less in control of the organization, it is more a public effect. The sum of all outcomes are called impact by the HBS researchers.

For the WikiMedia Foundation, the number of articles is an obvious output. In issuing different policies we can (or can try to) influence this output. It is easy to measure. WikiMedia has a lot of such measurable values, like length of articles, article depth, visitor counts, etc. These are all what we often call metrics when we are discussing on our mailing-list or in the projects.

As everyone of us know, who had took part in these discussions, these metrics are no good measurements. The reason from them to be not good is that one can interpret them in a lot of ways. And they do not necessarily correlate with the outcome we wish.

The outcome we want to achieve is higher quality of our articles, more penetration of our projects, more participation of our users, more diversity of our projects, etc. And these are not so easy to measure.

Let's take the example of article quality. I know discussion about article quality since I joined our editing community. How can you design a measurement for so much articles in more than 270 languages in topics as different as top quark and Professor Layton and the Eternal Diva? The most obvious suggestion is article length. But the sole length of an article doesn't really reflect the quality of an article. An article could be very long, but still badly structured, poorly referenced and contains strong point of view. The article depth is a more sophisticated approach which treats a language version as a whole and tries to calculate how often the articles are updated. Beside technical and methodological problems there are also other difficulties in measuring quality. The perception for a good article and a bad article can differ between the editing community, the general public and experts of their fields. Each of these groups can have different criteria for quality of articles. For example the general public may value an article as higher quality because it is more comprehensible, but comprehensible may mean for an expert explanations that contain more ambivalent and misleading analogues.

Because of the difficulty of measurement of outcomes there is often a big gap between the measurable output of an organization and its impact. This problem is annoying for most of nonprofit organizations and highly uncomfortable for their boards. Nevertheless most of the organizations believe that they achieve impact with their work. The HBS researchers call this believe Theory of Change. It is a hypothetical and in many cases unproved theory about if we do this, than we will change the society in that way, and that would lead to the fulfillment of our mission. Most strategies of nonprofit organizations are based on theories of change.

So the theory of change of article length is that longer articles tend to contain more information, tend to be more thorough and thus of higher quality. The theory of change of article depth is that if more updates are done on a language version, then we can assume that the articles are more up-to-date, and more failures are corrected by the editing community, and thus better articles.

But as the many discussions in the past and current suggest these are all hypothetical theories and we don't really know.

The best way to proof the theory of change is to measure the outcome. As I had already written before, this is not easy. In many cases the organization also has no resource in know-how, man power and money to conduct a measurement or survey. The WikiMedia Foundation and our communities had in the past conducted a score of experiments and methods to measure the quality:

The featured article is doubtless the most successful of these. It is a measurement from the view of our editing community for high quality articles. Across all projects the threshold for featured articles are very high. With the public policy project WikiMedia Foundation began in the last months a test on user feed backs about quality from the reader perception. Although there were some outside evaluations with experts like those conducted by Nature or c't, these evaluations are often of too small a scale and not consistent enough to give us an overall trend over the years and across the language versions, or in more general fields.

So one of our major tasks in the coming years is still to find a way to bridge some gaps of theory of change. And one of the tasks of the board would be to engage our community and outside experts to free their resources and expertise to help us in this endeavor.

