Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Prometheus Assessed?: Research Measurement, Peer Review, and Citation Analysis
Prometheus Assessed?: Research Measurement, Peer Review, and Citation Analysis
Prometheus Assessed?: Research Measurement, Peer Review, and Citation Analysis
Ebook448 pages

Prometheus Assessed?: Research Measurement, Peer Review, and Citation Analysis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

This book examines the problems, pitfalls and opportunities of different models of assessing research quality, drawing on studies from around the world. Aimed at academics, education officials and public servants, key features include an overview of the argument of whether research should be assessed and how research quality should be determined. Prometheus Assessed? offers a survey of research assessment models in the US, UK, Japan and New Zealand and includes an examination of citation analysis and comparison between the different models.
  • Should research be assessed and what is research quality?
  • Survey of research assessment models in US, UK, Japan and New Zealand
  • Examination of citation analysis
LanguageEnglish
Release dateApr 11, 2012
ISBN9781780633015
Prometheus Assessed?: Research Measurement, Peer Review, and Citation Analysis
Author

Shaun Goldfinch

Shaun Goldfinch is an Associate Professor at the Nottingham University Business School, University of Nottingham.

Related to Prometheus Assessed?

Language Arts & Discipline For You

View More

Reviews for Prometheus Assessed?

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Prometheus Assessed? - Shaun Goldfinch

    Science.

    1

    Prometheus assessed?

    Abstract:

    We outline the key arguments of the book, before comparing bibliometric, panel review and decentralised models. We then argue that the philosophy and sociology of science provide considerable illumination to the problems of research assessment, before outlining in greater detail the structure of the book and the key topics and arguments of its chapters.

    Key words

    research assessment

    panel review

    citation analysis

    peer review

    philosophy of science

    sociology of science

    Popper

    fallibilism

    Lakatos

    Feyerabend

    Foucault

    social science

    positivism

    hermeneutics

    Prometheus assessed? The commonly used Promethean metaphor – perhaps cliché – of science as a creative, innovative, changing, and even explosive force, always pushing the boundaries of accepted belief and understanding, seems anathema to the whole concept of assessment, measurement and bureaucratic control. In this noble vision, disinterested seekers after the truth – working in an autonomous, democratic, collegial and self-regulating community of scholars and researchers – develop and pass on this knowledge, as Prometheus to human kind.¹ Science develops along its own paths, through serendipity, individualism and heroic achievement. Assessment by outsiders and bureaucratic interference can only pervert and derail this endeavour.

    But there have always been standards of measurement in the processes of science and ‘knowledge creation’ and restrictions on the autonomy of scientists and scientific communities. Power, hierarchy, the very nature of the disciplines themselves, have selected and codified what it is considered at any one time acceptable knowledge. Governments, businesses and military-industrial complexes have steered the social and physical sciences in particular directions. Peer review, referees and editors of journals, learned societies, editors and publishers of books, key intellectual figures, departmental heads, university and science administrators, appointment committees even, have filtered and shaped these disciplines. Hopefully the truth will out, but perhaps not or perhaps not yet; and only the most idealistic or naive commentator would ignore the politics, ideologies, values, power, pecuniary interests, conformity and herd mentalities, pettiness and personalities that are part and parcel of academic and other research.

    What has changed then is not that assessment has been introduced, but that it has intensified. It has in some cases been centralised to state agencies. Different assessment measures, and different agents of measurement, have been introduced. Public money is being spent; the fashion of the day is to evaluate how successfully this has been done and whether ‘good’ results are being achieved. A trend towards performance evaluation and a suspicion of professional self-regulation and autonomy to some extent underpin these models, and they reflect wider developments in public sectors influenced by international trends such as New Public Management (NPM) (Goldfinch and Wallis, 2010).² Accountability is a key theme in this process, just as it became a key concern of the wider public sector in recent times. Individuals and organisations must make account of, and be blamed (and hopefully rewarded) for their (in)actions and their spending of public money.

    In any event, whether or not one can, or indeed should, measure and rank performance seems an argument that is lost. This is at least if we talk in public policy terms, and at least for the foreseeable future. As this book outlines, a number of countries have introduced Research Assessment exercises of various forms, with often a central body or bodies responsible for assessing research quality, albeit with often a considerable array of measures and variety as to what is considered research. These evaluations may have direct funding implications for organisations, or they may form parts of larger evaluation mechanisms that have funding implications. Indeed, organisational and individual status and reputation and the allocation of large sums of public money may depend on the outcome (Alexander, 2000; Stolz et al., 2010). None of these models are without flaws, and indeed parts of some are of questionable usefulness. Some perhaps are even counter-productive. Research assessment itself may or may not be a good idea in a more general sense of course, and we also touch on this in this book.

    At the risk of caricature, research assessment models seem to consist of three broad types, as follows.

    Bibliometric measures

    Bibliometric measures include citation analysis, publication counts, ‘journal rankings’, and other measures such as patents and plant varietal rights. Such measures have certain benefits, it is claimed. They are often portrayed as ‘objective’ measures. They are generally quantitative in nature. They seemingly remove the element of qualitative and subjective assessment, with all its attendant problems of personality and politics and fallible human nature, found in peer and panel review models. Research is good, it is claimed or assumed, because it is published in a highly rated journal, because it is highly cited by other scientists or scholars, or because it has led to a patent or commercial outcome. Quantitative measures have an apparent ‘scientificity’ and an apparent ease and lower cost of measurement. Increasing mechanisation of such measures through citation and other databases such as the Web of Science and Google Scholar magnify these apparent beneficial effects.

    However, cost-saving, quality and the ‘objective’ status of bibliographic measures remain highly contested, as we will examine. Citations themselves, and publication decisions, are the outcome of social processes, some of which are highly flawed and/or subjective. It is simply naive to view bibliometric measures as objective. There are no simple relationships between citation, publication and quality; and the value of bibliometric measures differs across disciplines. In some fields for example, such as the creative ones, citations are essentially useless. They are of mixed use in the humanities, and critics even debate whether they serve well for the social and physical sciences. Chapter 2 examines these issues in greater depth, including looking at the Excellence in Research for Australia (ERA), which used journals lists and citations in its first assessment. Chapter 3 examines peer review and publication generally, particularly the limitations of article refereeing.

    Panel review models

    In panel review models, a panel of experts review and rate submitted evidence of research and research excellence. For proponents of panel review models, this maintains some scholarly control of research assessment – assuming of course those ‘peers’ on review panels are themselves scholars, or even that they are ‘peers’. Such panel review, it is proposed, allows experts to oversee quality research, by directly assessing and/or reading (some) of the research itself, and other evidence of its, and the subject of examination’s, excellence. Experts are able to assess evidence of research excellence, because of their own purported excellence in the field, however that might be defined. Such peer review circumvents the problems of so-called objective measures such as citation and journal publication counts by engaging with the quality of research itself, it is claimed. These peer review panel processes are sometimes portrayed as selfless, sometimes disinterested, technocratic exercises.

    However, panel review processes are also highly flawed, particularly but not only in small societies, with the usual problems of mixed expertise, sometimes cronyism, and the biases of individual and collective decision-making. How much research is directly assessed seems to vary greatly, and some indicators used in some models seem to have little, or at best a highly indirect, relationship to research output and quality. Indeed, some assessment models may not be primarily measures of research performance at all. The bureaucratic requirements of panel review are onerous and expensive, questioning whether their contested benefits are worth the cost. Panel review is central to the British Research Assessment Exercise (RAE) and its successor model the Research Excellence Framework (REF), the New Zealand Performance Based Research Fund (PBRF), and forms a large part of the Japanese assessment of research of the National University Corporations. A chapter is dedicated to each of these.

    Decentralised models

    In this model, perhaps exemplified (at least in theory) by private colleges and particularly Ivy League universities in the United States, there are a variety of decentralised models of assessment, with each university, and academic or science organisation having the autonomy to assess particular individuals, particular organisations, and their research and other expertise as they see fit. To an extent, universities then compete for reputation, research money and student enrolments. Indeed, as far as the golden age of scholarly self-government ever existed, perhaps aspects of the private US model remain closer to this than more centralised university and science systems found elsewhere. The dominance of US universities in international rankings and the leading role the United States plays in a number of research areas suggests, at worst, that this competitive model has not harmed research performance.

    However, US universities themselves can be highly bureaucratically controlled, and scholarly self-government should not be overstated. Public university systems in the United States do have considerable oversight of various types in the various states, and some evaluation models with funding implications include a limited research performance component. Inherent problems with peer review, the use of bibliometric measures and research and disciplinary culture in general do not disappear in the US model, and may even be intensified. The United States may, of course, have the luxury, due to its size and wealth, of allowing a highly competitive model, with its elite universities with massive private endowments, down to a long tail of less than perfect organisations, and some that are decidedly substandard.

    But models overlap

    As we will examine, despite often being portrayed as such, none of these three types of assessment are mutually exclusive models. Citation models inherently involve peer review in terms of acceptance for publication in journals and in acceptance of research worthy of being cited. There will always be ‘qualitative’ and subjective elements to this. However, perhaps the community of peers doing the assessment in citation measures is greater than a panel. Some panel review models explicitly use quantitative measures such as citations as evidence of such measures as ‘peer esteem’ or research quality, and can be influenced in their assessment of research by journal counts and journal rankings, which also draw on bibliometric measures. Indeed, the Australian ERA explicitly mixes elements from different types of research assessment. Decentralised measures use a variety of measures to assess research and other aspects of quality, including citations and publication counts, peer assessment including voting for tenure, and so on. As such, these three supposedly independent methods of assessment are highly interdependent and overlapping. The process and development of science is an inherently social activity. All of these assessments of research involve humans, and like any situations involving humans, biases, values, interests, politics and power play a part. How much a part is, of course, highly disputed. As such, the search for a simple, objective measure of research, unalloyed by these issues, is perhaps a vain one. Indeed, it is a potentially harmful misreading of the nature of research and its assessment, and the limitations of these assessments.

    The philosophy of science and research assessment

    How and what constitutes acceptable reasonable knowledge differs between disciplines, between social sciences, the humanities and the natural sciences, as do methods of research, study and methods of persuasion. Disciplines themselves are unstable and change over time with ‘what is called a discipline… [being] a complex set of practices, whose unity, such as it is, is given as much by historical accident and institutional convenience as by a coherent intellectual rationale’ (Collini, 2001). Indeed, there is a large body of research that examines how science and research has developed over the centuries, its limitations and successes, and how it is (and/or should be) carried out – in the broad fields of the philosophy, history and sociology of science. This literature is often neglected in studies of research assessment, which often treats the process as a technocratic and generally unproblematic one. This neglect can lead to a general naivety and inferior policy design decisions.

    In particular, the philosophy and the sociology of science focus attention on several key issues of particular relevance for research assessment.

     First, the highly contested nature of knowledge and its claims to truth.

     Second, the highly provisional and perhaps temporary nature of knowledge with it liable to challenge and abandonment over time, albeit within periods of stability.

     Third, the ways beliefs, and what is accepted knowledge, are formed, and the processes of such in the development of science and research. In particular, the social processes involved in all of these, particularly the power, discipline and high levels of conformity involved.

     Fourth, the highly contested methodological issues in all disciplines, but particularly in the social sciences. As such, cautions are raised against attempts to provide one method in various fields.

     Fifth, the problems of finding models of assessment that can apply across highly variable fields that may use different and perhaps mutually incompatible methods. Indeed, despite numerous claims otherwise, there is no one accepted method of doing research, nor one type of science.

     Sixth, Continental and other philosophers focussed attention on the job-like and imperious nature of some types of research, and questioned some of its overblown claims to rationality and objectivity.

     Seventh, the difficulty of assessing activities termed ‘research’ in some assessment models, particularly in the arts and to some extent in the humanities, where there may be little basis to do this beyond highly subjective taste.

    There is not necessarily consensus. Views on reality can range from a world (sometimes called radical empiricism or some variant of naive realism) that exists out there with accessible ‘facts’ easily amenable to testing, measurement and/ or corroboration of hypotheses, untainted by values, and where results are self-evidentially true or not; to other extremes where the truth claims of science are questioned entirely, where the natural and social sciences are seen as no more valid than the ‘knowledges’ of pre-scientific people or myths and religion, where ‘facts’ themselves are largely social and/or power constructions that can serve such things as interests, class and/or gender, floating detached from a ‘real world’ (which may or may not exist). Rationality, research, scholarship and science might be glass bead or linguistic games played within their own rules – that do not, or may not, correspond to a real world and/or only have notions of and reference to themselves. Or rationality might be rejected altogether. We are often seen to experience the world filtered through various perceptual, conceptual, linguistic, cultural and/or theoretical lenses, with a distinction seen between things as they are (assuming there is a real world existing outside our perception at all), and things as they might appear to be to us; while our perceptions are subjective and error-ridden. The autonomy of science from wider society, business and state, both in choosing what to study and how to interpret and use findings of research, is debated at length (Feyerabend, 2010; Mirowski, 2004), as in the role of values is initiating and judging research.

    Given the vastness and complexity of these issues, we focus on several influential thinkers to examine them in greater depth, and to draw out their relevance for research assessment.

    Popper and fallibilism

    For much of the early twentieth century, there was a focus on defining a unifying method for science, indeed of defining what science was and was not, drawing on the methods of logic and experimental methods in science. The mid-twentieth century ‘logical positivists’ in particular combined logical methods, a strongly held empiricism, and held that only those statements that could be verified by evidence were meaningful. It is a widely held, but not unanimous, belief that this endeavour to find a unifying method failed.

    The contemporary and critic of the logical positivists, the highly influential Austrian-born philosopher Karl Popper, proposed that science could be distinguished not by the verification of its statements or theories by the accumulation of facts and cases (such as through inductive methods proposed by the logical positivists), but instead in its potential for ‘falsifiability’ (Popper, 2002 [1959]). This was the notion that science was distinguished by its ability to be proved wrong by empirical research. The claims of some fields like religion or Freudian psychoanalysis are generally irrefutable and are largely articles of faith, and hence at best pseudo or non-science. Induction itself was rejected as a basis of truth or certain knowledge; that because All Swans Are White (until now) does not mean the next one will not be a Black Swan, from West Australia. Knowledge itself was not simply derived from empirical experience, but was shaped by conjecture and theory, and Popper proposed bold hypothesising about the world to be tested through empirical work. Knowledge and theories could never be entirely certain and as such were only provisional, but their survival through testing, and their ‘fruitfulness’ increased confidence in their usefulness; however this could only be an inconclusive and fallible confidence. Whether statements and theories that survived such serious tests could be said to be ‘true’ is contested (Andersson, 2009; Gorton, 2006; Musgrave, 1993; 2009; Psillos, 2000; 2003). Popperian and similar approaches are often termed fallibilism.

    Such a notion of division of studies into science and non-science has particular implications for fields not considered science in this formulation, particularly in the context of research assessment. How does one assess faith? What and/or who decides what non-scientific disciplines are meaningful and what constitutes research in these instances? Where the term ‘research’ might be applied to artistic or cultural or humanistic works as in some assessment models, might assessment simply be a highly subjective aesthetic decision? Or just taste? Popper himself noted that while ‘metaphysical’ notions and persuasive texts or interpretations were unfalsfiable and hence unscientific, they could still be subject to critical debate and perhaps fit with other knowledge, and as such contain a degree of plausibility (Gorton, 2006). To put this more broadly, non-scientific disciplines could still be rational ones.

    While a powerful notion with strong normative appeal and something that is at least given lip service in the physical and other sciences, falsifiability is sometimes seen as lacking as an account of how natural science, let alone other disciplines, is carried out. Historians and sociologists of science in particular, developed accounts of how scientific and other ideas came to be accepted, and questioned whether falsifiability was widely used in the processes of natural science (Feyerabend, 2010; Kuhn, 1996). Ironically falsification seems to be perceived by some practitioners (or at least journal referees) as a failing of research, and negative or ambiguous findings that go against the predominate theoretical orthodoxy can often be rejected (Mahoney, 1977; Nickerson, 1998). Indeed, as we note in our discussion of publication practices and peer review in Chapter 3, negative findings and refutation of hypotheses can often be given as a reason to refuse to publish an article. In the Duhem-Quine thesis, it was argued that the inter-connectedness of models, theories and statements means it is difficult to test and falsify them in isolation in any event (Weber, 2009). Failure of one theory might be addressed by adjusting other parts of the system. The fallibility of our senses implied it might not be rational to simply let some tests refute well-established theories, at least without very good reason. The complexity and interconnectedness of science models, theories, statements and scientific language and concepts was also underlined by a number of Continental (i.e. European and non-analytic) philosophers through the twentieth century. Indeed, statements and theories were seen to draw their meanings from relationships with other theories and statements, and worked together to structure and interpret notions of

    Enjoying the preview?
    Page 1 of 1