Thursday, September 22, 2016

Q&A with CNI’s Clifford Lynch: Time to re-think the institutional repository?

(A print version of this interview is available here)

Seventeen years ago 25 people gathered in Santa Fe, New Mexico, to discuss ways in which the growing number of e-print servers and digital repositories could be made interoperable. 

As scholarly archives and repositories had begun to proliferate a number of issues had arisen. There was a concern, for instance, that archives would needlessly replicate each other’s content, and that users would have to learn multiple interfaces in order to use them. 
Photo courtesy Susan van Hengstum
It was therefore felt there was a need to develop tools and protocols that would allow repositories to copy content from each other, and to work in concert on a distributed basis.
 

With this aim in mind those attending the New Mexico event – dubbed the Santa Fe Convention for the Open Archives Initiative (OAI) – agreed to create the (somewhat wordy) Open Archives Initiative Protocol for Metadata Harvesting, or OAI-PMH for short.

Key to the OAI-PMH approach was the notion that data providers – the individual archives – would be given easy-to-implement mechanisms for making information about what they held in their archives externally available. This external availability would then enable third-party service providers to build higher levels of functionality by using the metadata harvesting protocol.

The repository model that the organisers of the Santa Fe meeting had very much in mind was the physics preprint server arXiv This had been created in 1991 by physicist Paul Ginsparg, who was one of the attendees of the New Mexico meeting. As a result, the early focus of the initiative was on increasing the speed with which research papers were shared, and it was therefore assumed that the emphasis would be on archiving papers that had yet to be published (i.e. preprints).
 

However, amongst the Santa Fe attendees were a number of open access advocates. They saw OAI-PMH as a way of aggregating content hosted in local – rather than central – archives. And they envisaged that the archived content would be papers that had already been published, rather than preprints. These local archives later came to be known as institutional repositories, or IRs.

In other words, the OA advocates present were committed to the concept of author self-archiving (aka green open access). The objective for them was to encourage universities to create their own repositories and then instruct their researchers to deposit in them copies of all the papers they published in subscription journals. 

As these repositories would be on the open internet outside any paywall the papers would be freely available to all. And the expectation was that OAI-PMH would allow the content from all these local repositories to be aggregated into a single searchable virtual archive of (eventually) all published research.

Given these different perspectives there was inevitably some tension around the OAI from the beginning. And as the open access movement took off, and IRs proliferated, a number of other groups emerged, each with their own ideas about what the role and target content of institutional repositories should be. The resulting confusion continues to plague the IR landscape.

Moreover, today we can see that the interoperability promised by OAI-PMH has not really materialised, few third-party service providers have emerged, and content duplication has not been avoided. And to the exasperation of green OA advocates, author self-archiving has remained a minority sport, with researchers reluctant to take on the task of depositing their papers in their institutional repository. Given this, some believe the IR now faces an existential threat. 

In light of the challenging, volatile, but inherently interesting situation that IRs now find themselves in I decided recently to contact a few of the Santa Fe attendees and put some questions to them. My first two approaches were unsuccessful, but I struck third-time lucky when Clifford Lynch, director of the Washington-based Coalition for Networked Information (CNI), agreed to answer my questions.

I am publishing the resultant Q&A today. This can be accessed in the pdf file here.

As is my custom, I have prefaced the interview with a long introduction. However, those who only wish to read the Q&A need simply click on the link at the head of the file and go directly to it. 

12 comments:

Anonymous said...

Part of the problem is that the same few voices are chiming in year after year and the analysis is coming from the think tanks and not enough from those on the ground. If you were to ask IR managers what the barriers are, they might list a finite set of obstacles that could be overcome with some more resources and some clear-headed community-wide coordination. A few include: pushing back against the stranglehold that the publishers' claim over copyright, permissions, and licensing of scholarly work; largely mediating deposit; and hiring the right people (and enough of them) to do the work in the IR, i.e. those who understand publishing production. What is lacking in the discussion now is what we have to lose if we just throw up our hands and give up on the IR experiment. Allowing the scholarly communication lifecycle to be driven by solely by hyper-monetized means does a disservice to authors and to readers. Have we really come to the point of saying: Oh, Elsevier and a few others do it all so well, why should we even bother? They don't really do it very well, they just do it financially successfully. Our values may be skewing toward hyper-monetization of everything we do (IRs, libraries, university presses, all together), but I suspect that we will rue the day when we let go of our small, individualized contributions to the publishing stream. We are seeing the Whole Foods/Walmartization of scholarly communication.

Mike Taylor said...

(Note: I have not read the interview yet, just the blog-post.)

I too have my reservations about institutional repositories; but now seems a very strange time to be p[redicting their downfall, what with all the new institutional mandates and climbing compliance rates. Five years ago it would have made more sense. (And in five more years, it might again.)

Stevan Harnad said...

Just for the record: I have never said or thought that "the only purpose of the IR is to provide a platform on which researchers can post copies of the papers they have published in subscription journals, thereby freeing them from the “subscription firewall”.".

I said (more often than I would care to recall) that "the primary purpose of the IR is to... (etc.).

On the contrary, I myself proposed several secondary purposes, including impact metrics and research evaluation (and indeed predicted and preached that once Green OA prevailed, it would force journals to downsize to Fair-Gold OA, with all access-provision and archiving offloaded onto the worldwide network of Green OA IRs).

My argument against confusing IRs with (peer-reviewed) research publishers was peer review itself, which is not the métier either of librarians or IR-managers.

(My disagreements with Cliff go way back, too, but I have no interest in disinterring them.)

Erstwhile Archivangelist

T Scott said...

The quote of mine that you include in the intro was made in the context of a talk I gave at the NASIG meeting in June, titled "Dialectic: The Aims of Institutional Repositories". The title is a riff off a comment Lynch made in the forward to the recently published book "Making Institutional Repositories Work." In that talk, I track the impact of the contrasting views of IRs as given in the Lynch and Crow papers that you mention, to come up with some recommendations about how to refocus what the role of IRs might usefully be. I've linked to the video of the talk, as well as the transcript and slides in a blog post here: http://tscott.typepad.com/tsp/2016/09/dialectic-the-future-of-institutional-repositories.html

Tony Ross-Hellauer said...

The calls for a fundamental rethink of repositories is already being answered! See the ongoing work of the COAR next-generation repositories working group: https://www.coar-repositories.org/activities/advocacy-leadership/working-group-next-generation-repositories/

Also, OpenAIRE's long-term future a sustainable infrastructure for open science in Europe and beyond is increasingly secure. We'll be establishing ourselves as a legal entity within the next 6-12 months.

Richard Poynder said...

Thank you for this Tony. The vision and priorities page you point to does not strike me as a fundamental rethink of the institutional repository, but essentially more of the same.

I did interview COAR's Kathleen Shearer in 2014 (here). In doing so, I did not get a sense that COAR has taken a leadership role in rethinking the IR, but rather is trying to keep up with what is happening in practice. Consequently, Shearer seemed a little reluctant to define what an IR is, and exactly what role it should play, which should perhaps have been a starting point.

Shearer said "[I]n practice, repository services and infrastructures are diverse and there is a lot of overlap with other systems. Perhaps most significantly, practices and technologies are changing quickly, making it a challenge to concretely define their services. My feeling is that we need to be flexible in the way we conceptualize repositories."

As I see it, what new initiatives are emerging are coming from commercial publishers, although it is true that these are generally based on projects originally started by the research community. What I guess is key is that publishers have the money to turn ideas into reality (although of course this money did come from the research community in the first place!)

That said, I applaud the efforts of organisations like OpenAIRE and COAR and wish them every success. I also wish you good luck in achieving sustainability for OpenAIRE.

David B. Lowe said...

As for the trajectory of IRs, the mission-critical reason that IRs will be with us for the foreseeable future in higher education is not related to OA publishing as such, but instead to ETDs (electronic theses and dissertations). ETD workflows have many moving parts that require approval and buy-in along a chain of scholarly and bureaucratic offices, so once they are agreed upon and put in place, they achieve an instant inertia of their own, having replaced the paper paths that led to them. Documentation is the coin of the realm, but a return to paper is highly unlikely, leaving the IR as a pretty stable bet. ETDs are an anchoring pillar since they enable the imprimatur of every institution of higher learning, namely the terminal degrees conferred. It's the very reason we exist.

As for the trajectory of OA, the Max Planck Society plans are welcome news indeed and I look forward to hearing more about them, but for an operating model of re-engineering the funding arrangement from traditional publishing in one research community to OA at the journal level, we may turn to the related SCOAP3 project, which dates from 2007 and which presumably will serve as the foundational case study for the Max Planck effort. SCOAP3 was recently renewed for another 3-year phase that includes 8 journals. My hope is that, as a next step, we may begin to convince professional society publishers that the same sort of model can work for them.

Anonymous said...

As a publishing researcher I can second the comment by Richard. All this is not really offering a new way and more like reacting to the flow. Maybe that has to do with the kind of people working on it, the IR crowd is usually coming from the library field and their job is not to be inventive but to archive and keep stuff save. No offense meant, quite the contrary.

What bugs me is that platforms that are really on to something like ScienceOpen or ResearchGate are either in very close cooperation with publishers or with the advertisement industry. Both are not healthy partners for this topic, to say it decently.

I would love to see librarians take a more active role here because these are people I trust.

Unknown said...

(part 1)

“The reports of our death have been greatly exaggerated” (to paraphrase Mark Twain)

Although I agree with some of what Richard Poynder writes in the introduction to his recent interview with Cliff Lynch published on September 22, 2016, I do take exception to a number of the assertions he makes about the current state of IRs, especially his comments that green OA has failed (although this is clearly what the publishers would have us believe).

It is true that repositories have not yet completely fulfilled their potential, and there are efforts to shift the transition to open access through APC-based gold OA. However, this is a critical time for IRs. The global network is now at a point where we have an international mechanism to communicate with each other (COAR) and we are consolidating around a common vision and strategy for repositories.

In the last 3 months I have been traveling extensively in Europe, Latin America and China. All of these regions are investing in repository infrastructure to support open access, are working actively to improve interoperability across regions, and are establishing regional and/or national networks for repositories. In this respect, the United States is an outlier, since it has yet to leverage the strategic value of its institutional repositories through developing a national network. I hope this will change in the near future.

As Poynder alludes to in his introduction, highly centralized systems are far easier to launch, nurture and promote, however, there are significant benefits to a distributed system. It is much less vulnerable to buy-out, manipulation, or failure. Furthermore, a global network, managed collectively by the university and research community around the world, can be more attuned to local values, regional issues and a variety of perspectives. Repositories do have the potential to change scholarly communication, but there is some urgency that we start to build greater momentum now.

Recognizing the current challenges and opportunities for repositories, COAR launched a working group in April 2016 to identify priority functionalities for the next generation of repositories. In this activity, our vision is clearly articulated,

"To position distributed repositories as the foundation of a globally networked infrastructure for scholarly communication that is collectively managed by the scholarly community. The resulting global repository network should have the potential to help transform the scholarly communication system by emphasizing the benefits of collective, open and distributed management, open content, uniform behaviors, real-time dissemination, and collective innovation.”

Ultimately, what we are promoting is a conceptual model, not a technology. Technologies will and must change over time, including repository technologies. We are calling for the scholarly community to take back control of knowledge production process via a distributed network based at scholarly institutions around the world.
The aim of our next generation repositories working group is to better integrate repositories into the research process and make repositories truly ‘of the web, not just on the web’. Once we do that, we can support the creation of better, more sophisticated value added services.

Unknown said...

(part 2)

In his comments, Poynder also talks about the lack of full text content in repositories and cites one example, the University of Florida, which is working with Elsevier to add metadata records. However, one repository does not make a trend and COAR does not support this type of model. The vast majority of repositories focus on collecting full text content and the primary raison d’etre of repositories has always been and remains to provide access to full text articles, and other valuable research outputs, so they can be re-used and maximize the value and impact of research.

Poynder also mis-characterizes many of the centralized services aggregating repository content saying they “appear (like SSRN) to be operated by for-profit concerns”. On the contrary, there are numerous examples of not-for-profit aggregators including BASE, CORE, SemanticScholar, CiteSeerX, OpenAIRE, LA Referencia and SHARE (I could go on). These services index and provide access to a large set of articles, while also, in some cases, keeping a copy of the content.

And finally, Poynder’s comments about the current protocol used for interoperability, OAI-PMH, are somewhat misleading. OAI-PMH was a child of its time (1999) and was pretty good at what it was supposed to do at the time. However, it is out of date and we need a new approach; the OAI has proposed ResourceSync, based on Sitemaps, for discovery and synchronization of repository resources. A major outcome for the COAR Next Generation Repositories Working Group will be recommendations about new standards for repository interoperability.

And so, there is an African proverb that I often quote in my presentations about the future of repositories, ‘If you want to go fast, go alone. If you want to go far, go together’. Indeed, it has taken longer than we had anticipated to coalesce around a common vision in a distributed, global environment, but we are now well positioned to offer a viable alternative for an open and community led scholarly communication system.

Kathleen Shearer, Executive Director, COAR

Unknown said...

As an Open Access (OA) advocate and (disclaimer) someone who works for a repositories’ aggregation service (CORE https://core.ac.uk/ - a non-profit service that caches the aggregated content and maintains a fairly large collection) your introduction described a rather too gloomy picture for the purpose of repositories and their future. It is agreed that OAI-PMH has disadvantages, but it has served the field well for quite some time now. Having said that, I am very much looking forward to see COAR’s next-generation repositories working group conclusions. In addition, to me it is dreadful to consider that Gold OA is the future, especially for commercial publishers; it is an expensive route to OA - for some commercial publishers it is even too expensive - and asking from taxpayers to sustain it for a long period of time is not to their benefit. We have to accept that commercial companies/publishers will get into the OA arena, acquire a small amount of OA products, like SSRN, and perhaps shift their OA character.

Nonetheless, the beauty of the repositories, especially the institutional, lies within the fact that they are growing within academic institutions and can be used as live archives for the institution, they can host an institution’s “fruits”; from research papers to courses’ syllabus and from organisational bureaucratic documents to outdated webpages. They can serve as a portfolio to demonstrate a researcher’s work, as a research impact tool for the university, as a mean to text mine content and as a tool where OA content can be discovered from everyone around the world for free. It is my strong belief that we don’t need to abandon repositories, on the contrary, we have work harder to improve their functionalities based on the current needs.

Richard Poynder said...

Many thanks to those who posted the above comments. I have responded to them in a new post here.