Thursday, August 26, 2010

Open Data: The Panton Discussions

If you are interested in the Open Data (OD) movement but unclear about the issues, or what scientists can do to support the movement, what better way of finding out than by talking to leading OD advocates Peter Murray-Rust of the University of Cambridge and Jordan Hatcher of Open Data Commons.

That was what I did last Tuesday as part of a new initiative called the Panton Discussions. The first in a planned series, the event lasted around two hours and took place in the Panton Arms in Cambridge.

Below is a sample of the kind of questions discussed:

* What is Open Data and why is the OD movement important? What is the problem it aims to fix?

* Amongst the OD tools available there is the Public Domain Dedication and Licence (PDDL), a process of Public Domain Dedication and Certification (PDDC), and Creative Commons Zero (CC0). What are these tools, how do they work, and how do they differ?

* Likewise, there is the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition. How do these differ? Why do we need two similar initiatives?

* More recently we have also seen the introduction of The Panton Principles? What do this initiative provide that was not available before?

* Where does Open Data fit with Open Access (OA)?

* Where does Open Science fit in?

* What about Open Notebook Science (ONS)? Where does OD fit with ONS?

* How should scientists go about making their data open? What pitfalls do they need to avoid?

Help sought

Peter hopes to crowdsource the creation of a transcript of the discussion. Jamaica Jones and Graham Steel have both kindly offered to help, but more volunteers would make the task easier, and quicker. Peter can be contacted by email here.

Thursday, August 12, 2010

Preserving the Scholarly Record: Interview with digital preservation specialist Neil Beagrie

One of the many challenges of our increasingly digital world is that of establishing effective ways of preserving digital information — which is far more fragile than printed material. What are the implications of this for the scholarly record, and where does Open Access (OA) fit into the picture?

In a 1999 report for the Council on Library and Information Resources Jeff Rothenberg , a senior research scientist at the RAND Corporation, pointed out that while we were generating more and more digital content each year no one really knew how to preserve it effectively. If we didn't find a way of doing it soon, he warned, "our increasingly digital heritage is in grave risk of being lost."

In launching the UK Web Archive earlier this year British Library chief executive Dame Lynne Brindley estimated that the Library would only be able to archive about one per cent of the 8.8 million .co.uk domains expected to exist by 2011. The remaining 99 per cent, she said, was in danger of falling into a "digital black hole".

In the context of Rothenberg's earlier warning Brindley's comment might seem to suggest that very little has changed in the past eleven years so far as digital preservation is concerned. But that would be the wrong conclusion to reach. Rather, it draws attention to the fact that digital preservation is not just a technical issue.

As it happens, many of the technical issues associated with digital preservation have now been resolved. In their place, however, a bunch of other issues have emerged — including legal, organisational, social, and financial issues.

What concerns Brindley, for instance, are not the technical issues associated with archiving the Web, but the undesirable barrier that today's copyright laws imposes on anyone trying to do so. Since copyright requires obtaining permission from the owner of every web site before archiving it the task is time consuming, expensive, and quite often impossible.

Clearly there are implications here for the research community.

State of play

So what is the current state of play so far as preserving the scholarly record is concerned?

First we need to distinguish between two different categories of digital information. There is retro-digitised material, which in the research context consists mainly of data created as a result of research libraries digitising their print holdings — journals, books, theses, special collections etc. Then there is born-digital material — which includes ejournals, eBooks and raw data produced during the research process.

It is worth noting that the quantities of raw data generated by Big Science can be mind-boggling. In the case of the Large Hadron Collider, for instance, CERN expects that it will generate 27 terabytes of raw data every day when it is running at full throttle — plus 10 terabytes of "event summary data".

To cater for this deluge CERN has created a bespoke computing grid called the WLCG. While the costs associated with the WLCG will be shared amongst 130 computing centres around the world, the personnel and materials costs to CERN alone reached 100 million Euros in 2008, and CERN's budget for the grid going forward is 14 million Euros per annum.

Of course, these figures by no means represent preservation costs alone, and they are not typical — but they provide some perspective on the kind of challenges the science community faces.

So how is the research community coping with the challenges? With the aim of finding out the Alliance of German Science Organisations recently commissioned a report (which was published in February).

What were the main findings?

So far as retro-digitisation is concerned, the Report points out that funding is limited and "the quantity of non-digitised material is huge". Even so, it adds, there is general concern about "the sustainability of hosting" the data that has been generated from digitisation. This is a particular concern for small and medium-sized institutions.

With regard to born-digital material the Report found that the largest gaps are currently in the "provision for perpetual access for e-journals".

The situation with regard to eBooks and databases is less clear since, as the Report points out, "experience in digital preservation with these content types is currently more limited."

While the Report focused on the situation in Germany the international nature of today's research environment suggests the situation will be similar in all developed nations (Although Germany does have two unique mass digitisation centres).

We should not be surprised that the German Report found the largest gap to be in the preservation of journal content. As we shall see, the migration from a print to a digital environment has disrupted traditional practices and responsibilities, and led to some uncertainty about who is ultimately responsible for preserving the scholarly record.

We should also point out that one important area that the German Report did not look at is the growing trend for scholars to make use of blogs, wikis, open notebooks and other Web 2.0 applications. Should this data not be preserved? If it should, whose responsibility is it to do it, and what peculiar challenges does it raise? As we have seen, for instance, preserving web content is not a technical issue alone. Amongst other things there are copyright issues. (Although as the research community starts to use more liberal copyright licences these difficulties should ease somewhat).

Another recently published report did look at the issue of web-created scholarly content, but reached no firm conclusion. Produced by the Blue Ribbon Task Force, this Report concluded: "[I]n scholarly discourse there is a clear community consensus about the value of e-journals over time. There is much less clarity about the long-term value of emerging forms of scholarly communication such as blogs, products of collaborative workspaces, digital lab books, and grey literature (at least in those fields that do not use preprints). Demand may be hypothesised — social networking sites should be preserved for future generations — but that does not tell us what to do or why."

Open Access

One issue likely to be of interest to OA advocates is whether institutional repositories should be expected to play a part in preserving research output.

Evidence cited by the German Report suggests that repositories are not generally viewed as preservation tools. It pointed out, for instance, that the Dutch National Library's KB e-Depot currently archives the content hosted in 13 institutional repositories in the Netherlands.

The Blue Ribbon Report, by contrast, appears to believe that repositories do have a long-term archiving role. It suggests, for instance, that self-archiving mandates should always be accompanied by a "preservation mandate".

The Report goes on to suggest that the inevitable additional costs associated with repository preservation should be taken out of the institution's Gold OA fund (where such a fund exists).

##

If you wish to read the rest of this introduction, and the interview with preservation specialist Neil Beagrie, please click on the link below. I am publishing it under a Creative Commons licence, so you are free to copy and distribute it as you wish, so long as you credit me as the author, do not alter or transform the text, and do not use it for any commercial purpose.

If you would like to republish the interview on a commercial basis, or have any comments on it, please email me at richard.poynder@btinternet.com.

To read the rest of the introduction and the interview with Neal Beagrie (as a PDF file) click here.

Monday, August 02, 2010

University of Ottawa Press launches OA book initiative

Last week the University of Ottawa Press (UOP) announced a new open access (OA) book initiative. This, it says, will provide "free and unrestricted access to scholarly research". But what does it mean in practice? And what issues arise?

UOP's new initiative is part of a wider open access strategy first unveiled last December. Initially it will consist of making 36 French-language and English-language in-print titles in the arts, humanities and social sciences freely available online via the University of Ottawa's institutional repository (IR), uO Research.

The UOP news is of interest for a couple of reasons.

First, until relatively recently open access was seen as an issue of relevance only to scholarly journals, not books, and for the sciences rather than the humanities.

It is only in the last few years, for instance, that new OA publishers like Bloomsbury Academic, Open Humanities Press (OHP), and re.press have appeared on the scene; and only recently that traditional publishers and university presses have started to introduce OA book initiatives — e.g. The University of Michigan Press' digitalculturebooks project and Penn State University Press' Romance Studies.

Second, unlike Bloomsbury Academic, OHP, re.press, and the University of Michigan, UOP has not released its OA books under creative commons licences, but simply placed the text in a PDF file with the original "all rights reserved" notice still attached to it. (E.g. in this 24 MB file).

UOP's move suggests that traditional presses can no longer afford to ignore the rising OA tide — despite the fact that there is still no tried and trusted business model for OA books. It also demonstrates that there is as yet no consensus on how best to go about it, or what to do about copyright.

The latter issue could prove a source of some confusion for readers of UOP's books.

Libre/Gratis

For instance, anyone who read UOP's announcement that it is providing its books on a "free and unrestricted access" basis who then downloaded one of the books would surely scratch their head when they saw the all rights reserved notice attached to it.

While they could be confident that they were free to read the book, they might wonder whether they were permitted to forward it to a colleague. They might also wonder whether they were free to print it, whether they could cut and paste text from it, or whether they were permitted to create derivative versions.

Free and unrestricted access would seem to imply they could do all those things. All rights reserved suggests quite the opposite — indeed, a copyright lawyer might argue that even downloading a book infringes an all-rights licence.

It does not help that there appears to be no terms and conditions notice on the UOP web site clarifying what readers can and cannot do with the books — as there is, for instance, on PSU's Romance Studies site.

In fact, UOP is only granting permission for people to read, download and print the books.

But it need not be that confusing. OA comes in different flavours, and what UOP is offering is what OA advocates call Gratis OA (that is, it has removed the price barriers); it is not offering Libre OA (which would require removing permission barriers too — i.e. relaxing the copyright restrictions).

Gratis OA is a perfectly legitimate way of providing OA, so long as you make it clear that that is what you are offering. Some, however, might argue that there is a contradiction between what UOP says it is offering and the true state of affairs — that the publisher is claiming to offer something that it is not.

"While there's nothing deceptive in using the term 'OA' for work that is Gratis OA, there is something deceptive in using language suggesting Libre OA for work that is Gratis OA," the de facto leader of the OA movement Peter Suber commented when I asked for his views. Stressing that he has not yet looked at the details of the UOP initiative, Suber added: "The phrase 'unrestricted access' suggests Libre OA."

There is no reason to doubt UOP's motives: It believes that using the term free and unrestricted is accurate given that the OA books do not come with DRM, and "any user with a computer can access the books, download them and read them freely".

Nevertheless, it does seem to be sending out a confusing message. And when putting content online publishers should really aim to be as precise as possible in the terms they use, and the claims they make — particularly in light of the many copyright controversies that have arisen in connection with digital content. UOP has surely failed to do this.

Suber would perhaps agree. "I realise that most people aren't familiar with the Gratis/Libre distinction", he emailed me. "But at the same time, people who do understand the distinction should use it, and could help everyone by describing the Ottawa position accurately. If it's Gratis and not Libre (which I haven't had time to check), then it should be described as Gratis."

We might however add that some OA advocates believe Gratis OA to be an inadequate way of making research available online. And it is noteworthy here that in his definition of OA, Suber assumes Libre OA to be the default. Open access literature, he states, is "digital, online, free of charge, and free of most copyright and licensing restrictions."

Some would doubtless claim that worrying about such matters is a non-issue. After all, they might say, aside from reading it, what more could you possible want to do with a book? So why does it matter whether you make it available online as Gratis OA or Libre OA?

But a few years ago exactly this issue led to some heated debates in connection with making scholarly papers OA, with many insisting that worrying about such matters was a complete irrelevancy — until Peter Murray-Rust pointed out that in a Web 2.0 environment there are very good reasons for providing re-use rights to scholarly work.

Indeed, it was as a result of that long-running debate that the movement eventually hammered out the Gratis/Libre distinction.

It is, of course early days for book publishers, who are still in experimental mode vis-à-vis OA. But they would surely benefit from reviewing some of the debates that have taken place in connection with providing OA to refereed papers.

In order to get UOP's views on these matters I contacted the publisher's eBook Coordinator Rebecca Ross, who kindly agreed to an email interview. Below are her answers.

The good news is that UOP does hope to adopt creative commons licences in the future!

clip_image002[4]

Rebecca Ross, UOP eBook Coordinator

RP: I understand that UOP has made 36 of its books available on an OA basis and these can be accessed via the University's institutional repository. Is there a list of these books you can point me to?

RR: You can browse the books by title here.

RP: Where can people obtain more information about UOP, and its activities?

RR: Unfortunately the UOP website is in a state of transition with a new website launching very soon. To give you a bit of background about UOP, we are Canada's oldest French-language university press and the only fully bilingual (English-French) university press in North America.

RP: How many books does UOP publish each year, and what kinds of books does it publish?

RR: UOP was founded in 1936 and has published over 800 titles. We currently publish 25-30 books annually in four main subject areas: social and cultural studies, translation and interpretation, literature and the arts, and political and international affairs.

RP: How did you choose which books to make OA?

RR: The books were chosen based on input from Michael O'Hearn (UOP Director) Eric Nelson (Acquisitions Editor), Marie Clausén (Managing Editor), Jessica Clark (Marketing Manager) and myself as a collaborative process to determine a collection of books that are diverse in terms of language, date published, and subject matter.

This will help UOP best determine the books that work effectively as open access. For example, we want to test questions like: does an 800 page collected work about social policy work better as OA than a monograph about Canadian literature?

We also wanted to test if an electronic open access version gives a second life to the print edition or generates interest in a second edition. In this sense, open access is also a marketing tool for us to reach a wider audience than traditional marketing.

In our decision process we also made sure to include books whose authors would be amiable to licensing their work open access (we have several authors who are very excited by having their work as open access!), and to include topics that are relevant, timely and even timeless (for example a reappraisal of Stephen Leacock's work).

Free and unrestricted access

RP: The UOP press release says that the books are being made available on a "free and unrestricted access" basis. What does that mean?

RR: All of the books included in the open access collection are protected by copyright. UOP does not support DRM or restrictive access to our eBooks, whether they are part of the open access collection or for sale.

RP: The books are not being made available under Creative Commons licences are they?

RR: The books are not under Creative Commons. The authors granted a non-exclusive distribution license to the Press for providing access via uO Research.

RP: Many OA advocates might argue that OA implies using creative commons licensing. You don't agree?

RR: Where possible, UOP is very interested in moving forward with creative commons licensing. We're learning from our colleagues at Athabasca University Press, who publish, where possible, using a Creative Commons license: (Attribution-Noncommercial-No Derivative Works 2.5 Canada).

The decision to use the current licensing model was made to best align UOP with the University of Ottawa Library and the University’s institutional repository uO Research.

RP: I do not think it says anywhere on your site exactly what users can do with the books. Anyone downloading the files will see a traditional "all rights reserved" notice attached. The UOP announcement, however, says that the works are available on a free and unrestricted access basis. Readers might therefore wonder what exactly they are permitted to do with the text — whether, for instance, they can print them out, whether they can freely copy and distribute them, whether they can cut and paste text from them, and whether they can they create derivative versions. What exactly can they do?

RR: So far as the open access collection is concerned users can read them, download them and print them.

Without DRM we are unable to control what exactly users do with the books, but as I said, they are protected by copyright.

In the end, we are pleased that users are accessing our content and our authors are pleased that their research is reaching a wider audience.

RP: Would you agree that you are offering the books Gratis OA rather than Libre OA? That is, you have removed the price barriers, but not the permission barriers?

RR: Describing UOP's open access collection (as it is now) as Gratis OA rather than Libre OA is both fair and accurate. We've removed the price barrier as a first step; the next step will be working with our authors and editors to remove the permission barriers.

RP: Do you not think that saying the books are being offered on a free and unrestricted access basis might be a slight overstatement. Does not "unrestricted access" imply the removal of both price and permission barriers?

RR: When compared to print books offered at sometimes very high and restrictive prices and made available only in certain parts of the world, I don't think the description of "free and unrestricted access" is an overstatement.

UOP's open access books are free, and their access is unrestricted, any user with a computer can access the books, download them and read them freely. At this stage UOP open access books are protected by copyright: this is partly for us and partly for our authors.

Once UOP's open access program has been fully defined and the level of support we will receive from our host institution is determined, we will be in a better position to remove permission barriers.

In conceptualising UOP's open access program the first objective was to provide a wider reach for our authors and books. Most of our authors write, not to make a living, but to further scholarship and research in their fields; allowing their work to be distributed for free is an excellent way to do so.

As I said, the next step for UOP's open access program will be working in collaboration with our authors and the University of Ottawa Library to remove the remaining permission barriers. We are looking into Creative Commons and defining what it means to offer UOP books as open access.

Right now we are very excited to be involved with open access and looking forward to the next steps of the project.

RP: Would you say you were offering the books as Green OA or Gold OA, or do such distinctions only make sense in the context of journals?

RR: As it stands right now, these distinctions seem appropriate only in the context of journals.

If I had to make the distinction I would say we fall into Green OA because of our participation in uO Research the University of Ottawa's institutional repository.

When preparing and researching for the open access program we found that much of the literature is about open access for journals and many university presses, both in Canada and the United States, are just starting to think about how open access can work for books.

Still at an early stage

RP: Does UOP believe that OA is an inevitable development for scholarly monographs?

RR: The University of Ottawa announced its open access program in late 2009. This includes support to UOP in publishing a collection of OA books. Although there is much research surrounding open access in academic journals, open access book publishing is still at an early stage.

UOP launched this open access collection to determine the effects of open access on our publishing program, to eventually determine what kind of support we require to become an open access press.

It would be difficult to say with certainty that OA in now an inevitable way for scholarly monographs to be published in the future but it does appear that way and UOP is interested in testing and researching this notion.

At this stage, it is UOP's assumption that open access will only suit certain books, for example we are not including any textbooks in the open access collection. However, this assumption is based on previously published books and going forward open access will be an important aspect of UOP's acquisition procedure and publishing program.

RP: When I interviewed Northwestern University Dean Sarah Pritchard about Northwestern University Press earlier this year I suggested that the model many advocate see for OA books is that of making the text freely available online but selling the print version. Pritchard replied that she saw that as a very logical model, and one that she envisages NUP adopting before it moves to a totally OA environment. Is that your view too?

RR: Absolutely. We are in the business of publishing books both in print and electronically. At the moment we are borrowing models and ideas from many of our colleagues within Canada and the US, including Athabasca University Press and the International Development Research Centre.

Sarah Pritchard brings forward many important issues that are relevant to us at UOP. I do believe that electronic versions of print books will drive print sales — that assumption is the backbone of our open access program.

The wider the distribution an author or a publisher has, the better the chance for course adoptions, sales and even translation rights. The model UOP has adopted is a hybrid model: we are doing a bit of everything right now, and we will continue experimenting to see what fits best.

RP: Does UOP pay its way today, or is it subsidised by the University? Can you see OA affecting the current state of affairs?

RR: UOP is subsidised by the University. Our publications are too specialised to make the best-seller list; alas, we will never become a cash cow for our home institution! This is a bit of an experiment.

We don't know if OA will have a negative effect on the sales of the print version or if it will encourage people to buy the print version, especially in the case of single-authored volumes containing long and complex arguments — books like these are likely easier to read in the traditional paper format than on a computer screen.

In either case the level of support the University provides its Press will change accordingly.