BPRECISE : Benchmarking PREservation in a Common InfraStructurE | Remco van Veenendaal | June 2017

ioi_ab's bookmarks 2022-08-12

Summary:

"...Imagine that we develop an open, extensible, standards-based research infrastructure for objective preservation tool testing, benchmarking and establishing community best practices. (The acronym in the title is a first attempt – I couldn’t find the words to form BPRACTICAL. Sorry Jon.) It will stimulate cooperation and help answer research questions like: given a certain set of files, which tools are best able to identify, validate, extract metadata from and/or convert that set, according to your specific needs (profile, policy)? How does this new tool I found/developed compare against existing tools? What is the best suited pipeline of identification, validation, metadata extraction and conversion tools for a certain data set? We could include emulation and other DP tasks by providing a template infrastructure ‘silo’ (see below) for additional tool categories. Moreover, by researching what we have, we will have an opportunity to find where there are gaps in the preservation ecosystem, or where there is room for improvement, and form priorities for new initiatives. To achieve this, one of the things we need is cooperate on a standardised way of starting and sending input to tools, measuring their performance, and storing, comparing and visualising the results. This requires a community standardisation effort. No, we don’t have to redesign the tools themselves, but we do have to develop standardised APIs or e.g. WSDL documents, define some XML output collection format and implement comparison and visualisation solutions. Good news The good news is that we already have tools, services like PRONOM and Wikidata, and test corpora (although there’s no data like more data). We have the experience of proposing and executing projects like PLANETS, SCAPE, E-ARK and BenchmarkDP, with e.g. PREFORMA and PERICLES still running. We can build services, workflow systems and (commercial) repositories that use the tools. We also have tool sets like FITS, that harmonise tool output to FITS-XML, and tools like c3po that can gather, profile and visualise FITS-XML. We know of EUDAT and the Research Data Alliance. Several OPF members are active in the field of digital humanities, where they have a distributed research infrastructure called CLARIN, with virtual workspaces where you can fiddle with and pipeline tools? All this knowledge and technology can be reused to build our DP research infrastructure...."

Link:

https://openpreservation.org/blogs/bprecise-benchmarking-preservation-in-a-common-infrastructure/

Updated:

08/12/2022, 07:20

From feeds:

[IOI] Open Infrastructure Tracking Project » ioi_ab's bookmarks

Tags:

preservation recommendations standards

Date tagged:

08/12/2022, 11:20

Date published:

08/12/2021, 07:20