Thursday, January 24, 2008

Session: NorStore

NorStore - A National Distributed Storage Infrastructure for Research & Education
By Jan Meijer

Norway is a mountainous country, rocks. Hard to dig to lay cable. 4 High Performance Computing Locations.

Why? separate long term storage from HPC. Can lose information when changing computers. Need a strategy, policy, and practice regarding the creation, management, and long term storage of data.

Trends: Size of data has increased to terabyte scale. IT tools evolve rapidly and the flexibility in using these tools put the very data they create and transform at risk. Survival of digital scientific information depends on a hierarchy of constantly shifting technologies.

Reasons to keep data:
- retention of unique observational information which is impossible to recreate
- retention of expensively generated data which is cheaper to mainteain than to recreate.
- reuse data for new or future research purpose
- validate and account for publicly funded research
- for compliance with legal requirements
- for educational and teacher purposes.

Objective: establish and maintain a broad and sustainable infrastructure for the curation, archiving,... [?]

-operate storage resources and peripheral equipment
-provide support to researchers needing storage capacity, digital repositories, and curation services.
-promote a set of standard services and establish best practices and polices that aim to improve the reuse and reusability of scientific data
-provide easy, secure, and transparent access to distributes storages resources, provide access to larger aggregate storage.

Non-technical issues: security, confidentiality and continued privacy, ownership, assured provenance, authenticity and integrity. How to guarantee the quality of the primary data and associated metadata? One solution to some of this is to only handle data that has open-access, so ownership, etc goes away.

Initial consortium: Universities in Bergen, Trondheim (NTNU), Oslo, Tromso, UNINETT (NREN), UNINETT Sigma

2007: initial specs, choice of technologies, levels of curation, roadmap for 2008, and accumulation of experience
2008: investment in hardware for initial infrastructure.

Initially two locations with 600TB each.

No comments: