Skip to Content

Instrukcja korzystania z Biblioteki

Serwisy:

Ukryty Internet | Wyszukiwarki specjalistyczne tekstów i źródeł naukowych | Translatory online | Encyklopedie i słowniki online

Translator:

Kosmos
Astronomia Astrofizyka
Inne

Kultura
Sztuka dawna i współczesna, muzea i kolekcje

Metoda
Metodologia nauk, Matematyka, Filozofia, Miary i wagi, Pomiary

Materia
Substancje, reakcje, energia
Fizyka, chemia i inżynieria materiałowa

Człowiek
Antropologia kulturowa Socjologia Psychologia Zdrowie i medycyna

Wizje
Przewidywania Kosmologia Religie Ideologia Polityka

Ziemia
Geologia, geofizyka, geochemia, środowisko przyrodnicze

Życie
Biologia, biologia molekularna i genetyka

Cyberprzestrzeń
Technologia cyberprzestrzeni, cyberkultura, media i komunikacja

Działalność
Wiadomości | Gospodarka, biznes, zarządzanie, ekonomia

Technologie
Budownictwo, energetyka, transport, wytwarzanie, technologie informacyjne

Journal of Digital Information

The Digital Assets Repository (DAR) is an Institutional Repository developed at the Bibliotheca Alexandrina to manage the full lifecycle of a digital asset: its creation and ingestion, its metadata management, storage and archival in addition to the necessary mechanisms for publishing and dissemination. DAR was designed with a focus on integrating DAR with different sources of digital objects and metadata in addition to integration with applications built on top of the repository. As a modern repository, the system architecture demonstrates a modular design relying on components that are best of the breed, a flexible content model for digital objects based on current standards and heavily relying on RDF triples to define relations. In this paper we will demonstrate the building blocks of DAR as an example of a modern repository, discussing how the system addresses the challenges that face an institution in consolidating its assets and a focus on solving scalability issues.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5396 2014/08/14 - 22:24

As more institutions continue to work with large and diverse type of content for their digital repositories, there is an inherent need to evaluate, prototype, and implement user-friendly websites -regardless of the digital files' size, format, location or the content management system in use. This article aims to provide an overview of the need and current development of Document Viewers for digitized objects in DSpace repositories -includign a local viewer developed for an newspaper collection and four other viewers currently implemented in DSpace repositories. According to the DSpace Registry, 22% of institutions are currently storing "Images" in their repositories and 21% are using DSpace for non-traditional IR content such as: Image Repository, Subject Repository, Museum Cultural, or Learning Resources. The combination of current technologies such as Djatoka Image Server, IIPImage Server, DjVu Libre, and the Internet Archive BookReader, as well as the growing number of digital repositories hosting digitized content, suggests that the DSpace community will probably benefit with an "out-of-the-box" Document Viewer, especially one for large, high-resolution, and multi-page objects.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5600 2014/08/14 - 22:24

In his oft-quoted seminal paper ‘Institutional Repositories: Essential Infrastructure For Scholarship In The Digital Age’ Clifford Lynch (2003) described the Institutional Repository as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” This paper seeks instead to define the repository service at a more primitive level, without the specialism of being an ‘Institutional Repository’, and looks at how it can viewed as providing a service within appropriate boundaries, and what that could mean for the future development of repositories, our expectations of what repositories should be, and how they could fit into the set of services required to deliver an Institutional Repository service as describe by Lynch.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5872 2014/08/14 - 22:24

Chempound is a new generation repository architecture based on RDF, semantic dictionaries and linked data. It has been developed to hold any type of chemical object expressible in CML and is exemplified by crystallographic experiments and computational chemistry calculations. In both examples, the repository can hold >50k entries which can be searched by SPARQL endpoints and pre-indexing of key fields. The Chempound architecture is general and adaptable to other fields of data-rich science.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5873 2014/08/14 - 22:24

The Penn State University Libraries and Information Technology Services (ITS) collaborated on the development of Curation Architecture Prototype Services (CAPS), a web application for ingest and management of digital objects. CAPS is built atop a prototype service platform providing atomistic curation functions in order to address the current and emerging requirements in the Libraries and ITS for digital curation, defined as “... maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle” (Pennock, 2006)[7]. Additional key goals for CAPS were application of an agile-style methodology to the development process and an assessment of the resulting tool and stakeholders’ experience in the project. This article focuses in particular on the community-building aspects of CAPS, which emerged from a combination of agile-style approaches and our commitment to engage stakeholders actively throughout the process, from the construction of use cases, to decisions on metadata standards, to ingest and management functionalities of the tool. The ensuing community of curatorial practice effectively set the stage for the next iteration of CAPS, which will be devoted to planning and executing the development of a production-ready, enterprise-quality infrastructure to support publishing and curation services at Penn State.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5874 2014/08/14 - 22:24

In the realm of digital data, vendor-supplied cloud systems will still leave the user with responsibility for curation of digital data. Some of the very tasks users thought they were delegating to the cloud vendor may be a requirement for users after all. For example, cloud vendors most often require that users maintain archival copies. Beyond the better known vendor cloud model, we examine curation in two other models: inhouse clouds, and what we call "open" clouds—which are neither inhouse nor vendor. In open clouds, users come aboard as participants or partners—for example, by invitation. In open cloud systems users can develop their own software and data management, control access, and purchase their own hardware while running securely in the cloud environment. To do so will still require working within the rules of the cloud system, but in some open cloud systems those restrictions and limitations can be walked around easily with surprisingly little loss of freedom. It is in this context that REDDnet (Research and Education Data Depot network) is presented as the place where the Texas Tech University (TTU)) Libraries have been conducting research on long-term digital archival storage. The REDDnet network by year's end will be at 1.2 petabytes (PB) with an additional 1.4 PB for a related project (Compact Muon Soleniod Heavy Ion [CMS-HI]); additionally there are over 200 TB of tape storage. These numbers exclude any disk space which TTU will be purchasing during the year. National Science Foundation (NSF) funding covering REDDnet and CMS-HI was in excess of $850,000 with $850,000 earmarked toward REDDnet. In the terminology we used above, REDDnet is an open cloud system that invited TTU Libraries to participate. This means that we run software which fits the REDDnet structure. We are beginning to complete the final design of our system, and starting to move into the first stages of construction. And we have made a decision to move forward and purchase one-half petabyte of disk storage in the initial phase. The concerns, deliberations and testing are presented here along with our initial approach.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5875 2014/08/14 - 22:24

The UK JISC-funded Content Lifecycle Integration Framework (CLIF) project has explored the management of digital content throughout its lifecycle from creation through to preservation or disposal. Whilst many individual systems offer the capability of carrying out lifecycle stages to varying degrees, CLIF recognised that only by facilitating the movement of content between systems could the full lifecycle take advantage of systems specifically geared towards different stages of the digital lifecycle. The project has also placed the digital repository at the heart of this movement and has explored this through carrying out integrations between Fedora and Sakai, and Fedora and SharePoint. This article will describe these integrations in the context of lifecycle management and highlight the issues discovered in enabling the smooth movement of content as required.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5876 2014/08/14 - 22:24

The paper describes the investigations and outcomes of the JISC-funded Kindura project, which is piloting the use of hybrid cloud infrastructure to provide repository-focused services to researchers. The hybrid cloud services integrate external commercial cloud services with internal IT infrastructure, which has been adapted to provide cloud-like interfaces. The system provides services to manage and process research outputs, primarily focusing on research data. These services include both repository services, based on use of the Fedora Commons repository, as well as common services such as preservation operations that are provided by cloud compute services. Kindura is piloting the use of the DuraCloud2, open source software developed by DuraSpace, to provide a common interface to interact with cloud storage and compute providers. A storage broker integrates with DuraCloud to optimise the usage of available resources, taking into account such factors as cost, reliability, security and performance. The development is focused on the requirements of target groups of researchers.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5877 2014/08/14 - 22:24

Open data is becoming increasingly important in research. While individual researchers are slowlybecoming aware of the value, funding agencies are taking the lead by requiring data be made available, and also by requiring data management plans to ensure the data is available in a useable form. Some journals also require that data be made available. However, in most cases, “available upon request” is considered sufficient. We describe a number of historical examples of data use and discovery, then describe two current test cases at the University of New Mexico. The lessons learned suggest that an instituional data services program needs to not only facilitate fulfilling the mandates of granting agencies but to realize the true value of open data. Librarians and institutional archives should actively collaborate with their researchers. We should also work to find ways to make open data enhance a researchers career. In the long run, better quality data and metadata will result if researchers are engaged and willing participants in the dissemination of their data.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5878 2014/08/14 - 22:24

In 2008 the University of Hull, Stanford University and University of Virginia decided to collaborate with Fedora Commons (now DuraSpace) on the Hydra project. This project has sought to define and develop repository-enabled solutions for the management of multiple digital content management needs that are multi-purpose and multi-functional in such a way as to allow their use across multiple institutions. This article describes the evolution of Hydra as a project, but most importantly as a community that can sustain the outcomes from Hydra and develop them further. The data modelling and technical implementation are touched on in this context, and examples of the Hydra heads in development or production are highlighted. Finally, the benefits of working together, and having worked together, are explored as a key element in establishing a sustainable open source solution.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5879 2014/08/14 - 22:24

In this paper, we describe our recent work in using cloud computing to provision digital library services. We consider our original and current motivations, technical details of our implementation, the path we took, and our future work and lessons learned. We also compare our work with other digital library cloud efforts.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5881 2014/08/14 - 22:24

This paper describes an environment for the “sheer curation” of the experimental data of a group of researchers in the fields of biophysics and structural biology. The approach involves embedding data capture and interpretation within researchers' working practices, so that it is automatic and invisible to the researcher. The environment does not capture just the individual datasets generated by an experiment, but the entire workflow that represent the “story” of the experiment, including intermediate files and provenance metadata, so as to support the verification and reproduction of published results. As the curation environment is decoupled from the researchers’ processing environment, the provenance is inferred from a variety of domain-specific contextual information, using software that implements the knowledge and expertise of the researchers. We also present an approach to publishing the data files and their provenance according to linked data principles by using OAI-ORE (Open Archives Initiative Object Reuse and Exchange) and OPMV.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5883 2014/08/14 - 22:24

This paper describes the FISHNet project, which developed a repository environment for the curation and sharing of data relating to freshwater science, a discipline whose research community is distributed thinly across a variety of institutions, and usually works in relative isolation as individual researchers or within small groups. As in other “small sciences”, these datasets tend to be small and “hand-crafted”, created to address particular research questions rather than with a view to reuse, so they are rarely curated effectively, and the potential for sharing and reusing them is limited. The paper addresses a variety of issues and concerns raised by freshwater researchers as regards data sharing, describes our approach to developing a repository environment that addresses these concerns, and identifies the potential impact within the research community of the system.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5884 2014/08/14 - 22:24

The article describes the integrated adoption of Fedora Commons and MediaMosa for managing a digital repository. The integration was experimented along with the development of a cooperative project, Sapienza Digital Library (SDL). The functionalities of the two applications were exploited to built a weaving factory, useful for archiving, preserving and disseminating of multi-format and multi-protocol audio video contents, in different fruition contexts. The integration was unleashed by means of both repository-to-repository interaction, and mapping of video Content Model's disseminators to MediaMosa's Restful services. The outcomes of this integration will lead to a more flexible management of the dissemination services, as well as to economize the overproduction of different dissemination formats.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5911 2014/08/14 - 22:24

As academia in general, and research funders in particular, place ever greater importance on data as an output of research, so the value of good research data management practices becomes ever more apparent. In response to this, the Innovative Design and Manufacturing Research Centre (IdMRC) at the University of Bath, UK, with funding from the JISC, ran a project to draw up a data management planning regime. In carrying out this task, the ERIM (Engineering Research Information Management) Project devised a visual method of mapping out the data records produced in the course of research, along with the associations between them. This method, called Research Activity Information Development (RAID) Modelling, is based on the Unified Modelling Language (UML) for portability. It is offered to the wider research community as an intuitive way for researchers both to keep track of their own data and to communicate this understanding to others who may wish to validate the findings or re-use the data.

https://journals.tdl.org/jodi/index.php/jodi/article/view/5917 2014/08/14 - 22:24

The Digital Assets Repository (DAR) is an Institutional Repository developed at the Bibliotheca Alexandrina to manage the full lifecycle of a digital asset: its creation and ingestion, its metadata management, storage and archival in addition to the necessary mechanisms for publishing and dissemination. DAR was designed with a focus on integrating DAR with different sources of digital objects and metadata in addition to integration with applications built on top of the repository. As a modern repository, the system architecture demonstrates a modular design relying on components that are best of the breed, a flexible content model for digital objects based on current standards and heavily relying on RDF triples to define relations. In this paper we will demonstrate the building blocks of DAR as an example of a modern repository, discussing how the system addresses the challenges that face an institution in consolidating its assets and a focus on solving scalability issues.

http://journals.tdl.org/jodi/article/view/5396 2012/05/05 - 08:11

As more institutions continue to work with large and diverse type of content for their digital repositories, there is an inherent need to evaluate, prototype, and implement user-friendly websites -regardless of the digital files' size, format, location or the content management system in use. This article aims to provide an overview of the need and current development of Document Viewers for digitized objects in DSpace repositories -includign a local viewer developed for an newspaper collection and four other viewers currently implemented in DSpace repositories. According to the DSpace Registry, 22% of institutions are currently storing "Images" in their repositories and 21% are using DSpace for non-traditional IR content such as: Image Repository, Subject Repository, Museum Cultural, or Learning Resources. The combination of current technologies such as Djatoka Image Server, IIPImage Server, DjVu Libre, and the Internet Archive BookReader, as well as the growing number of digital repositories hosting digitized content, suggests that the DSpace community will probably benefit with an "out-of-the-box" Document Viewer, especially one for large, high-resolution, and multi-page objects.

http://journals.tdl.org/jodi/article/view/5600 2012/05/05 - 08:11

In his oft-quoted seminal paper ‘Institutional Repositories: Essential Infrastructure For Scholarship In The Digital Age’ Clifford Lynch (2003) described the Institutional Repository as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.” This paper seeks instead to define the repository service at a more primitive level, without the specialism of being an ‘Institutional Repository’, and looks at how it can viewed as providing a service within appropriate boundaries, and what that could mean for the future development of repositories, our expectations of what repositories should be, and how they could fit into the set of services required to deliver an Institutional Repository service as describe by Lynch.

http://journals.tdl.org/jodi/article/view/5872 2012/05/05 - 08:11

Chempound is a new generation repository architecture based on RDF, semantic dictionaries and linked data. It has been developed to hold any type of chemical object expressible in CML and is exemplified by crystallographic experiments and computational chemistry calculations. In both examples, the repository can hold >50k entries which can be searched by SPARQL endpoints and pre-indexing of key fields. The Chempound architecture is general and adaptable to other fields of data-rich science.

http://journals.tdl.org/jodi/article/view/5873 2012/05/05 - 08:11

The Penn State University Libraries and Information Technology Services (ITS) collaborated on the development of Curation Architecture Prototype Services (CAPS), a web application for ingest and management of digital objects. CAPS is built atop a prototype service platform providing atomistic curation functions in order to address the current and emerging requirements in the Libraries and ITS for digital curation, defined as “... maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle” (Pennock, 2006)[7]. Additional key goals for CAPS were application of an agile-style methodology to the development process and an assessment of the resulting tool and stakeholders’ experience in the project. This article focuses in particular on the community-building aspects of CAPS, which emerged from a combination of agile-style approaches and our commitment to engage stakeholders actively throughout the process, from the construction of use cases, to decisions on metadata standards, to ingest and management functionalities of the tool. The ensuing community of curatorial practice effectively set the stage for the next iteration of CAPS, which will be devoted to planning and executing the development of a production-ready, enterprise-quality infrastructure to support publishing and curation services at Penn State.

http://journals.tdl.org/jodi/article/view/5874 2012/05/05 - 08:11

In the realm of digital data, vendor-supplied cloud systems will still leave the user with responsibility for curation of digital data. Some of the very tasks users thought they were delegating to the cloud vendor may be a requirement for users after all. For example, cloud vendors most often require that users maintain archival copies. Beyond the better known vendor cloud model, we examine curation in two other models: inhouse clouds, and what we call "open" clouds—which are neither inhouse nor vendor. In open clouds, users come aboard as participants or partners—for example, by invitation. In open cloud systems users can develop their own software and data management, control access, and purchase their own hardware while running securely in the cloud environment. To do so will still require working within the rules of the cloud system, but in some open cloud systems those restrictions and limitations can be walked around easily with surprisingly little loss of freedom. It is in this context that REDDnet (Research and Education Data Depot network) is presented as the place where the Texas Tech University (TTU)) Libraries have been conducting research on long-term digital archival storage. The REDDnet network by year's end will be at 1.2 petabytes (PB) with an additional 1.4 PB for a related project (Compact Muon Soleniod Heavy Ion [CMS-HI]); additionally there are over 200 TB of tape storage. These numbers exclude any disk space which TTU will be purchasing during the year. National Science Foundation (NSF) funding covering REDDnet and CMS-HI was in excess of $850,000 with $850,000 earmarked toward REDDnet. In the terminology we used above, REDDnet is an open cloud system that invited TTU Libraries to participate. This means that we run software which fits the REDDnet structure. We are beginning to complete the final design of our system, and starting to move into the first stages of construction. And we have made a decision to move forward and purchase one-half petabyte of disk storage in the initial phase. The concerns, deliberations and testing are presented here along with our initial approach.

http://journals.tdl.org/jodi/article/view/5875 2012/05/05 - 08:11

The UK JISC-funded Content Lifecycle Integration Framework (CLIF) project has explored the management of digital content throughout its lifecycle from creation through to preservation or disposal. Whilst many individual systems offer the capability of carrying out lifecycle stages to varying degrees, CLIF recognised that only by facilitating the movement of content between systems could the full lifecycle take advantage of systems specifically geared towards different stages of the digital lifecycle. The project has also placed the digital repository at the heart of this movement and has explored this through carrying out integrations between Fedora and Sakai, and Fedora and SharePoint. This article will describe these integrations in the context of lifecycle management and highlight the issues discovered in enabling the smooth movement of content as required.

http://journals.tdl.org/jodi/article/view/5876 2012/05/05 - 08:11

The paper describes the investigations and outcomes of the JISC-funded Kindura project, which is piloting the use of hybrid cloud infrastructure to provide repository-focused services to researchers. The hybrid cloud services integrate external commercial cloud services with internal IT infrastructure, which has been adapted to provide cloud-like interfaces. The system provides services to manage and process research outputs, primarily focusing on research data. These services include both repository services, based on use of the Fedora Commons repository, as well as common services such as preservation operations that are provided by cloud compute services. Kindura is piloting the use of the DuraCloud2, open source software developed by DuraSpace, to provide a common interface to interact with cloud storage and compute providers. A storage broker integrates with DuraCloud to optimise the usage of available resources, taking into account such factors as cost, reliability, security and performance. The development is focused on the requirements of target groups of researchers.

http://journals.tdl.org/jodi/article/view/5877 2012/05/05 - 08:11

Open data is becoming increasingly important in research. While individual researchers are slowlybecoming aware of the value, funding agencies are taking the lead by requiring data be made available, and also by requiring data management plans to ensure the data is available in a useable form. Some journals also require that data be made available. However, in most cases, “available upon request” is considered sufficient. We describe a number of historical examples of data use and discovery, then describe two current test cases at the University of New Mexico. The lessons learned suggest that an instituional data services program needs to not only facilitate fulfilling the mandates of granting agencies but to realize the true value of open data. Librarians and institutional archives should actively collaborate with their researchers. We should also work to find ways to make open data enhance a researchers career. In the long run, better quality data and metadata will result if researchers are engaged and willing participants in the dissemination of their data.

http://journals.tdl.org/jodi/article/view/5878 2012/05/05 - 08:11

In 2008 the University of Hull, Stanford University and University of Virginia decided to collaborate with Fedora Commons (now DuraSpace) on the Hydra project. This project has sought to define and develop repository-enabled solutions for the management of multiple digital content management needs that are multi-purpose and multi-functional in such a way as to allow their use across multiple institutions. This article describes the evolution of Hydra as a project, but most importantly as a community that can sustain the outcomes from Hydra and develop them further. The data modelling and technical implementation are touched on in this context, and examples of the Hydra heads in development or production are highlighted. Finally, the benefits of working together, and having worked together, are explored as a key element in establishing a sustainable open source solution.

http://journals.tdl.org/jodi/article/view/5879 2012/05/05 - 08:11

In this paper, we describe our recent work in using cloud computing to provision digital library services. We consider our original and current motivations, technical details of our implementation, the path we took, and our future work and lessons learned. We also compare our work with other digital library cloud efforts.

http://journals.tdl.org/jodi/article/view/5881 2012/05/05 - 08:11

This paper describes an environment for the “sheer curation” of the experimental data of a group of researchers in the fields of biophysics and structural biology. The approach involves embedding data capture and interpretation within researchers' working practices, so that it is automatic and invisible to the researcher. The environment does not capture just the individual datasets generated by an experiment, but the entire workflow that represent the “story” of the experiment, including intermediate files and provenance metadata, so as to support the verification and reproduction of published results. As the curation environment is decoupled from the researchers’ processing environment, the provenance is inferred from a variety of domain-specific contextual information, using software that implements the knowledge and expertise of the researchers. We also present an approach to publishing the data files and their provenance according to linked data principles by using OAI-ORE (Open Archives Initiative Object Reuse and Exchange) and OPMV.

http://journals.tdl.org/jodi/article/view/5883 2012/05/05 - 08:11

This paper describes the FISHNet project, which developed a repository environment for the curation and sharing of data relating to freshwater science, a discipline whose research community is distributed thinly across a variety of institutions, and usually works in relative isolation as individual researchers or within small groups. As in other “small sciences”, these datasets tend to be small and “hand-crafted”, created to address particular research questions rather than with a view to reuse, so they are rarely curated effectively, and the potential for sharing and reusing them is limited. The paper addresses a variety of issues and concerns raised by freshwater researchers as regards data sharing, describes our approach to developing a repository environment that addresses these concerns, and identifies the potential impact within the research community of the system.

http://journals.tdl.org/jodi/article/view/5884 2012/05/05 - 08:11

The article describes the integrated adoption of Fedora Commons and MediaMosa for managing a digital repository. The integration was experimented along with the development of a cooperative project, Sapienza Digital Library (SDL). The functionalities of the two applications were exploited to built a weaving factory, useful for archiving, preserving and disseminating of multi-format and multi-protocol audio video contents, in different fruition contexts. The integration was unleashed by means of both repository-to-repository interaction, and mapping of video Content Model's disseminators to MediaMosa's Restful services. The outcomes of this integration will lead to a more flexible management of the dissemination services, as well as to economize the overproduction of different dissemination formats.

http://journals.tdl.org/jodi/article/view/5911 2012/05/05 - 08:11

As academia in general, and research funders in particular, place ever greater importance on data as an output of research, so the value of good research data management practices becomes ever more apparent. In response to this, the Innovative Design and Manufacturing Research Centre (IdMRC) at the University of Bath, UK, with funding from the JISC, ran a project to draw up a data management planning regime. In carrying out this task, the ERIM (Engineering Research Information Management) Project devised a visual method of mapping out the data records produced in the course of research, along with the associations between them. This method, called Research Activity Information Development (RAID) Modelling, is based on the Unified Modelling Language (UML) for portability. It is offered to the wider research community as an intuitive way for researchers both to keep track of their own data and to communicate this understanding to others who may wish to validate the findings or re-use the data.

http://journals.tdl.org/jodi/article/view/5917 2012/05/05 - 08:11

Focus groups conducted at Carnegie Mellon reveal that what motivates many faculty to self-archive on a website or disciplinary repository will not motivate them to deposit their work in the institutional repository. Recruiting a critical mass of content for the institutional repository is contingent on increasing awareness, aligning deposit with existing workflows, and providing value-added services that meet needs not currently being met by other tools. Faculty share concerns about quality and the payoff for time invested in publishing and disseminating their work, but disagree about metrics for assessing quality, the merit of disseminating work prior to peer review, and the importance of complying with publisher policies on open access. Bridging the differences among disciplinary cultures and belief systems presents a significant challenge to marketing the institutional repository and developing coherent guidelines for deposit.

http://journals.tdl.org/jodi/article/view/2068 2011/07/01 - 19:28

The complexity and flexibility of some XML schemas can make their implementation difficult in working environments. This is particularly true of CERIF, a standard for the interchange of research management information, which consists of 192 interlinked XML schemas. This article examines a possible approach of using 'intermediary' XML schemas, and associated XSLT stylesheets, to make such applications easier to employ. It specifically examines the use of an intermediary schema, CERIF4REF, which was designed to allow UK Higher Education institutions to submit data for a national periodic research assessment exercise in CERIF. The wider applicability of this methodology, particularly in relation to the METS standard, is also discussed.

http://journals.tdl.org/jodi/article/view/2069 2011/07/01 - 19:28

Our activities are becoming more and more computer-mediated. For documenting these activities, it is no longer sufficient to automatically record their traces. In this paper we introduce the redocumentation process of computer-mediated activity as a narrative construction that ties together the content of activity traces and the users’ knowledge in describing their activities in new easily exchangeable documents. We present a generic semi-automatic approach for this process, which is based on rhetorical structure theory. This approach uses formal models for process input and output, and handles the process through two main phases: an automatic phase to generate a fragmented document from traces as a first description of the activity and an interactive phase to allow the user to tailor this first description according to his particular needs and choices. We also present ActRedoc, a tool developed for text-based redocumentation, for which a first evaluation was conducted.

http://journals.tdl.org/jodi/article/view/2088 2011/07/01 - 19:28

The exposure of an organisation’s illegal or unethical practices is often known as whistleblowing. It is currently a high- profile activity as a consequence of whistleblowing websites such as Wikileaks. However, modern digital fingerprinting technologies allow the identification of the human users associated with a particular copy of a leaked digital file. Fear of such discovery may discourage the public from exposing illegal or unethical practices. This paper therefore introduces the novel whistleblower- defending problem, a unique variant of the existing document- marking and traitor-tracing problems. It is addressed here by outlining practical steps that real-world whistleblowers can take to improve their safety, using only standard desktop OS features. ZIP compression is found to be useful for indirect file comparison, in cases where direct file comparison or use of checksums is impossible, inconvenient or easily traceable. The methods of this paper are experimentally evaluated and found to be effective.

http://journals.tdl.org/jodi/article/view/2136 2011/07/01 - 19:28

In Europe over 2.5 million publications of universities and research institutions are stored in institutional repositories. Although institutional repositories make these publications accessible over time, a repository does not have the task to preserve the content for the long term. Some countries have developed an infrastructure dedicated to sustainability. The Netherlands is one of those countries. The Dutch situation could be regarded as a successful example of how long term preservation of scholarly publications is organised through an open access environment. In this article it will be explained how this infrastructure is structured, and some preservation issues related to it will be discussed. This contribution is based on the long term preservation studies into Enhanced Publications, performed in the FP7 project DRIVER II (2007-2009). The overall conclusion of the DRIVER studies about long term preservation is that the issues are rather of an organisational nature than of a technical one. The nature of publications in scholarly communication is changing. Enhanced Publications and Collaborative Research Environments are new phenomena in scholarly communication using the wide range of possibilities of the digital environment in which researchers and their audience act. This rapidly changing digital environment also affects long term preservation archives. Raising awareness of long term preservation in the research community is important because researchers are responsible for public dissemination of their research output and need to understand their role in the life cycle of the digital object. Researchers should be aware that constant curation and preservation actions must be undertaken to keep the research results fit for verification, reuse, learning and history over time.

http://journals.tdl.org/jodi/article/view/1764 2011/05/25 - 02:43

The effective long-term curation of digital content requires expert analysis, policy setting, and decision making, and a robust technical infrastructure that can effect and enforce curation policies and implement appropriate curation activities. Since the number, size, and diversity of content under curation management will undoubtedly continue to grow over time, and the state of curation understanding and best practices relative to that content will undergo a similar constant evolution, one of the overarching design goals of a sustainable curation infrastructure is flexibility. In order to provide the necessary flexibility of deployment and configuration in the face of potentially disruptive changes in technology, institutional mission, and user expectation, a useful design metaphor is provided by the Unix pipeline, in which complex behavior is an emergent property of the coordinated action of a number of simple independent components. The decomposition of repository function into a highly granular and orthogonal set of independent but interoperable micro-services is consistent with the principles of prudent engineering practice. Since each micro-service is small and self-contained, they are individually more robust and collectively easier to implement and maintain. By being freely interoperable in various strategic combinations, any number of micro-services-based repositories can be easily constructed to meet specific administrative or technical needs. Importantly, since these repositories are purposefully built from policy neutral and protocol and platform independent components to provide the function minimally necessary for a specific context, they are not constrained to conform to an infrastructural monoculture of prepackaged repository solutions. The University of California Curation Center has developed an open source micro-services infrastructure that is being used to manage the diverse digital collections of the ten campus University system and a number of non-university content partners. This paper provides a review of the conceptual design and technical implementation of this micro-services environment, a case study of initial deployment, and a look at ongoing micro-services developments.

http://journals.tdl.org/jodi/article/view/1605 2011/04/30 - 22:44

Document servers complying to the standards of the Open Archives Initiative (OAI) are rich, yet seldom exploited source of textual primary data for research fields in text mining, natural language processing or computational linguistics. We present a bilingual (English and German) text corpus consisting of bibliographic OAI records and the associated full texts. A particular added value is that we annotated each record with at least one Dewey Decimal Classification (DDC) number, inducing a subject-based categorization of the corpus. By this means, it can be used as training data for machine learning-based text categorization tasks in digital libraries, but also as primary data source for linguistic research on academic language use related to specific disciplines. We describe the construction of the corpus using data from the Bielefeld Academic Search Engine (BASE), as well as its characteristics.

http://journals.tdl.org/jodi/article/view/1765 2011/04/30 - 22:44

The stated aim of many repositories is to provide permanent open access to their content. However, relatively few repositories have implemented practical action plans towards permanence. Repository managers often lack time and confidence to tackle the important but scary problem of preservation. Written by, and aimed at, repository managers, this paper describes how the JISC-funded KeepIt project has been bringing together existing preservation tools and services with appropriate training and advice to enable repository managers to formulate practical and achievable preservation plans. Three elements of the KeepIt project are described: 1. The initial, exploratory phase in which repository managers and a preservation specialist established the current status of each repository and its preservation objectives; 2. The repository-specific KeepIt preservation training course which covered the organisational and financial framework of repository preservation; metadata; the new preservation tools; and issues of trust between repository, users and services; 3. The application of tools and lessons learned from the training course to four exemplar repositories and the impact that this has made. The paper concludes by recommending practical steps that all repository managers may take to ensure their repositories are preservation-ready.

http://journals.tdl.org/jodi/article/view/1767 2011/04/30 - 22:44

This paper proposes using OAI-ORE as the basis for a new method to represent and manage the description of archival collections. This strategy adapts traditional archival description methods for the contemporary reality of digital collections and takes advantage of the power of OAI-ORE to allow for a multitude of non-linear relationships, providing richer and more powerful access and description. A schema for representing finding aids in OAI-ORE would facilitate more sophisticated methods for modeling archival collection descriptions.

http://journals.tdl.org/jodi/article/view/1814 2011/04/30 - 22:44

IT based research environments with an integrated repository component environments are increasingly important in research. While grid technologies and its relatives used to draw most attention, the e-Infrastructure community is now often looking to the repository and preservation communities to learn from their experiences. After all, trustworthy data-management and concepts to foster the agenda for data-intensive research are among the key requirements of researchers from a great variety of disciplines. The WissGrid project aims to provide cross-disciplinary data curation tools for a grid environment by adapting repository concepts and technologies to the existing D-Grid e Infrastructure. To achieve this, it combines existing systems including Fedora, iRODS, DCache, JHove, and others. WissGrid respects diversity of systems, and aims to improve interoperability of the interfaces between those systems.

http://journals.tdl.org/jodi/article/view/1896 2011/04/30 - 22:44