Purpose

The project charter defines the scope, objectives, and overall approach for the work to be completed. It is a critical element for initiating, planning, executing, controlling, and assessing the project. It should be the single point of reference on the project for project goals and objectives, scope, organization, estimates, work plan, and budget. In addition, it serves as a contract between the Project Team and the Project Sponsors, stating what will be delivered according to the budget, time constraints, risks, resources, and standards agreed upon for the project.

Executive Summary

Currently, version 1.0 of the VIVO Harvester application has been successfully used to harvest grant information from the UF Division of Sponsored Research (DSR). However, there is no mechanism in place to allow harvesting of DSR data on a recurring schedule. This purpose of this project is to plan, design and implement a recurring harvest of DSR data into VIVO at University of Florida. Ultimately, this project will lay the groundwork for future implementations of recurring data harvesting from additional data sources. This project will be referred to as “DSR Reproducible Harvest”. It is likely that this will be an upgrade to the existing Harvester application as opposed to a standalone application. This project is considered to have a minimal time investment with a high value to VIVO @ UF upon successful completion. This project should define a process that is reproducible to support any data source to be harvested using the Harvester.

Goals

The following are major goals of this project:
- Assemble a project team and define roles
- Analyze and document the current status of the Harvester as it pertains to a DSR harvest
- Analyze the current status of the development, staging and production VIVO data as it pertains to DSR data
- Design the DSR Harvest software
- Write a functional specification
- Build the DSR Harvest software
- Build a logging and email notification system
- Write a technical specification
- Implement, test and refine the system in a development environment
- Implement, test and refine the system in a staging environment
- Implement and test the system in a production environment
- Contribute the software to the community via Source Forge

Objectives

Team assembled
Roles defined and disseminated
Project plan defined
Timeline created
Harvester analyzed
VIVO development, staging, production environment analyzed
Harvester for DSR designed
Functional specification written
DSR Harvest application built
Technical specification written
Development version implemented, tested, refined
Staging version implemented, tested, refined, approved by sponsors
Production version implemented, tested, approved by sponsors
Notification system implemented
Server specifications(s) updated to reflect implementation of the Harvester
Source code contributed to the community site at Source Forge

Scope

The scope of this project is limited to reproducible harvesting of DSR data at the University of Florida. No additional data will be harvested or tested as a part of this project. This project does not include support for the DSR Harvesting beyond the production implementation. Support will be provided through the normal channels of communication via Source Forge. This project is not a new feature of the current Harvester. It is a separate application that will use the Harvester as a tool to complete its processing.

Assumptions

It is assumed that Harvester 1.0 is able to successfully harvest and map data into VIVO.
It is assumed that Harvester 1.0 is able to re-harvest and ignore existing or unchanged data.
It is assumed that all Grants are removed from VIVO prior to the first harvest of data from DSR
It is assumed that each person working on the project ensures that he/she dedicates a reasonable amount of time to the project with regard to FTE on the VIVO grant.
It is assumed that DSR representatives will not charge a fee for meeting, data feeds or data acquisition.
It is assumed that CTRIP will have Alex Rockwell as a resource in person for at least three days a week until completion of the project as defined by the project manager.

Risks

A medium risk exists that scope creep may occur. To mitigate this risk, the project charter will be reviewed on a weekly basis to ensure that any tasks out of scope are presented to the sponsors.
A high risk exists that the dates of milestones defined in the timeline may not be met for many reasons. To mitigate this risk, all actors in the project will have ongoing access to the timeline and will be expected to review it on a weekly basis.
A low risk exists that certain actors will not be available when needed. To mitigate this risk, meetings with the sponsor(s) will be held weekly so decisions about how to proceed can be agreed upon.
A low risk exists that the DSR data will not be clean. To mitigate this risk, a process will be defined that excludes data that does not fit the minimum requirements for the Harvester.
A medium risk exists that the specifications may not exist or may not accurately represent data mappings. To mitigate this risk, the functional specification designed for this project will clearly define data mapping.
A medium risk exists logging and notifications may be difficult to accurately represent in an automated fashion.
A medium risk exists that the harvester will have have bugs or may not be able to successfully reproduce a harvest without bugs or problems. To mitigate this risk, thorough discovery and testing will be conducted with the developers of the Harvester.
A risk exists that the systems and infrastructure may not support the required software or processes needed to implement the reproducible harvesting. To mitigate this risk, a thorough evaluation of the Development server will be conducted after the design phase of the project.

Organization

Person	Organization	Role
Mike Conlon	PI	Sponsor
Valrie Davis	MSL	Sponsor
Narayan Raum	CTRIP	Project Manager
Christopher Barnes	CTRIP	Harvester Project Manager
James Pence	CTRIP	Harvester Technical Contact
Alex Rockwell	MSL	Implementation Expert
Logan Clapp	MSL	Implementation Expert
Nick Dunham	DSR	Data source

Resources(Costs)

Person	FTE
Mike Conlon	.05
Valrie Davis	.1
Narayan Raum	.25
Christopher Barnes	.1
James Pence	.25
Alex Rockwell	.66

Timeline

The following timeline is a general overview of the project timeline. For a more detailed timeline, please refer to the DSR Harvest timeline spreadsheet, also in a Google Doc shared by the project manager.

Item	Start Date	Finish Date
Project Start	4/22/2011	4/22/2011
Project Plan	4/25/2011	4/29/2011
Discovery	5/2/2011	5/6/2011
Design	5/9/2011	5/13/2011
Build	5/16/2011	5/27/2011
Dev Implementation	5/30/2011	6/3/2011
Staging Implementation	6/6/2011	6/10/2011
Production Implementation	6/13/2011	6/17/2011
Contribute Source to SF Site	6/20/2011	6/24/2011
Monitor Harvesting	6/20/2011	6/24/2011
Project Complete	6/24/2011	6/24/2011

Addendum

Installation

.deb from sourceforge v1.1.1
install dpkg
Installs to /usr/share/vivo/harvester/
shell script exists, specific to DSR at UF

Configuration

vivo.model.xml - config file /harvester/example-scripts/example-dsr/
Settings must must comply with deploy.properties settings for VIVO
/harvester/example-scripts/example-images/jdbcfetch.config.xml
where clause in query reuired for confidential filtering
all others example where clauses are optional, allow subsets for testing
Only changes to example file necessary are connection, username, password
all others are part of the harvester application

DSR Models

Views have been created by DSR
contracts, project team and view_vivo
Fields - fields from the views to be harvested

Fetch

Full fetch takes approximately 3-4 hours
logs are stored in harvester directory
Harvester does not report or log when data is omitted. If it doesn’t meet requirements of the query, it is ignored
“Dirty” data that meets query requirements will be harvested, curation at the source then required

Scoring

Scored fields: UFID, Contract Number, Sponsors, Flow through sponsors

Curation

Assumed data will be corrected at the source, data that is scored on during harvest:

Update

Compares backup of previous fetch to new fetch
Removes deleted fields, triples
Adds new
Updates are actually a delete and add of triples
Duplicates reduced nearly 100%, rare due to new Smush feature in v1.1.x
If data is curated in VIVO and not at the source, duplicates may occur
Rectify: If duplicate occurs, curators responsible for manually removing edited triples
Note: In vivo 1.2, if triples are identical, only one displayed. If edited, then two will display
New harvester feature suggested:
- Ignore harvest of records that have been curated in vivo
- Should also require curating at the source, not the target

Apply

In Vivo 1.2, needs a re-index of the database
New harvester feature suggested:
- Upgrade harvester to include a function that rebuilds the luceneIndex upon completion of harvest and ingest
James Pence asking Cornell about vivo functions for luceneIndex reindexing
Without this, manual indexing via admin interface required, not acceptable

Logging

Log files include all details including stack trace
Harvester has a configuration in scripts.env, allows console log level to be customized to “info”, which will allow capture of actions through the process.
Perfect for reproducible harvest email log to sysadmins
Don’t want to parse language in logs, language can change in java, harvester, etc.
Suggested attach full log file to each email notification to sysadmins

Questions

Are grants editable in VIVO by users?
If we had a last modified by, could we use this to determine if data was curated at the target?

Space shortcuts

Page tree

Purpose

Executive Summary

Goals

Objectives

Scope

Assumptions

Risks

Organization

Resources(Costs)

Timeline

Addendum

Installation

Configuration

DSR Models

Fetch

Scoring

Curation

Update

Apply

Logging

Suggested Documentation

Questions

Space shortcuts

Page tree

DSR Reproducible Harvest Project Charter

Purpose

Executive Summary

Goals

Objectives

Scope

Assumptions

Risks

Organization

Resources(Costs)

Timeline

Addendum

Installation

Configuration

DSR Models

Fetch

Scoring

Curation

Update

Apply

Logging

Suggested Documentation

Questions