Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Call in details:

Excerpt

Sprinters

Developers

Testing and Validation

Documentation

Meetings

Expand
titleMeeting 01 - August 29, 2016

Meeting 01 - August 29, 2016

Agenda

  1. Introductions
  2. Logistics
    1. Virtual daily stand-ups?
    2. IRC? 
    3. Conference calls?
    4. Email check-in?
    5. Sprint retrospective?
  3. Phase 1 priorities
    1. Support transacting in RDF
    2. Support allowing the option to include Binaries
    3. Support references from exported resources to other exported resources
    4. Support import into a non-existing Fedora container
    5. Support export of resource and its "members" based on the ldp:contains predicate
    6. The URIs of the round-tripped resources must be the same as the original URIs
  4. Assign work

Minutes

  1. Introductions
  2. Logistics
    1. Virtual daily stand-ups on IRC by 10 am ET
      1. What you did yesterday
      2. What you plan on doing today
      3. Blockers
    2. Conference calls on this line if needed
    3. Sprint retrospective on Sep 9 at 3:30 pm ET
      1. What worked
      2. What didn’t work
      3. Try to wrap up by Thursday
  3. Phase I Priorities
    1. Camel serializer
      1. Supports transacting in RDF and has option to include binaries
      2. Listens for messages on queues; need to run indexer before running export
      3. Basic case: dump to disk and gzip
      4. Need to parse RDF to figure out information required for import, e.g., base url for repository
      5. Provides options for RDF serialization supported by REST API and supplies default serialization
  4. Looking ahead to Phases II and III
    1. Versions
      1. JCR versioning will include any version of parent
      2. For Camel serializer, does Fedora send out a message when it creates a version?
      3. Possible way forward is to consider basic case as export of current version and then to build in capability to export version that is not the current version with additional metadata
      4. Need to consider layout of versions on disk
      5. Fedora API specification is leaning towards making versions first-class resources
    2. Bags
      1. Client tool
      2. Generate from REST API calls to repository or from filesystem?
      3. Only LDP basic containers? Connected graph from root; basic containers if not starting at root
      4. Stakeholder use cases include exporting individual resources, not entire graph, and being able to generate resources that can be imported into other repositories (such as APT) from documented export formats
  5. Design considerations
    1. Camel may make bar too high for users; code may need to be re-implemented
    2. Some discussion of using bash or python for prototyping; final consensus was to use java
    3. Testing plan should define success, describe tests that will be run, and include test data
    4. Integration test suite in java; unit tests included with utilities
    5. Logging
  6. In-flight tickets
    1. FCREPO-2127: Document how the Camel RDF Serializer exports content to disk
    2. FCREPO-2128: Review Camel RDF Serializer implementation
    3. FCREPO-2129: Create import-export github repository in the fcrepo4-labs organization
    4. FCREPO-2130 (close out FCREPO-1990): Create skeletal import client utility
    5. FCREPO-2131: Create skeletal export client utility
    6. FCREPO-2132: Create a test plan
    7. FCREPO-2133: Create sample test dataset
    8. FCREPO-2134: Document sprint resources
    9. FCREPO-2135: Create user documentation
    10. FCREPO-2136: Create Import/Export wiki documentation

 

Expand
titleMeeting 02 - September 2, 2016

Meeting 02 - September 2, 2016 
 

Agenda

  1. Open issues
  2. Review ticket process
  3. Squashing commits
  4. Review use of IRC
  5. Helping everyone use jar and load data
    1. We need data with cross-references and external references(question)
  6. Consider refactoring export to have 'ldp:contains' passed in?
  7. IMPORT!
  8. Review tickets
  9. AuthZ

Minutes


Ticket review

  • 2127 justin will add step by step instructions with output to this ticket, to provide more info.
  • 2130 Esmé  will take over ticket, and start on import utility, get done what he can for others to review on Monday.  this will be a basic, skeletal version, working with a small repo, containing a couple of small handbuilt resources.
  • 2132 we reviewed Youn's test plan doc and resolved comments https://docs.google.com/document/d/1WW0dU9LDWvnRPGbzKpIE-ODkGq39md17sKkWVXC0P8U/edit
    Nick and Youn will work on adding this to the wiki
  • 2133 Josh has sample data in sample data repo, almost ready to close ticket.
    Youn has sample data, and is working on a script to put that data into a fedora repo, currently living in a google drive folder.
    Esmé provided feedback on how to modify the existing sample owl file, Justin will help next week with scripting the loading of owl data, if needed.
  • 2134 will leave open for extra comment/input untill next week
  • 2135 leaving open and unassigned until next week
  • 2139 low priority
  • 2143 and 2144 leaving open til next week
  • 2146 discussed, needs to be revisited, once basic import tool exists - some discussion about how to pass in and save config (command line and
  • 2155 Andrew is working on integration tests - run export and import from command line, to test roundtripping.
    Esmé suggested writing the test in the other order - import a fixture, then export it.  Andrew will commit initial driver level tests first, then create a follow on ticket for sub-module level tests


    Some take aways from the ticket review discussion:
    General process for making sample datasets:
    write up a document that provides links to raw data
    write a script to import that into a fedora repo
    do an fcr:backup on that repo
    tar up the output of fcr:backup
    add all the outputs from above steps to fcrepo sampledata github repo

    small test fixtures will be tested from integration tests
    larger sample datasets (like in fcrepo sampledata) will be tested that work with the fcrepo-vagrant as the test environment, and documented in the test plan, which will be added to the wiki.

    we reviewed process for how to use jira:
    -make sure to take ownership of tickets you are working on
    -mark them as in progress, or ready for review, etc as you go

    we reviewed the Pull Request process:
    -put in a pull request
    -others comments
    -based on comments, you make subsequent commits (don't squash)
    -once the pull request is complete, then squash down to one commit before merging to master

    This is not a hard and fast rule, there can be exceptions,
    -sometimes it is better to not squash, or squash to a set of commits, if there is logical separation between commits
    -sometimes the branch has to be rebased, due to other work going into master, and in that case it can make sense to squash when rebasing.

    Esmé provided a link to a good blog post on squashing commits by Jeremy Friesen: http://ndlib.github.io/practices/one-commit-per-pull-request/

    discussed communication
    -using irc, make sure irc client can notify you when your name is used, try to keep irc on so you can participate during the sprint

    discussed how to actually get the code, making sure all of us can build and/or acquire the jar file and run it. Ask in irc if there are questions.

    talked about process of dependency management and use of ldp:contains, Esmé and Andrew will do some refactoring.
Expand
titleMeeting 03 - September 7, 2016

Meeting 03 - September 7, 2016 

Agenda

  1. Goal: complete all phase-1 requirements and documentation
    1. Support transacting in RDF
    2. Support allowing the option to include Binaries
    3. Support references from exported resources to other exported resources
    4. Support import into a non-existing Fedora container
    5. Support export of resource and its "members" based on the ldp:contains predicate
    6. The URIs of the round-tripped resources must be the same as the original URIs
  2. Open pull requests
    1. https://github.com/fcrepo4-labs/fcrepo-import-export/pull/17
    2. https://github.com/fcrepo4-labs/fcrepo-import-export/pull/22
  3. Check-in on importer
  4. Jira
    serverDuraSpace JIRA
    jqlQueryproject = FCREPO AND status in (Open, "In Progress", Reopened, "In Review", Received) AND component = f4-import-export ORDER BY key ASC
    serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
  5. ...

Minutes

 

  • Sprint goals: Once the current PR that Esmé has is merged (PR-22), we will be done with the crossed off items on the list from #1 above,.
    • Andrew - has a ticket that's in progress where a config file is created upon export and can be used by import and export - the options will be written to a file and on subsequent runs can be passed in and used.
    • Config should/could contain at least the Base URI and directories where stuff is store. Both of those would tell how to map the URIs in exported RDF to files on disk.
    • Bags
      • Bags - next phase export utility will have to generate bags? or can you convert it to a bag? Whatever is making bags will need the config Andrew's working on to know how mapping works. Should exporter export bags or should we have a utility to take an export and convert to a bag? Unclear as to which direction to go in. Postoned talking about bags to look at tickets, since bags are not part of phase one.
    • Import Utility
      • If you try to restore RDF - it will only work if you import into repo at the same url, because no mapping is being done. You might want/expect a new URL if you move to new system. The import will not work in that scenario, as it's currently implemented.
      • Import would need to know the old baseURL and the new baseURL. Would it be helpful instead on the export side of things to doing a stream replace on base url - replace with fedora:info? That way you don't need to know the base URL that was exported from, just put it here. Will this help us out with bags down the line? Unsure.
    • Export Utility
      • Esmé will make a ticket for replacing baseURL with fedora:info in the export.
  • Leave call knowing how we will meeting a-f as well as documentation. Completion of phase 1 is defined by a-f being done, a utility to point people to, and a top level wiki page that describes what we've done and how to use it.
  • Pull Requests:
  • open tickets:
    • FCREPO-2166 & FCREPO-2167 - Josh will finish today.
      • Have dataset with cross object relationships & relationships to external content.
      • There is a python script that's a batch loader specific for this set of content. If someone wants to run the python to load it, contact Josh. Josh created a ticket for the namespace issue he was seeing yesterday.
    • FCREPO-2168 - Josh can also work on this one.
    • FCREPO-2132 - Nick wants feedback. Leave this open throughout all phases of sprints? How to proceed: It needs a section talking about what we are going to do to validate the utilites. One more section with bullets so that someone could, potentially, run the tests themselves. After that closing theissue makes sense.
    • Testing - how does one determine that the RDF is the same? How can you show that even though the graphs are the same that the points in them are identical (ie, same exact triples)? Simplest way open up in text editors and compare them. Automate them with SPARQL. There are tools to do diffs of RDF. Not using blank nodes should help.
      • One way: turtle docs - run them through unix-like facitities - sort or something like that. Then a line by line diff. Some canonical form that can be byte by byte compared.
      • Using ntriples might make more sense since they are not multiline.
      • Are there things to filter out when doing this? Server managed triples? (example of filtering here: https://github.com/fcrepo4-labs/fcrepo-import-export/pull/22/files#diff-d48f0429a55f2afa5045356359360d80R154) python RDF lib has isIsomorphic to test. So, filter out server managed triples and then compare the graph. Is that a good approach?
    • That we are comparing RDF should be mentioned in the test plan and the status of where that is
      • mention that we are not currently comparing RDF before/after
      • tools that we might use to do this
    • Josh can help with FCREPO-2132 Youn may add some as well. Bethany will start test section that will contain details of what actual tests we will be doing.
    • FCREPO-2134 - just add a link to the dataset and this one is done. Link to the google drive and the fcrepo-exts/sample-dataset - Nick will take this one
    • FCREPO-2164 - Andrew working on today
    • FCREPO-2178 - Nick doing today.
    • Documentation issues: 2135, 2143, 2144, 2145 - are these sufficient for communicating what this tool does and how to use it?
    • FCREPO-2143 & FCREPO-2144 sub pages (or part of a summary page). Justin will work on these on this page: https://wiki.duraspace.org/display/FEDORA4x/Import+and+Export+Tools
    • FCREPO-2145 - other then a few things (like indirect containers and remaping baseURL) it's good enough to start document. There maybe issues that change the format. Do we just want to do URL escaping for resource filenames and do that now versus later? Take care of this sprint? Yes - via url encoding/decoding. Document it as if it's fixed. Mike to create ticket about it. The entire file name will be url escaped.
    • FCREPO-2135 - will be updated to reflect where we end up on Friday. May be done on Friday.
    • FCREPO-2146 - can be rolled into the PR 22 - Esmé to work on.
    • FCREPO-2163 - Mike will work on today.
    • FCREPO-2169 & 2170 - related to one another. Not a goal for phase 1, so lower priority. Feel free to work on if you have time. Nice to have, but not necessary for phase 1.
    • FCREPO-2172 - same as above, not part of phase 1, nice to have. Keep around, but not as a requirement for phase 1.
    • FCREPO-2180 - newly added, based on earlier conversation

Everyone can take the responsibility for signing off on phase 1. Saying yes to where it works and raising a flag for where it doesn't. Please raise issues by tomorrow afternoon so we have a chance to address them.

Note that there are two dependencies on snapshots, which is not ideal. fcrepo because of Jena and then the java-client as well. Maybe after the sprint we will want to get a java-client release out so we can remove that dependency.

Wrap-up meeting on Friday at 3:30

Expand
titleMeeting 04 - September 9, 2016

Meeting 04 - September 9, 2016

Agenda

  • Stakeholder satisfaction; Goal: complete all phase-1 requirements and documentation
    1. Support transacting in RDF
    2. Support allowing the option to include Binaries
    3. Support references from exported resources to other exported resources
    4. Support import into a non-existing Fedora container
    5. Support export of resource and its "members" based on the ldp:contains predicate
    6. The URIs of the round-tripped resources must be the same as the original URIs
  • Retrospective
    1. What went well during the sprint cycle?
    2. What went wrong during the sprint cycle?
    3. What could we do differently to improve?
  • Penn State Sprint; September 19-23

Minutes

  • Stakeholder satisfaction
    • Requirements 1-6 were largely met
    • Requirement 6: true for relative paths
    • Possible future requirement to consider: option to delete fcr:tombstone
    • Focus on finding bugs and creating tickets, making it clear where we are now
    • Readme should include link to JIRA query
    • Bugs discussed: handling of pairtree nodes in plant patents data set; loss of properties upon import of plant patents data set; authorization errors
  • Penn State Sprint; September 19-23
    • Nick, Esme, Jon Stroop, and Andrew will participate and will touch base next week
    • Justin and Josh will check their availability and will confirm on IRC by 10am ET on Monday, September 12
    • Esme: The Penn State Sprint will focus on bags; work on bugs and other issues can happen concurrently; LDP-ICs are a priority for Sufia and Hydra
  • Other follow up
    • Mike will be working over the weekend; the expectation is that a mini-code freeze will be in place
    • Bethany will write a script to validate import and export by comparing triples, which could be provided with the utility
    • Andrew: code refactoring will happen early next week
    • Nick will draft an email notification to stakeholders and share it in a Google document
  • Retrospective
    • Andrew
      1. IRC; integration of roles
      2. Didn't finish everything
      3. Do a better job of making sure we have time available
    • Esme
      1. IRC; code review
      2. Dependency chaining; interruptions and other work
      3. More preparation at the beginning of the sprint
    • Mike
      1. Coordination working on same code base
      2. Unclear standards for fcrepo-labs code reviews
      3. Limit distractions in code reviews
    • Youn
      1. Communication
      2. Level of effort at the beginning of the sprint
      3. More preparation
    • Bethany
      1. IRC; teamwork
      2. Didn't have models to follow and know where to contribute at the beginning of the sprint
      3. Start working on scripts earlier
    • Justin
      1. IRC; group dynamics; overall what was accomplished
      2. Naive sense of commitment required; could have started sooner; learning curve with Fedora
      3. Review time commitments at kickoff; document code review process; bring data sets and examples upfront
    • Josh
      1. Daily stand ups; wiki documentation
      2. IRC somewhat distracting
      3. Clear testing phase
    • Nick
      1. Communication
      2. Better job scoping requirements; dependency chaining
      3. Blocking off calendars; assigning work; communicating schedules

 

Standups

Expand
titleStand-up report - 2016-08-30

<awoods> [Import/Export Standup]
<awoods> # Completed yesterday:
<awoods> ** Create a basic executable jar for fcrepo-import-export
<awoods> https://jira.duraspace.org/browse/FCREPO-2137
<awoods> ** Create a basic CLI framework for fcrepo-import-export
<awoods> https://jira.duraspace.org/browse/FCREPO-2138
<awoods> # Planning on completing today:
<awoods> Whatever help is useful/needed
<awoods> # Blockers / Need help with:
<awoods> None

<ruebot> [Import/Export Standup]
<ruebot> * Completed yesterday: - Stub page created for https://jira.duraspace.org/browse/FCREPO-2134 - Done https://jira.duraspace.org/browse/FCREPO-2129
<ruebot> * Planning on completing today: - https://jira.duraspace.org/browse/FCREPO-2134
<ruebot> * Blockers / Need help with: - None

<westgard> [Import/Export Standup]
<westgard> * Completed yesterday:
* dhlamb (~dhlamb@142.177.186.124) has joined
<westgard> Gathered test dataset (100 objects, ea. with a PDF file, total 55BM)
<westgard> Started batch load script to load to fcrepo (about 2/3 complete):
<westgard> https://github.com/jwestgard/plantpatents-batchload
<westgard> * Planning on completing today:
<westgard> Finish batch loader, load data to fcrepo vagrant, and export using serializer
<westgard> * Blockers / Need help with:
<westgard> None

<bseeger> [import / export standup]
<bseeger> * Completed yesterday:
* youn (8284ad7b@gateway/web/freenode/ip.130.132.173.123) has joined
<bseeger> I've been lurking mostly, glad to help where I can, but not sure what that is right now, though I'm sure there will be more later.
<bseeger> * Planning on completing today:
<bseeger> helping with the test doc
<bseeger> * Blocker / Need help with:
<bseeger> Nothing right now. Been wondering though, what about fedora triples and maintaining them on an import - ie, restoring based on export. Probably not part of first phase, but should be designed in.

<youn> * Completed yesterday: initial draft of test plan for import export sprint (edits and feedback welcome!)
* westgard has quit (Ping timeout: 250 seconds)
* ajs6f (~ajs6f@d-137-54-155-242.dhcp.virginia.edu) has joined
<youn> * Planning on completing today: updating test plan; starting work on components
<youn> * Blockers / Need help with: feedback on test plan

<escowles> [import/export standup]:
<escowles> * completed yesterday: added stub readme and maven execution plugin, started on export utility
<escowles> * today: just merged #4 (command-line parsing), plan on finishing basic export utility and creating tickets for obvious improvements
<escowles> * blockers: none

<justinsimpson> [Import/Export Standup]
<justinsimpson> * Completed yesterday: - attended kick off meeting (first hour)
<justinsimpson> * Planning on completing today: - setting up test environment and populating with test data
<justinsimpson> * Blockers / Need help with: - None

...