Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleMeeting 03 - September 7, 2016

Meeting 03 - September 7, 2016 

Agenda

  1. Goal: complete all phase-1 requirements and documentation
    1. Support transacting in RDF
    2. Support allowing the option to include Binaries
    3. Support references from exported resources to other exported resources
    4. Support import into a non-existing Fedora container
    5. Support export of resource and its "members" based on the ldp:contains predicate
    6. The URIs of the round-tripped resources must be the same as the original URIs
  2. Open pull requests
    1. https://github.com/fcrepo4-labs/fcrepo-import-export/pull/17
    2. https://github.com/fcrepo4-labs/fcrepo-import-export/pull/22
  3. Check-in on importer
  4. Jira
    serverDuraSpace JIRA
    jqlQueryproject = FCREPO AND status in (Open, "In Progress", Reopened, "In Review", Received) AND component = f4-import-export ORDER BY key ASC
    serverIdc815ca92-fd23-34c2-8fe3-956808caf8c5
  5. ...

Minutes

 

  • Sprint goals: Once the current PR that Esmé has is merged (PR-22), we will be done with the crossed off items on the list from #1 above,.
    • Andrew - has a ticket that's in progress where a config file is created upon export and can be used by import and export - the options will be written to a file and on subsequent runs can be passed in and used.
    • Config should/could contain at least the Base URI and directories where stuff is store. Both of those would tell how to map the URIs in exported RDF to files on disk.
    • Bags
      • Bags - next phase export utility will have to generate bags? or can you convert it to a bag? Whatever is making bags will need the config Andrew's working on to know how mapping works. Should exporter export bags or should we have a utility to take an export and convert to a bag? Unclear as to which direction to go in. Postoned talking about bags to look at tickets, since bags are not part of phase one.
    • Import Utility
      • If you try to restore RDF - it will only work if you import into repo at the same url, because no mapping is being done. You might want/expect a new URL if you move to new system. The import will not work in that scenario, as it's currently implemented.
      • Import would need to know the old baseURL and the new baseURL. Would it be helpful instead on the export side of things to doing a stream replace on base url - replace with fedora:info? That way you don't need to know the base URL that was exported from, just put it here. Will this help us out with bags down the line? Unsure.
    • Export Utility
      • Esmé will make a ticket for replacing baseURL with fedora:info in the export.
  • Leave call knowing how we will meeting a-f as well as documentation. Completion of phase 1 is defined by a-f being done, a utility to point people to, and a top level wiki page that describes what we've done and how to use it.
  • Pull Requests:
  • open tickets:
    • FCREPO-2166 & FCREPO-2167 - Josh will finish today.
      • Have dataset with cross object relationships & relationships to external content.
      • There is a python script that's a batch loader specific for this set of content. If someone wants to run the python to load it, contact Josh. Josh created a ticket for the namespace issue he was seeing yesterday.
    • FCREPO-2168 - Josh can also work on this one.
    • FCREPO-2132 - Nick wants feedback. Leave this open throughout all phases of sprints? How to proceed: It needs a section talking about what we are going to do to validate the utilites. One more section with bullets so that someone could, potentially, run the tests themselves. After that closing theissue makes sense.
    • Testing - how does one determine that the RDF is the same? How can you show that even though the graphs are the same that the points in them are identical (ie, same exact triples)? Simplest way open up in text editors and compare them. Automate them with SPARQL. There are tools to do diffs of RDF. Not using blank nodes should help.
      • One way: turtle docs - run them through unix-like facitities - sort or something like that. Then a line by line diff. Some canonical form that can be byte by byte compared.
      • Using ntriples might make more sense since they are not multiline.
      • Are there things to filter out when doing this? Server managed triples? (example of filtering here: https://github.com/fcrepo4-labs/fcrepo-import-export/pull/22/files#diff-d48f0429a55f2afa5045356359360d80R154) python RDF lib has isIsomorphic to test. So, filter out server managed triples and then compare the graph. Is that a good approach?
    • That we are comparing RDF should be mentioned in the test plan and the status of where that is
      • mention that we are not currently comparing RDF before/after
      • tools that we might use to do this
    • Josh can help with FCREPO-2132. Youn may add some as well. Bethany will start test section that will contain details of what actual tests we will be doing.
    • FCREPO-2134 - just add a link to the dataset and this one is done. Link to the google drive and the fcrepo-exts/sample-dataset - Nick will take this one
    • FCREPO-2164 - Andrew working on today
    • FCREPO-2178 - Nick doing today.
    • Documentation issues: 2135, 2143, 2144, 2145 - are these sufficient for communicating what this tool does and how to use it?
    • FCREPO-2143 & FCREPO-2144 sub pages (or part of a summary page). Justin will work on these on this page: https://wiki.duraspace.org/display/FEDORA4x/Import+and+Export+Tools
    • FCREPO-2145 - other then a few things (like indirect containers and remaping baseURL) it's good enough to start document. There maybe issues that change the format. Do we just want to do URL escaping for resource filenames and do that now versus later? Take care of this sprint? Yes - via url encoding/decoding. Document it as if it's fixed. Mike to create ticket about it. The entire file name will be url escaped.
    • FCREPO-2135 - will be updated to reflect where we end up on Friday. May be done on Friday.
    • FCREPO-2146 - can be rolled into the PR 22 - Esme Esmé to work on.
    • FCREPO-2163 - Mike will work on today.
    • FCREPO-2169 & 2170 - related to one another. Not a goal for phase 1, so lower priority. Feel free to work on if you have time. Nice to have, but not necessary for phase 1.
    • FCREPO-2172 - same as above, not part of phase 1, nice to have. Keep around, but not as a requirement for phase 1.
    • FCREPO-2180 - newly added, based on earlier conversation

Everyone can take the responsibility for signing off on phase 1. Saying yes to where it works and raising a flag for where it doesn't. Please raise issues by tomorrow afternoon so we have a chance to address them.

Note that there are two dependencies on snapshots, which is not ideal. fcrepo because of Jena and then the java-client as well. Maybe after the sprint we will want to get a java-client release out so we can remove that dependency.

Wrap-up meeting on Friday at 3:30

...