Governance/Committers Call
17/03/2022
Discussion Topic: How to best begin collecting information on Fedora user installations
Attendees:
Daniel Bernstein
Arran Griffith
Tim Shearer
Jennifer Gilbert
Jared Whiklo
Demian Katz
Robin Ruggaber
Clavin Xu
Mike Ritter
Ben Pennell
Thomas Bernhart
Additional input sent to Arran from:
Scott Prater
Jakov Vezic
Notes:
Known concerns:
- Privacy - specifically in Europe and UK
- Community - want to do things to build the community and not tear it down by doing something that they don’t agree with
- Definitions of what data is
- Ie. IP address is it personal data?
- Additional stakeholder groups
- It falls deep in the stack, so how do Islandora/Samvera installs know we’re there and how do we engage with them to get the information
- Who has access to this data?
- Can sister communities come and ask for this data?
- Where does it live?
- Importance of having tech community involved in this conversation is one of our key points of interest so that we know what is possible and what is not
- Jakov’s notes
- Tech side of view - totally doable
- If it was “opt-in” we could just provide a pop up
- Dockerized instances make this more difficult because
- Could provide a flag to override
- Ben: Our fedora instances are on servers which are probably blocked by firewalls from phoning home
- Could end up with a large number of entries that are random because it’s a self-reporting system
- Maybe need to put parameters on data collection over time to track active installations
- IP address possibly not as useful as we think nowadays
- User supplied contact and organizational information and making it easy for people to know where to go to join the community seem more useful than IPs?
- Sounds like a good suggestion because we are simply inserting ourselves in to the installation process
- Upgrade process - would you need to re-opt-in? Or could we just carry it over?
- Fedora instance reporting tool - separate .jar file
- Stands alone and you run it independently
- Jared - had this discussion with Tech group and with Islandora
- Many institutions will not want something running without notice
- Firewalls will block getting this information in the first place
- Language - places where english is not a first language may not know
- Would want to provide explicit reasoning behind why
- Maybe have levels of information that people can offer
- Ie. a base level of info and then people can offer up additional levels as they feel comfortable (more than one person liked this idea)
- Thomas - want to have an option for people who WANT to provide this information
- Can we leverage downloads?
- Put in something less intrusive maybe something form-based with user input
- We can tie a link or opt-in at new downloads (like a reminder)
- Maybe having a form in the fedora admin ui which displays on the landing page and as a prompt in the header until filled out the first time (maybe it links out to a form hosted by LYRASIS if fedora can’t communicate with the internet). Then maybe there could be an easy to use, optional data release option in the UI and API.
- Click through vs. input would be easier to get people to do
- Pre-filled information
- How would we be able to determine the difference between someone’s personal web browser vs where Fedora is running
- Fedora distributed through Maven
- People also download it off of the releases page in GitHub
- Having prompts in the fedora ui would be a long term reminder. Maybe there could be something that displays on GET requests to the root of the repository too.
- We do realize that we are looking toward the future and we recognize that we may never know what we have in past versions
- What data does not present risk?
- Version
- Continent
- Scott’s notes
- Many, if maybe not a majority, of the Fedora 3 repositories out in the wild have been inherited by IT groups, digital archivists and collections managers who may not be plugged into the community via email lists, etc.
- So I'd recommend an outreach effort that focused on publicizing Fedora 6 and migration help outside the usual channels
- posting to forums like code4lib, the dspace forum (many institutions that run DSpace also run Fedora), NDSA lists, PASIG lists, Samvera, Islandora, etc.
- any place where a repo manager, of any skill level, might be listening.
- Digital curation listserv as well
- End question to consider: what is the risk to the program if we don’t collect information?