This page details some of the ways that uploading a file might fail, the client errors received, files persisted to disk, etc. as a result.  The primary objective is to understand how Fedora fails under different circumstances, and identify any bugs that can be fixed to improve reliability.

Summary

  • Uploads that completed successfully and without error message could all be transferred to another VM and read successfully.
  • Filling up the disk or undeploying Fedora result in files being written to the Fedora home directory but not tracked by the repository.
  • Filling up the disk and then attempting to update Fedora results in an unresponsive repository.

General procedure

  • Run Fedora 4 in Tomcat on one VM.
  • Upload a large file from a second VM using curl.  4GB files were used so transfer times would be long enough to interrupt the upload process.
  • Upload a second file, and interrupting the upload in various ways to simulate application, network or hardware failures.
  • Copy the Fedora home directory to the second VM and run a separate Fedora instance to verify that the files could be retrieved

Fill up disk

  • Before the second upload, add files to the filesystem containing the Fedora home directory so there isn't enough disk space for the upload to finish.
  • The client receives a 400 Bad Request error.
  • The file uploads to a temporary directory and is then copied over to the Fedora home directory.
  • When the disk is full, any attempt to create containers makes Fedora become unresponsive, and unable to be restarted.
  • Freeing up disk space allows Fedora to handle updates and be started again.

Undeploy or redeploy Fedora

  • While the second upload is ongoing, undeploy the Fedora webapp (by removing the fcrepo.war file from Tomcat's webapps directory).
  • The upload finishes and the client receives a 500 Internal Server Error.
  • The file is moved to the Fedora home directory, but does not show up in the REST API

Kill Tomcat

  • While the second upload is ongoing, kill Tomcat's java process on the client VM.
  • The client dies with a non-0 exit status.
  • The file is not written to the Fedora home directory. 

Kill Client

  • While the second upload is ongoing, kill curl on the client VM.
  • If the checksum parameter is specified, the upload fails.
    • But if the checksum parameter is not specified, then the partial upload is treated as a complete file and added to the repository.

Block Client connections

  • Before the second upload, block the client IP on the server VM using IPTables firewall.
  • The second upload times out.

 

  • No labels

3 Comments

  1. Should this page be factored into the documentation in some way? It's kind of out here on its own now and would be hard to find.

    1. The audience for the page is not obvious to me... therefore, neither is its proper location in the wiki. Thoughts?

      1. I think some of this content would be useful for the administrator's guide under the general topic of known failure scenarios.  It could also be part of a page on durability, describing how Fedora does its best to signal errors and not lose data silently.

        But all of these failure scenarios will need to be retested before the 4.7 release, and the content as written up here is really my testing notes, not packaged in a way that would be useful for many people.