Appendix C

Start      Previous      Next  

The next topic concerns the problems in the source data.


EduRS.xml Fragment 1 - Figure 18

  • [F18H0] The middle name is all in lower case. If name parts are all shifted to upper case or all shifted to lower case before comparison then this becomes unimportant.
  • [F18H1] The  MAJOR  data field is empty. This is needed to properly label the  EducationalTraining  instance and so this row will be rejected.


EduRS.xml Fragment 2 - Figure 19

  • [F19H0] The  INSTITUTION  field has mixed case characters which will cause a mismatch if character for character equality is required for organizational matching. However, if all characters are shifted to the same case before comparison this will not be a problem.
  • [F19H1] The  NETID  field is missing. This causes uncertainty when comparing names. Since it is not uncommon for there to be several distinct people with exactly the same name parts, a token like netid is perfect for disambiguation of people. This is because netid is guaranteed to be unique even when a person’s name has changed due to choice, marriage or divorce. A missing netid weakens the association of a degree record and a person.


EduRS.xml Fragment 3 - Figure 20

  • [F20H0] See Figure 18, highlight 0.
  • [F20H1] Extra embedded whitespace. The XPATH function normalize-space deals effectively with this sort of issue.


EduRS.xml Fragment 4 - Figure 21

  • [F21H0] Trailing whitespace.
  • [F21H1] Missing netid. See Figure 19 Highlight 1.


EduRS.xml Fragment 5 - Figure 22

  • [F22H0] Same as [F18H1]. This row will also be rejected.
  • [F22H1] Same as [F19H1].


EduRS.xml Fragment 6 - Figure 23

Start      Previous      Next

  • No labels