XSLT Ingest Example: Appendix C

The next topic concerns the problems in the source data.

EduRS.xml Fragment 1 - Figure 18

[F18H0] The middle name is all in lower case. If name parts are all shifted to upper case or all shifted to lower case before comparison then this becomes unimportant.
[F18H1] The MAJOR data field is empty. This is needed to properly label the EducationalTraining instance and so this row will be rejected.

EduRS.xml Fragment 2 - Figure 19

[F19H0] The INSTITUTION field has mixed case characters which will cause a mismatch if character for character equality is required for organizational matching. However, if all characters are shifted to the same case before comparison this will not be a problem.
[F19H1] The NETID field is missing. This causes uncertainty when comparing names. Since it is not uncommon for there to be several distinct people with exactly the same name parts, a token like netid is perfect for disambiguation of people. This is because netid is guaranteed to be unique even when a person’s name has changed due to choice, marriage or divorce. A missing netid weakens the association of a degree record and a person.

EduRS.xml Fragment 3 - Figure 20