Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note: This HOWTO is a work-in-progress -- please contribute if you have any comments or advice to add, or corrections to make!

Note: This HOWTO is called "Character Encoding" but mainly deals with enabling Unicode/UTF-8 for repositories wishing to correctly handle and display Unicode UTF-8 characters

Character encoding in DSpace

Character encoding is an important consideration in digital repositories, archives and catalogues. Even if the majority of your digital resources are described in English, or in characters from the ISO-8859-1 (Latin1) character set, it is likely that users will eventually wish to search using characters from scientific character sets, character sets outside ISO-8859-1 (Latin1)  or that your repository needs to be compatible with other institutional systems that only speak Unicode UTF-8.

This HOWTO will give some tips and tricks to ensure your DSpace repository, user interfaces and servlet container are consistent in their handling of character encoding (and, better yet, compliant with UTF-8).

In DSpace 1.5.2 and 1.6.0, many character encoding fixes were submitted to help DSpace become more compliant with UTF-8. Previous versions may find that handling of text in search forms, license text, collection and community names is inconsistent, particularly in XMLUI (Manakin). A list of relevant JIRA issues can be found at the end of this page to help you identify any possible character encoding issues with your version of DSpace.

...