Date: Thu, 28 Mar 2024 10:56:53 -0400 (EDT) Message-ID: <1621141707.27928.1711637813257@lyrasis1-roc-mp1> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_27927_1195277172.1711637813257" ------=_Part_27927_1195277172.1711637813257 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
This is an archived listing of DSpace project ideas for Google Summer of= Code in 2008. To see which projects actually took part in GSoC 2008, pleas= e visit the P= ast DSpace Summer of Code Projects page
Contents
DSpace code is currently fairly closely tied to the underlying database = on which it runs. Refactoring the data access layer to use something like S= pring or Hibernate would open up the possibility of providing better suppor= t for more databases, rather than the current options of PostgreSQL or Orac= le (with MySQL support available through a patch).
A good project would be to build a version of DSpace that uses Hibernate= as the storage layer, removing all database platform-specific dependencies= .
<i> Several efforts already exist in this area on the 2.0 roadmap,= but there may be room for participation --Mark Diggory 10:56, 18 March 200= 8 (EDT) </i>
This is already being worked on in the sandbox repository (a hiberna= te prototype) --James Rutherford
Creating (or integrating) some AJAX tools for the DSpace Web UI. Some sp= ecific examples could include: auto-complete/listing already entered values= (in search and metadata entry for subjects, author names, etc), simple spe= ll check (in metadata entry and search), highlight matching terms (in searc= h results), etc.
I believe that the Manakin UI is an excellent starting point to Semantic= Web enable DSpace. That by making some simple adjustements to our internal= metadata formats and crosswalks and exposing those to the Manakin Aspect t= ransformation chain, we can begin a process to make all DSpace isntances be= tter participants in the Semantic Web world (and likewise, lay a foundation= for introducing up-incoming standards like ORE into DSpace). --Mark Diggor= y 11:29, 10 March 2008 (EDT)
History repeats itself... the Cocoon folks learned long ago that Pipelin= es would be both more efficient and more flexible if they were SAX driven r= ather than DOM driven. DSpace could learn a lesson from that play-book and = re-implement the Crosswalk API to Be a suite of SAX XMLReaders That take DS= pace Objects and serialize them to XML. This would leverage the existing wo= rk done in the Manakin DSpace Adapter API and bring it into a new Addon or = dspace-api directly making it available as a much more efficient mechanism = for getting DSpace Objects into XML.
Replace DIM with RDFXML as default "DSpace Internal Metadata" format for= embedding RDF into METS manifests in Manakin, OAI, or Packager
This is a project that would begin to solidify the use of RDF within DSp= ace as a descriptive metadata solution. It would allow Manakin to begin wor= king with RDFXML out of the box and standard adapters could be written whic= h work with the RDF Crosswalks to render descriptive metadata. Ideally, RDF= would be a replacement for DIM and allow us to begin utilizing RDF at the = core of DSpace.
Utilize above work to introduce RDFa into the Manakin Site, Community, C= ollection and Item dri2html theme templates. So that Semantic Web tools can= directly glean descriptive metadata out the xhtml directly.
http://www.w3.org/TR/xhtml-rdfa-primer/http://en.wikipedia.org/wiki/RDFa
Ideally this will also form a foundation for introducing ORE embeded dir= ectly into the rendered page.
This should almost certainly be for both the JSPUI and XMLUI --= James Rutherford( GSoC Projects shouldn't be forced to have su= ch a requirement IMO --Mark Diggory 12:30, 5 April 2008 (= EDT) )
Enhance Manakin Aspects to support multiple "named" pipelines (RDFXML, D= RI, etc) Expose RDFXML in a cocoon pipeline (or Manakin style Aspect) which= can be customized by the implementor to inject new RDF into UI for renderi= ng. An RDFXML SAX generation library has already been created and integrate= d (external to Aspects) by MIT Libraries and can be the foundation for furt= her integration into Aspects.
Proposed by: Mark Diggory. DSpace Systems Manager, MIT Libraries, email:= mdiggory@mit.edu
Solr is an Apache project that extends Lucene to provide = (perhaps most notably of the list of features) fa= ceted search and the ability to index and search specific fields.
Enable tracings on such metadata fields as Author, Subject, etc. to laun= ch a repository search for the metadata value from within another item or s= earch results list (i.e. clicking 'Albert Einstein' in the author field dis= play of one item would search the repository for all occurrences of 'dc.con= tributor.author=3DAlbert Einstein' in the repository).
<i>I think DSpace JSPUI already has this feature? In XMLUI this wo= uld be xslt development and in general Manakin/XMLUI could use some more ex= tensive development in this area --Mark Diggory 12:22, 5 April 2008 (EDT)&l= t;/i>
Stackable/Plugable service that can be used to query disparate CV, Ontol= ogical, Naming registries, via a shared query syntax (XQuery? Sparql?) retu= rning XML/RDF. Our first goal being the ability to plug these services into= Fields in the Customizable Submission workflow pages to populate suggested= values for fields. (Sources: LDAP, JDBC, DNS, XCat/OCLC/Barton, Google Sch= olar etc, GFR, other Metadata Registries).
<i>Stretch it even further... DSpace itself is a managed registry = of Metadata. A properly architected infrastructure that would allow the mes= hing of DSpace and other registry metadata in a "Semantic Web, Linked Open = Data" way would establish a foundation for exposing DSpace instances as LOD= , providing for SPARQL endponts on both ends of the design means that DSpac= e instances could become both producers and consumers of LOD. --Mark Diggor= y 12:28, 5 April 2008 (EDT)</i>
The metadata of a document can be very useful to propose services to the=
user around the document.
For instance:
<i> I would propose to be able to parameterize a link from any giv=
en metadata field to a "linked services" page (services provided by externa=
l applications OR by DSpace itself, for instance documents of the same auth=
or, on the same subject, etc.).
This is a generalisation of existing features proposed by different patche=
s. Christophe Dupriez 14:50, 17 March 2007 (EDT) </i>
Maybe a better idea to create a Java based installer that can work acros= s platforms?
Open source installer generators= p>
Unit tests that cover a lot of this have been written for 1.6 (thoug= h they aren't yet in trunk). What I would like to see is a mechanism for th= e construction of 'sandbox' environments for running such tests in. For exa= mple, proper testing would require a test database, test asset store, etc, = which is all quite messy to do by hand, and tricky to automate. --= James Rutherford
A useful precursor to unit tests, IMHO, would be an eclipse project file= setting. This would allow developers to more easily build eclipse without = them having to do all the IDE setup themselves. Then you could use the buil= tin JUnit support that eclipse has. Plus, of course, eclipse would help do = development generally.
With some given hardware, load large numbers of objects, large objects, = lots of users etc. etc into a DSpace and get some real, empirical data abou= t where the bottlenecks are, at what point the system starts to fail, how m= any concurrent users the system can cope with etc.
Exhibit is an excellent and easily embeddable report explorer that could= be utilized when exploring small (<2000) sets of items. --Mark Diggory = 11:29, 10 March 2008 (EDT)
Exhibit: http://simile.mit.edu/exhibit/
Inline example: http://members.porchlight.ca/bo= wer/simile.html
Embed Exhibit with inline JSON directly into new Aspects to do things li= ke "My Submissions", "Manage Collection Submission" and/or "Manage Collecti= on Workflows" Where the Collection Manager would have the ability to browse= existing Workflow Items filtering down to the criteria exposed in facets a= nd take/give/remove ownership of these tasks.
Proposed by: Mark Diggory. DSpace Systems Manager, MIT Libraries, email:= mdiggory@mit.edu
Add a export feature so anyone can export the metadata in DSpace to bibl= iographic software such as EndNote and RefWorks. Second, develop an import = feature so administrators can import citations from from bibliographic soft= ware such as EndNote and RefWorks to create records in Dspace.
Exporting citations is a request of many faculty and a positive selling = feature for Librarians/Project Managers to help convince faculty to deposit= their work into Dspace. Importing citations will help Dspace administrator= s get records into Dspace faster and easier. It's all about getting content= into local instances of Dspace.
Some generic bibliographic file formats like RIS, which is widely adopte=
d by both
reference managers (Endnote, RefWorks, ...) and digital library systems
(IEEEXplore, ACM, ...) and BibTeX should be offered as as a minimum.
Users should be able to manage documents of the IR in their DSpace. They= should be able to create structures to manage documents. Could be done as = a kind of mapping to MyDSpace. Furthermore there should be an option to "au= tomap" new documents based on the alerting mechanism.
This is closely related to being able to ex/import different bibliograph= ic formats.
Add a mechanism to allow for selection of multiple individually-selected= items, queried search results, and filtered search results into a 'tempora= ry' holding space for a later activity. Ideally, this holding space would b= e persistent for a logged-in user of DSpace, perhaps as an expansion of the= present MyDSpace functionality. At least, allow this to be available for t= he length of the session. Possible (current or future) uses for the cart co= ntents might include:
Create an Admin UI which can allow uses to edit most simple configuratio= ns (from dspace.cfg, input-forms.xml, etc) without actually having to dig i= nto the configuration file (and restart Tomcat to refresh the cache).
<i> this will require a significant rewrite of the ConfigurationMa= nager, IMHO the UI should know nothing about what the serialization of a co= nfiguration property looks like and an infrustructure for managing editable= Configuration properties would need to be added to the core of Dspace. --M= ark Diggory 12:34, 5 April 2008 (EDT)</i>
The item mapper should be redone. Item should be mappable from the item =
itself. At the moment you must start a search (author only) to get the item=
s.
This would among other things include a "navigationable select collection"=
functionality, which would be usefull for other stuff like starting a subm=
ission from My DSpace.
Define different workflow types like
Create plug-in interface (a la PluginManager; maybe using MediaFilter?) = that can convert from BitstreamFormat X to BitstreamFormat y. This could wr= ap tools like OpenOffice (Tim Donohue's work), ps2pdf, latex2html etc. etc.= Then a simple admin UI could allow administrators to initiate conversions = of particular items, collections, or the whole site.
NOTE: The first part of this is already somewhat "completed" in =
my work on the OpenOffice.org MediaFilter. To create such a custom MediaFil=
ter, I had to change the old Med=
iaFilter abstract class into a Java interface, so that you can create MediaFilters which are
Institutional repositories gather an organization's scholarly content an= d buttress knowledge sharing and dissemination of intellectual output. The = communities and collections in a repository are designed according to an in= stitution's distribution of research centers, schools and divisions. Each c= ollection that represents a school, division or research centre holds erudi= te contents facilitated by respective academic projects mentored at those c= enters. By performing Co-Word analysis on each document collection, the ind= ividual research strength of that division or school can be found out. The = same principle can be propagated to sub-collections as well as communities = in a repository. This extrapolates the research strength of a particular di= vision/school and in general can be extended to determine the profound rese= arch strength of an institution. In addition to this, a qualified variation= or trend in the research strength of individual divisions/schools can also= be found out by applying Co-Word analysis over a predetermined period of t= ime. The above described feature can be extended as an add-on to the existi= ng framework of DSpace and would facilitate the burgeoning representation o= f DSpace as a research platform.
Proposed by: Jayan C Kurian, Research staff, National University of Sing= apore, Singapore. email: Jayan@comp.nus.edu.sg, jayanntu@gmail.com
Potential Student: Ashly Markose, Post-graduate student, National Univer= sity of Singapore, Singapore email : xxxxx@nus.edu.sg
Individual content submission strategies need to be highly supportive fo= r facilitating potential digital resource submissions to repositories. One = of the factors affecting content submission is authenticated sign-on to rep= ository collections. This can be supported by a single sign-on authenticati= on mechanism for content submissions by authorized users. Users in a domain= directory service are identified by a fully qualified user context. A user= context of the form cn=3D Jayan ou=3DUsers, ou=3DSCI, dc=3Dstaff, dc=3Dntu= , dc=3Dedu, dc=3Dsg will specifically identify the active directory group t= o which a user belongs. Each collection (e.g. Schools/Divisions) in a DSpac= e repository can be assigned with an authorization group using the DSpace i= n-built functions. By extracting specific features from a user context, a u= ser can be added to a pre-defined authorization group corresponding to resp= ective collections. Thus, when an authorized user authenticates against a D= Space instance, the user would be automatically embedded with content submi= ssion privileges only to privileged collections. This also eliminates the t= ask of user selecting an appropriate collection to submit and displaying co= llections at large. In addition to this, specialized collection access priv= ileges can be given to content submitters who share common active directory= context profiles. The authentication strategy would be tested on a Windows= environment supported by Windows Active Directory Services.
Proposed by: Jayan C Kurian, Research staff, Nanyang Technological Unive= rsity, Singapore. email: Jayan@ntu.edu.sg
Potential Student: Sunil Thomas, Post-graduate student, National Univers= ity of Singapore, Singapore email : thomas.sunil@nus.edu.sg
Recent years have witnessed the tremendous usage of repository software = since majority of scholarly content are published in digital form with no e= xception to the proliferation of DSpace instances. Popular software does ma= nage user queries through mailing list supported by dedicated committers an= d contributors. A good number of questions asked in a mailing list would ha= ve been responded previously. In this case, a Question Answering (QA) syste= m would help users by answering their questions, if it has been responded e= arlier or would suggest related answers encompassing the subject asked. For= this, information available on the DSpace mailing list knowledge base can = be extracted using template based extraction techniques or a rule based sys= tem. Once extracted, this can be classified according to a taxonomical stru= cture (i.e. Functional Overview, Installation, Upgrading, Configuration, Cu= stomization, Architecture, and Versions) that represents the DSpace system = documentation. The keywords automatically generated from the message text i= mprove the adaptive retrieval of relevant information in this QA system. A = test-bed for this QA system can be build using the DSpace platform and the = taxonomical structure can be facilitated by the in-built controlled vocabul= ary feature.
Proposed by: Jayan C Kurian, Research staff, National University of Sing= apore, Singapore. email: Jayan@comp.nus.edu.sg, jayanntu@gmail.com
Potential Student: Sunil Thomas, Post-graduate student, National Univers= ity of Singapore, Singapore email : thomas.sunil@nus.edu.sg
Efficient content acquisition strategies make it easier to import schola= rly information into repositories. DSpace supports batch content acquisitio= n through the ItemImport procedure. This procedure requires digital resourc= es to be represented in a Submission Information Package (SIP). The lead ti= me required for preparing this format can be facilitated by encoding docume= nt metadata and digital resource location in a spreadsheet. This has been i= mplemented at The Nanyang Technological University (Singapore), The Institu= te of Scientific and Technical Information of CNRS (INIST-CNRS, France), Th= e University of Calgary Library, National Informatics Centre (India), and T= he Lanzhou Branch of Chinese Academy of Sciences (China). Few recent reques= ts include The University of Waikato Library, The University of Sydney Libr= ary and the NITLE (U.S.A). Although the current implementation on Windows e= nvironment looks promising for the user community, there has been considera= ble request (New York University Library, Raman Research Institute Library = (India)......) to make this development compatible with the UNIX environmen= t. The proposal describes the following additions. (1) A GUI interface to f= acilitate the SIP preparation. (2) Compatibility with the UNIX environment.= (3) Utility to automatically bridge the SIP generation and ingestion into = specified collections. (4) Exploring the feasibility of template based docu= ment metadata extraction from digital resources into spreadsheet. It's anti= cipated that this add-on would facilitate content acquisition in DSpace ins= tallations.
Proposed by: Jayan C Kurian, Research staff, Nanyang Technological Unive= rsity, Singapore. email: Jayan@ntu.edu.sg
Potential Student: Sunil Thomas, Post-graduate student, National Univers= ity of Singapore, Singapore email : thomas.sunil@nus.edu.sg
Potential Student: Blooma Mohan John, Research Student, Nanyang Technolo= gical University, Singapore email : bl0002hn@ntu.edu.sg
<i>Changed AIP to SIP from the OAIS recommendations, AIP are more = of an internal abstraction of an IP, SIP and DIP (Dissemination Information= Package) are IP's that are transported between systems. --Mark Diggory 12:= 16, 17 March 2008 (EDT) </i>
JA-SIG Central Authentication Service (CAS) is an open source authentica= tion system originally created by Yale University. The single sign on authe= nticates the user to access all the applications he or she has been authori= zed to access. It eliminates future authenticaton requests when the user sw= itches applications during that particular session. It is the most popular = single sign-on solution for universities. For details about JA-SIG CAS, ple= ase visit http://www.ja-sig.org/products/cas/. JA-SIG CAS= has been deployed by (at least) 80 universities worldwide, and the user ba= se is over 1 million!
My idea is to create a CAS plug-in for DSPACE to allow single sign-on wi= th CAS. It will allow university users to use their campus-wide ID/password= to use DSPACE. This plug-in will do more than just single sign-on. It will= integrate with DSPACE's built in access control so that administrator can = set access to particular user group, for example: faculty only, staff only,= or students in Computer Science only, etc.
DSPACE and CAS are popular open source applications that both have a lar= ge education user base. The integration of the two will make DSPACE more ap= pealing.
Proposed by Minghui Yu, 4th year Computer Science student, University of=
British Columbia, Vancouver, Bc, Canada. Email: minghuiy AT interchange DO=
T ubc DOT ca.
Personal website: http://www.ugrad.cs.ubc.ca/~s3p5/
This has already been done. See http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D16012= 21&group_id=3D19984&atid=3D319984 --J= ames Rutherford
Fedora is a popular Middle Tier solution for IR that utilizes SOAP/REST = for accessing and modifying stored content. Explore options to integrating = Fedora with DSpace at the Assetstore level and with integration into the Co= mmunity, Collection, Item, Metadata areas of DSpace.