Wiki Markup |
---|
'''NOTE''': ''This is just a proposal. There is no guarantee that any of this will ever end up in the actual codebase, I just felt it was worth experimenting.'' \--\[\[User:JamesRutherford\|JR\]\]
'''Update |
<?xml version="1.0" encoding="utf-8"?>
<html>
NOTE: This is just a proposal. There is no guarantee that any of this will ever end up in the actual codebase, I just felt it was worth experimenting. --JR
...
(09-05-2007) |
...
''': ''I've made this work successfully |
...
Code Block |
---|
Collection |
s,
Code Block |
---|
Item |
s, and
Code Block |
---|
Bundle |
s. The performance improvements aren't fully implemented, but the separation is there, and in theory, that was the hard part. The
Code Block |
---|
[ for <code>Collection</code>s, <code>Item</code>s, and <code>Bundle</code>s. The performance improvements aren't fully implemented, but the separation is there, and in theory, that was the hard part. The <code>\[\[#org.dspace.core.ArchiveManager\|ArchiveManager\]\]</code> seems to be working pretty well too. |
...
'' \-- |
...
\[\[User:JamesRutherford\|JR\]\] '''Update (11-05-2007) |
...
''': ''I've implemented DAOs for |
...
Code Block |
---|
Community |
class as well. The
Code Block |
---|
[ the <code>Community</code> class as well. The <code>\[\[#org.dspace.core.ArchiveManager\|ArchiveManager\]\]</code> now supports |
...
Code Block |
---|
Item |
s,
Code Block |
---|
Collection |
, and
Code Block |
---|
Communities |
between containers._ --JR
...
moving <code>Item</code>s, <code>Collection</code>, and <code>Communities</code> between containers.'' \--\[\[User:JamesRutherford\|JR\]\] '''Update (23-05-2007)''': ''I've totally reimplemented persistent identifiers in DSpace as well (see \[\[PersistentIdentifiers\]\]). As well as removing the Handle System dependency, they also use DAOs.'' \-- |
...
\[\[User:JamesRutherford\|JR\]\] '''Update (20-06-2007) |
...
Code Block |
---|
Bitstream |
s as well. The two major classes that still need doing are
Code Block |
---|
EPerson |
and
Code Block |
---|
Group |
; once they're done, there are a few others (eg:
Code Block |
---|
SupervisedItem |
,
Code Block |
---|
WorkspaceItem |
, etc) but they should be relatively simple._ --JR
Just adding a comment that Handle/Pid management could be greatly improved by such an addition as well. currently with item caching, the DSpaceObject.getHandle method can become stale and using DAO's behind the scene for the HandleManagement might be beneficial – Mark Diggory 13:56, 10 May 2007 (EDT)
...
''': ''After a bit of a hiatus while I \[\[PersistentIdentifiers\|fixed persistent identifiers\]\] I've come back to DAOs, and I've now (mostly) got them in place for <code>Bitstream</code>s as well. The two major classes that still need doing are <code>EPerson</code> and <code>Group</code>; once they're done, there are a few others (eg: <code>SupervisedItem</code>, <code>WorkspaceItem</code>, etc) but they should be relatively simple.'' \--\[\[User:JamesRutherford\|JR\]\] ''Just adding a comment that Handle/Pid management could be greatly improved by such an addition as well. currently with item caching, the DSpaceObject.getHandle method can become stale and using DAO's behind the scene for the HandleManagement might be beneficial'' \-\- \[\[User:MarkDiggory\|Mark Diggory\]\] 13:56, 10 May 2007 (EDT) '''Update (14-08-2007) |
...
''': ''Everything (apart from the code |
...
Code Block |
---|
org in <code>org.dspace.checkerchecker</code>) has been pushed through the DAO |
...
layer. Non-DAO classes no |
...
Code Block |
---|
import |
the
Code Block |
---|
DatabaseManager |
or
Code Block |
---|
throw SQLException |
s. There are interfaces for CRUD and link operations in
Code Block |
---|
org longer <code>import</code> the <code>DatabaseManager</code> or <code>throw SQLException</code>s. There are interfaces for CRUD and link operations in <code>org.dspace.storage.dao |
that I intend to write some tests to for throwing at all the implementing DAOs._ --JR
It has often struck me that DSpace would benefit from the use of Data Access Objects (DAO). If nothing else, it would make porting to alternative database platforms far easier; all we would need to do is provide alternative implementations for the DAO interfaces that worked for a given database. To this end, I have broken up some of the core classes in
Code Block |
---|
org.dspace.content |
to use DAOs.
As part of the same effort, I have done some work on making the
Code Block |
---|
Context |
less data-layer dependent (by having it hold a
Code Block |
---|
[dao</code> that I intend to write some tests to for throwing at all the implementing DAOs.'' \--\[\[User:JamesRutherford\|JR\]\] \---\- It has often struck me that DSpace would benefit from the use of \[http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html Data Access Objects\] (DAO). If nothing else, it would make porting to alternative database platforms far easier; all we would need to do is provide alternative implementations for the DAO interfaces that worked for a given database. To this end, I have broken up some of the core classes in <code>org.dspace.content</code> to use DAOs. As part of the same effort, I have done some work on making the <code>Context</code> less data-layer dependent (by having it hold a <code>\[\[#org.dspace.storage.dao.GlobalDAO\|org.dspace.storage.dao.GlobalDAO\]\]</code> rather than |
...
Code Block |
---|
java a <code>java.sql.ConnectionConnection</code>, etc). I've also introduced a \[\[#org.dspace.content.proxy.ItemProxy\|proxy\]\] for |
...
Code Block |
---|
Item |
that is a bit smarter about when it retrieves content from the data layer, and an
Code Block |
---|
[ the <code>Item</code> that is a bit smarter about when it retrieves content from the data layer, and an <code>\[\[#org.dspace.core.ArchiveManager\|ArchiveManager\]\]</code> class that takes care of some core "archive operations" (so that other core classes don't need to). |
...
The process to integration (if at all) would go as follows: |
...
\* Incorporate the new DAO classes into the codebase |
...
\* Refactor <code>org.dspace.content. |
...
Item</code> (etc) to use the DAO implementations of the data access methods internally |
...
\* Mark relevant methods |
...
in <code>org.dspace.content. |
...
Code Block |
---|
@Deprecated |
...
Item</code> as <code>@Deprecated</code> \* Using the compile-time deprecation warnings as a guide, refactor the rest of the code to use the DAOs explicitly rather than hiding the functionality behind existing methods |
...
Without further ado, here is how I have |
...
Code Block |
---|
org refactored <code>org.dspace.content.Item |
...
Item</code> to use DAOs. A few important things to note: |
...
\* "old" code has been used where possible to avoid re-implementing the wheel |
...
\* I've never |
...
liked <code>org.dspace.content. |
...
ItemIterator</code> so I've switched to using a "real" |
Code Block |
---|
Iterator |
...
<code>Iterator</code> from a <code>List<Item> |
...
</code> For examples of both of these principles, see the implementation |
...
Code Block |
---|
getItems() |
...
Code Block |
---|
Item.findAll() |
...
of <code>getItems()</code> \[\[#org.dspace.content.dao.ItemDAOPostgres\|below\]\]. It is a fairly straightforward wrapper for the current <code>Item.findAll()</code>, except that it returns a <code>List<Item> |
...
</code> rather than |
...
Code Block |
---|
ItemIterator |
...
Code Block |
---|
org.dspace.content |
The
Code Block |
---|
Item |
class will be broken up into the following classes:
...
an <code>ItemIterator</code>. == <code>org.dspace.content</code> == The <code>Item</code> class will be broken up into the following classes: \* <code>\[\[#org.dspace.content.Item\|org.dspace.content.Item\]\] |
...
</code>: core class that doesn't go near the database (it doesn't even know about the DAOs); behaves much like the current implementation. |
...
\* <code>\[\[#org.dspace.content.dao.ItemDAO\|org.dspace.content.dao.ItemDAO\]\] |
...
</code>: interface defining DAO |
...
API \* <code>\[\[#org.dspace.content.dao.ItemDAOFactory\|org.dspace.content.dao.ItemDAOFactory\]\] |
...
</code>: factory for dishing out implementations of the above |
...
interface \* <code>\[\[#org.dspace.content.dao.postgres.ItemDAOPostgres\|org.dspace.content.dao.postgres.ItemDAOPostgres\]\] |
...
</code>: default implementation of the above interface for use with |
...
PostgreSQL \* <code>\[\[#org.dspace.content.proxy.ItemProxy\|org.dspace.content.proxy.ItemProxy\]\] |
...
</code>: subclass |
...
Code Block |
---|
Item |
...
Code Block |
---|
Item |
...
The following classes have also been introduced:
...
of <code>Item</code> that needs to know about the DAO. It will be used for (eg) only loading metadata on demand, to reduce the memory footprint of <code>Item</code>s etc. The following classes have also been introduced: \* <code>\[\[#org.dspace.core.ArchiveManager\|org.dspace.core.ArchiveManager\] |
...
\]</code> \* <code>\[\[#org.dspace.storage.dao.GlobalDAO\|org.dspace.storage.dao.GlobalDAO\] |
...
\]</code> \* <code>\[\[#org.dspace.storage.dao.GlobalDAOFactory\|org.dspace.storage.dao.GlobalDAOFactory\] |
...
\]</code> \* <code>\[\[#org.dspace.storage.dao.postgres.GlobalDAOPostgres\|org.dspace.storage.dao.GlobalDAOPostgres\]\] |
...
</code> Note that it might be preferable to have a more generic implementation of |
...
Code Block |
---|
ItemDAO |
interface that supports both PostgreSQL and Oracle, but given that one motivation for adopting DAOs is to remove db-specificities from the code making it easier to port, I thought it was sensible to start with just PostgreSQL. Eventually, it ought to be possible to drop in
Code Block |
---|
ItemDAOHibernate |
(etc) implementations that make db portability far easier.
Code Block |
---|
org.dspace.content.Item |
Basic implementation of the
Code Block |
---|
Item |
object. This class has been stripped down to remove all contact with the database, including (but not limited to) contstructors, factory methods,
Code Block |
---|
update() |
,
Code Block |
---|
delete() |
,
Code Block |
---|
find() |
, etc. I haven't decided exactly how the
Code Block |
---|
Item |
API will look, but it will probably be much the same as before, only with any of the aforementioned methods. Another key difference is that it will have actual Java objects as member variables instead of pulling everything out of a
Code Block |
---|
TableRow |
.
Code Block |
---|
org.dspace.content.proxy.ItemProxy |
This will be a fairly simple proxy implementation. Specifically, it will be closest to being a virtual proxy, in that it will appear to be a regular
Code Block |
---|
Item |
object, but will have a slightly smarter implementation (not loading metadata until requested, keeping track of what has changed to make updates more efficient etc).
Panel |
---|
public class ItemProxy extends Item |
Code Block |
---|
org.dspace.content.dao.ItemDAO |
This isn't final, but it's a good start.
Panel |
---|
public interface ItemDAO extends ContentDAO |
Code Block |
---|
org.dspace.content.dao.ItemDAOFactory |
Panel |
---|
public class ItemDAOFactory |
Code Block |
---|
org.dspace.content.dao.postgres.ItemDAOPostgres |
This is a fairly straightforward implementation of the above interface. As much as possible, code from the original
Code Block |
---|
Item |
class will be used. For instance, this is how
Code Block |
---|
getItems() |
is implemented:
Panel |
---|
public List<Item> getItems() |
Panel |
---|
List<Item> items = new ArrayList<Item>(); |
Panel |
---|
for (TableRow row : tri.toList()) |
Panel |
---|
return items; |
Some changes have been made to eliminate
Code Block |
---|
ItemIterator |
s, and to generally make things a little more consistent with the rest of the code (this looks almost identical to, eg,
Code Block |
---|
CollectionDAO.getCollections() |
).
Code Block |
---|
org.dspace.core |
Code Block |
---|
org.dspace.core.ArchiveManager |
The idea behind this class came from the realisation that
Code Block |
---|
Item.withdraw() |
and
Code Block |
---|
Item.reinstate() |
don't really make sense. What I'd much rather do is call (eg)
Code Block |
---|
ArchiveManager.withdrawItem(Item item) |
.
I've been thinking that the
Code Block |
---|
ArchiveManager |
could be used for certain maintenance operations as well, such as moving
Code Block |
---|
Item |
s between
Code Block |
---|
Collection |
s, and maybe acting as a wrapper for the
Code Block |
---|
CommunityFiliator |
.
Panel |
---|
public class ArchiveManager |
Panel |
---|
public static void reinstateItem(Context context, Item item) |
Panel |
---|
public static void moveItem(Context context, |
Code Block |
---|
org.dspace.storage |
Code Block |
---|
org.dspace.storage.dao.GlobalDAO |
As suggested by Richard Jones, there probably ought to be a top-level general-purpose DAO interface that has implementations for the various storage mechanisms (
Code Block |
---|
GlobalDAOPostgres |
etc). The idea is to have this top-level object capture any implementation-specific details in a single top-level object, rather than in every Postgres DAO implementation. For example, with the current database "abstraction layer", the top-level implementation of
Code Block |
---|
GlobalDAO |
understands the
Code Block |
---|
Context |
object, whereas a Hibernate implementation would know what a
Code Block |
---|
SessionFactory |
is.
Panel |
---|
public interface GlobalDAO |
Code Block |
---|
org.dspace.storage.dao.GlobalDAOFactory |
Super-simple
Code Block |
---|
GlobalDAO |
factory.
Code Block |
---|
org.dspace.storage.dao.GlobalDAOPostgres |
Implementation of the
Code Block |
---|
GlobalDAO |
interface for PostgreSQL.
Panel |
---|
public class GlobalDAOPostgres implements GlobalDAO |
Panel |
---|
// ... |
Panel |
---|
public void startTransaction() |
Panel |
---|
// ... |
...
the <code>ItemDAO</code> interface that supports both PostgreSQL and Oracle, but given that one motivation for adopting DAOs is to remove db-specificities from the code making it easier to port, I thought it was sensible to start with just PostgreSQL. Eventually, it ought to be possible to drop in <code>ItemDAOHibernate</code> (etc) implementations that make db portability ''far'' easier.
=== <code>org.dspace.content.Item</code> ===
Basic implementation of the <code>Item</code> object. This class has been stripped down to remove all contact with the database, including (but not limited to) contstructors, factory methods, <code>update()</code>, <code>delete()</code>, <code>find()</code>, etc. I haven't decided exactly how the <code>Item</code> API will look, but it will probably be much the same as before, only with any of the aforementioned methods. Another key difference is that it will have actual Java objects as member variables instead of pulling everything out of a <code>TableRow</code>.
=== <code>org.dspace.content.proxy.ItemProxy</code> ===
This will be a fairly simple \[http://en.wikipedia.org/wiki/Proxy_pattern proxy\] implementation. Specifically, it will be closest to being a ''virtual proxy'', in that it will appear to be a regular <code>Item</code> object, but will have a slightly smarter implementation (not loading metadata until requested, keeping track of what has changed to make updates more efficient etc).
public class ItemProxy extends Item
{
// Overrides relevant methods of Item.
}
=== <code>org.dspace.content.dao.ItemDAO</code> ===
This isn't final, but it's a good start.
public interface ItemDAO extends ContentDAO
implements CRUD<Item>, Link<Item, Bundle>
{
public Item create(); throws AuthorizeException
public Item retrieve(int id);
public Item retrieve(UUID uuid);
public void update(Item item); throws AuthorizeException
public void delete(int id); throws AuthorizeException
public List<Item> getItems();
public List<Item> getItemsBySubmitter(EPerson eperson);
public List<Item> getItemsByCollection(Collection collection);
public List<Item> getParentItems(Bundle bundle);
}
=== <code>org.dspace.content.dao.ItemDAOFactory</code> ===
public class ItemDAOFactory
{
public static ItemDAO getInstance(Context context)
{
// Eventually, the implementation that is returned will be
// defined in the configuration.
return new ItemDAOPostgres(context);
}
}
=== <code>org.dspace.content.dao.postgres.ItemDAOPostgres</code> ===
This is a fairly straightforward implementation of the above interface. As much as possible, code from the original <code>Item</code> class will be used. For instance, this is how <code>getItems()</code> is implemented:
public List<Item> getItems()
{
try
{
TableRowIterator tri = DatabaseManager.queryTable(context, "item",
"SELECT item_id FROM item WHERE in_archive = '1'");
List<Item> items = new ArrayList<Item>();
for (TableRow row : tri.toList())
{
int id = row.getIntColumn("item_id");
items.add(retrieve(id));
}
return items;
}
catch (SQLException sqle)
{
// Need to think more carefully about how we deal with SQLExceptions
throw new RuntimeException(sqle);
}
}
Some changes have been made to eliminate <code>ItemIterator</code>s, and to generally make things a little more consistent with the rest of the code (this looks almost identical to, eg, <code>CollectionDAO.getCollections()</code>).
== <code>org.dspace.core</code> ==
=== <code>org.dspace.core.ArchiveManager</code> ===
The idea behind this class came from the realisation that <code>Item.withdraw()</code> and <code>Item.reinstate()</code> don't really make sense. What I'd much rather do is call (eg) <code>ArchiveManager.withdrawItem(Item item)</code>.
I've been thinking that the <code>ArchiveManager</code> could be used for certain maintenance operations as well, such as moving <code>Item</code>s between <code>Collection</code>s, and maybe acting as a wrapper for the <code>CommunityFiliator</code>.
public class ArchiveManager
{
public static void withdrawItem(Context context, Item item)
{
// ...
}
public static void reinstateItem(Context context, Item item)
{
// ...
}
public static void moveItem(Context context,
Item item, Collection source, Collection dest)
{
// ...
}
}
== <code>org.dspace.storage</code> ==
=== <code>org.dspace.storage.dao.GlobalDAO</code> ===
As suggested by Richard Jones, there probably ought to be a top-level general-purpose DAO interface that has implementations for the various storage mechanisms (<code>GlobalDAOPostgres</code> etc). The idea is to have this top-level object capture any implementation-specific details in a single top-level object, rather than in every Postgres DAO implementation. For example, with the current database "abstraction layer", the top-level implementation of <code>GlobalDAO</code> understands the <code>Context</code> object, whereas a Hibernate implementation would know what a <code>SessionFactory</code> is.
public interface GlobalDAO
{
// The following methods actually currently throw SQLExceptions to
// keep things simple, but in future SQLExceptions should be
// eliminated from any code that doesn't directly touch a database.
public void startTransaction() throws GlobalDAOException;
public void endTransaction() throws GlobalDAOException;
public void saveTransaction() throws GlobalDAOException;
public void abortTransaction();
public boolean transactionOpen();
@Deprecated Connection getConnection();
}
=== <code>org.dspace.storage.dao.GlobalDAOFactory</code> ===
Super-simple <code>GlobalDAO</code> factory.
=== <code>org.dspace.storage.dao.GlobalDAOPostgres</code> ===
Implementation of the <code>GlobalDAO</code> interface for PostgreSQL.
public class GlobalDAOPostgres implements GlobalDAO
{
private Connection connection;
// ...
public void startTransaction()
{
connection = DatabaseManager.getConnection();
connection.setAutoCommit(false);
}
// ...
}
\[\[Category:Refactoring\]\] |