« Content Management Podcast - Alfresco 1.3 | Main | SQL vs. Xpath vs. XQuery - A Query Language for Content Management »

2006.09.04

Comments

"The CIFS protocol is asynchronous, non-transactional and requires random access. Trying to support this protocol on top of a database BLOB would require constant fetching of an entire document from the database and continually save the entire document providing unacceptable performance."

Sorry but this sentence is wrong, it is perfectly possible to have random access to a Blob field in a database and to stream content for it without having to keep any transaction opened. We have done it in several projects and have exactly the same advantages you mention.

Blob specification in java already provides the following methods:

byte[] getBytes(long pos, int length)
int setBytes(long pos, byte[] bytes)
int setBytes(long pos, byte[] bytes, int offset, int len)

With this three methods is pretty simple to extend java.io.InputStream, java.io.OutputStream and RandomAccessFile so that you are able to access a Blob as if it is a simply filesystem file.

As for the transaction need, it is as simple as creating a buffered version that whenever it needs new data simply opens a new transaction, reads a chunk of data into the buffer and closes the transaction, all hidden by the stream implementation. In this way it is easy to stream files in and out a database blob without having to keep a resource as expensive as a transaction.

G'day John,

I understand that clustering Alfresco involves clustering just the application (JVM) and then configuring all nodes to share a single filesystem, thereby requiring that that filesystem be hosted on a SAN / NFS / SMB share.

Does this place any practical limits on how far Alfresco can be clustered before file I/O over the network becomes a bottleneck?

Cheers,
Peter

The comments to this entry are closed.