« January 2007 | Main | March 2007 »

February 2007

How You Read My Blog

Feed_logo_1I recently wrote a blog asking how you manage the blogs that you read. Thanks to everyone who contributed advice, which was very helpful. As a result of that advice, I was led toward publishing my blog using the feedburner service. This allows me to push content to multiple channels and to track how my blog is being subscribed and read.

I was actually very surprised at how many people subscribe to my blog. Feedreader provides me with the type of blog subscription tools people are using and from what country they are reading it. But don't worry, it doesn't identify who is reading the blog. If you are interested in subscribing, you can subscribe at this subscription link.

The primary tools people use to access Content Log are Google Reader or Homepage (28%), Bloglines (17%) and a bit of NewsGator (8%). I thought most people were accessing it mainly through Google search, but many of the links in Typepad do not show the source of link.

Here is the list of all the different types of feed readers that people are using and feedburner's description of those feed readers:

  • Akregator  -- Akregator is an open source aggregator that you can download at Sourceforge
  • Apple CFNetwork Generic Client  -- Most of the time, this the Apple OS X screensaver, which acts as a desktop feed reader. This user-agent can also represent other independent software that used Apple frameworks to build a feed reader.
  • Attensa for Outlook  -- Attensa for Outlook is an RSS reader that brings up-to-the-minute news and content from Websites, blogs and Podcast sites directly into Microsoft Outlook.
  • BlogBridge  -- BlogBridge is a desktop feed reader for Windows, Linux and Mac. BlogBridge provides a full suite of feed reading tools including feed search, discovery and categorizing capabilities that are unique to BlogBridge
  • Bloglines  -- Bloglines is a web-based aggregator that makes it easy to keep up with your favorite blogs and newsfeeds. With Bloglines, you can subscribe to the RSS feeds of your favorite blogs, and Bloglines will monitor updates to those sites.
  • FeedDemon -- FeedDemon enables you to quickly read and gather information from hundreds of web sites — without having to visit them. Don't waste any more time checking your favorite web sites for updates. Instead, use FeedDemon and make them come to you. Written by Nick Bradbury, creator of TopStyle and HomeSite, FeedDemon makes RSS/Atom feeds as easy to access as your email.
  • FeedReader  -- FeedReader is a freeware Windows application that reads and displays RSS feeds. It supports the RSS formats 0.9, 0.91, 1.0 and various extensions such as Dublin Core. FeedReader does not support Atom.
  • Firefox Live Bookmarks -- "Live Bookmarks is a new technology in Firefox that lets you view RSS news and blog headlines in the bookmarks toolbar or bookmarks menu.
  • Because of how the browser operates, version 1 of Firefox might be overstating the number of Live Bookmarks subscribers and some casual visitors may be counted as subscribers. This is fixed in version 2 and will show up as a separate entry called ""Firefox Live Bookmarks""."
  • Firefox Live Bookmarks  -- Live Bookmarks lets you view RSS news and blog headlines in the bookmarks toolbar or bookmarks menu. With one glance, quickly see the latest headlines from your favorite sites. Go directly to the articles that interest you - saving you time.
  • Google Desktop  -- Google Desktop doesn't just help you search your computer; it also helps you gather new information from the web with Sidebar, a new desktop feature that shows you your new email, weather and stock information, personalized news and RSS/Atom feeds, and more.
  • Google Feedfetcher  -- Feedfetcher is how Google grabs RSS or Atom feeds when users subscribe to them in Google Reader or the Google Personalized Homepage. Subscriber counts include Google Reader and the Google Personalized Homepage. Feedfetcher collects and periodically refreshes these user-initiated feeds, but does not index them in Blog Search or Google's other search services.
  • GreatNews  -- GreatNews is a downloadable RSS desktop client for Windows. GreatNews is optimized for "full page reading" so that you can scan through a number of articles quickly without having to navigate between feed items.
  • Gregarius -- An open source feedreader for inclusion in web sites
  • intraVnews  -- Keep up with hundreds of websites and have RSS and Atom news items delivered into your Outlook folders, as easy as email! With the powerful interface of intraVnews you get total control over your feed subscriptions on the Internet and your intranet.
  • Java-based feed reader  -- There are a number of feed readers, some quite sophisticated, some hacked together, that only identify themselves as having been built in Java and provide no further information about their identity. Thus, this particular "category" of feed reader is a catch all for one or more readers that are only partially identifying themselves when accessing your feed. If you know people are accessing your feed through a client that you don't otherwise see in this list of clients and aggregators, it's quite likely that the client is either sending no identifier or using this catch-all java identifier.
  • Liferea  -- Liferea is an open source feed reader for Linux
  • Mozilla/5.0 (compatible; Yoono; http://www.yoono.com/) --
  • My Yahoo  -- A web-based newsreader that allows you to select and manage RSS headlines within a My Yahoo! account.
  • NetNewsWire  -- NetNewsWire is an easy-to-use RSS Web news reader for Mac OS X. Its familiar three-paned interface - similar to Apple Mail and Outlook Express - can fetch and display news from thousands of different websites and weblogs, making it quick and easy to keep up with the latest news.
  • Netvibes  -- Netvibes allows you to create a personalized home page incorporating different kinds of weather, sports, news, product content and RSS/Atom feeds
  • NewsGator Online  -- NewsGator online is a free web-based aggregator with a clean user interface that makes it easy to arrange and track your favorite news feeds. Newsgator supports all the major feed formats and has a number of additional features for ranking feeds, getting recommendations, and permanently saving specific feed items.
  • NewsGator Outlook Edition  -- NewsGator is one of the most popular feed readers. It runs within Microsoft Outlook. The latest versions of NewsGator provide the ability to read feeds on multiple machines running Outlook with full synchronization.
  • Omea Reader  -- Omea Reader is an easy to use, all-in-one RSS/ATOM feed reader, newsgroup reader, and web bookmark manager. But what really makes it unique is the level of information organization and management features including lightning-fast searches, flexible filing, contextual access, and extensibility.
  • Onfolio -- Onfolio 2.0 includes a number of new features including a feed reader, Firefox integration, blogging support, shared collections, improved capture, Outlook integration, EndNote integration, folder publishing, and even more.
  • Outlook 2007 -- "Using Microsoft Office Outlook 2007 to subscribe to an RSS Feed is quick and easy and does not involve a registration process or fee. After you subscribe to an RSS Feed, headlines will appear in your RSS folders. RSS items appear similar to mail messages. This number may be inflated, since some requests from Internet Explorer 7 also identify themselves in this manner."
  • Rojo  -- Rojo (pronounced like Mojo with an R) is a web-based service dedicated to helping Internet users efficiently manage online content and information flow.
  • RssReader  -- RssReader is a free news reader for RSS and Atom feeds. Features include support for authenticated feeds and OPML import/export.
  • Safari RSS (OS X Tiger)  -- Safari RSS is the built-in news aggregator found in Apple's Safari web browser, starting with the version found in Mac OS X "Tiger." More information can be found on the Apple Safari RSS page.
  • SharpReader  -- SharpReader is a popular feed reader for Windows that handles all RSS versions and Atom. SharpReader has a number of powerful features like support for HTTP Authentication, making it possible to subscribe to authenticated feeds, drag-and-drop feed subscription, and keyboard navigation between content items.
  • Thunderbird  -- Thunderbird is Mozilla's next generation e-mail client. Thunderbird makes emailing safer, faster, and easier than ever before with the industry's best implementations of features such as intelligent spam filters, a built-in spell checker, extension support, and much more.
  • Topic Blogs  -- Topic Blogs is under construction as of Autumn 2005
  • Vienna  -- Vienna is a freeware, open source RSS/Atom newsreader for the Mac OS X operating system
  • Windows RSS Platform  -- This represents a subscription from the Microsoft Windows RSS platform. The actual client can vary, but your feed is being regularly checked and read.
  • WordPress  -- Wordpress bot

Technology Matters at Alfresco

Technology_matters_1

We have tended to downplay the technical innovation in marketing Alfresco and emphasize cost, ease of use and the benefits of open source. In an era of consolidation and commoditization, the marketing of technology doesn’t matter as much as ease, convenience and performance. However, we were asked by a major trade journal the following question. I thought I would elaborate the answer and post it to the blog. It reminds us of how far we have come and what a difference a new architecture and a clean slate can make.

How did the technology you used contribute Alfresco and why was it important?

The Enterprise Content Management industry has not innovated in the last several years as the major vendors try to integrate acquired technologies and repositories into their respective stacks. Alfresco uses technology innovation to meet the full functionality of ECM, adapt faster to new standards and customer requirements and is easier to use. With the benefit of 15 years hindsight from the co-founder of Documentum, Alfresco has a clear vision of a full ECM suite with document, records, image and web content management and has built the system using production-ready open source tools in the span of only two years.

Alfresco was built using the Spring open source application development framework and Aspect Oriented Programming, developed at Xerox PARC to address reuse of code with that “plugs-in” into systems and objects. Using the Spring framework and AOP, Alfresco provides simple hook points to add new functions, features and rules into the repository when applications perform actions such as access a document, save content or move or update information. Using these modular AOP plug points, it is easy to add services like authentication, permissions, transformation, versioning, or retention control. This makes the system much easier to extend without having to rebuild the system and also future-proofs the architecture. It also makes for a much faster repository because components or metadata that are not necessary for an application can just be unplugged.

Alfresco has evolved very rapidly by plugging in dozens of other full function, product-ready open source projects using the Spring Framework and not reinvent the wheel. The Hibernate Object-Relational mapping system enables Alfresco to create a model-driven architecture with configuration rather than programming and hides the complexities of the underlying database. The Lucene full-text search engine indexes content and metadata that potentially scales much larger than pure database solutions. The jBPM full-featured business process engine provides simple, modular business processes. The Java-based, Rhino JavaScript and FreeMarker templating scripting engines provide lightweight programming and user interface extensions that are robust and scalable.

The Alfresco repository makes ECM easier and gets users off uncontrolled shared file drives by emulating shared drives and controlling content with rules. The CIFS (Common Internet File System), used by Microsoft shared file drives allows all Windows-based applications to access the repository directly, displays additional metadata and thumbnails in Windows Explorer, provides drag and drop from other windows, and allows users to synchronize their content offline. Below CIFS, user-definable rules and scripts process new content, extract metadata, classify content, move content, apply a workflow or retention policy, or render the content in web ready formats. Users can then search the content based upon Google-like searches or constrained by metadata and can aggregate multiple repositories using the OpenSearch protocol.

By conforming to standards, Alfresco ensures that applications built against the system can be migrated and are future-proofed. As a 100% Java system portable to dozens of different systems, Alfresco is often one of the first ECM systems to implement standards such as CIFS, OpenSearch, Web Services and JSR-170 Java Content Repository standard interface. Alfresco exposes its user interface as a standard JSR-168 portlets and through the ubiquitous Tomcat application server. Alfresco has integrated the standard JavaScript language for server side scripting and is extending new Web 2.0 types of protocols such as REST-style interfaces based upon JavaScript, OpenSearch, RSS, and ATOM. Alfresco is now adding a standard SQL interface and the OpenID single sign-on protocol.

The Alfresco system would not be as full-featured nor as adaptable to requirements without a strong technology base and the commitment to use open source components.

Ian Howells: With Open Source you can't be half-pregnant

Pregnant

My colleague, Ian Howells Chief Marketing Officer at Alfresco, has posted a blog on how Geoffrey Moore's marketing models describe what is happening with the commoditization of the ECM space and how the network effects of open source are accelerating the pace of commoditization of enterprise software. He quotes Geoff:

"Enabling technologies commoditize extremely well, allowing them to proliferate into markets far afield from the original starting points and generate a high degree of network effects. These in turn put pressure on the overall marketplace to standardize exclusively on a single set of components driving market shares to extraordinary levels …”

Ian sites Alfresco move to GPL as embracing this network effect and that companies should embrace open source whole heartedly. Ian says:

"You can’t be 'half-pregnant' and similarly you can’t be half open source and half proprietary. It is open source, not hybrid models that will drive true disruption, commoditization and benefit most from the network effect. The GNU General Public License (GPL) is the ideal license to drive forward this industry disruption and accelerate the network effect. That is what drove us to move to GPL."

British Space Shuttle - The Whole Story

Follow up to my previous post on the British Space Shuttle on Top Gear. People have made the comment that you would never see this on American television.

Part I - http://www.youtube.com/watch?v=ks9-8XOtfC4

Part II - http://www.youtube.com/watch?v=ni1Ek_TlTBE

An American Entrepreneur in Europe

Entrepreneur_1

I was asked some questions about being an American entrepreneur in Europe by a major international business magazine. I wrote down my answers and I thought I would post them on my blog. This is related to my blog on Global Competition Workshop at Davos.

How long have you been based in Europe?

I have been in the UK since 1995 this time. I originally moved to the UK in 1987 to set up Ingres’ European Technical Center, but went back to California to set up a company in 1989. I didn’t believe you could start a company in those days in the UK. This was a hard decision for me, because I enjoyed living and working in Europe. I site visit to one of the capitals of Europe was much more interesting and educational than the site visits I was making in the mid-West of the US.

I then started Documentum in 1990 with Howard Shao. Once Documentum was up and going, I returned to the UK, but still had a large portion of the organization reporting to me, so I ended up going back to the US once a month until 2001. At that point, Documentum had become the leader in Enterprise Content Management, the stock prices was at an all time high, so I felt it was a good time to leave.

When you founded Alfresco in 2005, why did you decide to set up shop in Europe instead of the US or somewhere else?

We set up in the UK because it is hard. The UK is our home and we believe that all it needs is role models of success. All the elements are here in the UK to create great technology companies. Just like Silicon Valley, there are great ideas in universities like Oxford, Cambridge and Imperial College (Britain’s MIT). There are mid-level managers working in European headquarters of American companies that aspire to greater things, have been trained to a world class level by Americans, and have been able to work independently due to the distance from the headquarters in the US. Venture capital is freely available, especially since the substantial changes in capital gains tax in the mid-1990s. All that remains are the success stories and examples to show that it can be done.

 

Kitchener

Also John Powell and I found it easier to start a company here than to continue to commute to the bay area in order to have the influence that one gets from being at headquarters.

Have you recruited other Americans to work for Alfresco?

Yes. Although I am the only American in the European organization, about a quarter of the organization is in the US in sales and marketing roles. This is a reversal of roles, but we are more sympathetic to role that remote people play and try to find other channels for communication and engagement. We are big users of Skype and use it to freely communicate with each other wherever we are. The nature of open source is that it can happen anywhere and tends not to have a geographic center. Since we allow people to work at home often, there is less of a barrier between people in the US, people at home or people in the UK office.

Is entrepreneurism on the rise in Europe? If so, why?

It’s funny that George Bush claimed that the “French don’t have a word for entrepreneur.” I don’t think he was trying to be ironic, but to a certain extent he is right. I know a lot of people who are dying to start a company in France, but real changes have to happen in employment laws and work rules before they can become successful. Business Objects is the only success story that I can think of and they had to move to Silicon Valley to really succeed. France is burdened with a high level of bureaucracy and the level of bureaucracy has been increasing here in the UK, but that hasn’t stopped ambition. I wouldn’t confuse the politics and industrial sclerosis that infects much of Europe with what is happening at the ground level of entrepreneurs and in areas that are more loosely regulated.

It is clear to everyone that Europe is falling behind in a lot of areas and people are particularly concerned about the emerging threat from Asia than even American entrepreneurism. They know that it requires something different and that start-ups are a powerful force of competition. With the laser like focus that can run rings around conglomerates, they know that it is a way to make money for them as well as compete. Also, there is no shortage of creativity in Europe. European universities and labs can create as many new ideas as those in America. The trick is productizing and marketing them.

The force of entrepreneurialism is strongest in countries that have strong university traditions connected to industrial development. Essentially these are the areas that created the entrepreneurialism that created the Industrial Revolution. There are lots of startups in new devices, telephony and new Web 2.0 properties in the UK, Germany and Scandinavia. The former Soviet bloc has been bursting with activity and is now starting to take a chunk of business that would be going to India. With strong mathematical and computer heritages in places like Hungary, Poland, the Czech Republic, Belarus (where they used to copy American computers), and Russia, there is potentially a real factory of software that can be coming from the Eastern European countries.

However, some of the most powerful concepts to be turned into companies are around open source and Web 2.0 internet properties. That is because the internet has become such a leveling force in the delivery of information-based services. When starting Alfresco, we wanted to take what was a disadvantage, being in Europe, and turn it into an advantage. Much of the projects that have been successful in open source have come from Europe, such as MySQL and Linux. Skype came out of Europe as well. These types of companies don’t require lots of people, so they are not as affected by labor regulation, and the location of their services is completely irrelevant, even more so in a totally global market.

Do you know of other European-based startups that have Americans in top management slots?

There is a model discussed in the European venture capital community called the Israeli model. That is to take the ideas generated from the labs and universities and to set up sales and marketing operations in the US and keep development and operations in Europe. Amdocs and Check Point are examples of Israeli companies to do this. European companies that are following this model are hiring Americans in America. Other than that, most European companies are on their own relying on their own smarts and the training that have generally received from American companies.

This is not a whole lot different than the early days of Silicon Valley where a lot of the management was hired from the middle ranks of IBM or other technology firms east of the Mississippi. It took time, but the culture was self-developing after a decade or so.

Why is Alfresco Moving to the GPL?

Silhouettejump_1 Alfresco has now gone to the GPL model and it seems to be making news. We have added the FLOSS (Free/Libre Open Source Software) exception that allows any software using an OSI-approved license to embed Alfresco. We are really excited about the opportunity that this creates.

About a year and half ago, I wrote a blog about how Alfresco is an Open Source Laboratory. In it I said that there is no formula to building an open source company. It's still true. We have looked to companies like JBoss (see Roy Russo's blog on the subject), MySQL, SugarCRM and RedHat who have paved the way for us, but who are still experimenting with their models. Look at MySQL's all you can eat enterprise model. Or RedHat re-inventing packaging and solutions in the face of Oracle selling support for their product.

Although we all (professional open source companies) are still fine tuning the model, but we know it works and we are all just optimizing the model. However, there is a constant tension of rapidly expanding the pie and trying to  monetize that expansion and get our fair share. As we were building our brand, we sought refuge in the models with which others had experimented. Over time, it has become clear that growth is the most important factor in development of an open source company. We have moved beyond enterprise extensions and now we have moved beyond attribution. Are these really necessary to build an open source company? Well, we tried it and we don't believe it is. We believe that leadership is a much stronger protection than any license restrictions.

We made no pretensions to being experienced open source guys when we started Alfresco, but we knew that we were experience enterprise software guys. We saw a movement that was having a tremendous impact on enterprise software and the internet as a whole and thought that we could apply those lessons to the areas that we know best, content management. We were open to trying new things and we were willing to admit that we would have to change as things moved along.

Two years later, we have created a strong brand in Alfresco. We have confidence in the product that we are building. It was time to look at what made sense for our license. Matt Asay has always been a big advocate of the GPL license and has been a proponent since joining us in 2005. I was nervous about our use of the attribution clause originally, but I was also nervous of the GPL. Matt has been on the OSI board where there has been a lot of discussion around attribution and the various licenses that use it. What has become clear in recent months is that GPL systems can incorporate non-GPL/LGPL components, specifically Apache and BSD, and the FLOSS exception had eliminated any concerns that I had about the GPL license being used by others.

About 74% of all the projects on SourceForge are GPL. Matt anticipates that a lot of open source applications will ultimately want to go this way, but going to GPL is new in the professional open source applications area. Companies like MySQL and Java are infrastructure and their use makes sense from a dual licensing perspective. We too have a good embedding business and dual license makes sense for potential OEMs. However, the timing of Java going to GPL is coincidental since we have been looking at this for a while. The question was, what was best for Alfresco, the out of the box ECM solution. GPL with FLOSS exception is the answer.

We have made a big investment in integration with PHP as well. We look forward to integrating with a number of GPL projects that need a repository and to use GPL components as part of the Alfresco solution. We look forward to the GPL community adding and extending Alfresco with new functionality as well. The experiment looks like it is coming to a successful conclusion.

Alfresco Releases 2.0 with WCM and OpenSearch under GPL

Community_preview_2_0_alt_3

Alfresco is pleased to announce the availability of the Community Version 2.0 of its Open Source Enterprise Content Management System and JSR-170-based repository. This version is now available under the GPL license with a FLOSS exception which allows any open source software using an OSI-approved license to embed the Alfresco repository or applications without change of license or attribution. You can download here.

Version 2.0 has combined the core Alfresco repository with the new web content management system and records management system. The web content management system was designed and implemented by the original engineers from Interwoven who joined Alfresco at the beginning of 2006. The WCM system provides:

  • Simple import of existing web sites
  • Simple Xforms-based entry of XML data based upon the open source Chiba project with AJAX extensions
  • Templating of XML and HTML based upon XSLT and Freemarker
  • Virtual sandboxes for staging of web sites without copying files
  • Timeline snapshot of web sites with zero effort
  • Standard web production workflows extending the JBoss jBPM engine with group and queue support

Our new web site is now a production user of the Alfresco Web Content Management System. It certainly makes it easier for ordinary guys like John Powell and I to add new content to the web site.

Version 2.0 has added new federated search by implementing the OpenSearch protocol, which I blogged about recently and is supported by hundreds of search engines. OpenSearch is a naturally federable search protocol combining searches from multiple search engines. Alfresco acts as both an OpenSearch server participating in a federated search from a web browser or portlet and an OpenSearch client as the Alfresco web client supports searching against multiple Alfresco repositories as well as multiple internet search engines.

In addition, Alfresco Version 2.0 has further simplified the extension of the core repository without updating the Alfresco server using the new Alfresco Modular Packaging (AMP) packs. The Alfresco Records is now provided as an optional AMP pack download. AMP packs extend existing repositories and can include folder structures, content templates, JavaScript and templating scripts, web site structures or Spring-based Java plug-ins in a standard zip file.

Alfresco has also added new AJAX controls for browsing and navigating the repository to the web client. The Alfresco repository has added a new relationships and metadata to support the translation process and multiple language variants of the same content. This capability was developed for use by multi-national corporations and government agencies.

A lot of people have put a lot of work into this release. The WCM team with Kevin, Britt, Jon and Ariel have been working almost a year on the web content components of Alfresco. Derek and Andy have helped to integrate some very sophisticated virtualization capability that WCM supports into the repository. Kev and Gav have been working their magic on the web client to support not just WCM, but also the new OpenSearch interface that allows the client to search multiple repositories, wikis, blogs and the web. Will has been a great guinea pig in building our new web site. Roy has now made it much easier to plug new components into the repository with the AMP packs and it will get even easier as we introduce the AMP Exchange with an iTunes like download capability. He also did a great job in packaging the records management into an AMP as a test base and to make RM optional. And Paul has worked very late nights to make sure it all works together. Congratulations guys on a great release.

We have come a long way in two years. Each of the five releases in the last year have been worthy of a dot-zero designation, but we felt now was the time when all the pieces of ECM have come together in a single package. Wait till you see 2.1!

Developing a Content Application in Alfresco and JSR-170

By John Newton, David Caruana, and Paul Holmes-Higgin

Keyboard

(Warning: The content in this blog is technical and Java in nature. Proceed at your own risk.)

Alfresco is a complete Enterprise Content Management System and 100% open source. It is a comprehensive content management development platform and scalable repository supporting JSR-170, so is suitable for building complex enterprise-scale content applications. Access to the repository is provided through APIs, such as JSR-170 and web services, as well as through a virtual file system interface implementing the CIFS (Common Internet File System) protocol to emulate a Microsoft shared file system, FTP and WebDAV. The Alfresco system is built upon Spring taking full advantage of Spring’s dependency injection model to extend repository functionality, as well as incorporating Hibernate for persistence, Lucene for querying and indexing, jBPM for business process management, and the Mozilla Rhino JavaScript engine.

The Alfresco system has a web-based application that provides document management, web content management and records management capabilities. The web application, based upon the MyFaces implementation of JSF, is extensible, programmable and scriptable. Through Spring configuration it is possible to add new dialogs, views and wizards. The web application also provides dashboards to track repository and workflow activity. These dashboards are programmable either through Java or high-level templating languages, such as the open source FreeMarker templating engine.

As an example of building an enterprise content application, we use a scenario of building an email archiving application. For this, we use the JSR-170 interface to add content; add a new action to the repository specifically for email; use JavaScript for processing the email; then create an RSS feed using a Freemarker template, and use the templating engine to create a web view of email activity.

Defining a Metadata Aspect in Alfresco

For the purposes of our application, we will add a new metadata aspect to tag incoming emails. A metadata aspect is similar to a type, except that it can be added after the content has been created and more than one aspect can be added to the content object. It is similar to a JSR-170 mixin, except that in Alfresco it can also have behavior attached to it. We will use a predefined aspect in this example, the “cm:emailed” aspect, which includes the following metadata:

  • Originator
  • Addressee
  • Subject Line
  • Sent Date

Alfresco models are defined in XML and can be loaded dynamically. In this example, we create a new aspect called “tagged”, which is defined as follows:

<aspect name="cm:tagged">
   <title>Tagged</title>
   <properties>
       <property name="cm:tag">
           <type>d:text</type>
           <multiple>false</multiple>
       </property>
   </properties>
</aspect>

In order to view this in the Alfresco web client, we can extend the properties sheet with the following configuration:

<config evaluator="aspect-name" condition="cm:tagged">
   <property-sheet>
      <show-property name="cm:tag" />
   </property-sheet>
</config>

There are metadata aspects available as well for Dublin Core, DOD 5015.2 records management, basic Microsoft Office metadata, automatic counters, workflow process data, classification, auditing and localization among others.

Content Storage through JSR-170

Alfresco is a JSR-170 compliant repository therefore supports the JCR API. For purposes of this example, we assume that there is an email listener that stores the email as a JCR node using the JSR-170 interface. The node is placed in an “email drop zone”, a well known path for processing the content based upon rules in the repository. The drop zone is identified as a node in its own right and the email is attached as a child of that node. The actual binary content of the email is then added to this child node.

public class EmailListener
{
    ...

    private Node importEmail(InputStream msg)
    {
       // locate email dropzone folder
       Node rootNode = session.getRootNode();
       Node zone =
       rootNode.getNode("app:company_home/cm:email/cm:dropzone");

       // add email to folder (which fires registered rules)
       Node email =
          zone.addNode(GUID.generate(), "cm:content");
       email.setProperty("cm:content", msg);
       return email;
    }

    ...
}

The purpose of placing content in a drop zone is that it allows the business logic of email filing to be specified independently of the application and more importantly by business users. This is done in Alfresco through rules and associated actions.

Defining Rules to Process Incoming Content

The Alfresco repository organizes information in a hierarchical structure similar to other enterprise content management repositories. These structures are called spaces, which are similar to folders in a file system, but also contain rules for processing content that is added, removed, moved or updated in that folder. They also have users associated with that space that may have different roles in interacting with content in the space.

The rules associated with the space determine the disposition of content in the space. Rules can be used to change the type of the content being added, add aspects of metadata, attach behavior such as locking and versioning, and transform and copy to other spaces. In the email example, we will use a rule to extract metadata from the content and use that metadata to classify and move the content to a new space determined by the metadata.

To do this, we use the Alfresco web client, navigate to the email drop zone space and specify rules for content entering the space. Through a set of wizards, we add the following actions for all new items of mimetype email or “message/rfc822”:

  • Add the email aspect - this is the email metadata mentioned previously and can be combined with other aspects such as record data or process data.
  • Add the tagged aspect - this is the aspect we defined earlier.
  • Extract metadata - this is a standard capability of Alfresco that looks inside standard file formats to extract standard information such as author, title and subject. In this example, we will extend the system to extract additional standard metadata from emails.
  • Execute the “emailtag.js” JavaScript - this is a server-side JavaScript example that we will show in a later section. JavaScript is stored in the repository and can be executed just like Alfresco internal actions.

Rules and actions can be combined and chained to create more complex logic. Rules can include tests of types of content, which aspects are applied and what metadata has been set. These rules in turn fire off the actions in sequential order or can be executed asynchronously for long running operations. Common actions performed in rules are transformation, copying, moving and metadata setting and extraction.

Adding a New Behavior to the Alfresco Repository

Although the Alfresco repository already has an action to extract metadata from email, since it is a relatively new extension of the existing repository, it is worth showing how it was added. In addition, it is a good example of how Alfresco uses the dependency injection pattern of Spring to add new functionality without requiring rebuilding the repository system. In this example, the metadata extraction action has a standard Java interface defined as follows:

public interface MetadataExtractor
{
    public double getReliability(String sourceMimetype);

    public long getExtractionTime();

    public void extract(ContentReader reader, Map<QName, Serializable> destination);
}

For this example, we will add a new interface to inject into the MetadataExtractor interface that uses the open source POI Java access tool to read the proprietary Microsoft file format. We first insure that the file actually is a Microsoft Exchange message or rfc822 and then we read the fields delimited by the following hex codes:

  • 0C1F - The message originator
  • 0037 - The message subject
  • 39FE - The message addressee

The following code accesses these fields through POI and then sets the appropriate content properties on the metadata. Obviously, more complex processing or more metadata fields could be added to the code.

public class MailMetadataExtractor extends implements MetadataExtractor {
    private static final String PREFIX = "__substg1.0_";
    private MetadataExtracterRegistry registry;
    ...

    public void extract(ContentReader reader, Map<QName, Serializable> props)
    {
        POIFSReaderListener listener = new POIFSReaderListener()
        {
            public void processPOIFSReaderEvent(final POIFSReaderEvent event)
            {
                if (event.getName().startsWith(PREFIX))
                {
                    String type = event.getName();
                    type = type.substring(PREFIX_LENGTH,
                              PREFIX_LENGTH + 4);

                    if (type.equals("0C1F"))
                        props.put(PROP_ORIGINATOR, extractText());
                    else if (type.equals("0037"))
                        props.put(PROP_SUBJECT, extractText());
                    else if (type.equals("39FE"))
                        props.put(PROP_ADDRESSEE, extractText());
                    ...
                }
            }

            POIFSReader poi = new POIFSReader();
            poi.registerListener(listener);
            poi.read(reader.getContentInputStream());
        };
    }
}

To register this bean, we merely added the following Spring configuration:

<bean class="MailMetadataExtractor" init-method="register">
   <property name="registry">
      <ref bean="metadataExtractorRegistry"/>
   </property>
</bean>

Similar extensions can be added for transformations from one format to another, authentication interfaces, encryption and compression mechanisms on content transfer, and even rules and actions.

Using JavaScript to Add Repository Behavior

Previously, we mentioned the “emailtag.js” JavaScript for using the metadata to classify the emails. We could implement this in Java, but for simple tasks, it is often easier and just as efficient to implement them using JavaScript. Alfresco incorporates the Mozilla Rhino JavaScript engine. It includes all of the standard functions and classes of ECMA Script, but also has the ability to work with the Alfresco content model as well as the JBoss jBPM model for workflow applications. A special data dictionary space is provided for storing and managing scripts just as one would for any other content, allowing complete versioning, locking, auditing and CIFS access.

In this example, the “emailtag.js” JavaScript is invoked through the rule associated with email drop zone space. This script finds a tag from the subject line that has just been extracted from the previous rule action, searches for any term that is delimited by square brackets and adds that to the tagged metadata aspect. All that is required is the following four lines.

  var subject= document.properties.subjectline
  var tag= subject.substring(subject.indexOf('[')+1,subject.indexOf(']'));

  document.properties.tag = tag;
  document.save();

The script is atomic in that either the whole action occurs or it doesn’t. The script could also set up complex classifications or relationships to another content objects. Most of the Alfresco processing that can be done in Java can also be done in JavaScript, so the choice becomes one of performance and extension rather than capabilities.

Building an RSS Feed using FreeMarker

The Alfresco system also includes the FreeMarker templating engine. FreeMarker was chosen for its extensibility to other data models as well as its ability handle XML. The templating language is particularly suited to production of HTML and XML. Like other templating languages such as Velocity, Perl or PHP, directives to access and manipulate data are defined in tags interwoven with the static output to be delivered. The FreeMarker language has constructs for manipulating lists, defining reusable macros, and string and variable manipulation.

Alfresco has an open templating engine interface into which FreeMarker has been incorporated. FreeMarker has access to the Alfresco data model and can query and access content. FreeMarker can iterate through a folder, walk through a parent-child tree structure, and access properties and content. This ability provides a convenient tool for constructing complex content and provide re-use of content. In addition, FreeMarker has access to the URLs and icons for content to generate query-driven links and good report writing capabilities. Although designed for generating HTML and XML, FreeMarker can be used to generate any type of content and is the content is URL-addressable from Alfresco.

For this example, we use a FreeMarker template to generate an RSS feed for specifically tagged emails that have been collected by our email listener over the last seven days. In the FreeMarker template, we set up the normal RSS headers and use references to the Alfresco model to set up the description of the feed. For brevity, we include the heart of the RSS feed, which is a list generated by an XPath query of all content in the email space that has the tag of the argument tag associated with it. The template then pulls out metadata out of the content node to populate the appropriate RSS tags.

<?xml version="1.0"?>
<rss version="2.0">
<channel>

...

<#assign weekms=1000*60*60*24*7>
<#list space.childrenByXPath
      [".//*[@cm:tag:${args.tag}]"] as child>
    <#if (dateCompare(child.properties["cm:modified"], date, weekms) == 1)
|| (dateCompare(child.properties["cm:created"], date, weekms) == 1)>
    <item>
       <title>${child.properties.name}</title>
       <link>${hostname}${child.url}</link>
       <description>
         ${"<a ref='${hostname}${child.url}'>"?xml}
         ${child.properties.name}
         ${"</a>"?xml}
         <#if child.properties["cm:description"]?exists
            && child.properties["cm:description"] != "">
            ${child.properties["cm:description"]}
         </#if>
       </description>
       <pubDate>
       ${child.properties["cm:modified"]?string(datetimeformat)}
       </pubDate>
       <guid isPermaLink="false">${hostname}${child.url}</guid>
    </item>
  </#if>
</#list>
...

To invoke this RSS feed, first save the above script in the Presentation Templates space of the data dictionary. Then navigate to the email drop zone space and open the properties dialog. There is a tabbed area for RSS feeds. Apply the above script as the RSS feed and copy the URL link for the RSS feed. Add an argument of “?tag=tag_name” and add this to your RSS reader.

Scalability and Clusterability

This application provides an example of the capabilities for storing, managing and accessing content from the Alfresco repository. This application can sit side by side with the other applications that Alfresco provides out of the box. Nothing is required to make this application and others scalable.

The Alfresco system can scale from small organizations to hundreds or even thousands of users on inexpensive off-the-shelf hardware. In benchmarks validated by independent parties, Alfresco using RHEL 4 and MySQL 5.1 was able to produce the following numbers on a SuperMicro 3GHz Opteron dual core, dual processors system with 12Gbytes of memory of which 4Gbytes were allocated to Java and 6 x 100G RAID-configured drives.

  • 10 Million objects total in repository
  • Bulk load 60 documents per second into 10 Million object repository
  • Up to 128 concurrent threads
  • Access via unique id in under 0.1 seconds
  • Concurrent active mix of reads and writes at 128 per second

To support even larger systems, the Alfresco system can be clustered in loosely coupled hardware to take advantage of existing hardware resources. This is due to the fact that Alfresco is architected as stateless system with all operations performed in the context of transactions coordinated through the underlying database.  Using the distributed EHCache open source cache means that all clustered systems share a common view of the contents of the cache and their freshness. Combined with a clustered database such as MySQL 5.1, the Alfresco system can be extremely scalable.

Conclusion

We have seen an example of how the Alfresco system can be used to build an enterprise-class application such as the archival and retrieval of email and enhance that storage with rules that can extract additional metadata and act upon that data. We have seen how the Alfresco system can be used as a web conduit for monitoring and delivering content from an enterprise repository. The system itself can scale to the requirements of the enterprise using the inherent scalability of the components upon which Alfresco has been built and through the transactional clustering capability of the system.

If you would like to know more about Alfresco, please visit the developer web site at http://www.alfresco.org.

John Newton is Chief Technology Officer of Alfresco. David Caruana is the Chief Architect of Alfresco. Paul Holmes-Higgin is the Vice President of Engineering for Alfresco.

The British Space Shuttle

The other night on BBC's Top Gear, the boys turned a Reliant Robin, a post-war three wheeled vehicle, into a space shuttle. My father in law had a Reliant Robin and they looked like they would fall over just sitting on the side of the road, let alone fly.

This was a major mechanical engineering project. This thing has real solid fuel boosters and that is real liquid fuel thrust coming out of the back of the car. The end result though is both hilarious and awe inspiring.

If you can't see it in the blog, find it here: http://www.youtube.com/watch?v=aw5SHleYB5s

See the whole story here.

New Skype 3.0 can revolutionize business meetings

Last summer, I blogged about how Skype can help cut the cost of communication and actually change the culture of how we interact. In it, I talked about how EMC didn't allow communication for "compliance reasons". We use it extensively here at Alfresco and it definitely saves money. Being Open Source and having a extremely thrifty CEO like John Powell, what would you expect.

Yesterday, Dave Caruana and I had a call with IBM related to JSR-283. We had a similar problem in that the (new) IBM'er wasn't able to use Skype inside of IBM's firewalls. Who knows why, but I was at home and able to conference Dave in and call the other party with a couple of clicks. Already 10x easier than setting up a conference line and at 1.5c/min at least 10x cheaper if not 100x. Dave was amazed it worked, but this is something that you could do with old Skype. There has also been video calling, which I haven't bothered to use yet. (Done that already and I can't be bothered to comb my hair.)

However, I have been noticing that in the last few months, the quality of the line has been going up. The conversation with Dave was limited only by the mikes and headsets we were using. The line to IBM was about as good as a conference line. Compared to a year ago, quality is way up and you are wondering why you would use an old fashion line.

But now Skype 3.0 is out and I can see this as really changing things. It now has "SkypeCasts" that allow you to hook in up to 100 people. We tried it here in the office and it worked great. You can control who speaks and who is just listening. I made the mistake of making it a public skypecast and had a crank caller come on and I was able to just boot him off. Imagine trying to do that on an ordinary call. We can use this as FREE call in for web casts. That will save hundreds of dollars a call. It can also change the way we do large meetings or management meetings. We are a distributed organization and this will make it so much easier.

Skype_call
Phone number plug-in in Skype 3.0 makes writing down numbers unnecessary.

Skype 3 is also now open architecture so that third parties can add things like recording, share media, share my desktop (a la Webex and Live Meeting), draw pictures or even play games. I hear there is even a stress meter to tell if someone is lying.

The thing that I really like though is the plug-in to Firefox. It turns every phone number into a Skype button. It does an amazing job of recognizing the phone numbers and the country you are calling. I took my family out to dinner last weekend, I looked up the restaurant in Google, opened the web page and clicked the Skype button to call for a reservation. How easy is that?

You can see that the disappearance of phone numbers is imminent. What I hope is that disappearance of the business meeting is as well. Both Ingres and Documentum were very much meeting cultures and I think it drove everyone nuts. We don't have anywhere near as many meetings at Alfresco and I think it is because instant communication eliminates a lot of the status meetings and other rubbish. But when the locus of communication moves from physical meeting places to the internet, it will radically change the way we do business.

If only EMC, IBM, etc. could recognize this. There was a reply to my original post that companies are blocking the Skype protocol because of a theoretical risk of "Denial of Service" attacks. But if those companies took a fraction of their phone bill to work with eBay/Skype to solve those problems, it would eliminate the problem for everyone and save them huge amounts in the long run. Who knows, it might even eliminate those useless early morning meetings.

My Photo

  Subscribe
Add to Google Reader or Homepage
Subscribe in 

Bloglines

Subscribe in NewsGator 

Online
Add to netvibes
Subscribe in FeedLounge

Blog Roll

Powered by TypePad
Member since 02/2005

My Online Status