ECM

Adobe embeds Alfresco Repository

It's been quite a while in the making, but I am very pleased with the news today that Adobe will be embedding Alfresco technology as part of its LiveCycle Suite. A while ago, I wrote a blog about embeddable content repositories. It was clear then and more clear now that the old generation of content repositories is not really designed to be embedded as part of content-oriented applications. Yet, we all know that there is more information in content than there is in databases. Why can't applications use a set of services for managing content the way they manage data in embedded databases?

On this particular news, ComputerWorld reports Raja Hammond, Group Manager for Adobe LiveCycle, as saying, "Alfresco has a fantastic lightweight installation. It is J2EE server-based, so it is very much aligned with our architecture. We're able with this release to totally embed it. We've done extensive customization to the UIs to add additional capabilities to them. We've integrated them tightly with the various solution components within LiveCycle."

At InfoWorld, Brian Wick, Director of Product Marketing at Adobe said, "It's much easier, much quicker for our customers to build LiveCycle apps with the content services piece built in." This should be the sentiment of any product manager whose product handles content. This clearly the case of LiveCycle which handles potentially huge numbers of PDFs and forms.

Over at CMS Watch, Alan Pelz-Sharpe, a long-time ECM observer, blogged on the announcement that, "
It's been a while since there was a big product announcement in the ECM world, but today's announcement by Adobe that they will be embedding Alfresco into their LiveCycle Enterprise Suite will doubtless garner a few headlines. Alfresco, the UK-based open source ECM company, has certainly done a great job of marketing themselves since their launch a couple of years back, stealing some limelight from more established and much bigger vendors such as Interwoven, Vignette, and OpenText. The question we have to ask is whether this announcement is another marketing   triumph, or whether it suggests something more substantial. First off is the fact that it is a real OEM (Original Equipment Manufacturer) deal, and the technology will actually get embedded into the Adobe offering, so it is more than simply a paper partnership."

It is also significant that the Alfresco platform is open source. Open source allowed Adobe and our dozens of other OEMs to try out Alfresco before even approaching us. Open source also provides a level of comfort and confidence in a platform for services like content services and content repository. It is much better than providing code in escrow. it actually provides a community as well to ensure the long-term success of the platform.

We look forward to a fruitful and simbiotic relationship with Adobe. We believe that this is the beginning of looking at content management as a peer of database management of an essential component of any enterprise-class application. Congratulations to Adobe on all the hard work and the new release.

A Manifesto for Social Computing in the Enterprise

Investment in the infrastructure of the internet has dramatically increased bandwidth to everyone in the developing world and created home computers that are not only inexpensive, but very powerful. This change has expanded the usage of the internet exponentially and introduced new demographics and generations of users that had not used computing prior to the expansion of the internet. These users have themselves created the content and applications that feed the internet and have set expectations of the applications that we use in web browsers and new mobile devices. The increased bandwidth has made this experience much more interactive and visual experience encompassing video and visual elements. Web properties such as YouTube, Google, Amazon, Facebook, MySpace, and Flickr have set the benchmark for expression, accessibility and social interaction of computing systems.

Dubbed Web 2.0, this revolution in computing has shifted the face of software from a logical, linear, and introverted science to an expressive, graphical and social art. New designers of web sites, unschooled in traditional software techniques, are nonetheless able to create software that scales to millions of users and billions of objects of information and still meld those users into an artistically aware community. The next generation of enterprise employees who started using the internet in their early teens have only known this evolving culture of free and creative development of the internet and now demand better of the enterprise software that they meet. Older employees also know that that the software that they use on a day to day basis can be better. Enterprise 2.0 seeks to emulate the success of Web 2.0 in the creation of new software for the enterprise.

Social Computing

The shift of computing power from business logic and calculation to socialization and people-orientation has been dubbed by some as Social Computing. The term Social Computing has been used interchangeably with Enterprise 2.0 or Enterprise Social Applications, however, IBM and Microsoft have created Social Computing research centers and Forrester has started to use the term in describing next generation enterprise collaboration. Social Computing is the use of technology to support sharing of information and enabling collaboration through social networks and to tap into the value of the “Wisdom of Crowds”, a concept made famous by James Surowiecki in 2004 to explain how many people are smarter than individual experts. Social Computing exploits software oriented toward people and Social Networks, the extended relationships of individuals, to connect to more people and access the Wisdom of Crowds.

To tap into the wisdom and awareness of social networks and empower people to collaborate at any time or place, Social Computing platforms need the following capabilities:

  • People - Support information about people, their preferred communications, their relationships and affiliations, since social networking is all about people rather than just systems, data and objects. The more information available about other users, the more likely they can be found as a source of knowledge.
  • Context of Networks - Social networks organized around projects, teams and departments provide the context of work and relevance of information as it spreads from creation to the people that need that information. Social networks, especially networks extended beyond the enterprise, provide the greatest differentiation of social computing from previous generations of collaboration.
  • Social Collaboration - Provide an environment where people can share ideas, contribute knowledge and solve problems in creative, unstructured socialization as opposed to rigorous workflows that are required for control of information. Next generation tools use techniques developed by Web 2.0, particularly those tools that empower social knowledge, such as social tagging, integration of communication and awareness of changes in social networks.
  • Content as a Service - Content is the container of knowledge and information and is core to the socialization of information. Content needs to be accessible everywhere, not just in large, monolithic applications. Content capabilities need to be accessible as reusable service components. Social computing can happen inside the enterprise or outside and a channel can be a web site, web application, mobile device or even external web platforms such as Facebook or Google applications. Mashups can occur inside the enterprise or outside and the channel will require content as a service that can securely be accessed wherever it is needed or wherever it is contributed.
  • People-centric Tools - As Web 2.0 has spread new paradigms of user interaction, the consumerization of software has created expectations that enterprise software becomes easier and empowers user to contribute, correct and classify content and information within the context of social networks. AJAX and next generation rich internet application interfaces such as Adobe Flex will provide users with a much richer, more intuitive user experience and the ability to scan much more social knowledge to find ideas and solutions. These tools should themselves be componentized and accessible as a service so that they may be mashed up with other sources of social knowledge.

This does not mean that the need for traditional enterprise content technologies such as document and records management goes away. They are still repositories of the truth and verifiable information and thus play an important role in sharing knowledge within social networks. However, these traditional technologies lack the usability, empowerment, and breadth of reach that Web 2.0 sites provide. They lack the collaborative nature that invites in people without barriers and restrictions to contribute to the sharing of knowledge and information. Web content management for creating a richer Web 2.0-style user interface becomes even more important to this collaboration to provide a compelling face to the interaction and to simplify the access and navigation of shared information. Enterprise Content Management cannot become one of the principle platforms of Social Computing unless it addresses the requirements of Social Applications.

Use of Social Computing

The balance is shifting from contained and controlled companies to engaged and empowered collaborative enterprises driven by Web 2.0-inspired social computing. At the center of the shift from old models of computing in the enterprise to new social models are companies that are inspired to innovate or to engage more with their customers. This includes companies not just using their internet or intranet web sites, but engaging in social networking channels such as Yahoo, Google, YouTube, Facebook and MySpace. Those using social computing are interested in engaging people, such as customers, employees or partners. They are using new people-centric tools and facilitate creating or extending existing social networks.

Major ECM vendors are all planning their Social Computing efforts and to a large extent are being dragged in this direction by their more forward-looking customers. Enterprises that have discovered the value of Social Computing are:

  • Consumer-oriented companies that particularly address a younger demographic must engage their customers as part of both the marketing process as well as the development of new products. For example, games and film companies that engage their viewers in plot and scene development do much better than those that keep everything under wraps until the game or film is ready.
  • Enterprises hiring a new generation of knowledge workers who grew up on the internet must provide tools as empowering as those available from Web 2.0. Turning these tools off forces these workers to seek employment elsewhere and forcing them to use tools that do not meet their expectations of usability and engagement.
  • Financial Services firms are leading the shift in usage of these technologies. Financial Services have always been innovators in developing new technologies and investing in providing better service for their clients. Speed in innovation in these services becomes a major competitive advantage where churn of clients can be very high in turbulent times. Internally, competition for talent is intense and providing better support is important for attracting and retaining employees. In particular, young and ambitious brokers and managers are more likely to be sociable themselves and seek out Social Computing inside and outside the enterprise.
  • Government and Non-Profit organizations that provide services and citizen feedback online find increasing their IT budgets much easier than those that merely arbitrated by a front-line service. It is now inconceivable for an American politician to run for office without an extended internet presence such as Facebook or YouTube.
  • Enterprises that have faster cycles of product innovation, especially high tech, are looking to their customers and partners to participate in the development of new products and services. In previous generations, the field acted as a filtering mechanism of new customer requirements and ideas. However, today technology can provide a frictionless way of getting the entire enterprise to exchange ideas and improvements with the customer communities.

Integrating Social Computing

Because Social Computing is unlikely to come from a single source, especially because of the diversity of sources of knowledge and social networks available on Web 2.0, it is extremely important for the enterprise infrastructure for Social Computing to be integrated with those sources. This means bring these sources into the enterprise and bring the enterprise sources out to Web 2.0. No matter where the people collaborating are, the tools they want should be available. To facilitate this, the Social Computing should be:

  • Open Source - Through being developed through social computing paradigms and sharing best of breed components with the open source community, open source systems have evolved rapidly and encompass social computing capabilities developed by the open source community. Social tools such as MediaWiki, the wiki that powers Wikipedia, and WordPress, the most popular blogging software were developed using open source.
  • Integrating the Inside Out - By providing content as a service and enabling light-weight, Web-Oriented scripting development, the Social Computing platform should quickly integrate content services into external channels and web sites, such as Facebook and iGoogle, to allow enterprises to engage customers, partners and home workers.
  • Integrating the Outside In - If the Social Computing platform is modular and supports a Web 2.0-style mashup-oriented architecture, it enables users and teams to integrate external open source tools and social networking web services, such as Facebook, LinkedIn or other Open Social-enabled properties, to tap into the wisdom of crowds available on the internet and to make customers and partners part of team collaboration.
  • REST-style Architecture - A Web 2.0-style or REST-style of architecture using easy, light-weight scripting languages and integrated through internet standards-based APIs can easily mash-up content services into any web-oriented application or web site. These architectures should be scalable, fault-tolerant and high performance to meet any enterprise or internet requirement.
  • Choice - The Social Computing platform should be based upon open interfaces developed by the open source community to provide choice of operating system, database, application server, content authoring tools or APIs.

Over the past year and continuing into the coming year, Alfresco is dedicated to expanding its architecture and applications to enable this vision of Social Computing. We will work with partners and open source community to provide best of breed open source tools for enabling this architecture. We will integrate with external Social Computing properties such as Facebook and the Open Social alliance to expand the breadth of social networks and the ability to collaborate through those networks. We will be expanding the Alfresco system’s understanding of users as people and facilitate sharing of information and content through their networks. We will be open in the process and seek and encourage your feedback and participation.

What's Happening at EMC World?

This week is EMCWorld in Orlando and encompasses what used to be Documentum Momentum. Unfortunately, I don’t think I’m invited anymore. :-( However, it is definitely worth tracking what’s going on and I am trying to find out what was going on there by reading the news and blogs. I am also trying not to be so hard on Documentum and have acknowledged their leadership and innovation in integrating the ECM stack. At the conference, Balaji said:

"The future of content management is no longer just about securing, accessing and storing content," said Yelamanchili. "It is about driving new levels of collaboration and knowledge management through deriving more intelligence and context with information. It is also about managing all types of business processes and allowing connections to be made within and across different business environments. As a leader in ECM, EMC is in a great position to drive innovation in these key areas."

As far as I can tell, the big news on the Documentum side in terms of innovation seems to be TaskSpace, Based upon the description, it seems to be more focused on scanning and capture and “transactional content”.  There is a data sheet on EMC’s Taiwan web site.  What is surprisingly is that there are no screen shots, so it is really hard to tell what is different about this product versus the collection of technologies that they already had. There was this quote from Whitney:

“TaskSpace is designed for high-volume, content-rich task processing, providing sophisticated capture and business process management (BPM) capabilities. A user experience built specifically for managing transactional based content and processes, TaskSpace enables organizations to streamline transactional business processes and improve end user productivity.”

In addition, there is some talk about Web 2.0 streams that will allow developers to mash Documentum content with internal and external web content. This is great news. Again, it would help if there were pictures or more details.

I also noticed that Craig Randall presented the Documentum Foundation Services, which I would suspect contains their long awaited web services interfaces.

It’s a shame that there aren’t more people blogging about EMC World. Given that there are somewhere between 6000 to 8000 people depending on who is writing, I wish there were more people who blogged. A search on Technorati yields about 136 posts, but I can’t find a lot about Documentum in those posts. Let me know if I have missed anything.

Take AIIM on Security

Fife

I am off to the AIIM (Association for Information and Image Management) Conference in Boston right now. This is perhaps the largest Enterprise Content Management conference there is. I will be speaking tomorrow on Web 2.0 and ECM with Wilson D’Souza of MIT. Looking through the agenda of the conference, I was looking for sessions on security.

James McGovern has constantly asked me and other ECM vendors to solve security issues for ECM using the standards that have already been developed for web services, such as XACML. Looking at the agenda at AIIM, it doesn’t look like the vendors are taking it quite as seriously as James. James is an enterprise architect and his role is to look at stuff like this.

Trying to address security in some of the standards groups such as AIIM’s iECM initiative and JSR-283, the successor to JSR-170, has been politically tricky. It is difficult to figure out what a common view of security is given all of the different models of security such as Access Control List, Role-based Security and Policy-based Security used in Records Management, let alone all the different vendors’ implementations of each. However, looking at this problem going forward, without addressing and standardizing security, we are creating huge barriers to interoperability and not meeting the requirements of new models of interaction on the internet.

In looking at how new Web 2.0 companies are starting to mash-up and integrate different services, it is hard to see how we can extend these capabilities into  more secure and sensitive services such as eCommerce or bringing these services into the enterprise without a common notion of identity, role, entitlements or membership. As vendors, we either address these issues or, like so many time before, they will be addressed for us by others on the internet and we will be forced to catch up.

I have been doing some background thinking on this and here are some important points that I think ECM vendors need to consider:

  • Common identity. There needs to be a common way of addressing identity between different services whether those services are in the enterprise or outside. As we start to bring customers and partners into the process of serving themselves or helping us design new products and services, we can’t just rely on internal directory services. OpenID is the only standard that I am aware of that provides a neutral way of identifying users and is not dependent on any single vendor.
  • Common Models for Rights Management. The big, looming problem in content is the fact that huge numbers of users are adding, accessing or updating an even larger number of pieces of content. This calls for a model that controls content through definition of context such as time, location, metadata or role. XACML could very well fit this model. However, users need to understand this model as they set up the controls on the content.
  • Distributed Directory Services. Identity is not sufficient for determining roles or entitlements. There needs to be a more open way of integrating multiple directories without revealing sensitive information. This is the same problem we are trying to solve for content and directories need the same mechanism to define access.
  • Mashup Frameworks for Security. Mashups, the integration of different systems at the browser level, represent the fastest-growing and easiest mechanism to weld systems together. Almost all mashups have no notion of security and only work on public systems. In addition, they introduce security holes that require code running on other systems since JavaScript security features in browsers prevent parts in a web page from executing between two domains. Google has introduced a mechanism for Gdata and other mashable components to execute code. This type of mechanism needs to be standardized and a model of trusted sources needs to be integrated.
  • Search and Security.  As search becomes increasingly federated, such as through the OpenSearch API, managing identity and entitlements on content becomes very problematic. The search sources should filter out any content to which the user doesn’t have access. However, that requires some cooperation with the software that is doing the aggregating and the content sources. ECM systems will probably control the most sensitive information, but this will need to be aggregated with public sources as well to create effective search applications for the enterprise.

Help the process by asking your vendor how they expect to address these types of security concerns. If you are at AIIM, bring the issue in relevant sessions. I don’t have all the answers nor does any vendor. People in the middle of this problem like James can help by bringing up their use cases. If we start asking the questions, then perhaps we can collaboratively answer the questions and solve this problem. If you think standardizing this is hard, try imagining building next generation systems without standardizing these security needs.

Add Salesforce.com to the Enterprise Content Management List

Today, Salesforce.com announced that they will be getting into the ECM business by acquiring Koral, one of our neighbors here in Maidenhead. Some of the guys there are our friends and we wish them luck in this new turn in the ECM market. They scored a real coup showing up at Demo last year and it obviously got the attention of Salesforce. The system has not been around long, but they have added some interesting Web 2.0 twists. It is focused on document management and was born out of the efforts of BuildOnLine, a specialist online content management provider for the construction industry that has recently merged to create CTSpace.

This is a significant shift for both Salesforce.com and for the ECM market. As we all know, although most of the Fortune 1000 have ECM, penetration into those accounts can generally figured in single digit percentages and practically non-existent in smaller organizations. Increasingly, ECM will be delivered either as a software or physical appliance or as software as a service in a utility like form. Smaller enterprises or organizations that have generic content management requirements will find this service useful. This will allow Salesforce to leverage its brand and start with the sales organizations. It also give Salesforce a means to expand its business beyond the sales and marketing organization. It also further validates the Software as a Service model for simple utility functions.

Although everyone is looking for simplicity, not everyone will be looking for a utility-like approach to ECM. It is up to the ECM vendors to simplify the installation and set up process to make it as easy as flipping a switch, but keeping the content inside. Organizations that are not comfortable putting their documents outside the firewall, such as financial services and government organizations are more likely to look for an internal system. Also, as the BuildOnLine guys found out, once you move to the area of specialist content applications, the sale gets much harder and configuring systems becomes even worse.  Records management, engineering applications and specialist publishing applications fall into this category. It will be entirely up to each organization which makes sense for them.

Software as a Service is a model that Salesforce.com didn't invent, but has become its biggest proponent and greatest success story. This acquisition will raise the profile of SaaS as a model for ECM. Salesforce will not be alone in delivering content management services as others are developing their solutions with systems like Documentum and Alfresco. Likewise, Microsoft is taking Sharepoint on-line with the Office Live offering. Other companies are now looking at providing an SaaS model for web content management and collaborative content management as well. No doubt we will be seeing feature by feature comparisons between these various solutions soon. Depending on the breadth of functionality, integration with internal systems and scope outside of sales and marketing content, we will see how Salesforce does.

How to Discount Your ECM System without Really Trying

Sinus_headache
If you are a customer of Documentum, FileNet, IBM, OpenText, Vignette or Interwoven, there is a really easy way to get deep discounts in your next license negotiations. Just follow these 3 easy steps:

  1. Download Alfresco here
  2. Install it in less than 5 minutes
  3. Walk through the Alfresco Tutorial here

Even easier, try a hosted trial here.

That’s it. If you say you are using Alfresco and that it is so much easier and cheaper, these ECM guys are offering massive discounts. The steps above will give you all the information that you need to have the upper hand in negotiations. Alfresco is even cheaper than their annual maintenance, because it is based upon CPUs, not users, so you can use it in negotiations for annual maintenance.

If the discount isn't massive, it is because they don't think you are strategic. It really shows what they think of you. In the strategic accounts, they are definitely discounting just to keep the business. If you still want to do business with them, it could be worth some free extra training or some free consulting time.

The secret to negotiation in enterprise software is to be confident and know that you can get a better deal elsewhere, even if you have no intention of doing so. They are often willing to match our support costs, but you can’t beat free. That's all it costs to try Alfresco out. The sales guy may appear to be confident in his negotiation, but he’ll be thinking, “Oh no! Not again.” You’ve got him where you want him.

Oh and by the way, try this at the end of the quarter. Right about now would be perfect for Q1 targets when things are pretty tough for them anyway. Having flushed their pipeline to get their annual commissions in December, they are much more flexible right now.

“The Departed” - Dave DeWalt and Documentum

Departed_wall7_1024x768 Being a co-founder of Documentum prior to starting Alfresco, it is only natural that people are asking me what I think of Dave DeWalt’s departure from EMC to become CEO at Security Software firm McAfee. EMC is also one of the largest players in enterprise content management after acquiring Documentum and so we should have an opinion.

I actually owe quite a bit to Dave DeWalt. I worked with and for Dave while at Documentum. He and I are actually very different. He is a tall, aggressive Olympic wrestler who runs triathlons. And me - well, I saw a triathlon or something like that on television once. Documentum’s turn-around after the recession in 2001 has a lot to do with Dave’s determination and competitive nature. Dave is also very smart and very ambitious.

I made a bet with someone in EMC that Dave would be CEO of EMC within two years. Dave’s progression from his initial entry in Documentum to his current position as the CEO of McAfee has been remarkable. Dave began at Documentum to head up the web content management group when Documentum had no solution. The early success transformed the Documentum brand and set up the all time high market value for Documentum. Jeff Miller, the CEO of Documentum at the time would frequently say that Dave would make a great CEO one day. Dave took on the role of running product operations, then became COO and then CEO in relatively quick succession. Not bad for something like a three year run.

It was around the time that Dave was COO that I left Documentum. Six years of travelling to the US once a month to be at the center of power in Pleasanton got to be too much. That was the price that I paid for choosing to live in the UK and still have any influence. However, I wouldn’t leave Documentum when it was low, but Documentum was on a high and it had been 11 years since starting it with Howard Shao. It wasn’t long after that the stock market crashed and Documentum stock with it.

Dave taking the reigns as CEO did a great job in recovering the stock price and for that I am extremely grateful. Dave’s management style was seemingly to do everything and he was at the center of most major projects. You can’t fault the job he did though. Documentum stock went from strength to strength as the rest of the enterprise software world was collapsing. As he moved up, he still held responsibilities for the core product and sales. Howard Shao was behind the acquisitions with good execution by Rob Tarkoff. But Dave glued the pieces together building a loyal cadre of people to run the business.

I had expected Documentum to be acquired by Oracle in the 2002 to 2003 timeframe. According to Oracle PowerPoint slides released as part of a DOJ anti-trust investigations indicated that Oracle was thinking the same thing. However, it was EMC that made the winning offer. What I had heard was that Dave, who used to work for Oracle, was not interested in working for Larry Ellison. (Could you blame him?) The price that EMC paid was one that neither Oracle nor IBM were willing to pay. I didn’t hold on to my new EMC stock because I knew nothing about the hardware business.

Dave, coming from the aggressive “only one of you will come out of here alive” culture of Oracle did very well in EMC, especially since EMC essentially left Documentum alone. The senior managers on the hardware side lacked Dave’s charisma, enthusiasm and customer engagement style. Dave moved from CEO of the Documentum business to co-head of EMC software to President of the software group. EMC was proud to declare that had one of the top 10 software businesses. Dave taking over the Legato group was a real coup, no really, a *real* coup. The acquisition of Captiva has been particularly lucrative for EMC. As I have said many times before, EMC was busy consolidating the Enterprise Content Management industry through acquisition.

Joe Tucci probably won’t be around forever. I really believed that Dave’s career would continue its trajectory to become CEO of EMC. Taking over World-wide Sales was an indication that it was in the cards. What I have heard was that there was concern about Dave’s lack of experience in hardware and lack of discipline at various company events led to him to be not taken seriously to run a $6 billion per year business. You can see Dave thrive being the boss and not getting the top job would lead him to look elsewhere.

Anything could have happened to cause Dave to leave. However, there are indications that EMC did not expect it. As of today, March 9th, Dave DeWalt was still on the EMC Executive Team web page. In addition, having two people in charge of the software division, Mike DeCesare and Balaji Yelamanchili, is consistent with Tucci not making a decision of who is charge or just letting things sort themselves out. However, having both in charge of an important division that must be close to a $1 billion business is a tacit admission on the part of EMC that they neither are up to the job either. There are a couple of events that Dave should have been at and EMC just had to cancel, even though really shouldn’t have. It could be that Dave was asked to look elsewhere or that he was just plucked from the air by McAfee. Knowing Dave, it’s likely that he got frustrated at the fact that he wasn’t going to get the top job and just went and looked elsewhere.

Dave commands a great deal of loyalty because he respects and rewards loyalty. He was more than a hands-on manager as he took on responsibility to sell and drive product strategy himself. Of course, there is the Dave that I knew pre-2001 and people have said that he has grown, but Dave’s personality and drive will only change so much.

I would expect some significant hiccups without Dave at the helm. Ironically, sales for EMC could go smoother, although the shift from an extrovert, aggressive, technologically savvy executive to Bill Teuber, EMC’s CFO until last August, is a very stark contrast. The software division will have a big whole in it. The looming threat of SharePoint 2007 is a big one for Documentum. The merged IBM and FileNet will be coming out of their massive integration sometime this year. I won’t even mention open source.

Dewalt    Teuber_new
DeWalt and Teuber must have very different styles.

Without the bond of personal loyalty that Dave built up, I really wonder what will happen to that layer of next generation of executives that came on board after I left. Can we expect to see heads move from EMC to McAfee? Did EMC give him a golden goodbye to not take anybody? Now that Howard Shao has retired and Dave DeWalt has left, who will drive that vision of what Documentum and the combine entity that is EMC software will be? Technical decisions will be ably handled by new CTO Razmik Abnous, one of the original engineers of Documentum, but what about the business side?

This will be an interesting turn in the further consolidation and commoditization of the ECM market.

Ian Howells: With Open Source you can't be half-pregnant

Pregnant

My colleague, Ian Howells Chief Marketing Officer at Alfresco, has posted a blog on how Geoffrey Moore's marketing models describe what is happening with the commoditization of the ECM space and how the network effects of open source are accelerating the pace of commoditization of enterprise software. He quotes Geoff:

"Enabling technologies commoditize extremely well, allowing them to proliferate into markets far afield from the original starting points and generate a high degree of network effects. These in turn put pressure on the overall marketplace to standardize exclusively on a single set of components driving market shares to extraordinary levels …”

Ian sites Alfresco move to GPL as embracing this network effect and that companies should embrace open source whole heartedly. Ian says:

"You can’t be 'half-pregnant' and similarly you can’t be half open source and half proprietary. It is open source, not hybrid models that will drive true disruption, commoditization and benefit most from the network effect. The GNU General Public License (GPL) is the ideal license to drive forward this industry disruption and accelerate the network effect. That is what drove us to move to GPL."

Alfresco Releases 2.0 with WCM and OpenSearch under GPL

Community_preview_2_0_alt_3

Alfresco is pleased to announce the availability of the Community Version 2.0 of its Open Source Enterprise Content Management System and JSR-170-based repository. This version is now available under the GPL license with a FLOSS exception which allows any open source software using an OSI-approved license to embed the Alfresco repository or applications without change of license or attribution. You can download here.

Version 2.0 has combined the core Alfresco repository with the new web content management system and records management system. The web content management system was designed and implemented by the original engineers from Interwoven who joined Alfresco at the beginning of 2006. The WCM system provides:

  • Simple import of existing web sites
  • Simple Xforms-based entry of XML data based upon the open source Chiba project with AJAX extensions
  • Templating of XML and HTML based upon XSLT and Freemarker
  • Virtual sandboxes for staging of web sites without copying files
  • Timeline snapshot of web sites with zero effort
  • Standard web production workflows extending the JBoss jBPM engine with group and queue support

Our new web site is now a production user of the Alfresco Web Content Management System. It certainly makes it easier for ordinary guys like John Powell and I to add new content to the web site.

Version 2.0 has added new federated search by implementing the OpenSearch protocol, which I blogged about recently and is supported by hundreds of search engines. OpenSearch is a naturally federable search protocol combining searches from multiple search engines. Alfresco acts as both an OpenSearch server participating in a federated search from a web browser or portlet and an OpenSearch client as the Alfresco web client supports searching against multiple Alfresco repositories as well as multiple internet search engines.

In addition, Alfresco Version 2.0 has further simplified the extension of the core repository without updating the Alfresco server using the new Alfresco Modular Packaging (AMP) packs. The Alfresco Records is now provided as an optional AMP pack download. AMP packs extend existing repositories and can include folder structures, content templates, JavaScript and templating scripts, web site structures or Spring-based Java plug-ins in a standard zip file.

Alfresco has also added new AJAX controls for browsing and navigating the repository to the web client. The Alfresco repository has added a new relationships and metadata to support the translation process and multiple language variants of the same content. This capability was developed for use by multi-national corporations and government agencies.

A lot of people have put a lot of work into this release. The WCM team with Kevin, Britt, Jon and Ariel have been working almost a year on the web content components of Alfresco. Derek and Andy have helped to integrate some very sophisticated virtualization capability that WCM supports into the repository. Kev and Gav have been working their magic on the web client to support not just WCM, but also the new OpenSearch interface that allows the client to search multiple repositories, wikis, blogs and the web. Will has been a great guinea pig in building our new web site. Roy has now made it much easier to plug new components into the repository with the AMP packs and it will get even easier as we introduce the AMP Exchange with an iTunes like download capability. He also did a great job in packaging the records management into an AMP as a test base and to make RM optional. And Paul has worked very late nights to make sure it all works together. Congratulations guys on a great release.

We have come a long way in two years. Each of the five releases in the last year have been worthy of a dot-zero designation, but we felt now was the time when all the pieces of ECM have come together in a single package. Wait till you see 2.1!

Developing a Content Application in Alfresco and JSR-170

By John Newton, David Caruana, and Paul Holmes-Higgin

Keyboard

(Warning: The content in this blog is technical and Java in nature. Proceed at your own risk.)

Alfresco is a complete Enterprise Content Management System and 100% open source. It is a comprehensive content management development platform and scalable repository supporting JSR-170, so is suitable for building complex enterprise-scale content applications. Access to the repository is provided through APIs, such as JSR-170 and web services, as well as through a virtual file system interface implementing the CIFS (Common Internet File System) protocol to emulate a Microsoft shared file system, FTP and WebDAV. The Alfresco system is built upon Spring taking full advantage of Spring’s dependency injection model to extend repository functionality, as well as incorporating Hibernate for persistence, Lucene for querying and indexing, jBPM for business process management, and the Mozilla Rhino JavaScript engine.

The Alfresco system has a web-based application that provides document management, web content management and records management capabilities. The web application, based upon the MyFaces implementation of JSF, is extensible, programmable and scriptable. Through Spring configuration it is possible to add new dialogs, views and wizards. The web application also provides dashboards to track repository and workflow activity. These dashboards are programmable either through Java or high-level templating languages, such as the open source FreeMarker templating engine.

As an example of building an enterprise content application, we use a scenario of building an email archiving application. For this, we use the JSR-170 interface to add content; add a new action to the repository specifically for email; use JavaScript for processing the email; then create an RSS feed using a Freemarker template, and use the templating engine to create a web view of email activity.

Defining a Metadata Aspect in Alfresco

For the purposes of our application, we will add a new metadata aspect to tag incoming emails. A metadata aspect is similar to a type, except that it can be added after the content has been created and more than one aspect can be added to the content object. It is similar to a JSR-170 mixin, except that in Alfresco it can also have behavior attached to it. We will use a predefined aspect in this example, the “cm:emailed” aspect, which includes the following metadata:

  • Originator
  • Addressee
  • Subject Line
  • Sent Date

Alfresco models are defined in XML and can be loaded dynamically. In this example, we create a new aspect called “tagged”, which is defined as follows:

<aspect name="cm:tagged">
   <title>Tagged</title>
   <properties>
       <property name="cm:tag">
           <type>d:text</type>
           <multiple>false</multiple>
       </property>
   </properties>
</aspect>

In order to view this in the Alfresco web client, we can extend the properties sheet with the following configuration:

<config evaluator="aspect-name" condition="cm:tagged">
   <property-sheet>
      <show-property name="cm:tag" />
   </property-sheet>
</config>

There are metadata aspects available as well for Dublin Core, DOD 5015.2 records management, basic Microsoft Office metadata, automatic counters, workflow process data, classification, auditing and localization among others.

Content Storage through JSR-170

Alfresco is a JSR-170 compliant repository therefore supports the JCR API. For purposes of this example, we assume that there is an email listener that stores the email as a JCR node using the JSR-170 interface. The node is placed in an “email drop zone”, a well known path for processing the content based upon rules in the repository. The drop zone is identified as a node in its own right and the email is attached as a child of that node. The actual binary content of the email is then added to this child node.

public class EmailListener
{
    ...

    private Node importEmail(InputStream msg)
    {
       // locate email dropzone folder
       Node rootNode = session.getRootNode();
       Node zone =
       rootNode.getNode("app:company_home/cm:email/cm:dropzone");

       // add email to folder (which fires registered rules)
       Node email =
          zone.addNode(GUID.generate(), "cm:content");
       email.setProperty("cm:content", msg);
       return email;
    }

    ...
}

The purpose of placing content in a drop zone is that it allows the business logic of email filing to be specified independently of the application and more importantly by business users. This is done in Alfresco through rules and associated actions.

Defining Rules to Process Incoming Content

The Alfresco repository organizes information in a hierarchical structure similar to other enterprise content management repositories. These structures are called spaces, which are similar to folders in a file system, but also contain rules for processing content that is added, removed, moved or updated in that folder. They also have users associated with that space that may have different roles in interacting with content in the space.

The rules associated with the space determine the disposition of content in the space. Rules can be used to change the type of the content being added, add aspects of metadata, attach behavior such as locking and versioning, and transform and copy to other spaces. In the email example, we will use a rule to extract metadata from the content and use that metadata to classify and move the content to a new space determined by the metadata.

To do this, we use the Alfresco web client, navigate to the email drop zone space and specify rules for content entering the space. Through a set of wizards, we add the following actions for all new items of mimetype email or “message/rfc822”:

  • Add the email aspect - this is the email metadata mentioned previously and can be combined with other aspects such as record data or process data.
  • Add the tagged aspect - this is the aspect we defined earlier.
  • Extract metadata - this is a standard capability of Alfresco that looks inside standard file formats to extract standard information such as author, title and subject. In this example, we will extend the system to extract additional standard metadata from emails.
  • Execute the “emailtag.js” JavaScript - this is a server-side JavaScript example that we will show in a later section. JavaScript is stored in the repository and can be executed just like Alfresco internal actions.

Rules and actions can be combined and chained to create more complex logic. Rules can include tests of types of content, which aspects are applied and what metadata has been set. These rules in turn fire off the actions in sequential order or can be executed asynchronously for long running operations. Common actions performed in rules are transformation, copying, moving and metadata setting and extraction.

Adding a New Behavior to the Alfresco Repository

Although the Alfresco repository already has an action to extract metadata from email, since it is a relatively new extension of the existing repository, it is worth showing how it was added. In addition, it is a good example of how Alfresco uses the dependency injection pattern of Spring to add new functionality without requiring rebuilding the repository system. In this example, the metadata extraction action has a standard Java interface defined as follows:

public interface MetadataExtractor
{
    public double getReliability(String sourceMimetype);

    public long getExtractionTime();

    public void extract(ContentReader reader, Map<QName, Serializable> destination);
}

For this example, we will add a new interface to inject into the MetadataExtractor interface that uses the open source POI Java access tool to read the proprietary Microsoft file format. We first insure that the file actually is a Microsoft Exchange message or rfc822 and then we read the fields delimited by the following hex codes:

  • 0C1F - The message originator
  • 0037 - The message subject
  • 39FE - The message addressee

The following code accesses these fields through POI and then sets the appropriate content properties on the metadata. Obviously, more complex processing or more metadata fields could be added to the code.

public class MailMetadataExtractor extends implements MetadataExtractor {
    private static final String PREFIX = "__substg1.0_";
    private MetadataExtracterRegistry registry;
    ...

    public void extract(ContentReader reader, Map<QName, Serializable> props)
    {
        POIFSReaderListener listener = new POIFSReaderListener()
        {
            public void processPOIFSReaderEvent(final POIFSReaderEvent event)
            {
                if (event.getName().startsWith(PREFIX))
                {
                    String type = event.getName();
                    type = type.substring(PREFIX_LENGTH,
                              PREFIX_LENGTH + 4);

                    if (type.equals("0C1F"))
                        props.put(PROP_ORIGINATOR, extractText());
                    else if (type.equals("0037"))
                        props.put(PROP_SUBJECT, extractText());
                    else if (type.equals("39FE"))
                        props.put(PROP_ADDRESSEE, extractText());
                    ...
                }
            }

            POIFSReader poi = new POIFSReader();
            poi.registerListener(listener);
            poi.read(reader.getContentInputStream());
        };
    }
}

To register this bean, we merely added the following Spring configuration:

<bean class="MailMetadataExtractor" init-method="register">
   <property name="registry">
      <ref bean="metadataExtractorRegistry"/>
   </property>
</bean>

Similar extensions can be added for transformations from one format to another, authentication interfaces, encryption and compression mechanisms on content transfer, and even rules and actions.

Using JavaScript to Add Repository Behavior

Previously, we mentioned the “emailtag.js” JavaScript for using the metadata to classify the emails. We could implement this in Java, but for simple tasks, it is often easier and just as efficient to implement them using JavaScript. Alfresco incorporates the Mozilla Rhino JavaScript engine. It includes all of the standard functions and classes of ECMA Script, but also has the ability to work with the Alfresco content model as well as the JBoss jBPM model for workflow applications. A special data dictionary space is provided for storing and managing scripts just as one would for any other content, allowing complete versioning, locking, auditing and CIFS access.

In this example, the “emailtag.js” JavaScript is invoked through the rule associated with email drop zone space. This script finds a tag from the subject line that has just been extracted from the previous rule action, searches for any term that is delimited by square brackets and adds that to the tagged metadata aspect. All that is required is the following four lines.

  var subject= document.properties.subjectline
  var tag= subject.substring(subject.indexOf('[')+1,subject.indexOf(']'));

  document.properties.tag = tag;
  document.save();

The script is atomic in that either the whole action occurs or it doesn’t. The script could also set up complex classifications or relationships to another content objects. Most of the Alfresco processing that can be done in Java can also be done in JavaScript, so the choice becomes one of performance and extension rather than capabilities.

Building an RSS Feed using FreeMarker

The Alfresco system also includes the FreeMarker templating engine. FreeMarker was chosen for its extensibility to other data models as well as its ability handle XML. The templating language is particularly suited to production of HTML and XML. Like other templating languages such as Velocity, Perl or PHP, directives to access and manipulate data are defined in tags interwoven with the static output to be delivered. The FreeMarker language has constructs for manipulating lists, defining reusable macros, and string and variable manipulation.

Alfresco has an open templating engine interface into which FreeMarker has been incorporated. FreeMarker has access to the Alfresco data model and can query and access content. FreeMarker can iterate through a folder, walk through a parent-child tree structure, and access properties and content. This ability provides a convenient tool for constructing complex content and provide re-use of content. In addition, FreeMarker has access to the URLs and icons for content to generate query-driven links and good report writing capabilities. Although designed for generating HTML and XML, FreeMarker can be used to generate any type of content and is the content is URL-addressable from Alfresco.

For this example, we use a FreeMarker template to generate an RSS feed for specifically tagged emails that have been collected by our email listener over the last seven days. In the FreeMarker template, we set up the normal RSS headers and use references to the Alfresco model to set up the description of the feed. For brevity, we include the heart of the RSS feed, which is a list generated by an XPath query of all content in the email space that has the tag of the argument tag associated with it. The template then pulls out metadata out of the content node to populate the appropriate RSS tags.

<?xml version="1.0"?>
<rss version="2.0">
<channel>

...

<#assign weekms=1000*60*60*24*7>
<#list space.childrenByXPath
      [".//*[@cm:tag:${args.tag}]"] as child>
    <#if (dateCompare(child.properties["cm:modified"], date, weekms) == 1)
|| (dateCompare(child.properties["cm:created"], date, weekms) == 1)>
    <item>
       <title>${child.properties.name}</title>
       <link>${hostname}${child.url}</link>
       <description>
         ${"<a ref='${hostname}${child.url}'>"?xml}
         ${child.properties.name}
         ${"</a>"?xml}
         <#if child.properties["cm:description"]?exists
            && child.properties["cm:description"] != "">
            ${child.properties["cm:description"]}
         </#if>
       </description>
       <pubDate>
       ${child.properties["cm:modified"]?string(datetimeformat)}
       </pubDate>
       <guid isPermaLink="false">${hostname}${child.url}</guid>
    </item>
  </#if>
</#list>
...

To invoke this RSS feed, first save the above script in the Presentation Templates space of the data dictionary. Then navigate to the email drop zone space and open the properties dialog. There is a tabbed area for RSS feeds. Apply the above script as the RSS feed and copy the URL link for the RSS feed. Add an argument of “?tag=tag_name” and add this to your RSS reader.

Scalability and Clusterability

This application provides an example of the capabilities for storing, managing and accessing content from the Alfresco repository. This application can sit side by side with the other applications that Alfresco provides out of the box. Nothing is required to make this application and others scalable.

The Alfresco system can scale from small organizations to hundreds or even thousands of users on inexpensive off-the-shelf hardware. In benchmarks validated by independent parties, Alfresco using RHEL 4 and MySQL 5.1 was able to produce the following numbers on a SuperMicro 3GHz Opteron dual core, dual processors system with 12Gbytes of memory of which 4Gbytes were allocated to Java and 6 x 100G RAID-configured drives.

  • 10 Million objects total in repository
  • Bulk load 60 documents per second into 10 Million object repository
  • Up to 128 concurrent threads
  • Access via unique id in under 0.1 seconds
  • Concurrent active mix of reads and writes at 128 per second

To support even larger systems, the Alfresco system can be clustered in loosely coupled hardware to take advantage of existing hardware resources. This is due to the fact that Alfresco is architected as stateless system with all operations performed in the context of transactions coordinated through the underlying database.  Using the distributed EHCache open source cache means that all clustered systems share a common view of the contents of the cache and their freshness. Combined with a clustered database such as MySQL 5.1, the Alfresco system can be extremely scalable.

Conclusion

We have seen an example of how the Alfresco system can be used to build an enterprise-class application such as the archival and retrieval of email and enhance that storage with rules that can extract additional metadata and act upon that data. We have seen how the Alfresco system can be used as a web conduit for monitoring and delivering content from an enterprise repository. The system itself can scale to the requirements of the enterprise using the inherent scalability of the components upon which Alfresco has been built and through the transactional clustering capability of the system.

If you would like to know more about Alfresco, please visit the developer web site at http://www.alfresco.org.

John Newton is Chief Technology Officer of Alfresco. David Caruana is the Chief Architect of Alfresco. Paul Holmes-Higgin is the Vice President of Engineering for Alfresco.

My Photo

  Subscribe
Add to Google Reader or Homepage
Subscribe in 

Bloglines

Subscribe in NewsGator 

Online
Add to netvibes
Subscribe in FeedLounge

Blog Roll

Powered by TypePad
Member since 02/2005

My Online Status