Tony Byrne announced that he is hosting a panel on convergence between enterprise data and content management and poses it as a question - will structured and unstructured information management converge? My short answer is no, but that answer has a complicated reason behind it. Much of it has to do with the fact that the larger stack of enterprise software is consolidating around it. Here are some of Apoorv Durga's comments on convergence as well.
I have lived in both worlds having worked with relational databases since 1977, being one of the founding engineers at Ingres and then co-founding Documentum with Howard Shao. While at Documentum, we explored what content was and how it was different from databases. Over the years my early bigotry in favor of a purely relational view of the world has given way toward a more relaxed view of how content is structure, indexed and managed. While starting Alfresco, we had the opportunity to start from scratch but still used some of the concepts that have proven effective in capturing and delivering information to users.
The relationship between relational databases and content management is like nuclear physics and organic chemistry. Relational database provides the mechanics to make data and information happen and content management builds upon that. Relational databases provide the transaction controls to ensure data integrity, the back-up tools to make sure that information is recoverable, replication to move data from one location to another, and the query, data manipulation and relationship tools to handle much more complex structures. Content management is more like the organic chemistry of information, combining information and relating it to human beings to make it more usable and consumable. The structures, processes, and models of content are different from other classes of information management. However, just like organic chemistry, content management may combine with other classes of application just as relational databases have. We are just missing the standardization and theoretical foundations of content management that have supported relational constructs.
What makes content management different from data management is how close it is to people. To make content useful, the people who create the information need to understand how it will be used. Content needs to be compelling, original, concise and understandable. Content has context that only humans can provide and only humans can use. This means that the services around content are more about change than integrity. Integrity is important, but that’s why the database is there. There is a whole rich set of services there to deal with transformation, change process, classification, publishing, versioning, content to content relationships, links and a whole bunch of other things that databases just don’t "think" about. Search may be yet another system that has no relational database at all, but should use the concepts that have been built up by the content management system. That’s why content management systems are separate systems built upon relational databases and integrated with separate search systems.
Since the inception of content management, the content management vendors have by and large continued to support the notion of a repository sitting on top of a relational database and integrated with a separate search system. Interestingly, many of the main vendors of ECM are now the database management companies - IBM, Oracle, and Microsoft. This should not be surprising since content management is now one of the fastest growing segments of database applications. Even so, these companies have chosen to layer their content management software on top of their relational database systems. The database groups are then free to focus on data management as their core competency. Databases support not just content management, but transactional systems and analytical business intelligence systems. Internal to these companies, the database groups have not really subsumed the content groups. Microsoft flirted with the idea of combining everything into one server group, but unwound that decision to have Sharepoint in the Office group. IBM’s content group reported into the DB2 group, but remained independent and it remains to be seen where it ends up after the FileNet acquisition. Oracle’s content group has wandered all over the organization since Oracle first attempted to build content systems in the late 1980s.
The non-database vendors of content management - EMC, OpenText, Interwoven, Vignette and Alfresco - still use relational databases in the management of content and layer their services above a database. Interwoven tried to not use databases to improve performance in the early days and took a very XML-based approach to managing, categorizing and controlling content, but this ended up being a losing proposition to companies worried about integrity. EMC sees a future that is independent of all these stack war issues in that people will always need storage and that content management is really about managing storage. They are essentially above (or below) the stack wars, but don’t be surprised to see them try to architect the database out of the equation. OpenText, Interwoven and Vignette look to either get acquired or get out of the way. At Alfresco, we believe that open source is the open alternative to the stack wars, which I will speak about later. The motivation of each is not the convergence of content and data, but the consolidation of the ECM stack at one level and the entire enterprise software stack at another with fewer and fewer players.
From Kathy Sierra's blog
What is happening at the macro business layer is that entire application stacks are consolidating to manage the data of record. IBM, Oracle, Microsoft and SAP are all vying to own the data and make themselves as sticky as possible. Each has Service-Oriented Architecture to make it possible to surround that data and to integrate it with other stacks when necessary. Data in the case of content management is simply the data about the content and is not a whole lot different than customer data as far as these stacks are concerned. These stacks need the checklist of the big items that enterprise customers are buying in order to build or integrate applications. This includes relational database, content management, business intelligence, build and test environment, system administration, and all sorts of XML stuff. Most of these, with the exception of IBM, have gobbled up the top application layer including CRM and ERP. SAP flirted with the database layer in alliance with MySQL, but seems to have abandoned this strategy. It could be though content management may be a common stack component if SAP goes out and purchases an ECM vendor. Content has become an important part of the data being managed and these SOA stacks will just link it like any other data.
Despite the relentless consolidation of these stacks, sucking in the ECM market with it, total integration of all systems into a single stack is impossible. At best, these stacks are fighting for a bigger piece of the enterprise pie by displacing smaller players. Enterprises are trying to go from a choice of 25 different systems to 3, but not down to one. Microsoft building Sharepoint organically can exclude other databases other than SQL Server, but lose a chunk of the market in the process. Will IBM really limit FileNet to only DB2? Will Oracle lock Stellent only to its database? Well maybe, but a totally integrated stack does not solve all problems of enterprise process or control. Likewise, SOA has not delivered on the promise of interoperability, despite the billions of dollars spent by IBM, Microsoft and major enterprises. Nor does it move far outside of back-office systems and into the front-office systems and web sites where most of the value is presented to an enterprise’s customers. It does not deliver the conversation with its customers that enterprises are increasingly demanding.
From Kathy Sierra's blog
There is a lot happening out in the world of the Internet that is making this whole notion of data versus content irrelevant. Web 2.0 has moved the conversation from the whole notion of bits and bytes into what matters is the content, people and the relationships between people and content. Web 2.0 says that people don’t care about data and structure, but in communicating with each other and building closer relationships. This notion is seeping into the enterprise software space with the class of software known as Enterprise 2.0. It is still early days, but billions of dollars of value have already been built upon the foundations of Web 2.0 and those foundations are at least 90% open source.
Open source has provided an alternative view of the vertical stacks that are being created by IBM, Oracle, Microsoft and SAP. In this view, open source is the stack and dominated by no one vendor. Each layer of the stack can be substituted with a best of breed open source component. These layers have been constantly rising from the operating system to the database to the app server and now the application layers. These application layers look a lot different than the enterprise stacks though. Rather than integrating at the depths of the infrastructure in a structured SOA, they are “mashing up” near the user and making it much easier for more providers to create new services and applications not depending on any particular stack. In fact, the stack is irrelevant as long as it is freely available. How many people really know what is behind Amazon, Google, Yahoo or Saleforce.com? The answer is a lot of open source, but which open source doesn’t matter a bit to the end users of those systems. At Alfresco, we are one layer in that open source stack and the user is free to choose that component or any other in the open source stack.
I plan on attending this session and seeing what others think.