Enterprise Software

Adobe embeds Alfresco Repository

It's been quite a while in the making, but I am very pleased with the news today that Adobe will be embedding Alfresco technology as part of its LiveCycle Suite. A while ago, I wrote a blog about embeddable content repositories. It was clear then and more clear now that the old generation of content repositories is not really designed to be embedded as part of content-oriented applications. Yet, we all know that there is more information in content than there is in databases. Why can't applications use a set of services for managing content the way they manage data in embedded databases?

On this particular news, ComputerWorld reports Raja Hammond, Group Manager for Adobe LiveCycle, as saying, "Alfresco has a fantastic lightweight installation. It is J2EE server-based, so it is very much aligned with our architecture. We're able with this release to totally embed it. We've done extensive customization to the UIs to add additional capabilities to them. We've integrated them tightly with the various solution components within LiveCycle."

At InfoWorld, Brian Wick, Director of Product Marketing at Adobe said, "It's much easier, much quicker for our customers to build LiveCycle apps with the content services piece built in." This should be the sentiment of any product manager whose product handles content. This clearly the case of LiveCycle which handles potentially huge numbers of PDFs and forms.

Over at CMS Watch, Alan Pelz-Sharpe, a long-time ECM observer, blogged on the announcement that, "
It's been a while since there was a big product announcement in the ECM world, but today's announcement by Adobe that they will be embedding Alfresco into their LiveCycle Enterprise Suite will doubtless garner a few headlines. Alfresco, the UK-based open source ECM company, has certainly done a great job of marketing themselves since their launch a couple of years back, stealing some limelight from more established and much bigger vendors such as Interwoven, Vignette, and OpenText. The question we have to ask is whether this announcement is another marketing   triumph, or whether it suggests something more substantial. First off is the fact that it is a real OEM (Original Equipment Manufacturer) deal, and the technology will actually get embedded into the Adobe offering, so it is more than simply a paper partnership."

It is also significant that the Alfresco platform is open source. Open source allowed Adobe and our dozens of other OEMs to try out Alfresco before even approaching us. Open source also provides a level of comfort and confidence in a platform for services like content services and content repository. It is much better than providing code in escrow. it actually provides a community as well to ensure the long-term success of the platform.

We look forward to a fruitful and simbiotic relationship with Adobe. We believe that this is the beginning of looking at content management as a peer of database management of an essential component of any enterprise-class application. Congratulations to Adobe on all the hard work and the new release.

MySQL Acquisition and Enterprise Software

In a software industry that had little innovation and created obstacles for the next class of rising companies, open source is turning enterprise software on its head. Xen Source, Zimbra and JBoss are now part of larger companies acquiring new technologies and new distribution models by leveraging the power of open source. Now we see that MySQL has been acquired by Sun for $1 billion. Sun has been embracing open source more and more under Jonathan Schwartz's watch as CEO and this can be seen as a logical next step in that strategy.

Marten_mickos10052_2

Marten Mickos, a happy man and a really nice guy.

When we started Alfresco, we came in with the assumption that one of the only things that is working in enterprise software is open source. The past year or so have proven this prediction right. Although it wasn't really my prediction. A meeting with Marten Mickos, CEO of MySQL in 2002, helped me understand that, yes, open source really could work. Up to that point, I was of the same opinion as Bill Gates, that open source is equivalent to communism. MySQL helped me understand the power of huge numbers of people using software and the value that support can provide to fund the development of professional software. The fact that the model works means that small open source companies can thrive in an environment of behemoths consolidating stacks and actually create an environment of innovation.

Mysqlconfaxmarkwidenius_2

David Axmark and Monty Widenius, founders of MySQL

When a category has been around long enough that customers know what they want, then open source works really well. MySQL provided a simple, cost-effective database system that meant that you didn't have to install a big, hulking Oracle, DB2 or SQL Server and more importantly, you didn't even have to pay for it. You just pay for support. JBoss did the same thing for app servers, Xen Source for virtualization systems and Zimbra for email. Some people question whether MySQL was really innovating, after all the set of SQL is the same as Oracle had in the early 90's. In reality, there wouldn't be a Web 2.0 or possibly even a Web 1.0 without MySQL. MySQL pioneered the model of Scale Out rather than Scale Up to provide web properties like Facebook, Google, Yahoo, etc. to scale to levels that were unthinkable in Oracle back in the 90s. JBoss, Xen and Zimbra were doing the same to their respective industries and bigger companies were willing to pay for that.

From our perspective at Alfresco, Sun is a great company to acquire MySQL. Sun has proven their alliance and cooperation with open source. And this doesn't change our plans to become a public company. We have created public companies in the past and we intend to in the future. Our sales of support and development of our community have exceeded our expectations and events like this make us even more determined that IPO can be successful for the development of the Alfresco system and the Alfresco community.

Congratulations to Marten and team and good luck in the future. We are looking forward to more successful collaborations and joint deployments of Alfresco and MySQL.

Scaling Out Like Technorati

My fellow World Economic Forum Technology Pioneer, David Sifry, the founder of Technorati, was also in Dalian, China for the “Meeting of New Champions” or “Summer Davos” as the Chinese like to call it. During Davos in January, I had the great misfortune of pitching Alfresco against Technorati in a competition between tech pioneer companies. As fantastically well as Alfresco is doing, Technorati has the temerity to compete against Google in blog search and win.

I got the chance to talk to Dave during the conference and ask him some questions on the technology and architecture behind Technorati, the internet blog search site. I thought that someone who could take ordinary computer components and build a huge internet architecture could possibly teach something to people running enterprise architectures that are puny in comparison.

Technorati is a web site that tracks blogs, pictures and any user generated content and allows you to search those sites about what people are thinking, seeing and hearing. When a new or urgent situation breaks out, you can do worse than to search Technorati for immediate reaction. Every day, every hour, every second, Technorati is indexing over 10 million blogs with over 10 billion objects. Technorati’s user base is doubling every six months and quick and accurate response is critical for retaining those users.

Davidsifry
David Sifry, Founder and Chairman of Technorati

I asked Dave about his architecture and what applicability their might be for enterprise architectures.

John Newton: In building Technorati, what were some of the issues that you had in architecting your systems.

David Sifry: I was looking at just temporal information. I had no idea how big it could get. When I looked at the architecture, instead of architecting it right, I architected it for right now. I had no big budget and I didn’t want to wait six months to build it. Also, I had no idea what the killer app would be.

I focused on data flexibility. At the time, that meant putting everything into a relational database. That was okay while the size of the indexes is less than RAM and about a million blocks of data. That was okay while there were less than 20 million blogs.

The next generation took advantage of data parallelism. That meant upon update send a signal to all the other systems. We expanded the data over several “shards” [segments of data partitioned on different databases on separate machines].

What was surprising was that we were writing as much data as we were reading. At this point Technorati was as big as some of the biggest OLTP. Even so, maintaining data integrity was important, because you would want the link count [count of how many other blogs point to a particular URL] to be out of sync. This put real pressure on the system. At the same time, we realized that time was more important dimension than URL. People didn’t want to sort or search on URL, they wanted to search on time. [i.e. what are the latest blogs on a particular subject?]

By this point, we understood the application more and more. The app [Technorati] is about real time access. You need to be able to count on finding latest information on a subject. That’s when we built the third architecture. Scaling was well understood and we build the shards on time rather than on URLs. Instead of putting data into a DBMS, we put it into special purpose databases. It was more of a bus-based architecture. Each database could be scalable and grow as big as we needed.

JN: The notion of shards - did you call it that at the time? I have been looking into shards and I was only aware of or heard of them for about the last year.

DS: Back in 2002 when we were pitching this to VCs, I at least explained the theory. All I just thought through the problem carefully. Doing it this way, we could add hundreds of systems, lots of cheap CPUs, RAM and disks. It provides inherent parallelism. I can’t believe that I was the first one to think this up.

JN: How big does this architecture scale?

DS: We are loading one terabyte a day into Technorati. That’s 100 million blogs or about 10 billion objects. A lot of is new types of tagged data. There are about a half billion videos and photos.

With all that data, you have to think about what do you throw away?  We can’t really delete anything, because we are potentially losing an asset. We don’t delete anything. So we take data out of the spin cycle. [Transitory data used in preparation.] We take the long-term data and put it into low latency storage.

When data is doubling in size every six months, that means that only one quarter is a year old. We don’t need to worry old data.

JN: How do you deal with large number of users with very large data sets?

DS: Any off the shelf tools falls over. There is a lot of interesting analysis on old data, but no off the shelf tools can handle that much data. It’s only just now that some tools can handle it.

JN: What are those tools?

DS: One is Green Plum by a bunch of O’Reilly guys. If you use ordinary data warehouse tools, they would just scream and shout.

JN: Actually what I was originally referring to was the fact that you are showing lots of data that are not users used to enterprise information management tools. How do you present this information to consumer-level users? How do you deal with the user interface and visualization of all this data?

DS: Gotcha. It depends on what the user wants to get out of Technorati. If the user wants search results, then we give it to them. Sometimes they want to browse or discover information. We have spent a lot of time on visual design. Then we give them lots of bright, shiny things for them to click on.  Things like metadata, video or other links.

We have used enterprise class web tools to analyze what users are doing? We look at the click stream and see what is successful or not. That helps to make the information contextual.

One of the big mistakes that we made is to not do this [buy click stream analysis tools] sooner. It was only $80K. Up to that point it was so much trial and error. I’m glad we finally did it. Now we can see how much time a user spends on a feature. We can see page views, goals per visitor.

JN: So what do you measure on Technorati?

DS: Measuring a web site is like forecasting the weather. Yesterday it’s sunny and today it is cloudy. Why is it cloudy?  Sometimes you have no idea. Sometimes you realize that that a change in barometric pressure has a lot to do with it.

We look at the number of newbies, number of reports, session lengths and then measure them against prior periods. It’s not always consistent.

I had never built a B2C site before. I just focused on me, on what I wanted. That worked well for a while when I was the target audience. But we have to build for a broader audience.

JN: At Alfresco, we measure conversions. Are you measuring things like performance? Does that affect retention of users?

DS: Of course, but if the system is falling down, then even performance doesn’t matter. So I don’t get too stressed out about it.

JN: When we met at Davos you wanted to move Technorati to be the Internet Now! Is that still the case?

DS: Everything is shifting. I wanted it to be a site that everyone is able to use. We forgot about the core users that just wanted to find out about blogs and any real time information. In an attempt to jump the chasm, we chased after 100 million users and tried to be everything to everyone. Now we try to make blogs and user driven content available for those looking for that.

Also performance is improved significantly. Now I notice how slow other sites are. This is a total tribute to the engineering team. Everything is easier and faster.

Pretty soon we will have a whole lot of stuff that we have been working for a year.

JN: Can you say what it is?

DS: I don’t pre-announce.

JN: What does the Technorati brand stand for today?

DS: Good question. What’s popping up now on the internet, especially user generated content? It’s about users tagging user generated content and finding it.

JN: Who are your competitors?

DS: I probably sound like the typical entrepreneur, but nobody really seriously. Google provides blog search, but other than that nobody really. Other people are trying to identify and tag information like Digg and del.icio.us, but they aren’t really competition.

JN: What do you want Technorati to be in two years time? Five years would be ridiculous.

DS: I would like Technorati to be a profitable business that is strongly differentiated. It will be the place that you would go for mobile, RSS or push information. For all that you would come to Technorati.

Open Source and Business Pleasure vs. Business Pain

A European PR firm was pitching my company for business last week and putting out a few ideas on how to generate demand in different countries across Europe. One of the ideas that they presented was a “business pain barometer” to indicate how much pain companies might be feeling using existing enterprise systems. This didn’t exactly resonate as a value proposition for open source, but it is a tried and true campaign strategy for traditional enterprise systems. Selling pain relief has worked for the last three decades to sell enterprise software, but has it run its course?

Read the rest of this entry »

When Collaboration is an Emergency

Tsunami, Earthquake, Hurricane, Flood - everyone’s nightmare disaster can also create the biggest challenges in collaboration and employing information technology. The same Communications of the ACM that had the 7 Habits article had a whole series of articles on Emergency Management Systems.  Surprisingly, the techniques that are required to cope with the flood of information in case of disaster don’t seem all that different from those required by business today. The Indonesian Tsunami and Hurricane Katrina taught the whole planet about the need to invest in preparedness and reaction capabilities regardless of how poor or how rich we are.

One article I found particularly interesting was The Human and Computer as a Team in Emergency Management Information Systems. In fact, nothing in this article seemed to be limited what is required in a disaster, but what is necessary for coping with daily business pressure and information overload. The primary process of coping with a disaster is Build the Picture, Understand the Picture, and Change the Picture in a Goal-Oriented Fashion. Sounds like good business strategy to me.

The article talks about the people who are involved in the command, control, and analysis for emergencies that are built on trust of others who also are working in 14-24 hour shifts knowing that mistakes can cost lives and immediate action is essential. As if describing the persona in a use case, these workers:

  • Feel they are exercising control
  • Have total focus on the problem at hand and ignore all that is not relevant
  • Improvise with unconventional solutions to appraise information and formulate decisions
  • Enjoy the challenge and curiosity of the effort
  • Are highly motivated due to the critical nature of the problem

This sounds like a typical Silicon Valley start-up or anyone else in a highly competitive field where people enjoy what they are doing. The design of the emergency response system then creates challenges that are not a typical of other collaborative systems in a highly reactive environment:

  • Obtain accurate and timely perceptions of reality through communication structures that track and facilitate open exchange of information
  • Enhance focus without interruption and require minimum effort to carry out a task
  • Encourage creativity and improvisation of both the individual and the team

Emergency_system

In disaster management systems under development, the emergency manager has the following tools at hand:

  • Information prioritization - rules to prioritize situational information defined by context
  • Decision support and modeling tools - impact analysis and support for decision execution
  • Representation of a common operating picture - visualization of what is happening and where resources are open to everyone

Wouldn’t it be great to have a system like this in any business? It requires a good understanding though of the participation of the people and computers. What is each good at and what is each bad at? People are good at:

  • Perceiving patterns
  • Improvising flexible procedures
  • Exercising judgment
  • Inductive reasoning
  • Detecting small changes by sight and sound
  • Storing large amounts of information for long periods of time and recalling facts at the right time

Machines are good at:

  • Responding quickly to control signals
  • Applying great force smoothly and precisely (like landing a 747)
  • Repetitive, routine tasks such as monitoring
  • Handling highly complex operations and multi-tasking
  • Deductive reasoning and computation
  • Storing information briefly

In other articles, there were extending these systems to use community participation using open source and mashup to collect information not just from officials but the public at large. Those that were prepared for the Tsunami were often ready because they were alerted by mobile phones. The internet can also play an important role in collecting intelligence. After all, the internet was originally designed to withstand thermo-nuclear war and breakdowns in individual communication links. Here are some examples of mashups for accessing and collecting information:

Emermap1

Emermap2

Automation and collaboration have a role in emergency management. Just as triage methods were invented in time of war and moved on to ordinary civil use, emergency systems can probably help teach us what is important in collaboration and process automation. The primary lesson that the Human and Computer as a Team article conveys is that we ignore the human role at our peril and that the computer supports people and helps build trust between people by increasing trust in the information that they are sharing.

Convergence of Content and Data Management?

Tony Byrne announced that he is hosting a panel on convergence between enterprise data and content management and poses it as a question - will structured and unstructured information management converge? My short answer is no, but that answer has a complicated reason behind it. Much of it has to do with the fact that the larger stack of enterprise software is consolidating around it. Here are some of Apoorv Durga's comments on convergence as well.

Tonybyrne
"Oh really?"

I have lived in both worlds having worked with relational databases since 1977, being one of the founding engineers at Ingres and then co-founding Documentum with Howard Shao. While at Documentum, we explored what content was and how it was different from databases. Over the years my early bigotry in favor of a purely relational view of the world has given way toward a more relaxed view of how content is structure, indexed and managed. While starting Alfresco, we had the opportunity to start from scratch but still used some of the concepts that have proven effective in capturing and delivering information to users.

The relationship between relational databases and content management is like nuclear physics and organic chemistry. Relational database provides the mechanics to make data and information happen and content management builds upon that. Relational databases provide the transaction controls to ensure data integrity, the back-up tools to make sure that information is recoverable, replication to move data from one location to another, and the query, data manipulation and relationship tools to handle much more complex structures. Content management is more like the organic chemistry of information, combining information and relating it to human beings to make it more usable and consumable. The structures, processes, and models of content are different from other classes of information management. However, just like organic chemistry, content management may combine with other classes of application just as relational databases have. We are just missing the standardization and theoretical foundations of content management that have supported relational constructs.

Notary_cartoon

What makes content management different from data management is how close it is to people. To make content useful, the people who create the information need to understand how it will be used. Content needs to be compelling, original, concise and understandable. Content has context that only humans can provide and only humans can use. This means that the services around content are more about change than integrity. Integrity is important, but that’s why the database is there. There is a whole rich set of services there to deal with transformation, change process, classification, publishing, versioning, content to content relationships, links and a whole bunch of other things that databases just don’t "think" about. Search may be yet another system that has no relational database at all, but should use the concepts that have been built up by the content management system. That’s why content management systems are separate systems built upon relational databases and integrated with separate search systems.

Since the inception of content management, the content management vendors have by and large continued to support the notion of a repository sitting on top of a relational database and integrated with a separate search system. Interestingly, many of the main vendors of ECM are now the database management companies - IBM, Oracle, and Microsoft. This should not be surprising since content management is now one of the fastest growing segments of database applications. Even so, these companies have chosen to layer their content management software on top of their relational database systems. The database groups are then free to focus on data management as their core competency. Databases support not just content management, but transactional systems and analytical business intelligence systems. Internal to these companies, the database groups have not really subsumed the content groups. Microsoft flirted with the idea of combining everything into one server group, but unwound that decision to have Sharepoint in the Office group. IBM’s content group reported into the DB2 group, but remained independent and it remains to be seen where it ends up after the FileNet acquisition. Oracle’s content group has wandered all over the organization since Oracle first attempted to build content systems in the late 1980s.

The non-database vendors of content management - EMC, OpenText, Interwoven, Vignette and Alfresco - still use relational databases in the management of content and layer their services above a database. Interwoven tried to not use databases to improve performance in the early days and took a very XML-based approach to managing, categorizing and controlling content, but this ended up being a losing proposition to companies worried about integrity. EMC sees a future that is independent of all these stack war issues in that people will always need storage and that content management is really about managing storage. They are essentially above (or below) the stack wars, but don’t be surprised to see them try to architect the database out of the equation. OpenText, Interwoven and Vignette look to either get acquired or get out of the way. At Alfresco, we believe that open source is the open alternative to the stack wars, which I will speak about later. The motivation of each is not the convergence of content and data, but the consolidation of the ECM stack at one level and the entire enterprise software stack at another with fewer and fewer players.

Buythis

From Kathy Sierra's blog

What is happening at the macro business layer is that entire application stacks are consolidating to manage the data of record. IBM, Oracle, Microsoft and SAP are all vying to own the data and make themselves as sticky as possible. Each has Service-Oriented Architecture to make it possible to surround that data and to integrate it with other stacks when necessary. Data in the case of content management is simply the data about the content and is not a whole lot different than customer data as far as these stacks are concerned. These stacks need the checklist of the big items that enterprise customers are buying in order to build or integrate applications. This includes relational database, content management, business intelligence, build and test environment, system administration, and all sorts of XML stuff. Most of these, with the exception of IBM, have gobbled up the top application layer including CRM and ERP. SAP flirted with the database layer in alliance with MySQL, but seems to have abandoned this strategy. It could be though content management may be a common stack component if SAP goes out and purchases an ECM vendor. Content has become an important part of the data being managed and these SOA stacks will just link it like any other data.

Despite the relentless consolidation of these stacks, sucking in the ECM market with it, total integration of all systems into a single stack is impossible. At best, these stacks are fighting for a bigger piece of the enterprise pie by displacing smaller players. Enterprises are trying to go from a choice of 25 different systems to 3, but not down to one. Microsoft building Sharepoint organically can exclude other databases other than SQL Server, but lose a chunk of the market in the process. Will IBM really limit FileNet to only DB2? Will Oracle lock Stellent only to its database? Well maybe, but a totally integrated stack does not solve all problems of enterprise process or control. Likewise, SOA has not delivered on the promise of interoperability, despite the billions of dollars spent by IBM, Microsoft and major enterprises. Nor does it move far outside of back-office systems and into the front-office systems and web sites where most of the value is presented to an enterprise’s customers. It does not deliver the conversation with its customers that enterprises are increasingly demanding.

Userhierarchyofneeds

From Kathy Sierra's blog

There is a lot happening out in the world of the Internet that is making this whole notion of data versus content irrelevant. Web 2.0 has moved the conversation from the whole notion of bits and bytes into what matters is the content, people and the relationships between people and content. Web 2.0 says that people don’t care about data and structure, but in communicating with each other and building closer relationships. This notion is seeping into the enterprise software space with the class of software known as Enterprise 2.0. It is still early days, but billions of dollars of value have already been built upon the foundations of Web 2.0 and those foundations are at least 90% open source.

Open source has provided an alternative view of the vertical stacks that are being created by IBM, Oracle, Microsoft and SAP. In this view, open source is the stack and dominated by no one vendor. Each layer of the stack can be substituted with a best of breed open source component. These layers have been constantly rising from the operating system to the database to the app server and now the application layers. These application layers look a lot different than the enterprise stacks though. Rather than integrating at the depths of the infrastructure in a structured SOA, they are “mashing up” near the user and making it much easier for more providers to create new services and applications not depending on any particular stack. In fact, the stack is irrelevant as long as it is freely available. How many people really know what is behind Amazon, Google, Yahoo or Saleforce.com? The answer is a lot of open source, but which open source doesn’t matter a bit to the end users of those systems. At Alfresco, we are one layer in that open source stack and the user is free to choose that component or any other in the open source stack.

I plan on attending this session and seeing what others think.

Open Source Business Models

Venetian

Yesterday, I was on a panel at The Server Side Java Symposium at the Venetian in Las Vegas. It’s been 7 years since I have been to the Venetian when we had the Documentum user group meeting, Momentum, at the Venetian. Having since been to the real Venice and been in a real gondola, the effect of seeing indoor canals and tromp l’oeil ceilings is even more surreal. (BTW - Don’t pay 80 euros to ride in a gondola in Venice. Spend one euro to take the Targetto. Don’t spend any money to ride a gondola in Las Vegas.)

This is the first time that I had met Joe Ottinger, the editor of The Server Side, in person. We have spoken on the phone before and he is a nice, funny, unassuming guy. Joe was the moderator of the panel that included Joaquin Ruiz from Spike Source, Brian Kim from Liferay, Bob McWhirter from JBoss.org, and Neelam Choksi from Interface 21 (Spring Framework). Geir Magnusson of Apache was also planned to be on there, but had other commitments. I know Joaquin and I were a bit surprised when we found out that the panel was only supposed to last 35 mins. (I got up at 5am and flown all the way to share 35 mins with 5 other people?!) Fortunately, Joe was able to start the session earlier and let the panel run on a bit longer. Most people stayed as well for the post-lunch session.

Most of the questions were about why open source and what license people should use. The reasons were consistent among us around the power of the model, the cost effectiveness of open source and the draw of the community. Joe had a question on how the community is managed and I think it was Dave who answered that it requires a certain level of control and discipline to ensure the quality of the code that is contributed. There were lots of questions around GPL, of which Joe is not a fan nor is he a fan of Richard Stallman. However, I think how people perceive GPL has changed in that you can combine non-GPL components with GPL and the FLOSS exception allows you to embed GPL in non-GPL systems. Everyone wants to know what will happen with GPLv3 and although we are tracking it, I can’t really say what will happen with the restrictions. (See Matt Asay's comments on GPLv3. Matt is our VP, Bus Dev and he knows a lot more about this than I do.) I commented on the concern that some OEMs have with LGPL in that it has not been tested in court. JBoss and RedHat have a very liberal and ISV friendly interpretation of it, but we need to have a court test to prove that it is okay to use in commercial software.

Richardstallman_2

Richard Stallman

I appreciate the time that Joe gave us and the opportunity to speak about Alfresco. The Server Side has been a good source of the downloads that we have had so far. However, the whole area of Open Source Business Models is so much more than licenses. It really is a disruptive model not just for software, but for many other industry sectors. I organized a few thoughts on the flight over to Las Vegas.

Peter_fenton_3
Peter Fenton

When we first started to look at starting Alfresco, we interviewed and discussed the concept with some of the best thinkers in open source. One person in particular was very helpful due not just to his exposure to open source, but due to the thinking he had applied to looking at open source in the abstract as an investment thesis. Peter Fenton was an associate at Accel Partners and now a General Partner at Benchmark Capital. Peter had come up with a set of criteria of what worked and what did not work in open source. At the time we started, there had been 10 years of experimentation in investing in open source, although I think the last couple of years have been particularly instructional. Here are Peter’s criteria:

  1. There must be a large market with millions of users. I would characterize this as having a critical mass. If you are going to try and spread this as far and wide as possible and you are only going to get a couple of percent actually converting to paying customers, then that number must be very large. In this case, horizontal markets are good and vertical markets can be very limiting.
  2. There should be organic adoption of the project. As Peter points out most projects flame out in the first 2 years. The long, slow adoption of the organic community is a good thing. However, there are other ways to get a community. We have found a natural community in Enterprise Content Management practitioners who find Alfresco familiar and with which they are able to deploy much more cost-effectively than closed source options. Likewise, companies like IBM and Adobe have ready made communities as they exploit the open source model. Perhaps Microsoft will one day realize the potentially powerful open source community that they will be able to create with a quick email from Bill.
  3. There should be demand-side economies. It’s hard to know exactly what Peter means here and it might be Stanford MBA speak for scarcity of open source alternatives. He states that there should be a natural monopoly or duopoly. As an investment, you are looking for a company to be big and therefore it needs to be not just a leader in the segment, but dominant. Open source web content management on its own is an example of a market that is too diffuse. Open Source Enterprise Content Management is not.
  4. There needs to be sufficient product complexity or drift. This complexity provides a means to add value-added service over a product that is free. Products like application servers and database management systems are inherently complex. Many of the Apache projects, such as Tomcat, are too simple for someone to make money on them. It is possible that a product can work too well.
  5. The project should address a new frontier of adoption. Innovation happens at the edge. This should not just be old stuff for free. Peter uses the example of the LAMP stack as addressing new web sites and an area that commercial vendors overlooked. This is classic Innovator’s Dilemma. However, I would state this as there being a compelling differentiation other than just being open source. The 10 times factor of performance or new functionality allows the product to sell itself with the open source model being a swift closer. The community can be the source of the innovation that adds this wow factor.

I would add my own corollary to rule number one. Rather than just a large market, it needs to be a commodity market. With the commodity market you get the size need for critical mass and the market has been educated on the need for that product. This, by extension, means that not all markets are ripe for open source. Lall stated during the panel that not all markets need to be commodities since open source has been a major source of innovation. I agreed that some categories of open source were created by open source, but they are still commodities. An example is wikis, of which there are lots and it is now a commodity.

Looking at these rules, there are a number of industries that fall into these categories. Back in 2005, Peter was evaluating content management, wireless, system management, internet browsers, security, business intelligence, application infrastructure, middleware, databases, build and test environments, development tools. The only major category that was missing was games, which has 3 of the top 20 software companies. Most of these categories have had investment and new entrants funded by the major VCs, including Peter while at Benchmark. Perhaps you can consider other industries that are commoditized where you can tell one product from another except through very deep analysis.

Tim O'Reilly has said that when the price of a product like open source goes to zero, then the adjacent activities in the value chain gain value. This fits very neatly into Michael Porter's concept of value chains and competitive value. How you make money from open source can be divided into a few different models.Those models are dependent less on how you make money on the software and more on other activities that are adjacent to delivery and use of that software. The most common models are:

  • Support - This is the model that Alfresco uses. This about providing technical support, maintenance and bug fixes. In this model, the free version usually moves faster than an enterprise version where more time is taken to certify the product against stack components. Bug fixes to this enterprise version are given higher priority, but are eventually folded into the main code line.
  • Professional Services - This is providing professional services in using the software to create new solutions specifically for a customer. Custom engineering can also come under this category although the result can end up benefiting everyone as an enhancement to the product. JBoss and Interface 21 make some of their revenue from this model. At Alfresco, we give almost all the service revenue to our partners. Services is also one of the lowest margin businesses with the greatest risk in terms of building capacity.
  • Insurance - This is a warranty or indemnity on the product. The company supporting the open source product guarantees to support the customer for damages or claims that may arise from the use of the product. This is rarely provided on its own and is usually provided as a value-added service in conjunction with either support or services.
  • Enterprise Features - For those venturing into open source from a closed source background, this is the most comfortable model. By holding back certain features, particularly those that are related to scalability or system management, then users can try the software out, but if they are getting significant value from the software, then they will pay for this. We tried this model, but have decided that we could build a much larger community and hence more customers by providing the product 100% open source. Having enterprise features also encourages the community to replace your enterprise features, which can end up being counterproductive for everyone.
  • Hosting - Hosting is becoming a more popular way to consume open source software. The people who create the product can get value by providing the software as a hosted service.
    Embedding - Commercial vendors who can use open source software can often license the software to include it in an OEM relationship. This model is often accompanied by a dual license, a license that can be distributed as only open source or as an exception to those who pay for a commercial license. Once a commercial license is purchased, then the licensee is free to use the software however they feel appropriate. This is a model that was pioneered by JBoss, but originally developed by Ghostscript, an open source PostScript interpreter.

Each of these models has different implication in terms of how big the community it creates, the perception it gives to those trying the software and how much pressure there is on users to convert to a paid-for version of the software. Those who have given trust have often returned it with either contributing back to the project or paying for support when deploying the product into production. The exception to this are Google and Yahoo, who have created multi-billion dollar businesses using open source, but have given nowhere near as much back. This is one of the reasons that there is so much controversy over GPLv3, which attempts to set boundaries on fair use in a hosted environment.

The open source model is very powerful and rapidly growing. In our first full year of revenue, we have grown 50% faster than Documentum did in its first full year of revenue. You can use this model regardless of where you are located since it is inherently global. Despite slower uptake of open source in the UK, we have been able to distribute our software to countries where open source has greater acceptance and make a big impact in the US, the largest software market. It’s also a lot of fun, because people who are buying your product are doing it because they have already tried it and like it, rather than being convinced or coerced into using it.

There is still a lot experimentation in the above dimensions in the model, but I think we will be seeing a repeatable, cookie-cutter model. After all the fuss around GPLv3 dies down and after there is clear precedence on interpretation of LGPL, I think we will worry less about license and more about growing open source faster and seeing the disruption through.

The Future of Enterprise Content Management

Chrysler_2

The Alfresco executive team and I have been on the east coast at a user advisory panel for our American customers here in New York. (See "Is New York the Center of the Software Universe?") That is one of the reasons that I haven’t been blogging as much. This meeting was graciously hosted by one of our customers, the law firm, Davis Polk and Wardell. The meeting room was such a contrast from our offices in little old Maidenhead with a conference room that was as big as our whole office.

Newyorkfinal

We received a lot of great feedback, but it will take me a while to digest all the information. However, I can share the strategy that I presented to this group. In attendance were two large software companies, two major banks, one federal agency, one newspaper and two major games manufacturers. All of them are also customers of other ECM products, but have chosen Alfresco for new content applications. They also said that none of the other ECM products are thinking about what comes next or at least they aren’t communicating it to their customers - probably a combination of both.

I started by discussing what we saw in terms of major business trends. These included:

  • Greater purchase power of IT customers
  • Continued outsourcing of non-core activities and related communications problems
  • More, not less regulation
  • Acceptance of Open Source by major corporations
  • The fruition and fatigue of implementing Service Oriented Architectures
  • Further consolidation of enterprise software
  • Re-emergence of B2B initiatives
  • Major rewrites of corporate web presence based upon Web 2.0 concepts

Additionally, some people mentioned the move toward a more trust-based development of content, particularly things like wikis, but also pointed out the continuing need to demonstrating trust through workflow for regulated content. Others pointed out the trend for the desktop to just go away with Google taking a lead in this direction.

I also gave my predictions of what is going to happen next in the ECM market as a result of enterprise software consolidation and IBM buying FileNet. Some of this informed by Geoffrey Moore’s Stack Wars in Orchestrating the Stack, which is a great insight into the maturation of the software industry, but neglects to mention the impact of open source. We believe that open source will provide an alternative stack that will be open to substitution of best of breed parts regardless of whether those parts are open or closed.

My guess as to what will happen to the ECM market is:

  • SAP will buy an ECM vendor further filling out one of the prime stacks in Geoff’s Stack Wars
  • OpenText continues to look for a buyer. Could they hook up with SAP after being jilted by Oracle? OpenText’s iXOS acquisition makes this an attractive pairing.
  • Vignette, partnering with companies like Microsoft, are testing the waters for a possible acquisition
  • Interwoven is testing a niche play by retreating into Marketing applications, but may still opt for being acquired. EMC could do worse than to acquire Interwoven. They could also help Microsoft.
  • The remaining players (other than Alfresco) will retreat into niche areas either around verticals or technical specialization. After the current boom in web redesign, this is a sure path to the living dead.
  • Alfresco may end up being last independent ECM vendor
  • The introduction of Microsoft Sharepoint 2007 will be the single most disruptive factor in the ECM market
  • Sharepoint 2007 has not really launched yet, but in competitive situations, Microsoft has told customers that a new version will be out (Service Pack 1?) with additional Web 2.0 features. Will this be the time that Sharepoint launches along with all the customers that Microsoft has been giving free consulting to?
  • Continued expansion of Alfresco will be the second most disruptive factor

One person asked about the missing Gorilla - Google. To what extent would we be competing against Google? Looking at a next 3 year time frame to 2010, I don’t see it happening, but it could in 5 years. Interestingly, at Davos Eric Schmidt made the statement that as far as Google is concerned, web sites, applications, service provision, telcos and devices are all merging together. If that is true, an always connected Google would have a powerful position.

I talked about standards as well. I have been reluctant to blog about these, because I am not sure what the confidentiality rules are around participation in the JCP (the JSR process) or in AIIM’s iECM. However, at a high level, I think that:

  • JSR-170 still only has stealth support as IBM and Oracle have active development around this standard, but don’t really say anything about it
  • JSR-283 is moving along and will have greater acceptance with the greater involvement of IBM, FileNet and Oracle. We’ll see what happens with Documentum
  • iECM is not happening, but something will happen to fill its place

With these current factors and trends, what will happen to technology? Well I have already written my predictions for what will happen by the year 2010. But I stuck my ill informed neck out to predict that storage will triple in capacity, network capacity will increase by 5 times and with WiMax emerging, we will see the first notions of constant connectivity. I also predicted that typical desktop PC will have 4 to 8 cores in this time frame. The only pushback I got was that one customer believes that cores are low and that we will see 64 cores in this timeframe.

There are substantial implications for these factors. Desktops are overkill for the applications that people will be working in 2010. Much more knowledge worker activity will be handled on handheld devices that will be much more easier to use due to improvements in user interface and much greater power and storage.

At this point we had a very interesting discussion on licensing. Our customers were concerned about what happens to our per CPU pricing model when cores go up to 64 and beyond. It is a problem that is facing every enterprise software manufacturer that charges based upon CPUs or servers. Those companies that charge per user will face a backlash of charging this way when all a user does is take a look at a single piece of content. The idea of charging for usage also came up as an access or “click” charge appealed to some people as a way of paying for value. Someone mentioned Google per content model, but most seemed to agree that this was hard to measure and enforce. Expect more discussion in this area in the future.

I then discussed some of my thoughts around Web 2.0 and its relationship Right-brained thinking. The nature of right-brained thinking with its specialization of artistic sense, spatial relationships, face recognition, abstract concepts and future orientation explains a lot about the trends happening in Web 2.0. The way to think about this is that the first wave of the web was about use left-brained programmers who built it and now it is about everybody else.

Brain_2
I believe that there are implications as well for enterprise software as the focus moves from back office systems to front office applications and customer facing web sites. The Web 2.0 concepts of conversations and connections between people must factor in these systems as do improvements with usability. Creativity and engagement will be just as important as the factual accuracy, completeness of information.

The subject then turned to Sharepoint. Many companies represented also have Sharepoint implementations. No one seemed especially pleased with it, but felt that it implementation was inevitable largely due to Sharepoint’s connection to Office. This is consistent with my observation that Sharepoint is an extension of the Office monopoly. I reiterated my point that I made in What the Heck is Sharepoint 2007 that Microsoft is still not clear on what Sharepoint is. The only definition is this picture:

Sharepoint_4

However, I conceded that Sharepoint is addressing a need in the enterprise that was not being met by the other ECM vendors. It is a knowledge worker stack for building knowledge worker applications as long as all the tools, platform and databases are Microsoft. Open Source, I believe, will provide an alternative with best of breed components. Our advisors suggested that we provide out of the box templates that would make it easier for end users to visualize what is possible. Our advisors also suggested that we provide integration with Office as Sharepoint does, which Paul Holmes-Higgin was able to demonstrate later. (Slick demo by the way.)

Prior to going into a detailed review of the Roadmap with Paul Holmes-Higgin and Kevin Cochrane, I talked about the strategy our team has been developing of creating a next generation ECM platform that meets the needs of Web 2.0 and exceeds platforms like Sharepoint. Characteristics of this platform include:

  • Multi-channel distribution to mobile devices as well as PCs
  • Mashup architectures that blur the line between internal and external systems
  • Incorporate best of breed components regardless of the platform upon which they were developed
  • Appliance delivery including soft appliances as virtual machines
  • Highly interactive content as visual, video-oriented and personal. We would really like to see Flex become open source and become the basis of this content.
  • People-oriented with close connections to directory services, presence and instant messaging
  • De-centralized and loosely coupled in the same style that we have integrated OpenSearch
  • Evolved by the community where it is not a single vendor driving its development and it developed through cooperation, not competition

For Alfresco there are four main use cases that our driving our strategy. These use cases are independent of how the solution is delivered whether it is a simple download, an additional package, a completely configured virtual machine, embedded in a device or hosted.

  1. Collaborate and Publish. We have seen in many companies a desire to set up an out of the box solution that allows users to collaborate on deliverables, help them track and manage those deliverables and then publish out the result to the web. In another time and place, one would call this knowledge management.
  2. Controlled documents. These are things like contracts and procedures that may be regulated and need to go through version control, review and approval and be audited.
  3. Intranet and Internet sites. This includes sites that are especially targeted toward marketing products.
  4. Records Management. The nice thing about US DoD 5015.2 is that it is a use case and has an entire validation test suite.

Our product strategy is to do the following:

  1. To fulfil the four quadrants of ECM functionality of document, web content, image and records management. The four quadrants are the main areas that Forrester says are the main areas that are being spent on ECM.
  2. Componentize our user interface to make it more mashable with other web sites and web applications. This is also the basis for our portal and Office integrations.
  3. Focus on Web 2.0 - People and Collaboration. We went through what our plans are for wikis and blogs as well as calendaring and integration with directory information.

Four_quadrants_2
Other things that are important for our product strategy include scalability where we ultimately want to scale to 1 billion objects with 100 million being the next stopping point. We also plan to do more work on distributed including replication of content, but still focusing on loosely-couple architectures. Finally, we intend to work on security, particularly in a loosely-coupled, mashed up environment where it is necessary to authenticate not just inside the enterprise, but outside and between web properties.

Overall, we got great feedback from an impressive list of customers. It’s too bad that their legal departments won’t let me mention their names. I look forward to writing up more about what they feel we could be doing better and where we should be investing our time.

Information Technology in 2010

Jetsons

I was asked by a major IT magazine how I thought Information Technology would be different in the 2010. Bill Gates said in the Road Ahead, “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next 10.” He’s right, but this is on the cusp of being in between, it’s only 3 years away. I would actually prefer to describe further in the future with my promised write-up of the Davos Connected User in 2015 session.

Here are my prognostications:

  1. Storage and the network bandwidth to store and access information is growing much faster than computing causing an explosion in content creation. This will make content management one of the most important information technologies and new technologies will emerge to automatically find, organize, verify and visualize content.
  2. Content and content management will increasingly be delivered in two main forms - appliances and on-line services. Extremely simple, purpose-built physical appliances for household and office use will capture and organize documents, photos, music and video. Software appliances, configured as virtual machines for specific tasks, will be downloaded from the internet to generic hardware that will come in sizes Small, Medium or Large.
  3. On-line collaborative and content services will extend from Web 2.0 to the community developing sites and user experience with open source accelerating their rate of evolution. Mash-up technology will replace web services and will blur services as it blends internal and external services. Services will start to spill over into the physical world as shops and delivery become more integrated into requests from the internet.
  4. A new revolution in user interface design is just beginning as designers move from physical to soft design. Gesture control will make its way into handheld and notepad devices. User interfaces will move from 2D to 3D as gamers influence work habits and we may see the first holographic interfaces. Avatars will begin to replace dialogs as the request-response metaphor and we may see practical voice recognition and language understanding.
  5. Business computing will shift significantly from PCs to mobile devices as Blackberry-size devices capture more business activities and form factors improve. Ubiquitous internet access and informality espoused by blogs and instant messaging will lead to simpler forms of communication. Content will be consumed on something probably closer to a Playstation Portable and your very thin mobile phone.

Game Theory and Open Source

Chp_chess_game
I have been planning to write up another Work In Progress which is how does Game Theory relate to Open Source. I have been procrastinating in writing this up , but the notion of linkage between Game Theory and Open Source is important in informing other Works In Progress.

I was inspired to get cracking by a very interesting program on BBC 2 last night called, “The Trap - What Happened to Our Dream of Freedom”. (It was so fascinating that my wife decided to wash the dishes instead.) The program discussed how Game Theory has affected the way that the public perceives public services and the duty of citizens to contribute to the common good. The background on Game Theory really got me going. After the series is over in two weeks time, it might worth its own blog.

I was introduced to the notion of Game Theory at a very early age since my father studied it in the early 1960s while studying for his Masters degree in Operations Research. He would use references to it frequently as i was growing up. I took a greater interest in the last couple of years after reading a Harvard Business Review paper on the relationship between Boeing and Airbus in a potential cooperation on a Super-jumbo. That cooperation devolved into a strategy of go it alone for Airbus and a dismissal by Boeing of the whole idea to go instead with a Hub-oriented Dreamliner. A Harvard Business School case study applies Game Theory to what happens to Microsoft in an environment where all other software is free.

Open source is a disruptive force that involves millions of people acting independently to use a free piece of software. Game Theory as a model is useful for describing these types of interactions. Game Theory was invented during World War II by John von Neumann, after whom the von Neumann architecture, the same architecture upon which most computers are built, is named. It was further developed by John Nash of Beautiful Mind fame who developed Game Theory in non-cooperative systems.

Beautiful_mind_1

There are several types of games described by Game Theory that I intend to consider in the context of open source. My thoughts on these are still not fully formed, but they include:

  • Symmetric and Asymmetric - this includes the Prisoner’s Dilemma - You screw me, I screw you. Anyone know if there is any relationship between Innovator’s Dilemma and the Prisoner’s Dilemma?
  • Zero Sum and Non-Zero Sum - Is the market participation of open source a net positive or net negative in value creation?
  • Simultaneous and Sequential - The transparency of the open source model introduces a Sequential element to the competitive model of open vs. closed source competition.
  • Perfect vs. Imperfect Information - Ditto
  • Infinitely Long Games - I have no time for that.

I don’t expect Game Theory to be a predictive model, but I do expect it to be a descriptive model explaining why certain things work the way they do in open source. We now have a lot of data from Linux, MySQL, JBoss and the various BitTorrent clients. With this data, I am hoping to be able to understand the following:

  • Why do people download and contribute to open source?
  • What license model is best due to their various freedoms and restrictions on distribution, attribution, contribution, and forking?
  • What is the best pricing model for open source support?
  • How do you balance the rapid expansion of open source with the desire to make money in the creation of that software?
  • What happens to market conditions of the main competitors in an enterprise software market when an open source entrant disrupts the current equilibrium?
  • How does an open source ecosystem stabilize around a duopoly or single dominant player?
  • Does Game Theory provide a mechanism to protect the intellectual property of those who contribute while still maintaining free and open access?
  • How is mass collaboration in and out of the enterprise affected by the principles of Game Theory?

Doing a Google search for “Open Source” and “Game Theory” after the program last night, I was surprised that not more has been written on the subject of how little is easily found. Marc Fleury is one of the few who has written anything and he really only made a throw away comment in attack of Oracle against their move to support RHEL. Most other references are to open source software that supports Game Theory.

Have you found any relevant references?

Perhaps Google can make this a project in their Summer of Code, but for economists and mathematicians.

My Photo

  Subscribe
Add to Google Reader or Homepage
Subscribe in 

Bloglines

Subscribe in NewsGator 

Online
Add to netvibes
Subscribe in FeedLounge

Blog Roll

Powered by TypePad
Member since 02/2005

My Online Status