Web2.0

How Web 2.0 will change the face of business

The following is an article that I wrote recently for a magazine...I'm sure the marketing team will shorten it.:-) Feedback welcome...unless of course you are posting a URL to cheap watches.

Speaking to the IT architects of multi-national corporation about their plans around using Web 2.0 in enterprises from the largest oil and pharmaceutical companies to global banking and accounting firm, it is clear that most, but not all, have plans and intentions to use the new technologies that have changed the way we use the internet. Forrester predicts that by 2013, social software, the application of Web 2.0 for the enterprise, will grow at an annual rate of 43% per year to $4.5B by 2013. This is quickly becoming the fastest growing sector in the Enterprise software industry. However, many people are generally confused by what Web 2.0 is and it's significance in the workplace and in culture, including those planning to adopt it. The term has been for around four years old now and was coined by technology gurus Tim O'Reilly and John Battelle to describe the resurgence in activity, venture capital and huge audiences that surrounded new emerging web sites. However, in that time, there is no concise definition of what Web 2.0 is.

Web 2.0 is explained more by example than by defining the technologies that make it up. A collection of brands provide the metaphors for what exactly is different in the way we use new web technologies, such as Google for search, YouTube for video, Flickr for photos, MySpace and Facebook for social networking and Wikipedia for wikis. There are a few more examples and there may be new sites, like Twitter, that may expand our understanding of Web 2.0, but we are coming close to a complete list. These brands as metaphors become the nouns and verbs of describing Web 2.0 as a new way of socializing, communicating and sharing with each other in huge, consumer-scale markets. By being the first to create critical mass in the internet space, these brands have been able to define the way we will live and the way we will work. After all, it wasn't really a PC until IBM named their desktop computer a Personal Computer.

However, Web 2.0 is not really so much a revolution in technology, but in how people use technology and how people interact with each other as a result of that technology. The amazing technological innovations have really been happening behind the scenes with the huge build out of inter-networking and creation of new scalability technologies through open source. The open source sharing of code used to build these sites have made it possible to build and manage the sites on a modest budget and deliver incredible new content and services to absolutely anyone with great performance. This in turn has allowed a whole new class of people have come to use technology that wouldn't have had access to it before. Now even your granny was connected to the internet. The internet and desktop technology was no longer the domain of the geeks and nerds. Real people, average people, artistic people, old people, young people were connecting to the internet and discovering each other. The web sites, in turn, were reacting and evolving a very rapid rate to adapt to these new users.  Sites that appealed to these new users grew out of a Darwinian natural selection and became a lot more facile and adaptable in the process and very large and profitable.

As a result of the introduction of the internet, rapid infrastructure build-out and the new generation of Web 2.0 sites, we have seen one of the most dramatic democratizations of technology since at least the PC, if not the telephone. Through universal access, users discovered that computers could be used for far more than information; that they could be used as a medium of expression, sharing and revelation. Although the PC gave access to computing power to most office workers and many home users, it was generally within the constraints of software created by others using information created by others in business domains defined by others. The information that users created and shared tended to be very textual, columnar, organized and very factual. If you wanted to liven this information up at all, you would add in a few sappy clip art pictures to express what you were really trying to say. In short, it was an environment that was invented by geeks and nerds (like me!) and generally appealed to similar personality types.

By broadening access to information and technology and broadening the types of users accessing that information, new and more expressive types of people looked for ways to use the technology that suited their personality. Users voted with their mouse for sites that appealed to their personal sense of expression. If you were musical, visual, artistic, auditory, adventurous, sympathetic or caring, you could share art, prose, music, visuals or images with those who shared your interest, rather than a production studio in the State of Washington. The place that these people met were generally at sites that catered to introducing one friend to another - first Friendster, then MySpace and slightly more recently Facebook. Depending on your mode of expression, you may end up going to Flickr for photos, YouTube for video, or Last.fm for music and then linking it all into your personal page on your favorite social networking site. The socialization and communication enjoyed by this expression and sharing created a truly different feeling that no spreadsheet or presentation could ever possibly provide. You were Flickring, YouTubing, Googling, Twittering on your MySpace or Facebook, sites that almost have a feel like a real geographic location. For many people, these activities have become compulsive and addictive in the process. This is why the average age of television watchers in the US has now risen to 50 years old and a new generation of users will not be willing to go back.

These compelling experiences have attracted huge numbers of people, which in turn makes the experience even more compelling in a positive feedback loop. Just as in any revolution, once critical mass is achieved, the revolution takes on its own momentum and is self-adapting. A critical mass of people were dictating what is interesting and what is not. What is acceptable is far more likely to be those sites and capabilities that minimize constraints and empower people. Those that put constraints on reasonable behavior were quickly discarded. Sites that allowed individuals to write, like WordPress blogs, to edit, like Wikipedia, or to tag interesting content, like Digg, rose quickly up the internet charts of popularity. Once Facebook removed restrictions on who could join or who your friends could, it quickly grew exponentially to over 60 million users. Web 2.0 became a very democratic revolution with core principles of freedom of speech and freedom of assembly on the internet. The notions of controls are not determined by central authorities, but by the users themselves in terms of their access to the site.

By lowering the barriers to participation, this provides platforms for anyone to contribute to a site like Wikipedia or YouTube. Actual active contribution of content can be a relatively small number, less than 1% of all users according to an article in Time on the 25th April 2007 and much lower for YouTube, but as high as 4.5% for Wikipedia visitors. However with a critical mass of large numbers of people, this can still represent hundreds of thousands of authors and contributors. It also doesn't take into account social networking sites where everyone is a contributor by simply creating their home page and being compelled to enhance and adorn it in response to their friends doing the same thing. I have seen several large corporation that have skills or profile pages for their users, but even senior executives may be more likely to update their Facebook or LinkedIn profiles than their corporate skills page.  In addition, this doesn't include what might be the most powerful aspect of this participation - feedback of the masses. Many people comment on blogs when they feel passionate about a particular subject and are more likely to rate information provided such as reviews or products available on-line. In a world swamped in information, feedback on popularity and rating, hallmarks of Web 2.0 sites, provides a valuable indicator of what is important and what are leading trends.

Dubbed "Wisdom of the Crowds" by James Surowiecki in 2004, a mass of individuals on average is smarter than anyone person or expert could ever be. At some point excesses of disclosure and impropriety go off in their own direction and those seeking refuge of appropriateness find it in communities of like minded people. Those looking for accurate or appropriate information just move off to more appropriate communities. This provides the feedback loop of control through democracy of participation. Wikipedia is a good example of this as it has demonstrated that it can be just as accurate as encyclopedia Britannica, but much faster at correcting mistakes. "Lord of the Rings" provided feedback to the community  on their website on how the story was evolving prior to filming and ultimately became one of the most popular series of films. The "Da Vinci Code" kept everything secret and, despite the popularity of the book, did not fare well in the box office.

Michael Lynch, CEO of Autonomy, suggested in Financial Times on the 30th of June, that Web 2.0 was something that must be tamed. Perhaps this is missing the point. Web 2.0 is not anarchic nor is it necessarily bad for business. Web 2.0 To try to control Web 2.0 is like trying to put one's finger in the dyke. It is happening and there is nothing that business can do to prevent it. In fact, when companies try to restrict access to Web 2.0 they either find that the roadblocks have been circumscribed or that potential employees will go elsewhere. Generation Y, the generation born between 1978 and now, is expected to grow from 25% of the US workforce to 47% by 2014. This is a generation that has only known the empowerment of the internet and have become accustomed to their vote counting. To try and control it now would only disenfranchise them. To empower them would yield not only an optimistic workforce, but also provide an engaging conversation between employees, their customers and their partners in a participatory and enlightened collaboration. In addition, my generation, the Baby Boomers, are starting to retire now, taking with them some of the most valuable knowledge ever accumulated in some of the biggest numbers ever. Knowledge management programs over the last two decades have failed to capture that knowledge, is Web 2.0 our last hope of retaining it? Interestingly, the Time article suggests that those over 35 are more likely to contribute content to Web 2.0 sites, so this may be an indicator that older generations want to contribute their knowledge from experience and are willing to do this through Web 2.0.

Software vendors are now jumping on the bandwagon with social software and collaborative features smelling a bit opportunity. Many are repackaged capabilities from another era of enterprise software. Some are looking at their portfolios and asking whether this is what they were doing all along. This misses the point. Web 2.0 has so far outstripped enterprise software as we know it in usability, accessibility and empowerment, that it causes mass rolling of eyeballs at its mere sight of not just the new generation, but most others as well. Those who are familiar with the ease of use and empowerment of Web 2.0 sites like YouTube, Wikipedia and Facebook are aware of what is possible and have much higher expectations. I believe that the enterprise software vendors will get there, but with much coaxing and coaching of a new generation. It will take a few years, but eventually they will figure out that Web 2.0 is not just a few new collaboration features and highly interactive web technologies, but empowerment of their users and the ability to draw in a critical mass of users from outside the trusted circle.

Enterprise systems won't change immediately, but it will probably change faster than people expect. Care should be taken in what is opened up. However, rather than treating Web 2.0 content and technology suspiciously, corporations should ring fence the information that must be controlled and open up the rest to participation. At the Enterprise 2.0 conference in Boston this June, Pfizer presented how they were using open source technology to enable Web 2.0 collaboration. This is a brave move in the highly regulated world of pharmaceuticals, but they have recognized clear boundaries of what must be regulated in content, particularly in manufacturing and research practices and what can be opened up, such as redefining process or identifying new potential areas of research. The scenarios of "Doctor 2.0" are available on the internet, but they have created a vision and a reality that uses the same technology as Wikipedia to create Pfizerpedia, a wiki of process and ideas that feed into the main areas of research and manufacturing.

Change in the Enterprise is more likely to come from outside as well. In all likelihood, if you are information worker, you use Google more than any of your internal IT systems. You may even rely on blogs to track what is happening in your industry more than you rely on industry press. You and your co-workers are likely to use these and other Web 2.0 technologies to track what is happening in your business world as well as your social world. These web sites will set further expectations on the internal systems you use and a requirement to integrate internal information with these external sources of information. Web 2.0 has an answer for this as well with an integration technique known as "mash up", the ability mix information from multiple sources using the web browser itself as the point of integration. These external sources of information also provide something that our internal information systems could never provide, a critical mass of opinion utilizing the Wisdom of the Crowds. We will ultimately need to combine external opinion with our internal opinion to get more accurate predictive decision making with our own unique insights inside the enterprise.

Ultimately, the most profound effect Web 2.0 will have is on the way we do business rather than just the technology we use. Employees will use this freedom of speech to provide valuable and fearless feedback for the business. Employees will have the freedom to assemble teams with customers without interference. Customers will become part of the decision making process and allow us to design the most imaginative products and services. Control will be limited those areas that ultimately must have control and free up the creative process to speed and enhance business. Empowered employees will build more productive businesses and become more fulfilled participants in the business. With any opportunity comes risk and embracing Web 2.0 is not without its risks. However, smart businesses can already see the opportunities and are willing to take those risks.

Impressions of Enterprise 2.0 in Boston

Boston20080613_4

Boston at five in the morning before I had to take off on Friday.

This past week, I was on a panel at Enterprise 2.0 in Boston with Bob Bickel of Ringside Networks and Jeff Whatcott of Acquia, the commercializer of Drupal. The topic was open source options for delivering an Enterprise 2.0 Experience. Both Bob and Jeff are excellent speakers and bring a wealth of experience behind new companies. I think that Kathleen Reidy at the 451 Group did a very good job of covering the panel, so I will move on to my impressions of the conference.

On the panel, we spent much more time talking about open source and less about Enterprise 2.0. However, this doesn't mean that there was a lot of clarity on the meaning of the term Enterprise 2.0 at the conference. Although Web 2.0 had no less than Tim O'Reilly and John Battelle to define what that term means (barely), Enterprise 2.0 has no such authority. Consensus says that it is just Web 2.0 for the enterprise. However, researching the concept a couple of years ago, E2.0 is about taking the social aspects of Web 2.0, collaboration, social networks, user contribution, wisdom of crowds and social tagging and voting and applying it to information, documents and content in the enterprise. There are no fixed patterns for how to do this, although popular Web 2.0 sites, such as Facebook, Google Maps, Digg, YouTube and Wikipedia, provide at least paradigms for how these can be accomplished in the enterprise. It is very difficult to describe Enterprise 2.0 without drawing analogies to these web properties.

On Wikipedia, the topic "Enterprise 2.0" redirects to Enterprise Social Software. In August 2007, a "Ruud Koot" permanently redirected it from Enterprise 2.0.  The last direct version of an Enterprise 2.0 article in Wikipedia extolls an Alan Wurms as the person who apparently coined the term in 2001. That is the power of Wikipedia, it can get rid of the rubbish. Most people's problem with the term is that it does not describe what it does and it sounds like it is just riding on the tailcoat of the Web 2.0 phenomenon. Is this really a new version of the Enterprise? I sat in a session with Carl Frappaolo where he equated Enterprise 2.0 with the evolution of Knowledge Management, but made the point that enterprises have not fundamentally changed as a result.

Some people believe that Enterprise 2.0, like Web 2.0, must be delivered as a whole hosted platform on the internet in order to be Enterprise 2.0. For some people this is absolutely true, but a majority still look to keep this information under enterprise control for bandwidth and security reasons. The majority of vendors in the exhibit area provided Software as a Service solutions. Most likely they used open source in creating those solutions.

I apparently missed the highlight of the show, which actually occurred on Monday before the opening. This was a shoot out between Microsoft SharePoint and IBM's counter to SharePoint, Connections. As an IBM product manager at the IBM booth said, "We don't fuck up demos." Everyone seemed to agree with him. Poor Lawrence Liu of Microsoft was not so lucky. The Microsoft demo did not have the business process coherence in which IBM is very well versed. There was a lot of hand-waving about how various Enterprise 2.0 features were supplied by partners. The performance issues that Lawrence faced may very well be related to the terrible internet connectivity provided by the Westin Hotel. Imagine an Enterprise 2.0 conference where no one is connected. Both companies are talking more about Social Software, Social Computing and Social Networking more than Enterprise 2.0, so my feeling is that this emerging market will be named more along those lines rather than E2.0.

Img_0009

Peter Fields of Wachovia

There were three customer presentations on their usage of Enterprise 2.0 and these present probably the best understanding of what these collections of technologies are, what they are trying to accomplish and what market is forming as a result. Despite the fact that he was using Microsoft SharePoint, I really liked the presentation from Peter Fields of Wachovia. Peter seems to think about the business problems and technology solutions the way I do. (Or probably the other way around.) He described the need to empower employees as a way of tapping into the intuitive sense of employees and he is the only other person I have ever seen that has uses Myers-Briggs to describe this paradigm shift. In a session just before, I got shot down in flames for daring to suggest that the change in enterprise software is the result of shifting demographics and a new, incoming generation of worker - the Millennials. Here Peter was backing it up with data that suggests that in less that five years, this generation will move from 25% of the working population of the US to 41% of the working population. He discussed an imperative that I had not really considered as well, which is that the baby boomers are retiring and this will represent the single largest loss of implicit knowledge in industrial society. Enterprises MUST facilitate capturing what the baby boomers know now and lower the barriers dramatically toward capturing that information.

Img_0024

Simon Revell from Pfizer

Peter is roughly my age, but Simon Revell from Pfizer, who looks a lot younger, presented a view of what the new generation wants - seemingly both Generation X and Generation Y. Pfizer has created a couple of sets of slides describing life in a networked world. Pfizer does use the 2.0 word and even describes a "Doctor 2.0" as a female researcher who also seems to spend a lot of time on Facebook. But rather than trivializing what that means, Simon presented a set of tools that Pfizer is using (open source by the way) that allow researchers to collaborate. Pfizerpedia is a mediawiki implementation modeled on Wikipedia and used as a single instance. Its primary purpose is to capture best practice in an informal way, which Pfizer codifies and controls after the process is discovered or developed. The result is actually a knowledge base of information that can be used for many other purposes. My sense has been that wikis that are single, highly interconnected instances rather than many team or project wikis. Bob talks about one wiki, Peter is hoping for 10,000 wikis. My take is that we need two words for what is now described as a wiki.

I wish I could have seen more the conference, although many of the break out sessions didn't add a lot. The subject of wikis and blogs have been covered better at Web 2.0 conferences. Some of the sessions on community didn't really say anything at all. Neither did a session by Mark Woollen from Oracle CRM. Mark is a good speaker and there was some good content at the end. Too bad that the beginning didn't say much except that social networking will probably be important in the future.

Img_0003

The direction that Oracle's Mark Woollen's presentation took

Enterprise 2.0 is not just about wikis, blogs and forums. These do not make communities. A whole bunch of the vendors in exhibit area are likely to be gone in a couple of years if not sooner. Open source, although played down in this conference, will likely be one, if not the, major driver of this new market. Companies are actually using this and it is those that hope to attract and retain not just a younger generation of employee, but also customer. The smart ones are also recognizing that they need this stuff to make sure that critical knowledge is not retired when their baby boomer employees have.

A Manifesto for Social Computing in the Enterprise

Investment in the infrastructure of the internet has dramatically increased bandwidth to everyone in the developing world and created home computers that are not only inexpensive, but very powerful. This change has expanded the usage of the internet exponentially and introduced new demographics and generations of users that had not used computing prior to the expansion of the internet. These users have themselves created the content and applications that feed the internet and have set expectations of the applications that we use in web browsers and new mobile devices. The increased bandwidth has made this experience much more interactive and visual experience encompassing video and visual elements. Web properties such as YouTube, Google, Amazon, Facebook, MySpace, and Flickr have set the benchmark for expression, accessibility and social interaction of computing systems.

Dubbed Web 2.0, this revolution in computing has shifted the face of software from a logical, linear, and introverted science to an expressive, graphical and social art. New designers of web sites, unschooled in traditional software techniques, are nonetheless able to create software that scales to millions of users and billions of objects of information and still meld those users into an artistically aware community. The next generation of enterprise employees who started using the internet in their early teens have only known this evolving culture of free and creative development of the internet and now demand better of the enterprise software that they meet. Older employees also know that that the software that they use on a day to day basis can be better. Enterprise 2.0 seeks to emulate the success of Web 2.0 in the creation of new software for the enterprise.

Social Computing

The shift of computing power from business logic and calculation to socialization and people-orientation has been dubbed by some as Social Computing. The term Social Computing has been used interchangeably with Enterprise 2.0 or Enterprise Social Applications, however, IBM and Microsoft have created Social Computing research centers and Forrester has started to use the term in describing next generation enterprise collaboration. Social Computing is the use of technology to support sharing of information and enabling collaboration through social networks and to tap into the value of the “Wisdom of Crowds”, a concept made famous by James Surowiecki in 2004 to explain how many people are smarter than individual experts. Social Computing exploits software oriented toward people and Social Networks, the extended relationships of individuals, to connect to more people and access the Wisdom of Crowds.

To tap into the wisdom and awareness of social networks and empower people to collaborate at any time or place, Social Computing platforms need the following capabilities:

  • People - Support information about people, their preferred communications, their relationships and affiliations, since social networking is all about people rather than just systems, data and objects. The more information available about other users, the more likely they can be found as a source of knowledge.
  • Context of Networks - Social networks organized around projects, teams and departments provide the context of work and relevance of information as it spreads from creation to the people that need that information. Social networks, especially networks extended beyond the enterprise, provide the greatest differentiation of social computing from previous generations of collaboration.
  • Social Collaboration - Provide an environment where people can share ideas, contribute knowledge and solve problems in creative, unstructured socialization as opposed to rigorous workflows that are required for control of information. Next generation tools use techniques developed by Web 2.0, particularly those tools that empower social knowledge, such as social tagging, integration of communication and awareness of changes in social networks.
  • Content as a Service - Content is the container of knowledge and information and is core to the socialization of information. Content needs to be accessible everywhere, not just in large, monolithic applications. Content capabilities need to be accessible as reusable service components. Social computing can happen inside the enterprise or outside and a channel can be a web site, web application, mobile device or even external web platforms such as Facebook or Google applications. Mashups can occur inside the enterprise or outside and the channel will require content as a service that can securely be accessed wherever it is needed or wherever it is contributed.
  • People-centric Tools - As Web 2.0 has spread new paradigms of user interaction, the consumerization of software has created expectations that enterprise software becomes easier and empowers user to contribute, correct and classify content and information within the context of social networks. AJAX and next generation rich internet application interfaces such as Adobe Flex will provide users with a much richer, more intuitive user experience and the ability to scan much more social knowledge to find ideas and solutions. These tools should themselves be componentized and accessible as a service so that they may be mashed up with other sources of social knowledge.

This does not mean that the need for traditional enterprise content technologies such as document and records management goes away. They are still repositories of the truth and verifiable information and thus play an important role in sharing knowledge within social networks. However, these traditional technologies lack the usability, empowerment, and breadth of reach that Web 2.0 sites provide. They lack the collaborative nature that invites in people without barriers and restrictions to contribute to the sharing of knowledge and information. Web content management for creating a richer Web 2.0-style user interface becomes even more important to this collaboration to provide a compelling face to the interaction and to simplify the access and navigation of shared information. Enterprise Content Management cannot become one of the principle platforms of Social Computing unless it addresses the requirements of Social Applications.

Use of Social Computing

The balance is shifting from contained and controlled companies to engaged and empowered collaborative enterprises driven by Web 2.0-inspired social computing. At the center of the shift from old models of computing in the enterprise to new social models are companies that are inspired to innovate or to engage more with their customers. This includes companies not just using their internet or intranet web sites, but engaging in social networking channels such as Yahoo, Google, YouTube, Facebook and MySpace. Those using social computing are interested in engaging people, such as customers, employees or partners. They are using new people-centric tools and facilitate creating or extending existing social networks.

Major ECM vendors are all planning their Social Computing efforts and to a large extent are being dragged in this direction by their more forward-looking customers. Enterprises that have discovered the value of Social Computing are:

  • Consumer-oriented companies that particularly address a younger demographic must engage their customers as part of both the marketing process as well as the development of new products. For example, games and film companies that engage their viewers in plot and scene development do much better than those that keep everything under wraps until the game or film is ready.
  • Enterprises hiring a new generation of knowledge workers who grew up on the internet must provide tools as empowering as those available from Web 2.0. Turning these tools off forces these workers to seek employment elsewhere and forcing them to use tools that do not meet their expectations of usability and engagement.
  • Financial Services firms are leading the shift in usage of these technologies. Financial Services have always been innovators in developing new technologies and investing in providing better service for their clients. Speed in innovation in these services becomes a major competitive advantage where churn of clients can be very high in turbulent times. Internally, competition for talent is intense and providing better support is important for attracting and retaining employees. In particular, young and ambitious brokers and managers are more likely to be sociable themselves and seek out Social Computing inside and outside the enterprise.
  • Government and Non-Profit organizations that provide services and citizen feedback online find increasing their IT budgets much easier than those that merely arbitrated by a front-line service. It is now inconceivable for an American politician to run for office without an extended internet presence such as Facebook or YouTube.
  • Enterprises that have faster cycles of product innovation, especially high tech, are looking to their customers and partners to participate in the development of new products and services. In previous generations, the field acted as a filtering mechanism of new customer requirements and ideas. However, today technology can provide a frictionless way of getting the entire enterprise to exchange ideas and improvements with the customer communities.

Integrating Social Computing

Because Social Computing is unlikely to come from a single source, especially because of the diversity of sources of knowledge and social networks available on Web 2.0, it is extremely important for the enterprise infrastructure for Social Computing to be integrated with those sources. This means bring these sources into the enterprise and bring the enterprise sources out to Web 2.0. No matter where the people collaborating are, the tools they want should be available. To facilitate this, the Social Computing should be:

  • Open Source - Through being developed through social computing paradigms and sharing best of breed components with the open source community, open source systems have evolved rapidly and encompass social computing capabilities developed by the open source community. Social tools such as MediaWiki, the wiki that powers Wikipedia, and WordPress, the most popular blogging software were developed using open source.
  • Integrating the Inside Out - By providing content as a service and enabling light-weight, Web-Oriented scripting development, the Social Computing platform should quickly integrate content services into external channels and web sites, such as Facebook and iGoogle, to allow enterprises to engage customers, partners and home workers.
  • Integrating the Outside In - If the Social Computing platform is modular and supports a Web 2.0-style mashup-oriented architecture, it enables users and teams to integrate external open source tools and social networking web services, such as Facebook, LinkedIn or other Open Social-enabled properties, to tap into the wisdom of crowds available on the internet and to make customers and partners part of team collaboration.
  • REST-style Architecture - A Web 2.0-style or REST-style of architecture using easy, light-weight scripting languages and integrated through internet standards-based APIs can easily mash-up content services into any web-oriented application or web site. These architectures should be scalable, fault-tolerant and high performance to meet any enterprise or internet requirement.
  • Choice - The Social Computing platform should be based upon open interfaces developed by the open source community to provide choice of operating system, database, application server, content authoring tools or APIs.

Over the past year and continuing into the coming year, Alfresco is dedicated to expanding its architecture and applications to enable this vision of Social Computing. We will work with partners and open source community to provide best of breed open source tools for enabling this architecture. We will integrate with external Social Computing properties such as Facebook and the Open Social alliance to expand the breadth of social networks and the ability to collaborate through those networks. We will be expanding the Alfresco system’s understanding of users as people and facilitate sharing of information and content through their networks. We will be open in the process and seek and encourage your feedback and participation.

Scaling Out Like Technorati

My fellow World Economic Forum Technology Pioneer, David Sifry, the founder of Technorati, was also in Dalian, China for the “Meeting of New Champions” or “Summer Davos” as the Chinese like to call it. During Davos in January, I had the great misfortune of pitching Alfresco against Technorati in a competition between tech pioneer companies. As fantastically well as Alfresco is doing, Technorati has the temerity to compete against Google in blog search and win.

I got the chance to talk to Dave during the conference and ask him some questions on the technology and architecture behind Technorati, the internet blog search site. I thought that someone who could take ordinary computer components and build a huge internet architecture could possibly teach something to people running enterprise architectures that are puny in comparison.

Technorati is a web site that tracks blogs, pictures and any user generated content and allows you to search those sites about what people are thinking, seeing and hearing. When a new or urgent situation breaks out, you can do worse than to search Technorati for immediate reaction. Every day, every hour, every second, Technorati is indexing over 10 million blogs with over 10 billion objects. Technorati’s user base is doubling every six months and quick and accurate response is critical for retaining those users.

Davidsifry
David Sifry, Founder and Chairman of Technorati

I asked Dave about his architecture and what applicability their might be for enterprise architectures.

John Newton: In building Technorati, what were some of the issues that you had in architecting your systems.

David Sifry: I was looking at just temporal information. I had no idea how big it could get. When I looked at the architecture, instead of architecting it right, I architected it for right now. I had no big budget and I didn’t want to wait six months to build it. Also, I had no idea what the killer app would be.

I focused on data flexibility. At the time, that meant putting everything into a relational database. That was okay while the size of the indexes is less than RAM and about a million blocks of data. That was okay while there were less than 20 million blogs.

The next generation took advantage of data parallelism. That meant upon update send a signal to all the other systems. We expanded the data over several “shards” [segments of data partitioned on different databases on separate machines].

What was surprising was that we were writing as much data as we were reading. At this point Technorati was as big as some of the biggest OLTP. Even so, maintaining data integrity was important, because you would want the link count [count of how many other blogs point to a particular URL] to be out of sync. This put real pressure on the system. At the same time, we realized that time was more important dimension than URL. People didn’t want to sort or search on URL, they wanted to search on time. [i.e. what are the latest blogs on a particular subject?]

By this point, we understood the application more and more. The app [Technorati] is about real time access. You need to be able to count on finding latest information on a subject. That’s when we built the third architecture. Scaling was well understood and we build the shards on time rather than on URLs. Instead of putting data into a DBMS, we put it into special purpose databases. It was more of a bus-based architecture. Each database could be scalable and grow as big as we needed.

JN: The notion of shards - did you call it that at the time? I have been looking into shards and I was only aware of or heard of them for about the last year.

DS: Back in 2002 when we were pitching this to VCs, I at least explained the theory. All I just thought through the problem carefully. Doing it this way, we could add hundreds of systems, lots of cheap CPUs, RAM and disks. It provides inherent parallelism. I can’t believe that I was the first one to think this up.

JN: How big does this architecture scale?

DS: We are loading one terabyte a day into Technorati. That’s 100 million blogs or about 10 billion objects. A lot of is new types of tagged data. There are about a half billion videos and photos.

With all that data, you have to think about what do you throw away?  We can’t really delete anything, because we are potentially losing an asset. We don’t delete anything. So we take data out of the spin cycle. [Transitory data used in preparation.] We take the long-term data and put it into low latency storage.

When data is doubling in size every six months, that means that only one quarter is a year old. We don’t need to worry old data.

JN: How do you deal with large number of users with very large data sets?

DS: Any off the shelf tools falls over. There is a lot of interesting analysis on old data, but no off the shelf tools can handle that much data. It’s only just now that some tools can handle it.

JN: What are those tools?

DS: One is Green Plum by a bunch of O’Reilly guys. If you use ordinary data warehouse tools, they would just scream and shout.

JN: Actually what I was originally referring to was the fact that you are showing lots of data that are not users used to enterprise information management tools. How do you present this information to consumer-level users? How do you deal with the user interface and visualization of all this data?

DS: Gotcha. It depends on what the user wants to get out of Technorati. If the user wants search results, then we give it to them. Sometimes they want to browse or discover information. We have spent a lot of time on visual design. Then we give them lots of bright, shiny things for them to click on.  Things like metadata, video or other links.

We have used enterprise class web tools to analyze what users are doing? We look at the click stream and see what is successful or not. That helps to make the information contextual.

One of the big mistakes that we made is to not do this [buy click stream analysis tools] sooner. It was only $80K. Up to that point it was so much trial and error. I’m glad we finally did it. Now we can see how much time a user spends on a feature. We can see page views, goals per visitor.

JN: So what do you measure on Technorati?

DS: Measuring a web site is like forecasting the weather. Yesterday it’s sunny and today it is cloudy. Why is it cloudy?  Sometimes you have no idea. Sometimes you realize that that a change in barometric pressure has a lot to do with it.

We look at the number of newbies, number of reports, session lengths and then measure them against prior periods. It’s not always consistent.

I had never built a B2C site before. I just focused on me, on what I wanted. That worked well for a while when I was the target audience. But we have to build for a broader audience.

JN: At Alfresco, we measure conversions. Are you measuring things like performance? Does that affect retention of users?

DS: Of course, but if the system is falling down, then even performance doesn’t matter. So I don’t get too stressed out about it.

JN: When we met at Davos you wanted to move Technorati to be the Internet Now! Is that still the case?

DS: Everything is shifting. I wanted it to be a site that everyone is able to use. We forgot about the core users that just wanted to find out about blogs and any real time information. In an attempt to jump the chasm, we chased after 100 million users and tried to be everything to everyone. Now we try to make blogs and user driven content available for those looking for that.

Also performance is improved significantly. Now I notice how slow other sites are. This is a total tribute to the engineering team. Everything is easier and faster.

Pretty soon we will have a whole lot of stuff that we have been working for a year.

JN: Can you say what it is?

DS: I don’t pre-announce.

JN: What does the Technorati brand stand for today?

DS: Good question. What’s popping up now on the internet, especially user generated content? It’s about users tagging user generated content and finding it.

JN: Who are your competitors?

DS: I probably sound like the typical entrepreneur, but nobody really seriously. Google provides blog search, but other than that nobody really. Other people are trying to identify and tag information like Digg and del.icio.us, but they aren’t really competition.

JN: What do you want Technorati to be in two years time? Five years would be ridiculous.

DS: I would like Technorati to be a profitable business that is strongly differentiated. It will be the place that you would go for mobile, RSS or push information. For all that you would come to Technorati.

IBM's Many Eyes

I found this reference to IBM's Many Eyes. Many Eyes some of the concepts of Wisdom of Crowds and Web 2.0 visualization, this site encourages anyone to create graphs, data visualizations and tag clouds by uploading their data sets. You can try it out here.

Manyeyes

Imagine if this used Flex.

I want my Joost TV

While I was in Davos, I got to meet Niklas Zennstrom, the founder of Skype. Somehow he neglected to mention that he was about to revolutionize TV. Later on in the conference, I heard rumors about Joost, his new service, but unfortunately I couldn't ask Niklas for a beta invite. It would help if remembered who I was along with other couple thousand people clambering around Davos.

The buzz is that Joost is the future of TV. I signed up for the Joost beta a couple of weeks after I returned and I got the download notice in my inbox two weeks ago. I finally have managed to download the beta today.

After a couple of sputtering starts on our home 5Mb DSL line, the thing finally got going. At first I thought this is not going to work as the image started and stopped, but when it did get going it was presenting me with near DVD quality images on my PC. I hear they have a new proprietary compression algorithm, but the image and sound are pretty good.

You are first confronted with a user interface for selecting channels of information. Since I live in the UK, I don't think I have heard of any of the channels or the programs on the channels, but I understand that they are signing up some pretty amazing media deals.

Joost1_2

From these channels, you can select specific programs to watch on demand.

Joost2_2

Once you start watching, you get a full screen viewing experience by default. It is absolutely nothing like YouTube. It is full screen with all the normal controls that you would expect.

Joost3_2

While you are watching, you can access a set of widget for adding comments or instant messaging a friend. If they open up the platform the way that Skype has, you can imagine all sorts of widgets being created for searching for related shows, looking up references to and from the show, historical references or simulcasting your own commentary and voice over.

Joost4

One thing you should watch out for. As soon as I downloaded this, I started taking screen shots. I live in the UK, so I don't know anything about a series called Total Recall 2070, but it looked interesting. After taking this screen shot, the couple take their clothes off and start getting it on when my wife walks in. I am only talking about the first few minutes of the program. I have to explain that I am doing this in the name of research! It could probably use some sort rating on the programs, at least while I am trying it out at home and not in the office.

Joost5

Anyway, I have no more invitations for 3 friends to invite to the Joost beta. Please let me know if you are interested.

Add Salesforce.com to the Enterprise Content Management List

Today, Salesforce.com announced that they will be getting into the ECM business by acquiring Koral, one of our neighbors here in Maidenhead. Some of the guys there are our friends and we wish them luck in this new turn in the ECM market. They scored a real coup showing up at Demo last year and it obviously got the attention of Salesforce. The system has not been around long, but they have added some interesting Web 2.0 twists. It is focused on document management and was born out of the efforts of BuildOnLine, a specialist online content management provider for the construction industry that has recently merged to create CTSpace.

This is a significant shift for both Salesforce.com and for the ECM market. As we all know, although most of the Fortune 1000 have ECM, penetration into those accounts can generally figured in single digit percentages and practically non-existent in smaller organizations. Increasingly, ECM will be delivered either as a software or physical appliance or as software as a service in a utility like form. Smaller enterprises or organizations that have generic content management requirements will find this service useful. This will allow Salesforce to leverage its brand and start with the sales organizations. It also give Salesforce a means to expand its business beyond the sales and marketing organization. It also further validates the Software as a Service model for simple utility functions.

Although everyone is looking for simplicity, not everyone will be looking for a utility-like approach to ECM. It is up to the ECM vendors to simplify the installation and set up process to make it as easy as flipping a switch, but keeping the content inside. Organizations that are not comfortable putting their documents outside the firewall, such as financial services and government organizations are more likely to look for an internal system. Also, as the BuildOnLine guys found out, once you move to the area of specialist content applications, the sale gets much harder and configuring systems becomes even worse.  Records management, engineering applications and specialist publishing applications fall into this category. It will be entirely up to each organization which makes sense for them.

Software as a Service is a model that Salesforce.com didn't invent, but has become its biggest proponent and greatest success story. This acquisition will raise the profile of SaaS as a model for ECM. Salesforce will not be alone in delivering content management services as others are developing their solutions with systems like Documentum and Alfresco. Likewise, Microsoft is taking Sharepoint on-line with the Office Live offering. Other companies are now looking at providing an SaaS model for web content management and collaborative content management as well. No doubt we will be seeing feature by feature comparisons between these various solutions soon. Depending on the breadth of functionality, integration with internal systems and scope outside of sales and marketing content, we will see how Salesforce does.

Convergence of Content and Data Management?

Tony Byrne announced that he is hosting a panel on convergence between enterprise data and content management and poses it as a question - will structured and unstructured information management converge? My short answer is no, but that answer has a complicated reason behind it. Much of it has to do with the fact that the larger stack of enterprise software is consolidating around it. Here are some of Apoorv Durga's comments on convergence as well.

Tonybyrne
"Oh really?"

I have lived in both worlds having worked with relational databases since 1977, being one of the founding engineers at Ingres and then co-founding Documentum with Howard Shao. While at Documentum, we explored what content was and how it was different from databases. Over the years my early bigotry in favor of a purely relational view of the world has given way toward a more relaxed view of how content is structure, indexed and managed. While starting Alfresco, we had the opportunity to start from scratch but still used some of the concepts that have proven effective in capturing and delivering information to users.

The relationship between relational databases and content management is like nuclear physics and organic chemistry. Relational database provides the mechanics to make data and information happen and content management builds upon that. Relational databases provide the transaction controls to ensure data integrity, the back-up tools to make sure that information is recoverable, replication to move data from one location to another, and the query, data manipulation and relationship tools to handle much more complex structures. Content management is more like the organic chemistry of information, combining information and relating it to human beings to make it more usable and consumable. The structures, processes, and models of content are different from other classes of information management. However, just like organic chemistry, content management may combine with other classes of application just as relational databases have. We are just missing the standardization and theoretical foundations of content management that have supported relational constructs.

Notary_cartoon

What makes content management different from data management is how close it is to people. To make content useful, the people who create the information need to understand how it will be used. Content needs to be compelling, original, concise and understandable. Content has context that only humans can provide and only humans can use. This means that the services around content are more about change than integrity. Integrity is important, but that’s why the database is there. There is a whole rich set of services there to deal with transformation, change process, classification, publishing, versioning, content to content relationships, links and a whole bunch of other things that databases just don’t "think" about. Search may be yet another system that has no relational database at all, but should use the concepts that have been built up by the content management system. That’s why content management systems are separate systems built upon relational databases and integrated with separate search systems.

Since the inception of content management, the content management vendors have by and large continued to support the notion of a repository sitting on top of a relational database and integrated with a separate search system. Interestingly, many of the main vendors of ECM are now the database management companies - IBM, Oracle, and Microsoft. This should not be surprising since content management is now one of the fastest growing segments of database applications. Even so, these companies have chosen to layer their content management software on top of their relational database systems. The database groups are then free to focus on data management as their core competency. Databases support not just content management, but transactional systems and analytical business intelligence systems. Internal to these companies, the database groups have not really subsumed the content groups. Microsoft flirted with the idea of combining everything into one server group, but unwound that decision to have Sharepoint in the Office group. IBM’s content group reported into the DB2 group, but remained independent and it remains to be seen where it ends up after the FileNet acquisition. Oracle’s content group has wandered all over the organization since Oracle first attempted to build content systems in the late 1980s.

The non-database vendors of content management - EMC, OpenText, Interwoven, Vignette and Alfresco - still use relational databases in the management of content and layer their services above a database. Interwoven tried to not use databases to improve performance in the early days and took a very XML-based approach to managing, categorizing and controlling content, but this ended up being a losing proposition to companies worried about integrity. EMC sees a future that is independent of all these stack war issues in that people will always need storage and that content management is really about managing storage. They are essentially above (or below) the stack wars, but don’t be surprised to see them try to architect the database out of the equation. OpenText, Interwoven and Vignette look to either get acquired or get out of the way. At Alfresco, we believe that open source is the open alternative to the stack wars, which I will speak about later. The motivation of each is not the convergence of content and data, but the consolidation of the ECM stack at one level and the entire enterprise software stack at another with fewer and fewer players.

Buythis

From Kathy Sierra's blog

What is happening at the macro business layer is that entire application stacks are consolidating to manage the data of record. IBM, Oracle, Microsoft and SAP are all vying to own the data and make themselves as sticky as possible. Each has Service-Oriented Architecture to make it possible to surround that data and to integrate it with other stacks when necessary. Data in the case of content management is simply the data about the content and is not a whole lot different than customer data as far as these stacks are concerned. These stacks need the checklist of the big items that enterprise customers are buying in order to build or integrate applications. This includes relational database, content management, business intelligence, build and test environment, system administration, and all sorts of XML stuff. Most of these, with the exception of IBM, have gobbled up the top application layer including CRM and ERP. SAP flirted with the database layer in alliance with MySQL, but seems to have abandoned this strategy. It could be though content management may be a common stack component if SAP goes out and purchases an ECM vendor. Content has become an important part of the data being managed and these SOA stacks will just link it like any other data.

Despite the relentless consolidation of these stacks, sucking in the ECM market with it, total integration of all systems into a single stack is impossible. At best, these stacks are fighting for a bigger piece of the enterprise pie by displacing smaller players. Enterprises are trying to go from a choice of 25 different systems to 3, but not down to one. Microsoft building Sharepoint organically can exclude other databases other than SQL Server, but lose a chunk of the market in the process. Will IBM really limit FileNet to only DB2? Will Oracle lock Stellent only to its database? Well maybe, but a totally integrated stack does not solve all problems of enterprise process or control. Likewise, SOA has not delivered on the promise of interoperability, despite the billions of dollars spent by IBM, Microsoft and major enterprises. Nor does it move far outside of back-office systems and into the front-office systems and web sites where most of the value is presented to an enterprise’s customers. It does not deliver the conversation with its customers that enterprises are increasingly demanding.

Userhierarchyofneeds

From Kathy Sierra's blog

There is a lot happening out in the world of the Internet that is making this whole notion of data versus content irrelevant. Web 2.0 has moved the conversation from the whole notion of bits and bytes into what matters is the content, people and the relationships between people and content. Web 2.0 says that people don’t care about data and structure, but in communicating with each other and building closer relationships. This notion is seeping into the enterprise software space with the class of software known as Enterprise 2.0. It is still early days, but billions of dollars of value have already been built upon the foundations of Web 2.0 and those foundations are at least 90% open source.

Open source has provided an alternative view of the vertical stacks that are being created by IBM, Oracle, Microsoft and SAP. In this view, open source is the stack and dominated by no one vendor. Each layer of the stack can be substituted with a best of breed open source component. These layers have been constantly rising from the operating system to the database to the app server and now the application layers. These application layers look a lot different than the enterprise stacks though. Rather than integrating at the depths of the infrastructure in a structured SOA, they are “mashing up” near the user and making it much easier for more providers to create new services and applications not depending on any particular stack. In fact, the stack is irrelevant as long as it is freely available. How many people really know what is behind Amazon, Google, Yahoo or Saleforce.com? The answer is a lot of open source, but which open source doesn’t matter a bit to the end users of those systems. At Alfresco, we are one layer in that open source stack and the user is free to choose that component or any other in the open source stack.

I plan on attending this session and seeing what others think.

AIIM - Web 2.0 and Enterprise Content Management

Next week, I'll be in Boston speaking at the AIIM conference there. My topic of discussion is Web 2.0 and the next generation of Enterprise content management. Joining me is Wilson D'Souza from MIT. We will be speaking at 2pm on Tuesday, the first day of the conference. I hope you can come.

As usual, I am putting finishing touches on my presentation. However, I found some interesting material for the presentation from Kathy Sierra's blog. Kathy has been the source of blogging lately because death threats aimed at her by some thugs posting gruesome images in other blogs. This is her first post-trauma blog and includes some interesting charts that illustrate what is going on with Web 2.0. Here is one my favorites from the post because it really hits home what is different for the ECM world. Click on Web 2.0 to see some of my thoughts.

Glibwin

OpenSearch in Alfresco 2.0 Federates the Distributed Organization

Alfresco's Chief Architect Dave Caruana’s blog post on Alfresco’s support for Open Search in our version 2.0 should not go unnoticed. Dave's latest contribution to Alfresco is a big deal and I think it is a first in the Enterprise Content Management space. OpenSearch was originally created by A9 to provide a mechanism to aggregate search results from multiple search sources. Now it is supported by literally hundreds of search engines. Alfresco is one of the latest pieces of software to support OpenSearch.

Opensearch_3

Alfresco supports OpenSearch both as a client and as a server. This means that you can include one or more Alfresco repositories as well as the internet or any other search engine in your web browser, a portlet or other type of search tool. Firefox 2.0 and IE7 support OpenSearch from their built-in search tools and it is now easy to add Alfresco searches. In addition, you can search multiple Alfresco repositories and the internet from the Alfresco web client. This is a powerful tool to bring content into a repository and to aid in collaboration. We have been able to use both tools in customer implementations that include multiple Alfresco repositories with blogs, wikis and external search engines.

Opensearch_screen_1

This brings a whole new definition to Federated Repositories. Alfresco can now join federations of search engines and collections of repositories. Existing tools like IBM’s Venetica and EMC’s AskOnce use proprietary connector technology that relies on centralized topologies of integration. Because these searches rely on proprietary interfaces, what repositories and sources are supported depends on the vendors or after-market suppliers to provide the connectors. If these engines wanted to include the list of information source provided above, then they will eventually have to support OpenSearch.

The Alfresco approach is different in that it uses a standard protocol and allowing departments and individuals to configure how they want to federate their searches. It allows for a loosely coupled topology of repositories that can grow and fuse repositories as business needs require. Departments can try Alfresco at their leisure and then merge results through federated search, even if the collaborating organizations are outside the enterprise. Also, searches do not need to be limited to what Alfresco supports. The user can include searches to information sources that are important to the task at hand. This is a leap away from the repository-centric view where all activities revolve around the repository to a holistic view of information in which the repository plays a supporting role.

The OpenSearch really a collection of technologies based upon relatively simple, standardized protocols. http://en.wikipedia.org/wiki/OpenSearch The search engines are described using XML that is extensible and therefore expandable to support more sophisticated ECM types of searches. The search itself is invoked using a URL and therefore fits the web and REST model very neatly. The search is usually described in a template form, but is extensible to add additional metadata types of searches and is designed to include metadata. Page results are returned in the form of ATOM, RSS or HTML and well suited to work within a browser or to be aggregated in an aggregation tool. We have been able to use an open source Java aggregator as part of our web client implementation. (That’s the beauty of open source - we don’t have to reinvent it every time.)
Please give it a try. You can access the latest version of 2.0 at http://www.alfresco.com/products/docs/releases/2.0/

My Photo

  Subscribe
Add to Google Reader or Homepage
Subscribe in 

Bloglines

Subscribe in NewsGator 

Online
Add to netvibes
Subscribe in FeedLounge

Blog Roll

Powered by TypePad
Member since 02/2005

My Online Status