Off to Davos and the World Economic Form

I am currently writing this on the train from Zurich Airport to Davos journeying through amazing Swiss scenery. The flight from Heathrow was completely full and Davos was probably the reason. I saw a few faces that were vaguely familiar, probably because they won the Nobel prize. In the seat behind me was Emma Thompson, which was a thrill for me and looking at all the faces of the guys behind me queuing to get out of the airplane, probably a thrill for them too. Our investor from Accel, Kevin Comolli, was in the front of the plane.

Switzerland
I am cheating and using o picture from last year's trip.

Lakes as large as seas, snow covered mountains, rushing rivers and cows grazing on pastoral land have been rushing by me for nearly an hour. It's hard to believe it has been a year since the last time I saw this gorgeous scenery, but what an amazing year it has been. The insights that I gained from Davos last year helped me understand what was going on in the world and how it affected Alfresco. Davos was where I finally understood Web 2.0 and it is core to where we are taking the company.

This year's theme for Davos is the Power of Collaborative Innovation, a topic near and dear to my heart. Although, there is less tech content than last year, I will be particularly interested in how innovation and collaboration can be applied to some of the world's most difficult problems. I am also keenly interested to see where the economy is heading. If this group of people don't know, we're in a heap of trouble. As always, I will look forward to the workshop sessions and the one on one interaction that is unique to Davos.

I am going to try to keep up my blogging while I am here and try a Twitter or three, as well as keeping my Facebook page up to date.

Ooooh. It's starting to snow now...

MySQL Acquisition and Enterprise Software

In a software industry that had little innovation and created obstacles for the next class of rising companies, open source is turning enterprise software on its head. Xen Source, Zimbra and JBoss are now part of larger companies acquiring new technologies and new distribution models by leveraging the power of open source. Now we see that MySQL has been acquired by Sun for $1 billion. Sun has been embracing open source more and more under Jonathan Schwartz's watch as CEO and this can be seen as a logical next step in that strategy.

Marten_mickos10052_2

Marten Mickos, a happy man and a really nice guy.

When we started Alfresco, we came in with the assumption that one of the only things that is working in enterprise software is open source. The past year or so have proven this prediction right. Although it wasn't really my prediction. A meeting with Marten Mickos, CEO of MySQL in 2002, helped me understand that, yes, open source really could work. Up to that point, I was of the same opinion as Bill Gates, that open source is equivalent to communism. MySQL helped me understand the power of huge numbers of people using software and the value that support can provide to fund the development of professional software. The fact that the model works means that small open source companies can thrive in an environment of behemoths consolidating stacks and actually create an environment of innovation.

Mysqlconfaxmarkwidenius_2

David Axmark and Monty Widenius, founders of MySQL

When a category has been around long enough that customers know what they want, then open source works really well. MySQL provided a simple, cost-effective database system that meant that you didn't have to install a big, hulking Oracle, DB2 or SQL Server and more importantly, you didn't even have to pay for it. You just pay for support. JBoss did the same thing for app servers, Xen Source for virtualization systems and Zimbra for email. Some people question whether MySQL was really innovating, after all the set of SQL is the same as Oracle had in the early 90's. In reality, there wouldn't be a Web 2.0 or possibly even a Web 1.0 without MySQL. MySQL pioneered the model of Scale Out rather than Scale Up to provide web properties like Facebook, Google, Yahoo, etc. to scale to levels that were unthinkable in Oracle back in the 90s. JBoss, Xen and Zimbra were doing the same to their respective industries and bigger companies were willing to pay for that.

From our perspective at Alfresco, Sun is a great company to acquire MySQL. Sun has proven their alliance and cooperation with open source. And this doesn't change our plans to become a public company. We have created public companies in the past and we intend to in the future. Our sales of support and development of our community have exceeded our expectations and events like this make us even more determined that IPO can be successful for the development of the Alfresco system and the Alfresco community.

Congratulations to Marten and team and good luck in the future. We are looking forward to more successful collaborations and joint deployments of Alfresco and MySQL.

Open Source IPOs in 2008

Matthew Aslett at the 451 Group blogged on a Fortune article on IPOs in 2008 noting that 3 of the top 5 are open source including MySQL, Ingres and SugarCRM, all of which are partners of Alfresco and good friends. The 4th is an open source project sponsor, Parallels. The Fortune noted that even if a recession is coming that doesn't mean that business still don't need to innovate or cut costs.

On MySQL, the company most likely to go first, Matt says:

"MySQL has been talking up its IPO credentials for some time, and a 2008 offering was always more likely than 2007. The plan has not changed as far as The 451 Group is aware. What has changed is that the company has stopped being so open about its financial performance, which is typical of a company preparing to go public. Previously the company publicly claimed revenue of $50m in 2006 and $34m in 2005. Expect an IPO sooner rather than later."

The disruptive nature of open source and its low cost development and distribution model will ultimately thrive in a recessionary environment. The result could be very similar to previous recessions that created much bigger markets for mini-computers, relational databases, PCS, client/server and web-based technologies that ultimately cut costs of using older technology and made people more productive. The same will be true for the current generation of open source tools, applications and technologies. Companies just won't be able to afford using the old technology just because it is there.

That was FAST!

After the Microsoft / FAST acquisition...

Autonomy sign OEM Agreement with EMC

They must have known it was coming. You just don't do OEM deals that fast. So why didn't they announce it the day before yesterday?

Made the Move to Mac

Macpc

I have been using Windows now for nearly 20 years and PCs for over 25. This October my Dell refused to come out of standby mode, which forces me to reboot every single time I leave the building with my laptop. After all those years of blue screens, hanging on large PowerPoint presentations, hanging on network connections, waiting for the laptop to come up when I press the On button, I finally gave up. I ordered a Mac.

I would say half of Alfresco now have Macs. Matt Asay must own Apple shares as he has been the key sales person for all those Macs. The sales organization in the US all have Macs and a lot of the developers are now transitioning to Macs. A lot of our customers are also using Macs. When you look at the pain of transitioning to Vista versus just leaving Microsoft behind, it becomes a much easier decision.

I must say the transition hasn't been too difficult. The first thing you notice is how much faster the Mac is for doing all sorts of things. Coming out of sleep is so instantaneous that it seems like it was on all the time. The user interface takes a little getting use to, but it doesn't look as bad as moving to Vista. Transferring files is much faster. Upgrading to Firefox 3 beta at the same time has made web browsing much faster than before. I am using ChronoSync for synchronizing backups and Vienna for RSS reader. I haven't decided yet between MS Office for Mac, Apple’s iWork or NeoOffice (Open Office). This is my first blog using my new Mac.

During the last 25 odd years, I used Macs and Unix systems in addition to my PCs and laptops. I have been using Unix for over 30 years now and still can use Vi and write amazing shell scripts. When we started Documentum, my desktop machine was a Mac for writing and formatting the business plan and I owned a Mac SE for home use. I have also used Unix systems, mainly Suns and HP, side by side with my PCs and Macs for a very long time. Before that, we all multitasked on Vaxes and even PDP-11s while I was at Berkeley. I still find that I can do more with c-shell, sed, grep and awk for managing and finding information than I ever could with a drag / drop interface. It's nice to get some of that back.

I actually heard the CIO of a major US government agency say they were considering moving to Macs or Linux. The lock-in of new file formats and features in Office Vista were a concern for them. Between that and the user interface and file format issues of the new Microsoft systems, won’t a lot of people be looking back at the last couple of decades and saying "Why?"

Going Nowhere FAST

Fast2

I was a little surprised by the announcement today that Microsoft offered to buy FAST, the search engine maker. Surprised because Microsoft claims that have the whole search thing sorted in SharePoint after hiring a lot information retrieval talent. And surprised that OEMs dependent on FAST and who compete against Microsoft let it happen. Most notable is EMC with Documentum and Oracle with Stellent.

The press release implies that the purpose of the acquisition is to bolster enterprise sales. In their overview of SharePoint enterprise search, Microsoft states that MOSS enterprise search capabilities provides “enterprise-grade scalability, extensibility, and manageability meet the needs of even the largest organizations.” Jeff Raikes implies that FAST is there to provide the high-end solution contrary to previous claims. SharePoint could definitely use the performance boost.

Is Microsoft just trying to target the general search industry? Are they trying to block any in-roads that Google is making with Google Appliance? Although Google Appliance is only a side show for Google, it is still one of the largest enterprise search vendors, but have a limit of 30 million documents on their high end system. FAST originally made their name in internet search, so is Microsoft trying to bolster its Microsoft Live Search which few seem to like or use out of choice? Are they trying to undermine the ECM industry and their reliance on vendors like FAST for full-text search? My guess is that they are just trying to bridge one of the weak links in their product functionality. This technical note from Microsoft indicates that they have some real issues scaling and an upper bound of 50 million documents for its index server and require complex configurations to go beyond that.

So what do the OEM vendors and customers who are competitors of Microsoft do? Ironically, it puts Oracle in a similar to the position that it put MySQL in when they purchased Innodb. These vendors could do what we have done and use the Lucene open source search engine. We recently performed a benchmark with Unisys demonstrating linear scalability beyond 100 million documents with no inherent blocks to scaling to 1 billion and beyond. Lucene also has related projects such as Solr, Nutch and Hadoop that provide infrastructure for scaling, crawling and distribution. Being open source it is probably the full-text solution of choice for most people building systems from scratch.

The alternative is to go to Mike Lynch over at Autonomy who purchased Verity, the engine software vendors left to go to FAST, especially after EMC Documentum’s decision to OEM the search engine in 2005. Autonomy/Verity still powers the search of a number of other ECM systems. Some are looking at Endeca to provide alternative styles of search that are more aligned with taxonomic search.

Regardless, it would be prudent for FAST’s OEM customers to get off FAST fast. Microsoft is already in a position of locking in a number pieces of layers that users access in the office environment from the proprietary hooks in Office to SharePoint to bundled services in the operating system. For the sake of innovation in the future, we should have alternatives to Microsoft for search.

Happy New Year and Happy 3rd Birthday Alfresco

This is more or less Alfresco’s third birthday. More or less because we started Alfresco in earnest in the new year as people were coming back from the holidays. Early 2005 was an exciting time, since we knew we wanted to create an open source enterprise content management system, but we didn’t know exactly who was going to buy it or how the open source model would work. With 2007 just completed, we have learned a lot and the future looks to be just as exciting as our first year. Alfresco is in its third year of exponential growth thanks to all of you who not only downloaded the software but deployed it in the tens of thousands of live systems and your active participation in the community.

Start2005
Every company needs to start with table football.
L-R: Dave, Kev, Derek and Roy in early 2005

The year started by focusing on our community and nothing could have been more important than our decision to move to the GPL license from our previous modified MPL license. With this we made the entire system open source with an OSI approved license and decided not to withhold any features or bug fixes. We would encourage the community with full feature set and encourage enterprise customers with support and more testing and certification on different platforms, a model that most open source companies are adopting including MySQL and RedHat. CMO Ian Howells and his team are responsible for getting the world to know about Alfresco with a budget that is a tiny fraction of what anyone else in the ECM industry spends by building on an open source foundation and helping community development. Ian has hired Nancy Garrity as a community manager and we are in the process of revamping the whole community infrastructure. The result has been a dramatic growth in the community, over a hundred contributions, and our first user community meetings in New York and Paris.

Paris1
Kevin Cochrane and Paul Holmes-Higgin presenting at the Paris User Conference

Our engineering group led by VP of Engineering Paul Holmes-Higgin and Chief Architect David Caruana, expanded functionality of our ECM capabilities while providing excellent support for customers and increasing robustness and scalability of the Alfresco system. During 2007, Kevin Cochrane, Britt Park and Jon Cox led the release of our web content management product, although almost all of engineering was involved in the WCM application, runtime or deployment services. WCM has already had a significant impact on the product, the community and our customer base. During 2007, Activision, EA Sports, Harvard Business School Publishing, Kaplan Educational Services and Swisscom launched internet websites on Alfresco. Web Scripts, the brainchild of Chief Architect David Caruana, uses REST as a web-oriented architecture to make it easy to create both mashable user interface components and new data APIs. Web Scripts enabled us to quickly create Microsoft Office extensions and integrate Alfresco into all sorts of environments such as Facebook and iGoogle as well as standard portals. The simplicity of web scripts has also led to a lot more contributions of new functionality to the community such as the new calendaring functions provided by the London Boroughs of Islington and Camden.

Facebook
Dave Caruana's Facebook enhanced with Alfresco content thanks to Web Scripts

Enterprise sales and support grew dramatically and allow us to make the Alfresco system available free and open source. Matt Asay finds time between blogs on CNet to sell and hire the rapidly expanding US team. Denis Dorval, previously from FileNet, was promoted to VP of European sales and expanding a strong partner network here in Europe. The speed with which companies are adopting the enterprise system has surprised even us. I normally find out about and am surprised what new companies have bought an enterprise license during our end of quarter review.  This meant that we added hundreds of paying customers in 2007 and Helen Dann has been furiously hiring both here in the UK and in Austin, Texas to support them. In addition our OEM business has been growing very strongly with more companies, such as Ricoh and Quark, incorporating either our lightweight repository or our CIFS capability with the newly GPL’ed JLan engine developed by Gary Spencer.

The coming year is shaping up to take Alfresco into the realm of greater collaboration and social computing as a natural extension of our Enterprise Content Management business. In 2008, we will be developing enhanced collaboration features, integrate Web 2.0 and social networking services into our applications, and take Alfresco services to the outside world as “Content as a Service”. The idea behind this is that ECM is no longer about application suites, but accessing and contributing content wherever it is needed, inside or outside the enterprise. Briana Wherry and her growing team are developing new documentation and training to help you learn more about these new and existing capabilities. We will be expanding our footprint into Europe with more support, marketing and sales in more countries and increasing the depth and breadth of experience in the US.

On this third birthday, I would like to thank all the people of Alfresco for their efforts who are now becoming to numerous to name. We are now getting close to seven times the number of people we had at the start. I would like to also thank all the people who have been active in the community and spreading the word about Alfresco and actively contributing to its success, especially people like Russ Danner, Jeff Potts and Ray Gauss. I would especially like to thank the original team that came together in that small room in Maidenhead in January 2005 - John Powell, Andy Hind, Dave Caruana, Derek Hulley, Gavin Cornwell, Kevin Roast, Linton Baddeley, Paul Holmes-Higgin, Roy Weatherall, and Steve Rigby. Thanks for believing.

A Manifesto for Social Computing in the Enterprise

Investment in the infrastructure of the internet has dramatically increased bandwidth to everyone in the developing world and created home computers that are not only inexpensive, but very powerful. This change has expanded the usage of the internet exponentially and introduced new demographics and generations of users that had not used computing prior to the expansion of the internet. These users have themselves created the content and applications that feed the internet and have set expectations of the applications that we use in web browsers and new mobile devices. The increased bandwidth has made this experience much more interactive and visual experience encompassing video and visual elements. Web properties such as YouTube, Google, Amazon, Facebook, MySpace, and Flickr have set the benchmark for expression, accessibility and social interaction of computing systems.

Dubbed Web 2.0, this revolution in computing has shifted the face of software from a logical, linear, and introverted science to an expressive, graphical and social art. New designers of web sites, unschooled in traditional software techniques, are nonetheless able to create software that scales to millions of users and billions of objects of information and still meld those users into an artistically aware community. The next generation of enterprise employees who started using the internet in their early teens have only known this evolving culture of free and creative development of the internet and now demand better of the enterprise software that they meet. Older employees also know that that the software that they use on a day to day basis can be better. Enterprise 2.0 seeks to emulate the success of Web 2.0 in the creation of new software for the enterprise.

Social Computing

The shift of computing power from business logic and calculation to socialization and people-orientation has been dubbed by some as Social Computing. The term Social Computing has been used interchangeably with Enterprise 2.0 or Enterprise Social Applications, however, IBM and Microsoft have created Social Computing research centers and Forrester has started to use the term in describing next generation enterprise collaboration. Social Computing is the use of technology to support sharing of information and enabling collaboration through social networks and to tap into the value of the “Wisdom of Crowds”, a concept made famous by James Surowiecki in 2004 to explain how many people are smarter than individual experts. Social Computing exploits software oriented toward people and Social Networks, the extended relationships of individuals, to connect to more people and access the Wisdom of Crowds.

To tap into the wisdom and awareness of social networks and empower people to collaborate at any time or place, Social Computing platforms need the following capabilities:

  • People - Support information about people, their preferred communications, their relationships and affiliations, since social networking is all about people rather than just systems, data and objects. The more information available about other users, the more likely they can be found as a source of knowledge.
  • Context of Networks - Social networks organized around projects, teams and departments provide the context of work and relevance of information as it spreads from creation to the people that need that information. Social networks, especially networks extended beyond the enterprise, provide the greatest differentiation of social computing from previous generations of collaboration.
  • Social Collaboration - Provide an environment where people can share ideas, contribute knowledge and solve problems in creative, unstructured socialization as opposed to rigorous workflows that are required for control of information. Next generation tools use techniques developed by Web 2.0, particularly those tools that empower social knowledge, such as social tagging, integration of communication and awareness of changes in social networks.
  • Content as a Service - Content is the container of knowledge and information and is core to the socialization of information. Content needs to be accessible everywhere, not just in large, monolithic applications. Content capabilities need to be accessible as reusable service components. Social computing can happen inside the enterprise or outside and a channel can be a web site, web application, mobile device or even external web platforms such as Facebook or Google applications. Mashups can occur inside the enterprise or outside and the channel will require content as a service that can securely be accessed wherever it is needed or wherever it is contributed.
  • People-centric Tools - As Web 2.0 has spread new paradigms of user interaction, the consumerization of software has created expectations that enterprise software becomes easier and empowers user to contribute, correct and classify content and information within the context of social networks. AJAX and next generation rich internet application interfaces such as Adobe Flex will provide users with a much richer, more intuitive user experience and the ability to scan much more social knowledge to find ideas and solutions. These tools should themselves be componentized and accessible as a service so that they may be mashed up with other sources of social knowledge.

This does not mean that the need for traditional enterprise content technologies such as document and records management goes away. They are still repositories of the truth and verifiable information and thus play an important role in sharing knowledge within social networks. However, these traditional technologies lack the usability, empowerment, and breadth of reach that Web 2.0 sites provide. They lack the collaborative nature that invites in people without barriers and restrictions to contribute to the sharing of knowledge and information. Web content management for creating a richer Web 2.0-style user interface becomes even more important to this collaboration to provide a compelling face to the interaction and to simplify the access and navigation of shared information. Enterprise Content Management cannot become one of the principle platforms of Social Computing unless it addresses the requirements of Social Applications.

Use of Social Computing

The balance is shifting from contained and controlled companies to engaged and empowered collaborative enterprises driven by Web 2.0-inspired social computing. At the center of the shift from old models of computing in the enterprise to new social models are companies that are inspired to innovate or to engage more with their customers. This includes companies not just using their internet or intranet web sites, but engaging in social networking channels such as Yahoo, Google, YouTube, Facebook and MySpace. Those using social computing are interested in engaging people, such as customers, employees or partners. They are using new people-centric tools and facilitate creating or extending existing social networks.

Major ECM vendors are all planning their Social Computing efforts and to a large extent are being dragged in this direction by their more forward-looking customers. Enterprises that have discovered the value of Social Computing are:

  • Consumer-oriented companies that particularly address a younger demographic must engage their customers as part of both the marketing process as well as the development of new products. For example, games and film companies that engage their viewers in plot and scene development do much better than those that keep everything under wraps until the game or film is ready.
  • Enterprises hiring a new generation of knowledge workers who grew up on the internet must provide tools as empowering as those available from Web 2.0. Turning these tools off forces these workers to seek employment elsewhere and forcing them to use tools that do not meet their expectations of usability and engagement.
  • Financial Services firms are leading the shift in usage of these technologies. Financial Services have always been innovators in developing new technologies and investing in providing better service for their clients. Speed in innovation in these services becomes a major competitive advantage where churn of clients can be very high in turbulent times. Internally, competition for talent is intense and providing better support is important for attracting and retaining employees. In particular, young and ambitious brokers and managers are more likely to be sociable themselves and seek out Social Computing inside and outside the enterprise.
  • Government and Non-Profit organizations that provide services and citizen feedback online find increasing their IT budgets much easier than those that merely arbitrated by a front-line service. It is now inconceivable for an American politician to run for office without an extended internet presence such as Facebook or YouTube.
  • Enterprises that have faster cycles of product innovation, especially high tech, are looking to their customers and partners to participate in the development of new products and services. In previous generations, the field acted as a filtering mechanism of new customer requirements and ideas. However, today technology can provide a frictionless way of getting the entire enterprise to exchange ideas and improvements with the customer communities.

Integrating Social Computing

Because Social Computing is unlikely to come from a single source, especially because of the diversity of sources of knowledge and social networks available on Web 2.0, it is extremely important for the enterprise infrastructure for Social Computing to be integrated with those sources. This means bring these sources into the enterprise and bring the enterprise sources out to Web 2.0. No matter where the people collaborating are, the tools they want should be available. To facilitate this, the Social Computing should be:

  • Open Source - Through being developed through social computing paradigms and sharing best of breed components with the open source community, open source systems have evolved rapidly and encompass social computing capabilities developed by the open source community. Social tools such as MediaWiki, the wiki that powers Wikipedia, and WordPress, the most popular blogging software were developed using open source.
  • Integrating the Inside Out - By providing content as a service and enabling light-weight, Web-Oriented scripting development, the Social Computing platform should quickly integrate content services into external channels and web sites, such as Facebook and iGoogle, to allow enterprises to engage customers, partners and home workers.
  • Integrating the Outside In - If the Social Computing platform is modular and supports a Web 2.0-style mashup-oriented architecture, it enables users and teams to integrate external open source tools and social networking web services, such as Facebook, LinkedIn or other Open Social-enabled properties, to tap into the wisdom of crowds available on the internet and to make customers and partners part of team collaboration.
  • REST-style Architecture - A Web 2.0-style or REST-style of architecture using easy, light-weight scripting languages and integrated through internet standards-based APIs can easily mash-up content services into any web-oriented application or web site. These architectures should be scalable, fault-tolerant and high performance to meet any enterprise or internet requirement.
  • Choice - The Social Computing platform should be based upon open interfaces developed by the open source community to provide choice of operating system, database, application server, content authoring tools or APIs.

Over the past year and continuing into the coming year, Alfresco is dedicated to expanding its architecture and applications to enable this vision of Social Computing. We will work with partners and open source community to provide best of breed open source tools for enabling this architecture. We will integrate with external Social Computing properties such as Facebook and the Open Social alliance to expand the breadth of social networks and the ability to collaborate through those networks. We will be expanding the Alfresco system’s understanding of users as people and facilitate sharing of information and content through their networks. We will be open in the process and seek and encourage your feedback and participation.

Access to the Alfresco web site

This weekend there was a problem with our domain registrar, Reg-123, and the sites for which they provide domain services. This means that anyone trying to access our web site via web access or email cannot find our servers. The Alfresco servers are fine and running, but the alfresco.com domain is not.

If you want to reach anyone in Alfresco by email, you can do so by using <email address>@alfresco.org instead of <email address>@alfresco.com.

To access the web site, please use the following URLs:

We will try to get this resolved as quickly as possible. Rest assured we will take steps to make sure this never happen again.

Scaling Out Like Technorati

My fellow World Economic Forum Technology Pioneer, David Sifry, the founder of Technorati, was also in Dalian, China for the “Meeting of New Champions” or “Summer Davos” as the Chinese like to call it. During Davos in January, I had the great misfortune of pitching Alfresco against Technorati in a competition between tech pioneer companies. As fantastically well as Alfresco is doing, Technorati has the temerity to compete against Google in blog search and win.

I got the chance to talk to Dave during the conference and ask him some questions on the technology and architecture behind Technorati, the internet blog search site. I thought that someone who could take ordinary computer components and build a huge internet architecture could possibly teach something to people running enterprise architectures that are puny in comparison.

Technorati is a web site that tracks blogs, pictures and any user generated content and allows you to search those sites about what people are thinking, seeing and hearing. When a new or urgent situation breaks out, you can do worse than to search Technorati for immediate reaction. Every day, every hour, every second, Technorati is indexing over 10 million blogs with over 10 billion objects. Technorati’s user base is doubling every six months and quick and accurate response is critical for retaining those users.

Davidsifry
David Sifry, Founder and Chairman of Technorati

I asked Dave about his architecture and what applicability their might be for enterprise architectures.

John Newton: In building Technorati, what were some of the issues that you had in architecting your systems.

David Sifry: I was looking at just temporal information. I had no idea how big it could get. When I looked at the architecture, instead of architecting it right, I architected it for right now. I had no big budget and I didn’t want to wait six months to build it. Also, I had no idea what the killer app would be.

I focused on data flexibility. At the time, that meant putting everything into a relational database. That was okay while the size of the indexes is less than RAM and about a million blocks of data. That was okay while there were less than 20 million blogs.

The next generation took advantage of data parallelism. That meant upon update send a signal to all the other systems. We expanded the data over several “shards” [segments of data partitioned on different databases on separate machines].

What was surprising was that we were writing as much data as we were reading. At this point Technorati was as big as some of the biggest OLTP. Even so, maintaining data integrity was important, because you would want the link count [count of how many other blogs point to a particular URL] to be out of sync. This put real pressure on the system. At the same time, we realized that time was more important dimension than URL. People didn’t want to sort or search on URL, they wanted to search on time. [i.e. what are the latest blogs on a particular subject?]

By this point, we understood the application more and more. The app [Technorati] is about real time access. You need to be able to count on finding latest information on a subject. That’s when we built the third architecture. Scaling was well understood and we build the shards on time rather than on URLs. Instead of putting data into a DBMS, we put it into special purpose databases. It was more of a bus-based architecture. Each database could be scalable and grow as big as we needed.

JN: The notion of shards - did you call it that at the time? I have been looking into shards and I was only aware of or heard of them for about the last year.

DS: Back in 2002 when we were pitching this to VCs, I at least explained the theory. All I just thought through the problem carefully. Doing it this way, we could add hundreds of systems, lots of cheap CPUs, RAM and disks. It provides inherent parallelism. I can’t believe that I was the first one to think this up.

JN: How big does this architecture scale?

DS: We are loading one terabyte a day into Technorati. That’s 100 million blogs or about 10 billion objects. A lot of is new types of tagged data. There are about a half billion videos and photos.

With all that data, you have to think about what do you throw away?  We can’t really delete anything, because we are potentially losing an asset. We don’t delete anything. So we take data out of the spin cycle. [Transitory data used in preparation.] We take the long-term data and put it into low latency storage.

When data is doubling in size every six months, that means that only one quarter is a year old. We don’t need to worry old data.

JN: How do you deal with large number of users with very large data sets?

DS: Any off the shelf tools falls over. There is a lot of interesting analysis on old data, but no off the shelf tools can handle that much data. It’s only just now that some tools can handle it.

JN: What are those tools?

DS: One is Green Plum by a bunch of O’Reilly guys. If you use ordinary data warehouse tools, they would just scream and shout.

JN: Actually what I was originally referring to was the fact that you are showing lots of data that are not users used to enterprise information management tools. How do you present this information to consumer-level users? How do you deal with the user interface and visualization of all this data?

DS: Gotcha. It depends on what the user wants to get out of Technorati. If the user wants search results, then we give it to them. Sometimes they want to browse or discover information. We have spent a lot of time on visual design. Then we give them lots of bright, shiny things for them to click on.  Things like metadata, video or other links.

We have used enterprise class web tools to analyze what users are doing? We look at the click stream and see what is successful or not. That helps to make the information contextual.

One of the big mistakes that we made is to not do this [buy click stream analysis tools] sooner. It was only $80K. Up to that point it was so much trial and error. I’m glad we finally did it. Now we can see how much time a user spends on a feature. We can see page views, goals per visitor.

JN: So what do you measure on Technorati?

DS: Measuring a web site is like forecasting the weather. Yesterday it’s sunny and today it is cloudy. Why is it cloudy?  Sometimes you have no idea. Sometimes you realize that that a change in barometric pressure has a lot to do with it.

We look at the number of newbies, number of reports, session lengths and then measure them against prior periods. It’s not always consistent.

I had never built a B2C site before. I just focused on me, on what I wanted. That worked well for a while when I was the target audience. But we have to build for a broader audience.

JN: At Alfresco, we measure conversions. Are you measuring things like performance? Does that affect retention of users?

DS: Of course, but if the system is falling down, then even performance doesn’t matter. So I don’t get too stressed out about it.

JN: When we met at Davos you wanted to move Technorati to be the Internet Now! Is that still the case?

DS: Everything is shifting. I wanted it to be a site that everyone is able to use. We forgot about the core users that just wanted to find out about blogs and any real time information. In an attempt to jump the chasm, we chased after 100 million users and tried to be everything to everyone. Now we try to make blogs and user driven content available for those looking for that.

Also performance is improved significantly. Now I notice how slow other sites are. This is a total tribute to the engineering team. Everything is easier and faster.

Pretty soon we will have a whole lot of stuff that we have been working for a year.

JN: Can you say what it is?

DS: I don’t pre-announce.

JN: What does the Technorati brand stand for today?

DS: Good question. What’s popping up now on the internet, especially user generated content? It’s about users tagging user generated content and finding it.

JN: Who are your competitors?

DS: I probably sound like the typical entrepreneur, but nobody really seriously. Google provides blog search, but other than that nobody really. Other people are trying to identify and tag information like Digg and del.icio.us, but they aren’t really competition.

JN: What do you want Technorati to be in two years time? Five years would be ridiculous.

DS: I would like Technorati to be a profitable business that is strongly differentiated. It will be the place that you would go for mobile, RSS or push information. For all that you would come to Technorati.

My Photo

  Subscribe
Add to Google Reader or Homepage
Subscribe in 

Bloglines

Subscribe in NewsGator 

Online
Add to netvibes
Subscribe in FeedLounge

Blog Roll

Powered by TypePad
Member since 02/2005

My Online Status