I was a little surprised by the announcement today that Microsoft offered to buy FAST, the search engine maker. Surprised because Microsoft claims that have the whole search thing sorted in SharePoint after hiring a lot information retrieval talent. And surprised that OEMs dependent on FAST and who compete against Microsoft let it happen. Most notable is EMC with Documentum and Oracle with Stellent.
The press release implies that the purpose of the acquisition is to bolster enterprise sales. In their overview of SharePoint enterprise search, Microsoft states that MOSS enterprise search capabilities provides “enterprise-grade scalability, extensibility, and manageability meet the needs of even the largest organizations.” Jeff Raikes implies that FAST is there to provide the high-end solution contrary to previous claims. SharePoint could definitely use the performance boost.
Is Microsoft just trying to target the general search industry? Are they trying to block any in-roads that Google is making with Google Appliance? Although Google Appliance is only a side show for Google, it is still one of the largest enterprise search vendors, but have a limit of 30 million documents on their high end system. FAST originally made their name in internet search, so is Microsoft trying to bolster its Microsoft Live Search which few seem to like or use out of choice? Are they trying to undermine the ECM industry and their reliance on vendors like FAST for full-text search? My guess is that they are just trying to bridge one of the weak links in their product functionality. This technical note from Microsoft indicates that they have some real issues scaling and an upper bound of 50 million documents for its index server and require complex configurations to go beyond that.
So what do the OEM vendors and customers who are competitors of Microsoft do? Ironically, it puts Oracle in a similar to the position that it put MySQL in when they purchased Innodb. These vendors could do what we have done and use the Lucene open source search engine. We recently performed a benchmark with Unisys demonstrating linear scalability beyond 100 million documents with no inherent blocks to scaling to 1 billion and beyond. Lucene also has related projects such as Solr, Nutch and Hadoop that provide infrastructure for scaling, crawling and distribution. Being open source it is probably the full-text solution of choice for most people building systems from scratch.
The alternative is to go to Mike Lynch over at Autonomy who purchased Verity, the engine software vendors left to go to FAST, especially after EMC Documentum’s decision to OEM the search engine in 2005. Autonomy/Verity still powers the search of a number of other ECM systems. Some are looking at Endeca to provide alternative styles of search that are more aligned with taxonomic search.
Regardless, it would be prudent for FAST’s OEM customers to get off FAST fast. Microsoft is already in a position of locking in a number pieces of layers that users access in the office environment from the proprietary hooks in Office to SharePoint to bundled services in the operating system. For the sake of innovation in the future, we should have alternatives to Microsoft for search.