The dreaded question plagues discovery vendors, IT and even industry experts shy away from tackling the costs and complexities created by emerging unified communications systems. Office Communications Server 2007 and other communication systems feed divergent media streams into enterprise archives, corporate legal hold repositories and litigation collections. This ‘simplification’ for the users actually poses serious challenges for search technologies that have traditionally focused exclusively on text.
Finding key terms and phrases buried inside of mountains of recorded phone conversations, voice mails and IM chats can devour discovery budgets and send counsel crying ‘undue burden’ to the court. There seem to be two dominant speech analytics methods: phonetic indexing (first brought to eDiscovery by Nexidia) and transcription or speech-to-text (long dominated by Autonomy’s engine which supports both methods). Phonetic search renders sound wave forms into simplified strings of phonemes that can be indexed and searched. This makes the technology effectively content agnostic, but makes it challenging to integrate with text based search. Speech-to-text has been the foundation for automated conceptual search and improvements in speaker recognition technologies have also increased the value in what was effectively raw dialogue.
Knowing that one can search digitized conversations, the next question is can users effectively search everything within the enterprise system from unified federated search? There is little doubt that the archiving systems are aggressively pursuing acquisitions, partnerships and development to enable ingestion and indexing of every conceivable data stream. All of them started with email back in the late 1990’s. For example, Symantec doesn’t have audio, but jumped ahead with early products to handle IM, file shares, Sharepoint through merger and acquisition.
Mergers inject complexity by requiring integration of technology, services and cultures. Having experienced the Symantec-Veritas integration first hand, it always surprises me to see pundits and bloggers jumping all over the expected personnel departures. Moreover, other restructurings that occur in the wake of big M&A moves. Instead of looking at who departed, which products are End-of-Lifed and which partners jump ship, I think that we should look at who stays, fundamental IP integrations and new solution offerings to get a better idea of where the new joint entity is headed.
As a former customer of Zantaz’s Introspect, I watched press releases closely after the Autonomy acquisition last July. I was surprised to see how quickly all of Zantaz’s products were integrated to the IDOL search architecture. A two to three month integration tells me that the business units got the resources and investment needed to make necessary changes under the hood. The back channel continues to be positive. For example, after speaking with two big Introspect customers at a conference last week, it was clear that Autonomy had greatly improved search speed and performance. The back end administration and database complexity still seems to be an issue with some customers, but systems that offer enterprise scale will require enterprise level investments in architecture, support and consistent project management to succeed. Perhaps that is at the root of the shift within the Zantaz channel to go up market and target larger sales?
If we are going to demand that large public corporations make all of their communication data streams accessible and manageable to respond to the new FRCP requirements, then we have to expect further consolidation in the maze of different applications used today. Discovery, ILM, Retention Management, Enterprise Content Management and all the other flavors of alphabet soup are just ways of saying that companies are responsible for administrating their information assets, including formerly transient forms such as audio, IM, PIN-2-PIN, etc.