Archive for May 2008

Data Warehouse Appliances - Apples to Apples (update)

I’ve updated the data warehouse appliance spreadsheet, adding two new solutions (SAND & Calpont).  I’ve also updated information on several existing vendor solutions.

Buzzword: “Single Version of the Truth”

I was once a firm believer of the SVT concept, and I still believe in the fundamental principals.  But it’s not as cut and dried as people make it out to be.

On it’s face, a SVT seems to be a noble goal.  Who doesn’t support the truth, right?  The problem is, who defines what the truth really is, and in this case “the truth” maybe different depending on your audience.  I’ll give you a simplistic example: who is your customer, where are they located, and how valuable are they to your company? If you’re in sales, the customer is the one who made the decision to buy your company’s product and pays the bill.  If you’re in engineering or customer support, the customer is the person using your product and is requesting service enhancements or technical support.  That’s the easy part, determining the value of the customer is more difficult, and involves a number of variables that may or may not apply depending on where you sit within the organization.

These are certainly not insurmountable issues by any stretch, but they underscore the nuance required in getting to the “truth”.

Analytics in the Cloud

I attended a webinar today sponsored by Amazon and Vertica called “Data Analytics in the Cloud“.  The Vertica portion was mostly a duplicate of a prior web cast, but the Amazon portion on the Cloud concept was very interesting.  The key points of the cloud concept as I see it are:

  • Pay as you go model - you only pay for the disk space and processing you consume. No start-up costs, but you have to sign a contract. (They claimed the cost was 1/2 that of an in-house solution)
  • Time to market - hours instead of weeks to turn up a terabyte sized system, including hardware, OS, and the Vertica column based database
  • On-demand scalability - seamlessly scales to meet your demand
  • Proven platform - hosted by Amazon on the same platform that hosts the Amazon.com site

I think the benefits of this approach are obvious, especially for a small but rapidly growing operation.  The infrastructure and software license costs alone would be prohibitive, and time to market is critical especially when launching a new idea.

The downsides include:

  • Security concerns, especially for highly sensitive customer data
  • Performance - both in terms of loading large amounts of data and in real-time queries
  • Long-term cost - as with any usage based cost model, the upfront savings could be surpassed by subsequent usage fees

Buzzword: “ODS”

The Operational Data Store (or ODS), is classically defined as a physically integrated view of all or part of the transactional data environment.  The term is generally used in conjunction with a Data Warehouse (DW) and Data Marts (DM) to form the analytical data architecture triumvirate.

The ODS typically distinguishes itself from the DW and DM in two ways:

  1. Latency - the ODS is generally populated more frequently than a DW or DM, and newer systems offer near-real time access to underlying transactional data, either via virtual data integration or trickle feeds that populate the ODS on a continuous basis.
  2. Data structure - the ODS is typically a more normalized model than a DW or DM.  This facilitates lower latency refreshes as the model more closely matches the transactional system.  This also support the type of reporting and data distribution methods typically seen with an ODS, e.g., spreadsheet like operational reports, or data feeds to a customer care application.

In additional to providing the DW/DM with a clean integrated view of the transactional environment, the ODS directly supports business groups such as Customer Care.  Integrating care applications with the ODS allows for richer customer data for screen pops, real-time insight into multiple communication channels, and access to all products and services for a customer.

That being said, the ODS as was defined 10 years ago is dying, and is being replaced by EII technology that combines virtual and physical data integration with a meta data layer providing end users with a deeper understanding the the data.

Buzzword: “EII”

Enterprise Information Integration is a hybrid of service oriented architecture (SOA), enterprise application integration (EAI), virtual data integration, and physical data integration, with a little meta-data management thrown in for kicks.  The concept is to throw a layer on top of physical data storage, in order to provide a single interface to end users or other applications. (Wikipedia entry)

This layer generally includes interfaces into physical data stores, message buses, and other sources of data, with a meta data component to tie it all together.  A conceptual data layer is then defined which is modeled based on the consumer desired view of the world.  The final piece is the interface methods for end users or applications to access the conceptual information view.

That’s the technical view of EII, but the real business benefit is in providing real or near-real-time access to business information without having to navigate the underlying data stores.  The value is realized on two levels:

  1. Integration of data elements across disparate systems - this is the grunt work associated with mapping a customer name between two systems when one stores the name in a single field and the other stores it in three separate fields (First, Middle, Last)
  2. Providing contextual understanding of information - this is where the meta-data comes into play, by providing the end user with background and additional meaning to the information

Of course there are a number of companies claiming to provide a complete EII solution, but in my mind a true EII solution is too broad for any one product.  It should be treated as a business solution by starting with the benefits and working back to the appropriate technologies required to deliver those benefits in the most cost effective manner possible.

Buzzword: “Data Warehouse”

I almost didn’t bother with this one, since it’s almost too generic now to be useful, but it does deserve a few sentences if nothing more than to pay respects.  (Wikipedia has a good definition and some of the history around the term if you want background.)

Nowadays, I find that the term is hardly used anymore, probably because of the proliferation of more specific terms that describe the individual components (e.g., ETL, data quality, EII).  I think another reason is the move to more real-time analytics, and the term “data warehouse” conjures up visions of static information sitting in an Oracle (or Teradata) database. 

Ten years ago, all you had to do was say I’m building a “data warehouse” and most people knew what you were talking about.  Now, it could mean a dozen different things, which makes communicating more difficult.  It would be nice to have all of this wrapped up into one nice term that everyone can agree upon, but I doubt that’s going to happen, which is actually a good thing.  It means that people realize (both business and technical) that data driven solutions are not one-size fits all, and that there are a myriad of implementation options available.

The “Data Warehouse” isn’t dead, it just lives on in it’s numerous children and grandchildren.

Buzzword: “Data Quality”

Data Quality - everyone wants it, and everyone complains that they don’t have “good quality data”.  But how do you define data quality? What are the business benefits associated with the investment required to improve the quality of corporate data? Those are the questions you should be asking when approached by an angry business user complaining they can’t do their job because their data source(s) stink.

I think the most common misconception around DQ is that it’s an all or nothing proposition.  In reality there’s a cost-benefit analysis required to determine the payback associated with improving data quality.  Raising the data quality bar has a cost, and unless you can justify the expenditure you’re wasting corporate resources.

The business case can range from a simple exercise in comparing the cost of automating vs the current cost of manual labor required to fix and/or circumnavigate around incorrect data elements.  For example, it doesn’t make sense to spend a half million dollars implementing a data quality technology solution, to save a couple of hours a week of a business or data analyst’s time.  On the other end of the spectrum are strategic implications such as financial reporting and risk management, where the reputation of the company is at stake (just ask Fannie Mae).

Look at data quality as a bar that you raise and lower based on cost, business benefit, risk tolerance, and other factors that are important to the corporation.

Buzzword: “Business Intelligence”

The term “Business Intelligence”, or just BI, has been used and abused so much that it has nearly as many personalities as Herschel Walker.  And that’s just within the context of the analytics and data management community, never mind the legions of people who associate it with corporate espionage.

Within the analytics world, BI has taken on (at least) the following definitions:

  • The process of utilizing data for making better business decisions.  Sometimes used interchangeably with business or corporate performance management (originally coined by Gartner analyst Howard Dresner) 
  • Reporting and dissemination of data from a data warehouse
  • All systems required for collecting, integrating, cleansing, and reporting of data
  • Software tools that extract data from a repository (database or otherwise) and present to a user in various formats
  • Metrics used to measure business performance

So when someone uses the term BI, make sure you understand the context of the discussion (and the person’s background) so you’ll know which alter ego you’re conversing.

Data Integration Architecture

In the “causes of failure” department, the lack of a sound architecture is a close second only to the lack of a sound strategy.  And only because the go/no-go decision should be made in the strategy phase, where a number of failed projects should have been weeded out before they ever got started.

I won’t bore you with the details of a data integration (or data management, data warehouse, etc…) architecture - you can download my white paper if you want to dive deeper. First I’ll touch on the key elements of a good architecture, then list the top 3 mistakes:

Key Elements:

  1. Data acquisition - this includes identification of sources, integration approach (virtual vs physical), data quality processing, and any other step needed to gather data and prepare it for use in an analytical capacity.
  2. Data storage - Physical storage includes the old standbys (operational data stores, data warehouse, data marts, etc…) although these distinctions are blurring.  Virtual storage includes enterprise information integration (EII) and other methods that provide a virtual view of data across disparate underlying physical stores.
  3. Delivery encompasses all forms of information dissemination, including traditional forms of business intelligence such as reporting, OLAP, and dashboard.  Also included is integration into other applications such as a marketing automation system.  Often overlooked is the integration back into transactional systems to provide real-time analytics - read The New Age of Innovation(if you haven’t already) to get a view of the value of analytics through the business lens.
  4. Meta data - this is the glue that holds the whole thing together.  When done properly, the first three categories are driven off an integrated meta data repository that supports the development, operations, and end user communities.

Top 3 mistakes:

  1. Not defining the architecture - just like building a house, you must plan out the solution end-to-end if you want all the pieces to fit together.
  2. Taking a product vendor driven approach - buying a tool does not translate into defining an architecture.  The technical architecture should precede and transcend the toolset.  Buy tools that fit into your architecture, not the other way around.
  3. Technology for technologies sake - the term “business alignment” is overused and often misunderstood, but the technical implementation should operate within the parameters set during the strategy process.

360DegreeVendor.com Site Launch

I’ve launched another site, this one focused on vendor specific information.  The site will provide information in three areas:

  1. Indices - includes BI25 market index, number of vendors in the space, and venture capital metrics (total Venture Capital investment, number of VCs actively invested, number of vendors with VC backing)
  2. Reports - includes lists of vendors broken out by corporate type, offering categories, and venture backing, list of venture capital firms and their vendor recipients
  3. Search - capability to search by vendor name, venture capital name, and offering name to get a specific subset of data (to be available in the next 30 days)

There is some overlap with 360DegreeIndex, particular with the first category.  But the focus of this site will be on the vendors and the information I’ve captured around each vendor (corporate type, offerings, venture backing).

Please e-mail me with any comments, questions, or additions to the information provided.