Categories
Latest Postings
Links
|
Author ArchiveGreen Data Warehouse Top 1023. February 2009 by Rick Abbott.
I just published the second in the series of “Green Data Warehouse” articles in BeyeNetwork. This article, “Top 10 Things You Can Do to Improve Energy Efficiency“, provides a pragmatic list of 10 things you can do immediately to reduce consumption at the system level. Joe Foley (CTO Illuminate) left a comment on the benefits of “overloading processors rather than I/O”. The EPA study cited in my article found that CPUs consumed about 31% of the energy in an average system, more than any other component and more than 6 times the amount consumed by disks. So I think making more efficient use of CPU (and therefore reducing the number of CPUs) would be more beneficial than reducing disk. Also, I want to thank Scott Humphrey (Humphrey Strategic Communications) for helping pitch my article series to BeyeNetwork. Posted in Articles | 3 Comments » New to the list - InfoBright & Aster Data19. February 2009 by Rick Abbott.
I’ve added two new (to me) database appliance vendors this past week, InfoBright and Aster Data. The revised vendor matrix is attached. I spoke with InfoBright CEO Miriam Tuerk about their product, and my initial reaction was that this was just another run of the mill column oriented database built on an open source platform. After the discussion three key differentiators stood out:
Their business model is structured to mirror that of MySQL, with a revenue stream tied to support, training, and some consulting. But the secret sauce is the combination of open source availability with innovations in the architecture. The Knowledge Grid, combined with their Data Pack storage method, provide linear scalability, massive compression, and query acceleration. Miriam provided several case studies that showed both rapid deployment (under 24 hours in one case) and extreme compression (over 30x). Under the covers its a column store database built on Red Hat Linux. They currently run on an Intel or AMD platform, but are planning a Windows and Solaris version this calendar year. I’m looking forward to continuing discussions with Miriam this week, and may co-author an article with her. I also spoke with Steve Wooledge from Aster Data this week. He gave me an overview of their nCluster database. Built on top of the Postgress database, this MPP platform offers extreme scalability through a clustered architecture. MySpace uses nCluster to collect large amounts of information every hour for analytics purposes, requiring only 1/2 a resource to maintain the system - testament to their claim of “hands-off” system management. They also run on standard Intel x86 machines, and have recently launched a “green” initiative whereby they give customers credits for each piece of existing hardware they reuse. They have also launched a cloud version of their software, nCluster Cloud Edition, that runs on Amazon Web Services. The only concern I have is Steve didn’t have a good answer to the question around long-term management of hot-spots in the MPP environment, although the MySpace example seems to show they have a solution in place. Posted in Vendors | 1 Comment » illuminate8. February 2009 by Rick Abbott.
I’m getting a lot of interest recently in the data warehouse appliance chart I’ve been maintaining. I just spoke with Joe Foley, CTO of illuminate and added them to the spreadsheet. Their flagship product, iLuminate, stores data in a “value based storage” methodology that is neither row nor column. Essentially each unique data element in the database is stored, with all relationships (forward and reverse) captured in a pointer fashion. According to Joe this enables the database to realize significant compression of greater than 50%, while being able to scale in a linear manner without bound (except for 64-bit addressing limit in the hardware). The only run on a Windows platform, but are planning on rolling out a Linux based system in 2009. illuminate also has an analytical package called iCorrelate, which provides ad hoc reporting and analysis capability. I have calls scheduled this week with Infobright’s CEO Miriam Tuerk and Aster Data. Will post a new sheet at the end of the week. Posted in Vendors | 1 Comment » Data Warehouse Appliance Vendors23. January 2009 by Rick Abbott.
I had a briefing from Kim Stanick, VP Marketing for ParAccel yesterday, and made a few updates to the DW appliance vendor chart I’m maintaining. The chart lists vendor solutions grouped by full stack, database-only, and hardware, and lists key features for each vendor such as architecture (MPP vs SMP, row vs column orientation), DBMS and OS platforms, and key integration partners. If you would like to add your solution to the list or update an existing entry please send me an e-mail at rick@360degreeview.com Posted in Vendors | 1 Comment » The Green Data Warehouse19. January 2009 by Rick Abbott.
I just published the first in a series of articles on green computing. The article was released on Thursday, and provides an approach for measuring the energy consumption of servers and other hardware components typically used to support a data warehouse. I also discuss how this consumption links to energy usage in the data center. Next month’s article will be a “top 10″ list of techniques for improving energy efficiency in the data warehouse. Future articles will explore the energy ecosystem - how managing usage at the component level (e.g., servers, disk drives) impacts the public energy grid. It’s not enough for us to make incremental improvements in our usage, we need to make exponential reductions in our grid capacity. To do that we need to employ smart technology that balances usage against capacity. I believe that technology is already present in today’s mainstream data warehouse and business intelligence toolbox, such as business activity monitoring which facilitates adjustments in real-time based on historical data. Posted in Articles | 1 Comment » Green Data Warehouse17. November 2008 by Rick Abbott.
I just finished writing an article on Green Computing in the data warehouse arena, and am following that up with a “Top 10″ list of approaches to make your data warehouse more environmentally friendly. I’ll post a draft list here in a few days, but if anyone has suggestions please shoot me an e-mail (rick@360degreeview.com). I’ll also be looking for quotes or interviews once I’ve compiled my list. On a related but side not, I’ve been reading Tom Friedman’s “Hot, Flat, and Crowded”, which has given me some perspective on the global energy consumption problem. Good reading for anyone interested in this topic. Posted in Articles | 1 Comment » Kickfire Overview31. July 2008 by Rick Abbott.
I spoke with Karl Van Den Bergh, VP Business Development from Kickfire (founded June 2006) today and wanted to share my impressions of their company and offering, Kickfire Database Appliance. Kickfire is venture backed (Accel, Greylock, Mayfield Fund, and Pinnacle Ventures), and is based in The Kickfire Database Appliance has been in beta testing since April, and is scheduled to be launched commercially sometime in Q42008. The two key differentiators of the Kickfire platform are the Query Processing Module (QPM) and the Kickfire Storage Plug-in for MySQL QPM is a SQL accelerator chip, akin to a graphics chip. QPM plugs into a motherboard alongside a standard Intel based quad processor, and other off-the-shelf components. By processing SQL statements on the chip, they are able to achieve significant performance gains, resulting in impressive price/performance and raw performance numbers. Kickfire’s recently released TPC-H numbers for the 100GB and 300GB classes, and set records in those categories for both performance (non-clustered category) and price/performance. They plan to run tests on larger datasets, and feel the existing numbers will scale to these larger sizes. The storage plug-in sits under native MySQL and on top of Linux CentOS. The plug-in provides modern data warehouse features such as column store and compression. The big lift comes from deploying out of the box MySQL – access to the approximate 11 million installations of MySQL and growing. By going this route, Kickfire will not have to certify their platform with the myriad of business intelligence and data integration vendors. As long as those vendors work with MySQL, in theory they should work with Kickfire. Kickfire has a small consulting group focused on installation and configuration of their product, but is putting partnerships in place with larger systems integrators to support full life-cycle implementations.
If you’re running, or planning on running, an analytics solution on MySQL, I think you have to give this product serious consideration. At a starting cost of about $20,000, you’ll be hard pressed to find a better price point on a system in this category. Even if you have another platform for your enterprise solution, it’s worth investigating using Kickfire to support data marts or other departmental level systems. If you’re a Microsoft shop, you’re probably best to avoid this system, unless you’re making a strategic decision to migration part or all of you infrastructure to open source. In most cases, the cost savings won’t justify the added cost and complexity of introducing one MySQL instance into your environment. The big caveat to all of this is the production readiness of the system. Assuming they go production in Q4, they will have had less than 9 months of beta testing feedback. Any early adopters (re: anyone buying this before next Spring) should bake in plenty of internal testing to their deployment schedule, or better yet set this up in a sand-box environment until the 1.0 bugs have shaken out. Posted in Technical Focus, Vendors | 1 Comment » Dashboards28. July 2008 by Rick Abbott.
The term dashboards brings up a number of responses: including the housing for airplane controls, a place to mount your GPS in the car, and surprisingly (to me anyway) the wiki definition of an”application for Apple’s Mac OS X v10.4 Tiger and Mac OS X v10.5 Leopard operating systems” (who knew? - I think Wikipedia needs to work on that one.) In the business intelligence community, dashboard is generally defined as a reporting tool or application that presents metrics or KPIs to an end user. It is meant to mimic the plane reference above, presumably whereby corporate executives could sit in the “cockpit” and watch the dashboard while driving the company. In reality, this rarely if ever happens. The most effective dashboard implementations I’ve seen are targeted at an operations group (say customer care), and are used in a more tactical role. The group leader has the top level view, which displays key metrics for that group along with target values. When she notices a metric that is off base by a certain tolerance (good or bad), she can discuss the delta with the person in charge of that area. The value add for a dashboard is the ability to drill down from the top level metrics, and decompose those numbers into lower level supporting metrics. This fosters communication throughout the group, and allows for quick identification of problem areas or areas of opportunity. One common misconception is to confuse dashboards with corporate performance management (CPM). CPM is a process for utilizing technology to define measures that drive the business, and then managing to those measures. A dashboard is usually an important component of a CPM initiative, but they are not one and the same. Be particularly wary of a dashboard vendor trying to sell you a CPM solution. So what are the key takeaways when considering a dashboard?
Posted in Industry Buzzwords, Technical Focus | 1 Comment » Semantic Web24. July 2008 by Rick Abbott.
I’ve been trying to get my arms around the semantic web movement, and finally decided to devote some time to the topic this morning. First, let’s break down this phrase by defining the two words (courtesy of Websters.com): semantic - “of, pertaining to, or arising from the different meanings of words or other symbols…“ web - “something formed by or as if by weaving or interweaving.” So we have a weaving together of the different meanings of words or symbols, and presumably other objects such as video clips and files. So how is that different from the version of the “web” we’ve weaved today? The answer comes from an old Twilight Zone episode - it’s another dimension. The semantic web concept boils down to providing context (or dimensions) to the words, phrases, files, and other detritus that’s floating around out there now. The “Web 2.0” movement is attempting to address this issue, by building a community that comments on subject areas, thereby giving others context on that subject area. The semantic web concept goes beyond this, by embedding this extra dimension into the structure in which content is stored. Which highlights another important difference: Who gets to define content? The author, or the viewers? Ideally it would be both, with the ability to determine gaps in the definitions. So where “Web 2.0″ supports the viewer definition, the “Semantic Web” as advertised today encompasses technologies that support the author definition. But going back to our original breakdown of this phrase, in particular the piece about “weaving together of the different meanings of words and symbols” - doesn’t that mean capturing both author and viewer definitions? Leaving all the philosophical discussions aside, how do you implement a “semantic web” solution? And what are the benefits and drawbacks? The implementation starts with how the data is stored. A 1.0/2.0 generation website stores information in HTML files that are then directly translated and presented via a browser. A semantic website stores information in a structured format (either a database, Resource Description Framework or XML file) that supports a metadata layer. The metadata layer provides this extra dimension, by allowing descriptors to be stored on the content itself. This also decouples the storage from the presentation, which provides flexibility at a cost of presentation speed. This allows the content to be translated for web page viewing, but more importantly allows other applications to accurately integrate the data, by using the metadata as a roadmap. Thereby creating a web within a web, where applications (calendering system) talk to one another without human intervention. The benefit - all the data on this new web becomes much more valuable because of the leverage you get by combining content across multiple sites. The downside is the enormous cost and effort to implement a semantic web solution. There is an order of magnitude difference between putting content in an HTML file and storing data in a structured format with associated metadata. What does the future hold for the semantic web? Data that has significant value to the author or publisher will migrate towards a structured solution. These semantic enabled sites will then link up on an opportunistic basis, forming informal networks based on common interests. As these networks grow, the value proposition (and technological capabilities) will allow more sites to migrate.
As a side note, the semantic web is a subset of Web 3.0, but I’m out of breath and will save that for another posting. Posted in Industry Buzzwords, Technical Focus | 1 Comment » Buzzword: “Knowledge Management”?14. July 2008 by Rick Abbott.
Knowledge Management - is there really such a thing as managing your knowledge? Isn’t it more accurate to call it “Knowledge Capitalization”? Let’s break it down by pulling the most appropriate definitions from Webster for these terms:
Seems to me that the primary objective is to “take advantage of” the “body of truths or facts accumulated in the course of time”, as opposed to just “handling or controlling” this information. It’s no accident that business users have become gun-shy about the whole “Knowledge Management” concept. This has become an IT driven endeavor, and as a result the focus has been put on “handling” and “controlling”, task oriented words, as opposed to end goals such as “capitalize”. Too many “Knowledge Management” systems today place a disproportionate emphasis on the collection and storage of knowledge, and not enough on the end results. This makes it prohibitively expensive for users to add information, which dooms the system to mediocrity. All of us involved in delivering technology solutions should be focused on the end benefit of our work. In the case of “Knowledge Management”, put the focus on finding ways to capitalize on the “body of truths or facts” that are part of the corporate history. How do we do this?
Knowledge Capitalization should be, like learning itself, an iterative process. Posted in Industry Buzzwords | 1 Comment » | |||||||||||||||||||||||||||||||||||||||||||||||||