FASTEN YOUR SEAT-BELTS
Back in August, I concluded a post titled "On the merits of online storage", with a framework borrowing from Wired Magazine:
"TIRED: Storing all Your Stuff on Individual PCs/laptops/PDAs/Phones and syncing 'em.
WIRED: Storing all you Stuff online with infinite storage, access, security, and share-ability...for free or near free.P.S. There are economic and bandwidth limitation issues to unlimited storage and uploading/downloading that we do need to be mindful of, but they are manageable both technically and from a business model perspective over time...for more more on that, see my April post titled "ON TODAY'S BROADBAND LIMITATIONS"."
It's all very easy for armchair blogging strategists to say that Google, Yahoo! et al will unseat Microsoft for desktop applications because they'll provide all these applications and more to hundreds of millions of consumers, right off the network. And for free, because after all, it'll be supported by booming online ads.
The devil as they say, is in the details, and the details here have to do with economic and bandwidth issues, just to start.
As tech columnist Robert X. Cringely also points out, it also has to do with heat and power requirements. He has a great article outlining some of the numbers, highly recommended to be read in its entirety. An excerpted summary, for reader convenience:
"Every one of these service providers, if they really do intend to be there when we need them with all our pictures, videos, love letters, and construction blueprints, are going to have to keep all that data available online 24/7, which is an unprecedented storage challenge, and one that the storage industry has NOT been working on."
..."In the U.S. alone, according to Nielsen/Netratings, we have approximately 202 million Internet users, each of whom is eligible for a free Gmail account with two gigabytes of storage. Since my mother uses less than two gigs and I use more, let's do our rule-of-thumb estimate with that number, making the potential Gmail storage obligation 404 million gigabytes or about 400 petabytes. That's 400 times the current capacity of the Internet Archive, but it is also probably a tenth or less the total capacity of our PC and DVR hard drives today, so I think it is a very fair number to play with.
Of course, all that storage won't be required just for Gmail, unless Microsoft decides to create phantom users and take down its competitors through overwork. (Would that be legal? Maybe.) Rather, the 400 gigs will be shared among many competitors. But for this exercise it doesn't really matter because the issue is TOTAL cost, not who is bearing that cost.
Probably 80 percent of this capacity will be borne by the major players, with each of those taking a roughly equal share. That's MSN, Yahoo and Google, assuming that AOL will be somehow distributed between them, with each having about 100 petabytes of storage.
How much storage IS that, really? Well, the biggest enterprise hard drives available today hold 400 gigabytes each, which means each of these companies is going to need AT LEAST 250,000 drives, making Seagate, Hitachi, Maxtor, and Western Digital all very happy. Though with volume discounts that's really only about $25 million in disk drives -- far less than Microsoft's legal bills.
Now let's build a data center using those 250,000 drives. A disk array can hold about 32 drives in a 3U space. In a typical cabinet you can store about 12 arrays or a total of 384 drives. That cabinet sits on a 2' x 2' floor tile, plus some aisle space, or about 10 square feet of floor space for planning purposes. 250,000/384=651 cabinets or about 6,500 square feet. Heck, that's nothing when you read about all the hosting companies, with their 20,000 square foot data centers containing 20,000 servers each.
But just how many of those 20,000 square foot data centers are there, really? Do a little investigating and you'll find many hosting companies share the same building and claim the same 20,000 square feet.
The problem comes when you start to think about power consumption. It's not that disk drives consume so much power or that they haven't come down in consumption over the years, but each of those cabinets will require using modern drives about 3,300 watts to run while the full 100 petabytes will require 2.148 MEGAwatts. And all that heat has to go somewhere, so the building will typically use three to four times as much power for air conditioning as it does to run the drives, taking our total power consumption up to just under 10 megawatts, which at typical U.S. industrial power rates will cost about $5 million per year.
NOW we know why Google bought those 30 acres on the Columbia River in Oregon right next to a generating station from the Bonneville Power Administration. It's a source of cheap, uninterruptible power.
Of course nobody would build such a data center today because it would require 330 watts per square foot, and even the most modern facilities are provisioned only with 200 watts per square foot. Most are designed for 100 watts. So chances are any of these companies would spread their storage over two to three facilities.
This is the kind of planning and provisioning required to support FREE services. Add pictures and especially video and the total data storage requirements go up by another two orders of magnitude, much of that supposedly still supported by ads.
That's a heck of a lot of ads."
Robert goes on to add:
"My point here is that we're entering another period of Internet exuberance. Yes, a lot has changed since 1999, but it's amazing how many of the ideas being pushed are the SAME ideas, just empowered now by dark fiber, cheap broadband, and six years of Moore's Law. And this time I think it will actually work and the Internet will change even more than it has the ways we live and work. But it isn't going to come easy and it isn't going to come cheap."
What's truly different since 1999 is that we don't have a whole horizontal layer of infrastructure companies this time around, funded by the same VC and investor exuberance ready and waiting to provide the storage and ISP (Internet Service Provider) infrastructure to the consumer online services companies.
And we're just barely moving from text to pictures, with video, network-based productivity computing, internet telephony, wireless services and several other exciting things still around the corner.
In Web 1.0, we had an array of companies that helped make possible the early layers of infrastructure that made it possible for companies like AOL to go from the number four to number consumer online service company in the US in less than five years (1994 to 1999). Companies like UUNET, PSI, et al were there to build out huge networks of dial-up modem points of presence across the country, linked into nationwide networks that then made possible consumer ISPs like AOL, MSN, Earthlink et al.
Those ISPs in turn made it possible for companies like Yahoo! to gain global web portal status, along with folks like Netscape, Excite, Lycos and many others WITHOUT having to spend a dime on the underlying infrastructure spend, essentially a "free ride". They had high and growing margins that set investor's hearts thumping, matched later by their stock prices.
Later in Web 1.0, there were an array of hosting companies that emerged lead by companies like Exodus that made it possible again for consumer companies to offer the early versions of consumer online services around content, community and commerce.
Of course, behind all of them was the parallel boom in telecom where billions were being invested in dark fiber around the globe that ultimately went unused for the most part in that cycle. It's coming in handy in this second cycle.
But the point is that the infrastructure layer CONTINUES to need massive investment in money, people, research and development, and there is NOT as robust a horizontal layer of third party companies (funded by "exuberant" VCs and public investors) that consumer services like Google, Microsoft, Yahoo! and AOL can turn to this time around. And of course, no telecom and/or wireless infrastructure boom. In fact most of those companies are consolidating, merging and shrinking.
Which means that the consumer Internet companies are going to have to get their hands dirty this time. The effort, R&D, and the expense of the infrastructure build this time around will have to be borne on the financial models and statements of the consumer companies themselves, with or without the concurrent understanding, support and enthusiasm of their public shareholders. And these companies will need to do this globally given the much greater maturity of international markets in Web 2.0 vs. Web 1.0.
Which means that today's public investors in Internet companies need to brace themselves for rising infrastructure spend by the consumer online companies, and compressed margins, even though the ad-based revenues and profits are going through the roof (see my post from yesterday).
And this time, the "free ride" will be had by the hundreds of third-party web 2.0 companies that are emerging that'll be able to put out new services as "proofs of concept", whether it's in photo sharing services, podcasting and video blogging companies and dozens of other categories.
And then the lucky ones will either be acquired by the big internet and/or big media incumbent companies for the heavy-duty infrastructure spend for the REAL roll-out. A handful may even make the exits with successful IPOs.
So fasten your seat-belts, there's some turbulent infrastructure spend just around the corner. But the destination is still exciting, at least for users and consumers.
(DISCLOSURE: I was the lead research analyst on a number of the above-mentioned infrastructure company IPOs and secondaries including UUNET and Exodus, along with the consumer companies like Yahoo!, eBay and AOL in the 1990s.)
"In the U.S. alone, according to Nielsen/Netratings, we have approximately 202 million Internet users, each of whom is eligible for a free Gmail account with two gigabytes of storage. Since my mother uses less than two gigs and I use more, let's do our rule-of-thumb estimate with that number, making the potential Gmail storage obligation 404 million gigabytes or about 400 petabytes."
You left out one major part. Although your Gmail account may say you have 2+ gigs at your disposal, it does not mean that Google has set aside that much space for you yet. They only do so as you need it. And the most important thing is that Gmail uses COMPRESSED DISK SPACE. Since most emails are text only, and text has an amazing compression rate, Google can compress 2+ gigs (assuming a full text-only gmail account) down to a 100 megs or so. Google wasn't playing dumb when they created Gmail... they researched the most advanced compression methods to make it as fast and efficient as possible.
Posted by: Paul Stamatiou | Saturday, October 22, 2005 at 05:22 PM
Paul,
Thanks for clarifying on Google's use of compression. One should assume that all the players will have to use all technical capabilities, especially compression, to keep file sizes as small as possible in a world where consumers are going to expect 24/7 access to any data stored online.
Today's 2gb email accounts are only where the hockey puck is today. The post tries to explain the infrastructure challenges for where the hockey puck is going. In a world where every major portal is scrambling to provide both user-created and commercial videos to hundreds of millions of customers on demand, we're entering an environment that is going to test the boundaries of our capabilities in storage, bandwidth and network management, as never before.
Today's free 2gb email accounts will expand to hundreds of gigabytes of storage online, combined with the issues of consumers getting both upstream and downstream broadband speeds much, much higher than today.
Again the basic point is in the last internet cycle there were dozens if not hundreds of infrastructure companies both big and small focusing on being where the hockey puck was going. Today's picture is far different, and the consumer companies on the front-line are going to have to face and pay for much of that effort.
Thanks for the thoughtful comment.
Posted by: Michael Parekh | Saturday, October 22, 2005 at 10:39 PM
Cringely's numbers regarding electricity utilization and provisioning for data centers are inaccurate. I design data centers and wrote about some of his assumptions here:
http://www.bradfordgibson.net/node/224
Brad Gibson
Posted by: Brad Gibson | Sunday, October 23, 2005 at 01:35 AM