Mutterings on Tech

Greg Smithies: Big Data, Machine Learning, Predictive Analytics, and General Geekery; Across Silicon Valley, Seattle, and NY

Apr 8, 2014

In my previous post on Containerization I covered some of the, somewhat substantial, advantages of using containerization (e.g. Docker) in your infrastructure. However, there are tradeoffs - many of them already covered, such as lack of true (secure) isolation between the containers.

This post will skip some of those technical drawbacks, which inevitably will need to be handled in the core technology. Instead, we'll be looking at other pieces of the management and orchestration ecosystem that may need to be updated, changed, or replaced.

What is the opportunity?

As an investor, I'm always looking for gaps that can be filled by new technologies, and preferably, new companies. Any time a new Infrastructure paradigm is introduced, it generally breaks many of the other tools around the old paradigm. Each time something breaks, there's an opportunity for a new technology, and hence company.

As I look at the Docker ecosystem the question then becomes - what other tools around this ecosystem need to be built, and which portions of it are large enough gaps that they will spawn new standalone companies.

I like to think of these tools in concentric circles, and the VMWare ecosystem is a good analogy for the opportunities we may see in the docker ecosystem:

VMWare vs Docker ecosystem

What is clear is that, while VMware started just with the hypervisor, because of the lack of tools to manage virtualized environments, they had to build out the second layer of management and orchestration tools around that hypervisors themselves. This means that, while the virtualization ecosystem is massive, most of the 3rd party companies that were built around it were created in the 3rd tier: the tools such as monitoring and security around the core operations, orchestration and management of the environment. The core operational tools, however, were subsumed and built over time by VMWare.

Now let's juxtapose that against the KVM ecosystem. It's a closer analogy to Docker given the open source roots of the product. It turns out that the KVM ecosystem is similar to VMware's in that the tier-2 tools are also mainly provided with the core product.

Aside: This image is still not quite granular enough. There are key portions of the Tier 2 layer that are provided by 3rd parties, such as job scheduling and placement automation (which is slowly being improved by the core tools) and devops automation (see Chef, Puppet, Jenkins...). So it's not to say that there aren't opportunities in Tier 2; rather that they are often at the risk of being subsumed by the core technology.

So let's extend the analogy to the Docker ecosystem. Currently Docker provides the core Tier 1 Container technology, as well as a few key pieces of the management systems in Tier 2 (e.g. the Container image registry). But for now, these tools outside of the core Docker engine, are only provided by Docker because they're required to make the whole system work (e.g. you can't have sparse containers unless you have a core registry of standard container images - so they have to provide the image server):

Docker tools architecture

Who Dares to Live in Tier 2?

So what are people actually using right now to fill out Tier 2 in containerized infrastructure? There are a few options, depending on the type of company:

  • Build your own: This is the way the big guys are doing things (Google, FB, Twitter) They have the ability to build these tools internally. Differing levels of sophistication from simple collections of scripts, to fully fledged tools that could likely be open-sourced.
  • Use specialized tools: There is a growing cadre of containerization-first management and orchestration tools. Most of them grew out of the aforementioned internal projects at the large web companies. Think Mesos, CoreOS
  • Retool legacy tools: Many of the legacy orchestration tools are adding support for containers. There will be varying levels of capabilities and trade-offs that these tools have, depending on how tied to the assumptions of virtualization they are. The progressive ones (Chef, Puppet...) already have pretty good support; but don't hold your breath for tools from the true legacy providers (CA, BMC, Software.ag...)

So here's the question - how many of these core features will be subsumed by Docker as they expand their toolset; or how many will actually be long-term gaps that companies can grow into.

Frankly - it's a little early to tell; but as a betting man, I'd expect things to go along the lines of the VMWare / virtualization ecosystem. These features will eventually be features of the core Docker toolset.

The more interesting question as an investor is, how long will it take them to build them out? If it's a matter of 1-2 years there likely won't be any chance for companies to grow up in this space; but if they have so much work to do on the core Container system that it takes them much longer than ~2 years to roll out these tools, then there may actually be time for other companies to get a big enough foothold here to reach thermal runaway as standalone businesses.

I'd keep my eye on CoreOS and Mesos in this category.

What about Tier 3?

Tier 3 , in my mind, is a lower risk (but probably lower reward) place to look for investment opportunities.

The lower risk is because you eliminate the risk of Docker subsuming your features. Instead, you are saddled only with the underlying "market" risk of whether or not Docker takes off. Juxtapose this against tools in Tier 2 where you have both the market risk of needing Docker to take off, AND you run the risk of Docker eating your lunch.

Lower reward in this tier is a little more tenuous to prove. I'd have to point to the market cap of VMware itself at $42.5B, vs the (reported) valuations of virtualization monitoring companies AppDynamics and NewRelic in the $1.5B to $3B range.

In other words, the potential market size of running the core infrastructure appears to be orders of magnitude larger than the size of any particular tool category around it in Tier 3. All of the tools combined in Tier 3 may have a larger market size, but any particular category of tool will likely be much smaller.

So where are the areas of opportunity in Tier 3? Let's look back to the virtualization world for inspiration. Historically, the Tier 3 tools that have built interesting, growing, or large companies have been Monitoring (application layer, and infrastructure), Security, and Inventory Management (in large Enterprise). There are probably another 3-5 categories below this on the list, and arguably, these are very broad categories; but as we look for opportunities that will be buoyed by the rising tide of Docker, these are the first 3 areas we'll be looking.

Bottom Line

Any fundamental change in core architecture forces all of the tools around that infrastructure to adapt. Some can, but many simply cannot and must be replaced - they are obsoleted. Any time a technology is obsoleted is an opportunity for a new company to be built in its wake.

If Docker truly takes off, there are going to be hundreds of technologies that are obsoleted and hence hundreds of opportunities to build new companies.

We're seeing just the start of this shift, and I'm excited to spend the next few years exploring all of the opportunities that will arise.

Mar 23, 2014

Containerization is something that's actually been around for years, but has seemingly burst onto the scene in a massive way over the last few months. Sun had an implementation of the technology and Linux has long had LXC. So what's changed that you can't seem to talk to anyone in next-gen infrastructure without the topic of containers coming up?

In a word - Docker happened. Docker was an outshoot of a company looking to make a next-generation PaaS - dotCloud. While the PaaS was impressive, it was enabled by a core technology they had developed - Docker. It turns out that Docker has become far more popular than their PaaS ever did. Docker itself is a set of technologies and wrappers around the core LXC containerization capabilities that make them DEAD easy to use. By most accounts the LXC container technology was significantly hamstrung by its incredible complexity. Docker, on the other hand, makes LXC dead-simple to use for even a plebeian such as me.

Companies like Google and Facebook are building MASSIVE systems using Docker, so it seems that others would agree.

What is Containerization?

I'll skip some of the nuances of Docker vs LXC vs SUN implementations and focus on Docker vs more traditional virtualization.

The key drawback with Virtualization is that it required the system to run multiple copies of the guest OS. Boot up parallels on your desktop Mac and you'll immediately see the performance hit and memory overhead of having to run an entire separate OS.

Docker vs. Virtualization

There are ways around this - for example using extremely lightweight host OS, or highly cut-down guest OS'es to lessen the overhead of the system, but there's only so far these optimizations can take you.

Containerization on the other hand shares a single host OS, and single kernel, but creates "containers" inside that OS that contain all of the software and environment variables / libraries for the particular software you would like to run.

Docker Efficiency

Each container no only needs the overhead of the entire OS and the memory + CPU footprint that that entails, but it is also a sparse Diff version of a standard container. So if the new container only needs one library different from the standard image, the entire container will only contain the different library. Hence it's not uncommon to see containers for highly complex application installs running into just a few megabytes.

It's still unclear just how much memory + CPU + storage overhead a docker system will have vs. virtualization. And I'm sure that it depends on the particular workload. I have not yet seen any sizable studies comparing the two technologies in real-world workloads; but I expect them to be done soon.

Pros and Cons?

Aside from the obvious storage, memory and CPU efficiency gains described above (do you really want anything more) there are a few other key pros that you get for "free" using docker.

  • Portability: That same docker container will run anywhere on any linux system on any cloud without any changes. It brings with it all of the requisite libraries, file versions and environment variables needed to set up its environment - so it truly is entirely portable
  • Run Anything: If the software can run on the host, it can run in a container. If the Linux kernel can execute the code, it can run in a container. To the software itself it looks as it if it running on bare-metal. No tweaks for assumptions brought in through being in a guest OS

Just these two features together may be reason enough to use Containers. Given that it prevents the absolute nightmare scenarios I've seen before of dependency hell. Even when using a tool such as Chef or Puppet, debugging environment variable initializations and library dependencies can become essentially impossible.

BUT there are some key cons to using the system that have to be kept in mind. The primary one is isolation. In a virtualized environment there is a fare assumption of process and data isolation between the virtual machines. There are theoretical attacks that can hop through the hypervisor, but by-and-large what is executing one virtual machine is well isolated from the Host OS and the other Guest OS'es. We even see some large enterprises executing PCI and non-PCI workloads on the same physical boxes given the strong isolation that virtualization brings.

This isolation is not present in containerization. Given that all code in any container is still sharing the same memory and the same kernel, cross application communication is still possible, and malware or an attacker in one container should be assumed to have full access to all data and applications in the other containers.

I'm sure that improved isolation is a key item on the Docker roadmap, but for now it is not a technology that should be used in sensitive environments.

Bottom Line

Containers offer a tantalizing amount of opportunity for improved hardware utilization and devops efficiency; and hence could deliver massive bottom-line savings. There's still a lot of work to be done to make the system truly enterprise class with the full security and administration tools around them; but those will come in time.

For now - this is a technology you should be at least kicking the tires on Containers in your bleeding-edge R&D teams. There's too much potential upside here to miss this wave.

Mar 8, 2014

In my previous post on cloud security I spoke about the idea that even in a world where all infrastructure is cloud-based and most software is delivered as SaaS, there are still pieces of the security puzzle that remain the responsibility of the enterprise and will not be handled by the Cloud or SaaS provider. The majority of these fall into the bucket of user-related behaviors, permissions and insider threat.

What is the Problem?

In that past post I focused on RedOwl Analytics - they provide big-data and predictive tools to monitor users' actions. However, how does one even get a hold of the user-action data in a Cloud / SaaS world. You have a bit of a problem because the services are: * Not running in your data center * Not running on servers in your control * Not running on your network

And in the case of SaaS software accessed by your employees at home or on the road, it's: * Not going through your perimeter firewalls or choke-points * Going over public networks * Being accessed from devices (smartphones, tablets...) not controlled by you

In this world there is no perimeter; and there is no logical place to catch all of the traffic. And believe me - old-school solutions like back-hauling everything through one of your data centers via VPN just doesn't cut it for user experience these days; not to mention it's HIGHLY inefficient.

Shadow IT

There's a realted problem too: as more of your employees access and work from their mobile and personal devices, the ability of the enterprise to lock down the 3rd-party services (e.g. Dropbox, evernote...) that are used goes out the window. For example - I use Evernote every single day to take notes on the job. That data and information is technically the property of my employer, but it's sitting on my iPad and in Evernote's cloud - for all intents and purposes, Battery Ventures has no idea that it exists.

Evernote and Dropbox are frankly just the tip of the iceberg. Most companies intuitively know that their employees are using a few big-name services like that, but it's only a few, right? Unfortunately that's not the case - the average company is using 600 cloud services, 30 of those are for file-sharing alone; and I assure you their CIO / CISO would be flabbergasted by that number. This is likely an issue that will need it's "Snowden" moment for the average CISO to take seriously; but awareness is thankfully growing.

What is the Solution?

This is an early space, and so there are still quite a few companies jockeying for position - with subtly different solutions. The solutions broadly fall into two categories or mantras:

  1. In Cloud - Secure the SaaS services directly. These tools integrate into the SaaS service (e.g. Salesforce) and use the service's APIs to report and enforce behaviors.
  2. Reverse Proxy - Sit in front of the Cloud Services as a reverse proxy. These tools route traffic through themselves before they get to the SaaS service. So in essence they see all of the traffic flowing by and can use that to surmise user behaviors (e.g. they know what a data dump from Salesforce looks like on the wire). They can then also enforce policies on the traffic

Pros, Cons and Companies

In Cloud Solutions

Pros

These solutions integrate deeply into the platforms they support (e.g. Google Apps, Salesforce, Netsuite...) and hence tend to offer impressive CONTROLS for user behaviors. Their reporting and analysis of user actions are generally middling given they are hampered by the data that is surface to them through the APIS

Cons

Due to the deep integration work required, they often only support a few SaaS services (mainly Salesforce, Google Apps and a few with Workday and Netsuite.) A notable exception here is CirroSecure who have somehow cracked that code and able to offer integrations into tens of the top services; but each SaaS service still takes some work to support.

Secondly, these solutions are hampered by the APIs provided by the SaaS platforms - hence their analytics and reporting capabilities being starved of data. It also means that functionality across different platforms may be spotty. For example, a certain level of granular user access control may work on Salesforce, but not be available on Google Apps because that API doesn't support it

Companies

In no particular order:

Reverse Proxy Solutions

Pros

These solutions do not require integration into the SaaS platforms they support and hence can support orders of magnitude more services than the In Cloud solutions. They see all of the data flowing by, and hence are not limited by API access when it comes to reporting or control of user actions. They can report on everything and they can control everything.

Cons

However, given the broad number of SaaS platforms they support, the level of support on each of those platforms can be a bit hit-or-miss. E.g. they know what a data download from Salesforce looks like, but may not know what a similar download of data out of Workday looks like (Note: that's an overly simplistic example.) Of course - building out their toolkit to understand more services is simply a matter of time and expect all of the big-name SaaS services to be pretty well supported out of the gates.

Secondly - these are implemented as reverse proxies, hence the end user could technically bypass them and take them out of the loop. Most of them are implemented to integrate into the SAML sign-on process, so that it is difficult to log into the SaaS service using your business account without being forced to go through the reverse proxy; but it's not impossible.

Companies

Again, in no particular order:

Bottom Line

In the same way that Palo Alto Networks reinvented a stale category of firewalls by making them application- and user-aware, so too I believe one of these companies will be the next era's Palo Alto. They will bring user and application awareness to a SaaS and cloud world regardless of where your employees are accessing services from. They're all still quite far away from realizing this vision - but the writing is on the wall.

Mar 6, 2014

Clouds and Sky

Wrapping up RSA last week I found myself thinking more and more about cloud security. When a new infrastructure paradigm arrives there is a wave of other tools around it that need to mature and take a while to catch up. For example, it took a while for orchestration tools like Chef and Puppet to be released to take advantage of virtualized infrastructure.

The way the cycle tends to work is:

  1. New infrastructure paradigm released (e.g. virtualization)
  2. Traditional management tools realize that they no longer work and struggle to bolt on support / functionality for it (e.g. Bladelogic and BMC tools struggled to break into the Virtualized world)
  3. A multitude of new tools designed from the ground up for the new paradigm are released (e.g. Chef and Puppet and a myriad other things) - lots of jostling for position
  4. One or two of the new tools supplants the incumbents from (2) and becomes a sizable new company for this new era

We're currently in stage 2 or 3 of this process when it comes to tools to secure and manage Cloud Infrastructure. There are a lot of interesting companies tackling the problem of "how to I secure and manage my servers when they are not in my datacenter, and I don't control the network, or even any part of them underneath the hypervisor."

So, if Cloud Infrastructure is taken care of, as early-stage investors we need to start looking at what new problems are kicking off a new cycle at stage 1.

What is the problem?

Just as a lot of infrastructure has moved from data-centers to cloud; so too, a lot of software has shifted to being delivered as SaaS. Managing the infrastructure is completely out of the equation for people using Salesforce.com, for example.

The problem is that, when companies use SaaS software, they erroneously believe that they no longer have to worry about securing that software - the SaaS provider will worry about security, surely. Well - that's true and false.

In truth, the responsibility for security has just shifted - the customer is no longer responsible for certain parts of the low-level infrastructure security, but they are still responsible for higher-level application and data security. In fact - the responsibility for security roughly aligns with the responsibility for managing the particular layer of the stack:

Layer Party Responsible: On Prem Party Responsible: Cloud Party Responsible: SaaS
1. Physical Storage, Networking, Compute Security - e.g. who has access to these PHYSICAL boxes, who has admin rights, who can change them? Enterprise Cloud Provider SaaS Provider
2. Logical Storage, Networking, Compute Security - e.g. who has access to these LOGICAL servers, who has admin rights, who can change them? Enterprise Enterprise SaaS Provider
3. Application Security - e.g. who is responsible for the application being bug-free, and unhackable from the outside? Enterprise Enterprise SaaS Provider
4. Application level user permissions - e.g. which users are allowed into certain parts of the application, what rights do they have there? Enterprise Enterprise Enterprise
5. User Actions - e.g. which users are doing strange things with the data - maybe they've been compromised or are a bad-actor (insider threat?) Enterprise Enterprise Enterprise

Note: This is grossly simplified!

The key here is the right-hand column. SaaS providers take care of so much of the security, that enterprises forget that they are still responsible for items 4 and 5.

It's this realization, and of course our dear Mr. Snowden, that had RSA come alive with chatter about "insider threat mitigation." RSA has a long history of launching companies in network security, end-point security, and (recently) - big data security. The general attitude has historically been to ignore people, and focus on the hardware. Hence, I found it fascinating that this year, the company that was named Most Innovative Company at RSA 2014 was a human-focused, behavioral insider threat mitigation platform - RedOwl Analytics.

RedOwl Analytics

The team from RedOwl are impressive, hailing from many three-letter-agencies (that shall not be named) where they have dealt with the issues of "insider threat" at length. Their vision is large and likely justified. We joked with them that the TAM for their software was something akin to Ali G's Ice Cream Glove™ - given that all companies have employees, and all companies have data:

I'm looking forward to hearing a lot more about RedOwl over the next few years - I really think they are set up in exactly the right place to win the hearts and minds of Enterprise CISOs as they focus more on their employees, and move more of their applications to SaaS.

There are a lot of other tools that need to be built out around security for SaaS - e.g. companies such as Netskope, Skyhigh Networks, Cloudlock, Ionic Security, Veradocs, AlephCloud; but they're going to have to be the topic of another post.

Mar 2, 2014

The Reality Distortion Field

Silicon Valley is a fantastic place to be if you work in tech, or by extension, Venture Capital. There are not many places on earth where engineers can hold near rock-star status, data-scientists are treated like royalty, and where you're as likely to overhear people arguing about scalable storage systems as you are sports in any given bar.

However, there is a significant downside to living and working here. If you're not careful, you can end up with a myopic world-view horribly warped to think that the "norms" of this small stretch of peninsula are generalizable to the rest of the world. If you ran your worldview off of Norther California, you might think that the most common computers in the world are Apple, the most common smartphones are iPhones, the most common cars are Priuses (Prii?) and that it's alright to wear Google Glass in public (without being a glasshole.)

To really bake your noodle on that car point, in California, drivers of the $85,000 Tesla Model S are most likely to have owned a Prius as their previous car...

Tesla vs Prius

While these examples are of course flippant, it's unfortunate that we all-too-often fall into the same trap when trying to apply SV technology norms to the rest of the world. Through the SV lens it would be easy to think that cloud penetration in Enterprise is near 100%, that every employee works on the device of their choice, and that agile development methodologies are the norm in fortune 500 companies.

This just isn't the case; and for any disbelievers, one only has to look to the fact that IBM still has a System Z division, which still brings in over $2B in revenue per year.; AND, as recently as 2008 95% of the Fortune 1000 were still running System Z.

Climbing out of the Rabbit Hole

How do you keep your eye on what the majority of companies are focused on without losing touch with the next-generation trends coming out of the valley?

We hold a lot of events with buyers in the tech industry (CIOs, CTOs, CISOs...). These range from large cocktail evens and general networking, to smaller, focused dinners around specific verticals and themes. They help to keep us connected to the buy-side of the industry and the problems they are looking to solve (hence spend money on).

We specifically spend a lot of time trying to talk to non-Silicon Valley companies. SV has some great, large companies; but they are not representative of the "average" company. Yes, we can learn a lot from companies in SV on the bleeding edge of tech (with 100% cloud-based infrastructure, and 100% BYOD penetration) - they tell us where the puck is heading. But we can't get too focused on those buyers because most companies in the US (and the rest of the world) simply aren't anywhere close to those levels of penetration of "next-gen" technologies.

Rather - most companies have their work cut out for them having to deal with large legacy infrastructure, and hence are only able to dabble in next-gen technologies for single projects, or after significant work is done to allow a forklift upgrade of an entire system to a new infrastructure paradigm. Much of the world's core infrastructure still runs on Mainframe technology, and / or is not even virtualized yet; but if you only talk to Silicon Valley companies, you would never get this impression.

For example, Windows still has almost 33% market share in the server space - and you can almost guarantee that those servers are not on AWS!

Server Market Share

What does it all mean?

The bottom line is that when you spend each and every day surrounded by people thinking about what the next- next- next-generation architecture or technology paradigm will be, it can be easy to forget that all startups need customers. And sometimes it may take many years for your customers to catch up.

So while, as an investor, one always wants to have a few bets around massive, longer term categorical technology shifts (such as cloud, containers, BYOD...), those bets shouldn't be at the expense of backing companies that may not be redefining the entire playing field, but could be selling exactly what the majority of buyers want today and tomorrow.