Big Data: Still Like Teen Sex

Don’t worry if you are confused. The meme that big data is like teen sex rings as true as ever:

  • Everyone talks about it
  • Nobody really knows how to do it
  • Everyone thinks everyone else is doing it
  • So everyone claims they are doing it…

A recent FT article gives some nice color to the realities of the “big data revolution”. More to come on ClearlyTech regarding big data, but now I finally have a place to link when I need to remind people of this apt analogy.

The Cloud – Elasticity

bungees

This article is part of a series on the defining value propositions of cloud computing platforms. The whole series includes:

The Stretchy Cloud

Another core tenet of cloud computing is its inherent elasticity, i.e. the ability to consume more or less resources on-demand, and through a pay-per-usage model.

Elasticity is a critical enough differentiator for cloud computing that Amazon even named their IaaS offering “EC2”, for “Elastic Compute Cloud”.

Autoscaling

There are two ways to take advantage of elastic computing resources.

  1. Exploit the ability to rapidly provision new resources whenever you notice demand getting high, expect an increase in capacity needs, or have a one-off task that would benefit from increased horsepower.
  2. Get a computer to do #1 for you based on some predetermined thresholds. For example, CPU load getting above 80% on your current servers.

Amazon Web Services, Rackspace, Microsoft Azure and others all offer #2, known as autoscaling.

I’ve found the real usefulness of autoscaling to be limited, and know very few companies that do it in practice. A now ancient (2008!) article by George Reese sums up some of the arguments against autoscaling, which still ring true.

If you have very specific application services that have predictable load patterns, or if you have so many servers that it’s worth a few engineers worth of salary to manage the complexities and assumptions of an autoscaling cluster, go for it. Until then, stick to scaling your infrastructure on the fly, but manually, to take advantage of your elastic infrastructure.

An Underutilized Feature

The elastic nature of the cloud may be its single most under-utilized feature. Most companies move to the cloud to save on infrastructure costs or to take advantage of managed services. And many use the elastic nature of the cloud to easily provision servers. But disappointingly few automatically scale their resources up and down to meet demand. something I suspect will change as the tooling for cloud platforms matures.

The best dollar value for cloud computing comes when you can exploit the elasticity to only pay for resources you need, when you need them. You win because you aren’t paying for what you don’t need, and IaaS providers win because they aren rent the same resources to someone else during the hours you aren’t using them.

Case Study: NYTimes

Despite being ignore by many tech companies, who just want the cloud because it’s a low-cost way to deploy an application, there are still hundreds (maybe thousands?) of companies that are making liberal use of the elasticity of Amazon and other big cloud providers.

The New York Times technology team has long been at the cutting edge of both front-end and architectural strategies. When it came time for them to put all their 11 million public domain articles online as PDFs, they took advantage of EC2 by spinning up 100 machines for 24 hours to crunch all the data. Let’s say they used m1.large instance types (just guessing), at a cost of $0.24/hour. That’s only $576 for 100 instances to run a whole day. A veritable super-computer for the cost of a low-end iPad!

I went to a talk a number of years back where the New York Times discussed the process of updating data on election night as poll results get uploaded to the Associated Press FTP server for consumption by all the media outlets. New poll results flow in once per hour, and time is of the essence when reporting breaking election results to your readers. So the NYT would spin up lots of cloud instances on election night to parallelize the process of slurping down election results every hour and updating them on their site. This was part of a much larger strategy for serving high-traffic on election night, all made possible by the elasticity of the cloud.

Case Study: You

Take a look at how your business uses infrastructure, and ask two questions.

  1. How variable is my utilization? How much money could I save if I turned machines on and off in-sync with demand?
  2. What cool stuff could I do if cost was no barrier to having hundreds of machines at my disposal for quick bursts? Is there a way to leapfrog my competitors by throwing computing horsepower at the problem?

Keep in mind that it’ll take a bunch of work for your ops team to figure out how to best exploit elasticity on your particular deployment. Make sure it’s a big enough lever for you before you send them on a wild goose chase. But know that the capability is there, and is one of the great benefits of the Cloud.

Do More Possibility-Driven Development

A top restauranteur spends countless hours discovering new ingredients, educating themselves on cutting edge cooking techniques, and dining at other top restaurants for inspiration.

With so many great libraries, services, frameworks, and templates out there, it makes more and more sense to allow all those possible tools to inform your design and build decisions.

I call this “possibility-driven development”.

Just like that restauranteur works with their chef on the best dining experience, you should be constantly familiarizing yourself with the latest trends in web development and tools, and working with your development teams to creatively apply them to your business.

  • When you see a site or app that behaves in a unique way, ask yourself “how’d they do that?”
  • When you read about a new service on Tech Crunch, figure out how your product would be different if you were an early customer of theirs.
  • Check out cool front-end techniques on sites like WebAppers, or Smashing Magazine and modernize your app a bit.
  • Keep an eye on trending technologies and play with the sites that use them.
  • And of course, read ClearlyTech where we’ll be highlighting great tools and trends all the time in plain english.

As a non-developer, you have a unique ability to look at new enabling technologies in an un-biased way. Take advantage of that, and open a dialog with your tech team. Find ways to integrate new technologies and services to make your product shine.

Thousands of open-source and commercial projects are finding possible ways to make your product great. It’s up to you to seize the opportunities.

How To Work with your Technical Advisor

A strong relationship with a technical advisor can benefit all early-stage founders who are building technology product, whether you are using a contractor, off-shore team, or full-time staff. And in later stages, a technical advisor can continue to be a trusted ear for you, and a useful resource for your own technical team.

Through my work as an independent advisor, and with organizations like Mass Challenge, Founder Mentors, and NEVCA’s Critical Mass, I’ve been a compensated advisor (cash and/or equity) to at least a half-dozen startups, a pro-bono advisor to a few dozen, and an occasional advisor to hundreds.

Finding A Technical Advisor

You want to find an advisor who has deep technical experience, especially in areas you care about (big data? user-generated content? mobile applications?) But skip techies who are coming to you with an agenda. You don’t want them pitching their pet technologies, but rather offering the best solution for your unique business.

Often, you can reach out to strong technical leaders (CTOs usually) at other startups in your area (or remote, doesn’t much matter if the communication skills are strong). Pitch them on your idea, let them know why you need their help. A genuine interest in cultivating the best technical operation you can goes a long way towards making us want to help you.

Also, take a look at communities like Founder Dating to find strong tech leaders who are looking to engage full or part-time with new ventures. Most of my full-time roles began as technical advisor relationships. You never know, but you might even find your next great co-founder if things work out.

Make The Most of an Advisor

No matter how you find an advisor and choose to structure an advisory arrangement, here’s some advice from an advisor’s perspective on how you can get the best out of us.

  1. Show a genuine interest in the tech and execution side of your business. We get excited by your entrepreneurial curiosity, and we’re proud of the tech and product experience we bring to the table. Ask us questions, dig in deep. We’re here to discuss with you, not to lecture at you. It’s not useful for anyone if we are spoon-feeding generic answers across the table.

  2. Do your homework. It’s demoralizing to be asked questions that Google can answer faster and more completely than us. Don’t ask “How do I set up a Google Apps mailing list?” until you’ve taken a swing at it. If you come prepared into a conversation, it’s more productive and enjoyable for everyone.

  3. Ask “Why” Since you’ve done your homework and have a sense of what your options are, we’re happy to tell you which option is right for you. Unfortunately, not nearly enough of my advisees follow up with “why?” You should want to know why that’s the best option for your business or for your team. Why should you want to know?[1] Because of teach a man to fish…, and because your tech team will respect you more if you understand the why.

  4. Teach Us Back. We’re not working with you for our health, or because of a community service court order. We think you are smart, interesting, and have something to teach us too. Make sure this relationship is a two-way street, and don’t be surprised when we turn the tables and start picking your brain in return.

Stuff to Avoid

Here are some lessons founders and I have learned the hard way. Heed these pitfalls and everyone will get more from the relationship.

  1. Stop Pitching Us. You wouldn’t believe how many times I sit down for coffee with an advisee, only to have them spend 45 minutes telling me all about how the business is progressing, what the latest product ideas are, how much money they are definitely going to raise, and why it’s all going to change the world. There’s a time and an audience for that, but you’ve just wasted a whole meeting with someone who was there to help you, and you’ve gotten no value out of it.

  2. Don’t get defensive. We are spending our valuable time helping you out because we want you to succeed. We aren’t a competitor, we aren’t your boss, and we don’t have a hidden agenda to sink your idea. If you succeed, we look good and we’re proud to be associated with your success. Take our advice for what it’s worth, and then implement it or not. You are running the show here, no need to get defensive.

  3. Don’t be excessively legal. Okay, so that doesn’t mean be illegal, of course. Rather, don’t start shoving NDAs and IP Assignment agreements at a technical advisor early in the relationship, or ever for that matter. Tech people are the most skeptical bunch when it comes to politics and legal mumbo-jumbo (we incorrectly believe we would never use such litigious instruments if we were in your shoes). If you do want to stamp an agreement to protect both parties, keep it simple. Check out the Founder Institute’s FAST for a good starting place.

  4. We’re not your code monkey. If you are secretly hoping that you can get your technical advisor to fix your team’s code, or to build some part of your app in their advisory hours, put that thought aside. I had an advisee who actually asked me to re-install Windows on a laptop in their office, a task I thankfully managed to side-step. Many advisors will be happy to establish a separate consulting agreement if you want to pay us by the hour to get more hands-on. Keep those two parts of the relationship separate and everyone will be happier.


  1. See what I did there?  ↩

Foolishly Pretending to Know Tech

An entrepreneur asked me this week (paraphrasing), “I’m a non-technical person, but I do have some experience a ways back with HTML, CSS, CMS systems, and QA. Should I point that out when courting technical talent? Or will mentioning it seem irrelevant or worse?”

It’s a good question because I see people mis-handle this all the time. I see business people overplaying their technical background, saying things like “Well, actually, I used to be a developer”, or joking “I know enough about web development and databases to be dangerous”.

I know you mean it in good faith, as a geeky fig-leaf, a shared experience you hope will break the ice. But if you catch yourself saying these things, quit it!

Show, Don’t Tell

Unfortunately, mentioning your brief (or even deep but out-dated) brushes with technology, when you are first meeting technical talent, usually makes you come off as some combination of desperate, self-congratulatory, insecure, or obnoxious.

Imagine if the tables were turned. A technical developer announces in an interview setting, unprovoked, “I have actually messed around a bit with marketing copy and branding for a friend’s retail project, so I know enough about marketing to be dangerous”. Sounds stupid, right?

One of two things is true about your technical experience:

  1. It’s irrelevant, and you’ll be happy you kept your mouth shut.
  2. It makes you a better communicator, leader, and collaborator with your technical team. And that will be apparent from day one!

Tech talent is very adept as sniffing out BS. Rather than come across as trying-too-hard, you have an opportunity to pleasantly surprise us. A founder I worked with recently (an ivy-league MBA, all business guy), was showing me the latest changes to their blog design, and I asked him who was working on the CSS. “I just did it myself” he replied, as if it was the least surprising thing in the world.

He had never given me any indication he had any experience (or even interest) in web development in the many weeks we’d worked together. Instead, he showed me, and rather than rolling my eyes at his empty statements, I found myself genuinely impressed at his depth and scrappy resourcefulness.

Trust Your Talents

You are a well-rounded individual. A founder with a wide array of talents and experience. Just like having read a book about sales will make you a better sales manager, any background you have in technology will be a valuable asset as you build a tech organization. Heck, equipping founders to work better with tech teams is why I created ClearlyTech!

Trust your talents, and show a genuine interest in extending your knowledge to make you a better collaborator. Anyone pretending to bring more to the table than they do is a turn-off, and your tech savvy is no exception to that.

Building a Simple Olympic Medals API

olympic-rings-fail

I’m shamelessly excited about the upcoming olympic games. I’m a sucker for both the competition and the cheesy human-interest stories…. I thought the games would make a good excuse to show how a simple API can be built and launched from scratch with modern tools.

Put on your propeller beanie and let’s take a gentle geeky look at how I built it.

Olympics Medals API

The project was to launch an web-based API that returns JSON data on the current medal count for the Sochi 2014 games. In plain english, that means a URL:

which returns raw data that can easily be consumed by another computer program.

Why would we want this? This is very similar to almost every API that powers mobile apps today. Most iPhone and Android applications are constantly visiting URLs like this to get the data they need to update views in response to user input, loading a new screen, etc. These things power nearly every interaction you do on mobile, and a good chunk of the web too.

In our specific case, we return JSON text as seen below, with the latest medals counts for all the olympic countries. You’ll get the full data if you click the link above.[1]

[
    {
      "country_id": "united-states",
      "country_name": "United States",
      "rank": 1,
      "gold_count": 12,
      "silver_count": 14,
      "bronze_count": 6,
      "medal_count": 32
    },
    {
      "country_id": "germany",
      "country_name": "Germany",
      "rank": 2,
      "gold_count": 8,
      "silver_count": 16,
      "bronze_count": 1,
      "medal_count": 25
    },
    and so on, for all 94 countries represented.
]  

There’s also another URL for retrieving the medal counts for a particular country:

That one returns a very little bit of text:

{
    "country_name": "United States",
    "rank": 1,
    "gold_count": 12,
    "silver_count": 14,
    "bronze_count": 6,
    "medal_count": 32
}  

Getting the Data

This app was a fun reason to try out a newly launched tool called Kimono. They offer a service which scrapes structured data off web pages for you. I created a Kimono scraper in only a few clicks which retrieves the raw data directly from Sochi2014.com. Wouldn’t have been hard to do myself, but developers love shortcuts wherever we can find them.

It’s worth noting here that my API is a wrapper for a Kimono API, which is scraping the official Sochi website, which is displaying raw data from the International Olympic Committee medal standings API. These kinds of services-built-on-services are what makes the modern web so exciting and powerful, while simultaneously confusing and often fragile. If I were building a real production-quality API for olympic medal standings, I’d almost certainly try to license the raw data source to make my app faster and more reliable. But this approach will work for our purposes, and allowed me to get the whole API built and deployed in only a couple hours.

Building the App

I chose the lightweight Ruby Padrino framework for this app. It doesn’t have as many advanced features and support as something like Ruby on Rails, but it’s fast and easy to work with for a tight small project that doesn’t need a fancy front-end or even a database (though you can do all that with Padrino too).

You can find all the source code for this application open-sourced on GitHub. If you haven’t poked around at an app like this before, indulge yourself, and go take a look at just three files:

  1. The main application file shows three simple URLs. Our two API endpoints, and the root, which redirects to our documentation.
  2. The MedalData class which does the work of grabbing the raw data and arranging it to match what we return via JSON.
  3. A simple automated test for MedalData that makes sure future changes to my code or the Kimono scraper don’t break the behavior I’m expecting. This is a great example of how simple an automated test can be.

All the rest of the files in the project are just decoration, configuration, documentation, the boilerplate plumbing that Ruby and Padrino require to do the work. Not that hard, right?

Documenting the API

Developer tool Apiary maintains an open standard for documenting APIs like this one, called the API Blueprint.

I wrote up a similar description as above, but in their specified format, which is shown when a user visits http://olympics.clearlytech.com/.

Simple documentation like this goes a really long way towards convincing others to consume your API. Developers love this stuff.

Deploying It To The World

I decided to launch it on the mind-bogglingly easy Heroku platform. I created a new app, ran some git commands (Heroku manages your code by using the git source control tool that your developers are probably using anyway), and voilà! Instant public application.

Technically, the Heroku app runs at http://olympics-api.herokuapp.com/, but I told it to answer to http://olympics.clearlytech.com/ as well, by putting an entry in my DNS zone, managed by Amazon Route53. This may seem like a lot of moving parts, but wiring this kind of thing up is second-nature stuff to any full-stack developer worth her salt.

The whole process of setting this up on Heroku (including signing up for the service, setting up the app, deploying it, and changing my DNS) took about 10 minutes. There isn’t a faster way right now to deploy a low-volume application for public consumption.


  1. The code at the raw URL is not nicely formatted like our example, but another piece of code consuming this service doesn’t care how pretty it looks.  ↩