I haven't posted anything in a very, very long time (as one can see), because I have been concentrating on getting a company,
Numerate, Inc., going with an inspiring
group of people. I have been meaning to get back to writing, not because I have any more time now, but because I am starting to feel the need to write, again. Unfortunately, most of the things I would like to write about are things I am working on and can't talk about. However, I just read an article which touches on many things I have been working on in the past two years that I can talk about and one that I think needs to be put into perspective. The article touches on a number of issues I have been dealing with, both within the company and outside. The topics touched on in this article have already inspired the beginnings of a number of blog posts, in my head.
Before I tell you which article it is, let me step back a second and describe a bit (not too much) about what it is we at Numerate do. Numerate is a pharmaceutical company that specializes in very early stage small molecule drug development. Our entire process is an
in silico process and we don't have any wet chemistry. We help biotechnology and pharmaceutical companies fill their pipelines by solving the problems that the traditional drug development process fails to solve. By doing this we can make the drug engineering (TM) process, as we like to refer to it, cheaper, faster, and more able to solve difficult problems. A key component to the cheaper and faster bit is
enterprise cloud computing (for lack of a better term). We have been using Amazon's Elastic Compute Cloud (EC2) since March of 2007. We developed an on-demand enterprise cloud computing system on EC2 using Sun Grid Engine glued together with a number of custom scripts in late 2007. We use as many as 1000 virtual machines at a time on EC2. We have been too busy to write about it and I personally didn't realize how novel it was until recently.
Today, I read the
cover article of
Chemical & Engineering News entitled
New Computing Pioneers and was overwhelmed at how much the article touched on the many of the things I have been working on and how much credit it pays to (in my honest opinion) undeserving Big Pharma. The author tries to paint a picture of how slow-to-innovate Big Pharma is defying its stereotype by embracing cloud computing. From my perspective, they are just the next big industry to realize
some of the potential of cloud computing and are now dipping their proverbial toe in the cloud. What they claim to have done in the article isn't all that innovative. For the most part the article is a big puff piece to make Big Pharma look good. First off, I found it quite odd that in the article J&J and Genentech were given props at the beginning and then were never even mentioned. Why? Lily on the other hand talked about a 64 node virtual cluster working on "bioinformatics sequence information." That is tiny! We have been using over 500 nodes on EC2 since mid 2007. Not only that, Dave Powers is quoted as saying that they "complete[d] the work, and shut it down in 20 minutes", "[i]t cost $6.40", and he seemed pleased with this. Well, if you are familiar with EC2 you will know that he got charged for using all 64 machines for an hour. He only utilized those machines for one-third of the time he was charged. That seems like bad utilization to me. The article also points to a paper written by researchers at the Biotechnology & Bioengineering Center at the Medical College of Wisconsin on the viability of using Amazon's cloud-computing service for low-cost, scalable proteomics data processing. First, the article is published by a
gray publisher, meaning it is closed access and the general public can't read it without paying at least $30. The publisher also publishes Chemical & Engineering News. Open access is something for which the folks at the
Science Commons, who were interview for the article, and I are fighting. That discussion is however for another post. Second, why are they publishing about
the viability? Academically, it seems like a weak topic to publish in a peer reviewed journal unless the purpose of the paper is for it to be place on the stack of papers on which all academics needs to stand in order to be seen. Again, this is a topic for a later post.
In contrast to this bit of foolishness about innovation, the article brings up some very interesting points and highlights some interesting companies and efforts. Some of the interesting ideas and points brought up by the article, all of which I hope to address in more detail in later posts, are:
- how does a company deal with securing data in the cloud.
- how does a company control, monitor, and audit the costs of the use of the cloud by its researchers.
- what are the best tools for operating a research cluster in the cloud.
- how does one setup a collaborative environment in the cloud under these other considerations.
- how can we as researchers contribute to the ultimate commodification of compute cycles.
There are many people and companies trying to answer these questions and quickly fill these needs. One of these companies is mentioned in the article.
BioTeam is partnering with Pfizer "on connecting its work to the cloud." The BioTeam, which based on the results of my Google stalking, was developing very similar cloud infrastructure resources to ours own and it seems to have been doing it at about the same time. They are certainly a group to watch in the space of biotech computing. On the other hand, the fact that Pfizer is working with them reinforces my belief that Pfizer lacks the computational skills internally to really be called innovator.
Another highlight of the article was the mention of the
Science Commons. The comment from John Wilbanks, the executive director, about "[w]here the data reside in the discovery process" and how that "will dramatically affect the likelihood that they're ever going to be part of a cloud," is an interesting perspective. The mention of Merck's efforts, assisted by the Science Commons (which was not pointed out in the article), to open much of the data generated by Rosetta through
Sage is also an interesting move in getting the data out there. You can hear more about this from both John Wilbanks and Stephen Friend, senior vice president and oncology franchise head at Merck Research Laboratories (soon to be at Sage), in a
panel discussion at the Commonwealth Club of California entitled "Making the Web Work for Science" of which, I admit, I am an organizer.
All in, the article and its title are misleading, but it brings up many interesting points and actually does mention innovators (BioTeam & Science Commons), they are just not the articles purported innovators.
All views expressed in this article are those of the author and do not necessarily represent the views of, and should not be attributed to, Numerate, Inc., the Commonwealth Club of California, or anyone else with whom the author may have expressly or implicitly associated himself.
Labels: distributed computing, industry