A couple years ago I had the opportunity to attend a talk by John Wilkes. Before he began his talk on Borg, Google’s cluster scheduler, he began with an un-scheduled lecture on how to sell your research.
The night before, John attended the PhD Poster session at Google and told us (in his own way) that he found us lacking in the art of the elevator pitch. The line stuck with me was: You need to be able to summarize your research and your work in under 30 seconds and be in a place where someone is interested in asking you more questions. Citing a panel discussion from OOPSLA ‘93, specifically Kent Beck’s comments on how to write an abstract, John proceeded to outline for us a simple formula for how to summarize and engage audiences in 4 sentences.
Sentence ❶ : State the problem
Sentence ❷ : Why is the problem is a problem?
Sentence ❸ : A “startling” sentence
Sentence ❹ : Implications of the startling sentence.
This was Kent’s example from the panel:
The rejection rate for OOPSLA papers in near 90% ❶. Most papers are rejected not because of a lack of good ideas, but because they are poorly structured ❷. Following four simple steps in writing a paper will dramatically increase your chances of acceptance ❸. If everyone followed these steps, the amount of communication in the object community would increase, improving the rate of progress ❹.
That’s a pretty good synthetic example, but I wanted to see if John followed his own advice. Let’s take a look at Large-scale cluster management at Google with Borg which he presented himself at EuroSys ‘15:
The first sentence states the problem that Google faces - cluster management for a large datacenter spanning a diverse set of both applications and machines.❶
Google’s Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.
The problem is a problem because efficient task packing, over-commitment, performance isolation, and admission control are needed to achieve high utilization.❷
It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation.
Startling sentence? It works.❸
It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures.Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.
The implications are a set of lessons learned about job and cluster scheduling at global scale.❹
We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.
This paper is unconventional in that it is a retrospective of a production system but it still loosely adheres to the outline format. It is not the best example but it motivates why the problem is hard and why it is important.
Much of the security work done at UC San Diego involves measurement studies such as reverse engineering emissions defeat devices in Volkswagen and Fiat cars or mapping the internals of cloud infrastructure based on VM side-channel attacks.
The problem (opportunity) is people outsourcing compute to public clouds. ❶
Third-party cloud computing represents the promise of outsourcing as applied to computation.
This is a problem because the business model of cloud providers requires the use of virtual machines for users to run their job on. The virtual machine will represents a shared physical infrastructure. ❷
Services, such as Microsoft’s Azure and Amazon’s EC2, allow users to instantiate virtual machines (VMs) on demand and thus purchase precisely the capacity they require when they require it. In turn, the use of virtualization allows third-party cloud providers to maximize the utilization of their sunk capital costs by multiplexing many customer VMs across a shared physical infrastructure.
The shared physical infrastructure is vulnerable to new VM based side-channel attacks can leak data to both map the internal cloud infrastructure and disrupt user performance. ❸
However, in this paper, we show that this approach can also introduce new vulnerabilities. Using the Amazon EC2 service as a case study, we show that it is possible to map the internal cloud infrastructure, identify where a particular target VM is likely to reside, and then instantiate new VMs until one is placed co-resident with the target.
The implication is that it is possible to mount cross-VM side channel attacks that can extract information from a VM. ❹
We explore how such placement can then be used to mount cross-VM side-channel attacks to extract information from a target VM on the same machine.
Of course it is not as easy as this. One has to have the right starting point.
When I was a first year PhD, I found writing the abstract and introduction of my own papers the most daunting aspect of the paper. I was consoled to hear from my fellow graduate students “don’t worry the professor will take care of it”.
This seemed to be the common theme in most labs. The student writes the background, design, evaluation, and spends a few cycles working on an abstract and introduction which is eventually completely scrapped by the advisor and rewritten.
As a more senior PhD now, I understand why the professors always seemed to have an infallible talent for abstract writing - they understand the big picture. There is tendency to lose sight of the forest from the trees working long hours in the weeds of a narrow research thread. The thread can seem magnified and the abstract starts to be about all about the details of the contribution rather than the insight that the contribution enables.
I find that the earlier involvement I have in the beginning of the project the more comfortable I am with writing the abstract. That being said, not everyone joins a project in the beginning. I believe it is the responsibility of both the PI and the student to sync and realign what the big picture of the project is every now and again. Not only does it focus the efforts of the group, but I find it to be a huge morale boost to remember that the scope of the work is far bigger than the bug I was trying to debug this week.