Project Ideas for 15-849 Spring 2009
Datacenters and the Cloud
- The DCO has a large amount of power instrumentation that's not being heavily used. Monitor, analyze, recommend?
- Dave's Emulab cluster (CMULab) has no power management to turn nodes off when not in use. What the heck? Fix this -- and we'll get the patches applied to Emulab proper, where it can happily power-off hundreds of nodes at a time.
- Ask for Emulab use logs to do evaluation of potential power savings (Emulab is a several-hundred node research cluster at the University of Utah used by lots of network and systems researchers. All of the nodes are independently power-controllable.)
- Integrate with topology management to see if you can do it in a way that leaves the right nodes off most of the time.
- Use past logs to do this power mgmt without substantially increasing the time it takes for experimenters to swap in experiments. Keep a reserve pool that's properly sized, etc., etc. What policies would you use? Analysis of boot times, etc.
- Workload placement in the cloud: Certain applications could/should be placed on "slower" cores, since they're I/O or memory bound. Others have all their data in L1 cache and run fast. (See Merkel08 at HotPower 2008). While people have looked at optimizing these workloads assuming heterogenous cores, current hardware is homogenous. But perhaps you can do this using VM migration. Determine the right mapping between VMs (applications) and heterogenous cores. - does anyone else want to do this project, if so I'd be interested (firstname.lastname@example.org)
- Storage + power: If one looks only at magnetic drives, are the schemes from "A Spin-Up Saved is Energy Earned: Achieving Power-Efficient, Erasure Coded Storage" (HotDep 2008) feasible? Can you use erasure coding to lay out data so that the power used by the storage system is actually a function of offered load instead of a function of the number of disks? Ensure that both idle disks can be turned off and data can be reconstruucted. How would placement algorithms change for maximal power savings? How do you account for the CPU costs of data reconstruction? What is the trade between extra space used vs. extra power saved (coverage easier w/more replication).
- Are VM migration schemes too optimistic? Account for the cost of VM migration (network, CPU, etc) in these schemes. Most optimization algos and hot-* powers handwave around this issue.
- FAWN nodes as Web servers. If people must use their own machines as a web server, use a wimpy. How many web sites out there could run effectively on a FAWN node? Benchmark, compare.
- FAWN node + web servers: Can you turn off the main webserver most of the time and cache data at the FAWN front-end cache, letting the back-end be in deep sleep except when weird requests come in? 3W front-end to handle on-all-the-time tasks; 200W back-end to do the heavy lifting.
- Which takes more power: GMail or Outlook? Viewpoint 1: From the user's machine, being greedy. Viewpoint 2: The global view; include server power. (GMail and outlook being sample applications of the host-based and cloud-based variety, of course...)
- The Power of Caching: Is it worth spending power to cache (e.g., web caches) data locally or should you fetch it via the network? Ignore the performance benefits. Storage size vs. power vs. network cost? Joules per byte? What about distance to web server or the efficiency of the web server? If browsing the web, how often would your web cache cause the drive to spin up vs. otherwise?
- Same question about compression: Where's the tradeoff worth it? When? trivial-fast compression vs gzip vs bzip2 vs lzma? What do network byte transfer costs look like vs CPU costs? What's the trend over time?
- C-states vs. P-states: What's the real power savings? Can machines dynamically figure out what their power curve looks like (perhaps with small amounts of hardware support, e.g., to measure their own consumption) to optimize scheduling? Can't have a hardcoded policy, and lots of this information depends on motherboard, too - and amount of memory, etc.
- Does load skewing in web clusters, even without shutting the machines down, let you save power by having the rest of the machines in deeper sleep? Assuming they support deeper sleep.
- Can you batch requests into a machine (e.g., Web server requests) by a small amount -- microseconds to milliseconds -- to let the node spend more time in deeper C states, without substantially increasing latency? See the Nedeveschi sleeping NSDI paper, but apply to nodes. (contact on idea: Niraj Tolia, HP labs.)
- Power profiler that possibly uses external hardware -- what's out there? What can you do better? Trace applications?
- Record everything you can about your workload and power draw and fan state and battery state etc. on your laptop. What can we learn from a week of such traces? As detailed as possible. Note that some platforms have very detailed power consumption stats available, eg, for the Mac, you could use Hardware Monitor (Commercial, I'll buy a copy if you need it)
- GPGPU power projects??
- Graphics: When's it better to render in software vs hardware?? Speed vs. power tradeoff? E.g., new macbook pro computers have two GPUs, a wimpy one and a fast one. They can't switch dynmaically between them (requires logout), but if they COULD... how would you use them?
- Power-efficient code 1: PowerTop/kernelTop/latencyTOP are a start at measuring some of the power impact of code, they're clearly not the end. The increasing popularity of cloud computing and low-power computing is going to increase the incentive for developers to write low-power code. How does one measure power-efficient code?
- Note that some OSes (macos, solaris, bsd, maybe linux??) support the DTrace kernel + app profiler, which is ridiculously powerful. Can you use it for power profiling in some way?
- Power-efficient code 2: What would -Opower in a compiler actually do? Where and how much would it be worthwhile trying to write code where you could explicitly, during compilation or some other phase, trade between speed and power? space and power? etc.
- Something based on Tweet-a-Watt or Sensor Andrew's power monitors?
- Built-in, system-wide power measurement capabilities. How much would it cost? How much benefit could it provide? From the outlet in...
- Clever ways of measuring power. PowerScope is a starting point for this - how can you improve? Different apps? Cheaper? More accurate? Padmanabhan Pillai, Real-Time Dynamic Voltage Saling for Low-Power Embedded Operating Systems. (Babu is now at Intel, Pittsburgh...)
- Batteries - if you really use the battery instead of running the test laptop off of AC power, can you do things like creating a scheduler to optimize battery lifetime vs. just overall power draw? See, e.g., Thomas L. Martin Ph.D. thesis: Balancing Batteries, Power, and Performance: System Issues in CPU Speed-Setting for Mobile COmputing. (CMU thesis)
Really low power (sensors, etc)
- Do SSDs really save power on laptops? Test this in a rigorous way. We have multiple SSDs available. See, e.g., this grouchy blog post about SSD power consumption StorageMojo - Notebook SSDs are dead
Unedited idea pile
- See Jeff Mogul HotPower keynote talk / panel talk