712Projects-fall08

From Cmcl

Jump to: navigation, search
  • Group 1 - Michael Stevens, Michael Stroucken, Spencer Whitman
  • Group 2 - Bin Fan, Wittawat Tantisiriroj, Lin Xiao
  • Group 3 - Jeffrey Dunn, Brendan Meeder, Matthew Stanton
  • Group 4 - Daniel Mcfarlin, Iulian Moraru
  • Group 5 - Hormoz Zarnani, Vinod Chandrasekaran
  • Group 6 - Rui Meireles, Swapnil Patil, Harshavardhan Simhadri
  • Group 7 - Ravishankar Krishnaswamy, B. Aditya Prakash, Ali Kemal Sinop
  • Group 8 - Keith Bare, Michael Kasick, Eugene Marinelli, Jiaqi Tan

Contents

Design Document Meeting Schedule

Wed (29th Oct 2008)

Time Group#
7:00 - 7:30 pm
  • Group 4 - Daniel Mcfarlin, Iulian Moraru
7:30 - 8:00 pm
  • Group 3 - Jeffrey Dunn, Brendan Meeder, Matthew Stanton
8:00 - 8:30 pm
  • Group 6 - Rui Meireles, Swapnil Patil, Harshavardhan Simhadri
8:30 - 9:00 pm
  • Group 5 - Hormoz Zarnani, Vinod Chandrasekaran

Thursday (30th Oct 2008)

Time Group#
9:00 - 9:30 am
9:30 - 10:00 am
  • Group 8 - Keith Bare, Michael Kasick, Eugene Marinelli, Jiaqi Tan
10:00 - 10:30 am
  • Group 1 - Michael Stevens, Michael Stroucken, Spencer Whitman
10:30 - 11:00 am
  • Group 2 - Bin Fan, Wittawat Tantisiriroj, Lin Xiao
11:00 - 11:30 am
11:30 - 12:00 am
  • Group 7 - Ravishankar K, B. Aditya Prakash, Ali K. Sinop

Looking for project partners? Want to discuss ideas? Here's your place.

Name (email address), topics of interest, etc.

Example:

  • Joe (joe@... ) - My research interests are in File Systems. Specifically i am interested in looking at file system support for Flash based storage.

OR a more detailed one (if you have a specific project idea already and want to recruit more ppl)

  • Amar Phanishayee (amarp@cs. ) - TCP Throughput Collapse in Cluster-based Storage Systems. When data is striped over multiple networked storage nodes, a client can experience a TCP throughput collapse that results in much lower read bandwidth than should be provided by the available network links. Conceptually, this problem arises because the client simultaneously reads fragments of a data block from multiple sources that together send enough data to overload the switch buffers on the client's link. I'd like to analyze this problem and explore solutions to this problem.

Your ads here

  • Hormoz Zarnani (hzarnani@cs) - I am working on a project with Intel Research Pittsburgh to map BDDs (Binary Decision Diagrams) onto a large-scale computer cluster. More specifically, we are implementing a distributed, out-of-core BDD package. (BDDs are graph-based data structures used to represent Boolean functions. They have many applications, in particular formal verification.) Over the past nine months, I have developed a sequential, out-of-core version of the system. As my project for this course, I would like to do the next phase of the project---parallelize and distribute this system to run on multiple computer nodes. I was hoping to recruit some of you to partner with me in this project. I should note that it is unlikely that the entire task of parallelizing and distributing can be completed this semester. But we can identify a subset of it to work on.
  • Iulian Moraru (iulian@cs) - The topic would be: making applications written for uniprocessor run faster on multicore, without changing the code. I was thinking about speculative execution: pre-fetching disk data, trying to minimize the number of cache misses, maybe even more radical stuff.
  • Wittawat Tantisiriroj (wtantisi@cs) - My research interests are distributed/parallel file system. Topics I am interested in are something likes cross-server redundancy techniques, for example, mirror, multi-replication, or parity for parallel file systems such as Parallel Virtual File System (PVFS). Or, analysis for triple-replicas with slow recovery model in Google File System (GFS). Or, using a distributed/parallel file system as a light weight distributed database system.


  • Daniel McFarlin (dmcfarli@ece) - I'm interested in all aspects of High Performance Computing (HPC). Some possible project ideas I'm considering:

1. Reducing Linux kernel re-compilation times through binary constant propagation. Kernel configuration parameters are generally incorporated directly into the kernel binary image through #define's. Often, we only want to change a single configuration parameter if we are performance tuning a kernel. Unfortunately, even if the corresponding #define is only used by a handful of files we are at a minimum forced to undertake a time consuming relink of the entire kernel. Would it be possible to merely modify the current kernel binary in place (offline in the current envisioning) updating the #define and its subsequent uses with the new value?

2. De-parallelizing server applications. Given that state-machines exhibit better performance than their threaded counterparts, is there a way to auto-magically convert pthread/OpenMP code into state-machine/event-driven code? In the process, can we still achieve some form of parallelism to take advantage of SMT/SMP/CMP?

3. Optimizing streaming programs for the entire memory hierarchy. x86 CPUs (and their PowerPC counterparts) expose software streaming instructions (prefetching, non-temporal stores etc). When used properly, these instructions can provide significant (> 20%) speedups for streaming programs. The challenge is in using them properly particularly knowing how far in advance to issue them and in what quantity to batch them. Also, memory bandwidth becomes a significant bottleneck so giving the programmer some high level constructs to optimize the memory controller would be desirable as well. Can we use machine learning to aid the programmer (either through transparent program transformation or hints) here?

4. Software Virtual Memory. With multicore processors, can we achieve greatly flexibility (arbitrary page sizes, page replacement policies) and greater performance by dedicating a core to virtual memory management? This might entail implementing software TLBs and other fixed CPU functionality as general purpose modules in software.

  • Vinod Chandrasekaran (vinodc@andrew.cmu.edu)

1. I am interesting in exploring flash-based storage systems. They have some unique physical characteristics which mean that they do not perform well for random writes as the memory must be erased before it can be written. The unit of the erase operation is typically a block composed of multiple pages. For this, flash file systems use a flash translation layer (FTL) whose function is to map the storage interface logical blocks to physical pages within the device. The SSD random write performance is highly dependent on the effectiveness of the FTL algorithm. It would be interesting to enhance FTL algorithms with some knowledge of the application's access pattern, so that some locality could be exploited.

Personal tools