Tools
From Cmcl
Suggestion: Don't try to master everything. Many tools do the same things; if you already have a favorite that overlaps one of these, don't worry about it. But pick a few tools that are most useful and learn to use them well. Learning or playing with at least one new tool or deeper features of existing tools every couple of months or year is probably a good idea for keeping your brain fresh.
Contents |
Emulab Tools
Jeff's Method
- Node selection: http://sword.cs.williams.edu/ - give it a query about type of nodes, it'll give you a list
- Can now do that with CoMoN, too
Ashwin and I have a fairly mature remote deployment/script execution/management library that we've been using for a couple years for emulab/planetlab experiments that might be useful. Unfortunately it's in perl. But it is documented. :)
http://www.cs.cmu.edu/~jeffpang/software/travertine-20070315.tar.gz
- Deployment scripts and management "stuff"
- Lets you run scripts/daemons on remote machines
- Utilities both for running scripts everywhere and for writing the scripts that run remotely
- Detached processes on remote machines that you can attach to using screen
- Provides threads for perl too.
- Some scripts for deploying software using BitTorrent (if not in tarball, ask Jeff)
I also have some misc. kernel modules for emulating things that generally were not possible/difficult for emulab's default setup. See the software part of my webpage:
http://www.cs.cmu.edu/~jeffpang/research.shtml
- Can do traffic shaping on each node; used to overlay planetlab RTT matrix onto 1000s of virtual machines
Bindu's method: Run the same scripts on Emulab and Planetlab
- Phase 1: Get a slice and know it's up to speed (Planetlab)
- Step 1: Add nodes to slice (add_all) via xmlrpc PLC API
- Add _all_ nodes to your slice
- Set up your slice well in advance of deadlines, can take a long time
- requires some packages (ssl, etc.)
- Step 2: Get a list of nodes
- Refresh script using CoMoN produces current.txt (sort by uptime over last 10-15min)
- Does an SSH to each node to make sure they're working
- Todo: Will check for available disk space
- Step 1: Add nodes to slice (add_all) via xmlrpc PLC API
- Phase 2: The actual experiment
- Push code to the nodes
- Using Dave's RON testbed copy/run scripts (keeps N SSH sessions running at a time, N=5...100)
- "copy-stuff.pl"
- Run
- "do-perhost.pl" runs a command on every node in current.txt
- monitor.pl does ps -c on each node (runs on our end, not PL, ssh's to nodes)
- Get results back
- "get-stuff.pl"
- Build a "master" script composed of these scripts that does everything
- Push code to the nodes
- Phase 1 Emulab: create an experiment, swap it in.
- For a big experiment, topogen.pl creates the ns file
- You -could- do this in TCL, but that's kind of painful. :)
- Upload ns file, etc. (See Emulab documentation)
- Either submit for batch
- Requires that your entire experiment is automated: Starts, runs, collects data back to some safe place like NFS
- Recommended for large, long-running experiments
- Takes a lot of ahead-of-time work to get them to work
- Or click and pray (swap in immediately, use interactively)
- Either submit for batch
- For a big experiment, topogen.pl creates the ns file
Tricks for experiments
- For scalability, try make-lan? Has some restrictions about being able to traffic shape links independently.
- Try virtual nodes? No experience with this yet...
- Run multiple clients on each node
- Also modelnet (possibly on the CMCL Cluster)
Emulab
- Bindu's way: Grab a node, create local partition, compile, push out to other nodes (treats that node as a control point)
- Superscript then rsyncs results back to moo,oink,something. (ran into space problems on /proj)
- Dave's way: Create a tarball, have Emulab install it automatically (tb-set-node-tarfile, or something like that)
- Vyas/Amar/Someone's way: Run binaries from NFS
- Beware this way: Can heavily load the ops node if you have 100s of nodes in your experiment, resulting in icky slowdowns in your experiment and nasty emails from testbed-ops. Works nicely for a small # of nodes.
- Jeff has some Emulab images that work with Ubuntu and Fedora so that you can compile locally
- It auto-mounted the rest of the large disk as a separate partition
- Note: If you log a lot of data, NFS _will_ fail!
- Option 1: Log to local disk, rsync back to ops (or elsewhere) (create extra disk using mkextrafs or equivalent)
- Option 2: Use loghole
Planetlab Tools
Dave's Favorites
bc: a calculator
522 bark:~> bc -l (-l means "use math library" - e.g., floating point, logs, etc.) bc 1.06 Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY. For details type `warranty'. 1+1 <-- basic math stuff works as expected 2 l(512)/l(2) <-- natural log 9.00000000000000000008 <-- not perfect. :) x=500 <-- can create arbitrary variables to use later x+50 550 ibase=16 <-- change input base to base 16 1A 26 <-- woot, it'll convert for us!
units: unit conversion
units is standard on BSD-derived systems, but must be installed separately on most Linux systems. ("aptitude install units" does the trick well.) The Linux version can handle non-linear units such as degree conversions, which is nice.
458 sn001:~> units
2438 units, 71 prefixes, 32 nonlinear units
You have: 20 miles
You want: km
* 32.18688
/ 0.03106856
Google does unit conversions/calculator functions pretty well, too.
double.pl: catch doubled words in latex documents
The bane of editing: accidentally introducing duplicate words when moving text around. I can't count the number of times I've done it and/or caught it in other people's papers, including submitted and published papers. To my knowledge, double.pl was originally written by Kevin Foo. Always run this script before submitting your papers!
- You can find it on moo at: ~dga/bin/scripts/double.pl
#!/usr/bin/perl
# Detects duplicated words even when they are
# are repeated between lines.
# Taken from the ORA regex book
$/ = ".\n";
while (<>) {
next if !s/\b([a-z]+)((\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$4\e[m/ig;
s/^([^\e]*\n)+//mg;
s/^/$ARGV: /mg;
print;
}
analyze and stats.pl: quick summary stats
501 bark:~> cat > example 1 2 3 4 5 6 7 8 9 10 502 bark:~> cat example | analyze Mean: 5.500000 Trimmed mean: 5.500000 Median: 5.000000 Min/Max: 1.000000 / 10.000000 Stddev: 3.027650 Trimmed Stddev: 2.449490 95% CI: 3.633017 - 7.366983
Analyze, and the stats.pl library it uses, can be found in ~dga/bin/scripts on moo
make-cdf
507 bark:~> cat > example 1 1 1 2 2 2 3 3 3 4 4 5 5 6 7 508 bark:~> cat example | make-cdf 1.00000 0.20000 2.00000 0.20000 2.00000 0.40000 3.00000 0.40000 3.00000 0.60000 4.00000 0.60000 4.00000 0.73333 5.00000 0.73333 5.00000 0.86667 6.00000 0.86667 6.00000 0.93333 7.00000 0.93333 7.10000 1.00000
You can then feed the output of make-cdf into gnuplot and get a nice pretty cdf. Note that ploticus can do cdf analysis internally, so this script may be less relevant to you if you use ploticus.
There's also its cousin, make-hist and text-hist:
512 bark:~> cat example | text-hist 2 0- 2 1 1 1 2- 4 2 2 2 3 3 3 4- 6 4 4 5 5 6- 8 6 7
Vijay's Favorites
rubber - LaTeX Builder
When writing papers, you might have experienced situations where references in TeX files came up as [?] no matter how you cited it, even though you ran BibTeX and LaTeX and all that goodness. Introducing rubber, which can be found at http://www.pps.jussieu.fr/~beffara/soft/rubber/. It is installed on moo if you do editing on there.
Simply add 'rubber source.tex' to your makefile instead of all the BibTeX, dvips, etc commands and it will *automatically* figure out how many times to run LaTeX and BibTeX so that the paper compiles, all references show up properly, etc.
To ensure that proper warnings appear and PDFs with embedded fonts are included, your makefile should include:
GS_OPTIONS=-dPDFSETTINGS=/prepress rubber --pdf -Wrefs -Wmisc $(PAPER)
This will output a pdf document using pdfLaTeX and print any missing references or other warnings that you should address before submitting a paper. Highly recommended! It does a bit more than what I just said: use the command line --help flag to see all the options.
Note that to use Rubber with the biblio files below, you want to use the modified bibtex.py file. Otherwise, rubber doesn't compile the biblio files with --min-crossrefs=1000. The modified bibtex.py file can be found in the Rubber source tarball.
Dave's Biblio File
Instead of having to manage separate biblio files for all references in a given paper, use this combined biblio file that Dave has provided us: if you have svn access to the moo repository, you can download the files from https://moo.cmcl.cs.cmu.edu/svn/biblio.
Most of the references are networking papers; there's a good chance that a networking paper in the previous 5 years is in this document, except for very recent ones.
How do you use this? First, grab the files from svn and store them in some paper-agnostic directory, e.g.,
svn co https://moo.cmcl.cs.cmu.edu/svn/biblio ~/biblio/.
Next, simply create a symbolic link to the ref.bib and rfc.bib files in the directory with the paper(s) you are working on.
ln -s ~/biblio/ref.bib ln -s ~/biblio/rfc.bib
Finally, in your paper, make sure to put
\bibliography{ref,rfc}
If you want to cite a paper, search through the document for the author and/or title of the paper and see if it is there already. If it is not, you will need to add it.
When adding a paper, try to follow the format of the other citations in the document. Note the use of "crossref" for making citations a bit easier and consistent. Also, try to add the citation in the proper location in the document alphabetically (Dave, is there a way to automate this?).
VERY IMPORTANT -- Dave has included files in the "test" directory to validate the edited biblio files, ensuring that you won't break the biblio file format for others. You MUST run these validation scripts before committing changes to the biblio file. Simply run:
make validate
make test
If any of the errors printed are a result of references you have added, be sure to fix them before committing. When working on a paper, make sure to commit the changes in the biblio repository explicitly; simply committing the contents in your paper directory will not update the biblio repository versions.
When using these biblio files, you should run bibtex with --min-crossrefs=1000 to prevent Latex from separating the citation for the conference from the citation for the paper.
Bindu's Favorites
Downsize those scatter plots
During papers with loads of measurement data, I frequently ran into the problem of scatterplots being immensely big eps files. This not only increases the paper's size but also takes ages to view it in xpdf/gv and print it too. This can also happen when you use an image such as a map for a testbed. The following commands are a nice way to reduce the size without sacrificing much in quality and getting your final papers to load quickly and be lean and mean.
If the obese figure is fig.eps do the following-
gs -r300 -dEPSCrop -dTextAlphaBits=4 -sDEVICE=png16m -sOutputFile=fig.png -dBATCH -dNOPAUSE fig.eps
convert fig.png eps3:figsmall.eps
eps files from powerpoint, excel and visio diagrams
Sometimes you need to get diagrams from these programs into nice eps files for papers. Say you made a ppt prior to the paper with a nice diagram in it or frequently maps and testbed schematics are in visio format. The normal way of printing to pdf and trying to save to an eps or using acrobat distiller produces large files with bad quality. WMF2EPS is a nice free tool which will do this for you and generate nice and small .eps files for the paper.
Website: http://www.wmf2eps.de.vu/
Managing large experiments
Frequently you have a bunch of machines and you are running some distributed experiment on them (Emulab/Planetlab/wireless testbed). In this case, it is nice to have a suite of parallel-* tools to execute commands on all machines. This is a nice set of tools for this
Parallel ssh (pssh) Parallel scp (pscp) Parallel rsync (prsync) Parallel nuke (pnuke) Parallel slurp (pslurp)
Example: pssh -h ips.txt -l irb2 -o /tmp/foo uptime Example: pscp -h hosts.txt -l irb2 foo.txt /home/irb2/foo.txt
Demo---
- cat ips.txt
128.112.152.122 18.31.0.190 128.232.103.201
- pssh -h ips.txt -l irb2 -o /tmp/foo hostname
Success on 128.112.152.122:22 Success on 18.31.0.190:22 Success on 128.232.103.201:22
- ls /tmp/foo
128.112.152.122 128.232.103.201 18.31.0.190
- cat /tmp/foo/*
planetlab-1.cs.princeton.edu planetlab1.xeno.cl.cam.ac.uk planetlab1.lcs.mit.edu
colormake
This is a simple wrapper around "make" to make it's output more readable. It is pretty neat. (Thanks Michael!)
Homepage: http://bre.klaki.net/programs/colormake/
Another cdfgen
This is a self-written C program to generate cdf output that can be then fed into gnuplot. You can specify the column number of the data which needs to be plotted.
usage: ./cdfgen <filename> <bin size> <column> <output file>
Also have a gridgen that bins both the x-axis and the y-axis and computes the mass in each cell.
usage: ./gridgen <filename> <bin size1> <column (in which X axis is)> <bin size2> <column (in which Y axis is)> <output file>
Must admit, it is used much less often.
Dan's Favorites
SSH with probing on multiple paths
See the external site for: Multi-Path Probing for Secure SSH Authentication
Graphing with Grace
Dave may hate it, but by my count it beats having to memorize a lot of commands right off the bat. Its scriptable too, of course.
http://plasma-gate.weizmann.ac.il/Grace/
Your mileage may vary, but the last thing I want to be doing near a paper deadline is googling for some gnuplot syntax I don't remember.
Amar's Recommendations
- unix stats - very cool set of applications that will help in data analysis / graphing (see http://oldwww.acm.org/perlman/stat/)
- beamer - presentations in latex (ask me for an example file to get started if you are interested)
- Inkscape / dia - for diagrams to be used in presentations, posters, papers
- flyspell mode for emacs (helps catch spelling mistakes on the fly and also catches mistakes like duplicate words - like "and and")
- gobby - collaborative document editing
- SamePlace plugin for Firefox - shared whiteboard!
- Xournal - kickass tool for annotating PDF files (time to save some trees)
Jeff's Favorites
Yet another set of plotting/analysis scripts
To obtain:
cvs -d humpback.cmcl.cs.cmu.edu:/usr0/cvs co netmap/util
- cdf.pl, histo.pl, rank.pl, bucketize.pl - plots/generates cdfs, ccdfs, histograms, rank-rank plots, and bucketized time series (with a bazillion different options).
- cut++ - Like 'cut' but supports arithmetic. Example: cut++ -f1,2+3 => prints field 1 and the sum of values in field 2 and 3
- stats.pl, max.pl, mean.pl, median.pl, sum.pl, etc. - quick summaries of a distribution
- randomize.pl, partition.pl, subsample.pl: permute, partition, and subsample file lines
- union.pl, intersection.pl, difference.pl: set operations on files (represented as sets of file lines)
R
Also known as S. Stats package that is more scriptable than MATLAB. It is also free, has a large library of stats functions (also free), and can produce really beautiful graphs:
Ask Jeff for some simple Perl bindings. Python bindings are available:
Ask Jeff for a couple books on the language.
(note that if you want something more matlab-ish, try GNU Octave, which is also free)
