<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>All The Naughty Bits &#187; Linux</title>
	<atom:link href="http://rogerbush.wordpress.com/category/linux/feed/" rel="self" type="application/rss+xml" />
	<link>http://rogerbush.wordpress.com</link>
	<description>Programming Jiu-Jitsu for the Aspiring Linux Hacker</description>
	<lastBuildDate>Fri, 15 Aug 2008 18:07:19 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='rogerbush.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/584d8337c14375b7eec37c4fcfcb8b56?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>All The Naughty Bits &#187; Linux</title>
		<link>http://rogerbush.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://rogerbush.wordpress.com/osd.xml" title="All The Naughty Bits" />
		<item>
		<title>Cloud Computing For Personal Backups &#8211; Amazon S3 Versus Dreamhost?</title>
		<link>http://rogerbush.wordpress.com/2008/08/14/cloud-computing-for-personal-backups-amazon-s3-versus-dreamhost/</link>
		<comments>http://rogerbush.wordpress.com/2008/08/14/cloud-computing-for-personal-backups-amazon-s3-versus-dreamhost/#comments</comments>
		<pubDate>Thu, 14 Aug 2008 21:03:20 +0000</pubDate>
		<dc:creator>rogerbush</dc:creator>
				<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Add new tag]]></category>
		<category><![CDATA[Amazon S3]]></category>
		<category><![CDATA[Amazon Web Services]]></category>
		<category><![CDATA[AWS Availability]]></category>
		<category><![CDATA[dreamhost]]></category>
		<category><![CDATA[Grid Computing]]></category>
		<category><![CDATA[iPhone Outage]]></category>
		<category><![CDATA[Jeremy Zawodny]]></category>
		<category><![CDATA[Offsite Backups]]></category>
		<category><![CDATA[rsnapshot]]></category>
		<category><![CDATA[rsync]]></category>
		<category><![CDATA[S3]]></category>
		<category><![CDATA[Scale on Demand]]></category>
		<category><![CDATA[Thecus 5200]]></category>
		<category><![CDATA[Utility Computing]]></category>
		<category><![CDATA[Web Hosting Availability]]></category>

		<guid isPermaLink="false">http://rogerbush.wordpress.com/?p=60</guid>
		<description><![CDATA[



Utility Computing Meets Personal Backups

Short Version
A drive died in my fileserver, providing the impetus to use &#8220;Cloud Computing&#8221; to do my personal backups.
S3 is Amazon&#8217;s Scalable Storage Service, one of the components of Amazon Web Services (AWS).  AWS is a set of Cloud Computing resources for creating scalable Internet businesses which can benefit from [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rogerbush.wordpress.com&blog=2869039&post=60&subd=rogerbush&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><!-- AddThis Button BEGIN --><br />
<a href="http://www.addthis.com/bookmark.php" target="_blank"><img src="http://s9.addthis.com/button1-share.gif" width="125" height="16" border="0" alt="Bookmark and Share" /></a><br />
<!-- AddThis Button END --><br />
</p>
<p><strong>Utility Computing Meets Personal Backups</strong></p>
<p><em><br />
<strong>Short Version</strong></p>
<p>A drive died in my fileserver, providing the impetus to use &#8220;Cloud Computing&#8221; to do my personal backups.</p>
<p>S3 is Amazon&#8217;s Scalable Storage Service, one of the components of Amazon Web Services (AWS).  AWS is a set of Cloud Computing resources for creating scalable Internet businesses which can benefit from a scale-on-demand scheme.  In particular, if you have an Internet business that has large spikes in traffic, then AWS can lower costs dramatically since you pay only based on the resources used.</p>
<p>However, back of the envelope calculations show that for applications which don&#8217;t require scaling (e.g. home backups), AWS can be fairly pricey when compared to simple alternatives like web hosting services, even when availability is considered.  Using rsync to backup to a web hosting service can be done for less than 1/10th the price of Amazon&#8217;s S3, with very little setup involved.  Availability can be easily enhanced beyond that provided by S3, by employing a second hosted server, still at 1/5th the price.</p>
<p>AWS is designed for applications which could benefit from scale-on-demand scheme.  If your application doesn&#8217;t need this, then it might not make sense economically.<br />
</em></p>
<p><strong>Long Version</strong></br></br><br />
<strong>The Joys and Sorrows Of Owning Shiny Metal Boxes</strong></p>
<p>One of the SATA drives on my home file server died a horrible death this week.  The server had been acting funny all week, giving me the occasional strange I/O error.  Finally, the system would not boot , instead showing the familiar &#8220;Insert CD:&#8221; error that shows on the screen when the boot drive fails.  Attempting to auto-recognize the first SATA drive from the BIOS made the computer think for awhile, and then come back without finding anything.  Unfortunately, this box was also running slimserver which provides access to my music collection from my Squeezebox.  No streaming music for the BBQ I&#8217;m hosting this weekend (we&#8217;ll have to switch to the redundant iPod + boombox).</p>
<p>I originally built the computer as a tiny fileserver-on-the-cheap for things like my videos, music collection, and family photo album.  At the time, 2 years ago, I had been looking at SOHO NAS systems for a while, but just couldn&#8217;t get over how much they cost.  Also, for the price I thought I deserved unfettered access to the underlying server to put whatever I wanted on it.  Whenever I see some new device with Linux on it, the hacker in me wants access to bend it to my will.  For the server, I used the tiniest enclosure I could find that fit microATX, which at the time was the Antec 1380, a small cube about the size of 2 shoe boxes.  One slight design flaw was putting a powerful (but inexpensive) CPU in the box:  an AMD dual-core 3800.  Yes, I succumbed to market forces and, in the popular venacular, &#8220;biggie sized it.&#8221;  While I got the CPU at a good price, it ran a bit hot.  In retrospect I wonder if this extra heat caused my drive to fail.</p>
<p>If I had to buy my own equipment today, I&#8217;d probably go with the Thecus 5200 (dedicated SOHO NAS).  Let&#8217;s face it, owning tiny, shiny boxes packed with technology is fun.  But it&#8217;s also expensive, and eventually, everything breaks.</p>
<p><strong>Cloud Computing To The Rescue?</strong></p>
<p>A natural response to owning things that break is to not own them anymore, which brings us to Utility or Cloud Computing.  I had preferred the earlier term &#8220;Utility Computing,&#8221; as I thought this moniker more accurate, although Cloud Computing does sound cooler.  Eventually the coolest sounding name wins mindshare and we are forced to use it.</p>
<p>Also, a problem with the term Utility Computing is that the the analogy of &#8220;computing power as electricity&#8221; is limited.  Utility Computing is less like electricity, more like lego pieces.  Utility, er rather Cloud Computing already has a variety of different payment models, and service types, each with different features which are important for solving different types of problems.  One important feature to emerge is the ability to scale-on-demand, which is being pioneered by companies like Amazon, with their Web Services (AWS).</p>
<p>Scale-on-demand (my own term) is a really cool feature, which is part of the basic strategy for Amazon Web Services.  The basic argument is if one must own/rent equipment to cover the maximal use, then the equipment will not be fully utilized.  Consider a company whose normal utilization is 1/10th of periodic spikes in activity, say around noon.  In own/rent models, the company would have to have 10 times the equipment just to handle the spikes, even if the spikes only lasted for 1 hour.  Calculations show the average utilization would be far lower =&gt; 9 + 24 computer hours =&gt; 33/24 = 1.375.   So we need 1.375 computers instead of 10, if we could spread our load out consistently over time.</p>
<p>This shows that there is room for a middleman who aggregates computing resources to make a profit.  If the middleman can effectively spread out utilization spikes so that the utilization pattern for a large number of companies approaches the average utilization, then machines can approach 100% utilization.  This is exactly the approach that Amazon is taking with its various services which make up AWS.</p>
<p>AWS has many individual component services which each have different applications.  Those of you who work in the Internet space, or distributed computing will recognize many of the pieces of scalable Internet storage &#8211; entity storage, mySQL database instances, key value persisted storage, virtual instances (ala Xen), durable queues.  There are already many startup companies who have built their entire infrastructure using AWS.</p>
<p><strong>Are Personal Backups Using S3 Cost Effective?</strong></p>
<p>I&#8217;ve seen a lot of blog posts about doing personal backups to S3, which one must admit sounds extremely cool at first glance.  S3 is Amazon&#8217;s Simple Storage Service, which allows one to write, read and delete objects of up to 5GB.  The objects are retrieved from buckets you define via the bucket id and object name.</p>
<p>The best way to do incremental backups is to look at the state of what is backed up, and then produce a delta of what has changed.  This way, we&#8217;d only send the portions of the files that had changed.  This delta system works extremely well for many types of files.  And it turns out there is a well-known and dependable tool which does this well:  rsync.  Rsync remotely calculates a delta, and sends only the minimal set of file updates via ssh.  There are also some clever minimal backup packages that use Unix hard links to provide &#8220;minimal space snapshots.&#8221;  This technique is so cool and simple, but it does rely on hard links, which are not available to every OS (but all *nix have them).  If hard links aren&#8217;t available in a particular OS/filesystem, rsync can be trivially used to maintain a single up-to-date snapshot (a cron job invoking rsync is all it takes).</p>
<p>Note that rsync should be installed on any Linux host from a web hosting company, so if you have ssh access, you can use this backup technique, with almost no setup required.  Here are some links:</p>
<ol>
<li><a href="http://rsync.samba.org">Rsync</a> &#8211; file copy utility which computes and transmits minimal deltas to update.</li>
<li><a href="http://rsnapshot.org">Rsnapshot</a> &#8211; a backup program based on rsync and written in perl which implements the minimal space snapshot idea with hard links.</li>
</ol>
<p>Next, lets look at the cost analysis.  Remember that the Amazon Web Services can help us save money, but are designed from the standpoint of providing scale-on-demand ability versus purchasing for peak utilization.  We are essentially &#8220;buying insurance&#8221; against usage spikes.  But our simple backup application is completely predictable, and will never have any usage spikes.  This is for our home backups, not to provide scalable backups for all the consumers out there on the Web.  The backups are easy to schedule so that there are never any spikes, and thus we can completely predict our usage.  Not only that, but backups are not latency sensitive &#8211; I just want to make sure they happen every night.  I don&#8217;t really care if they take an extra 10 minutes.</p>
<p>S3 costs $0.15 per GB per month for storage.  For 500 GB, this would be $75 a month.  Web hosting services provide a variety of monthly plans from about $11, and give between 500 and 2500 GB.  In fact, there are services which offer &#8220;unlimited storage.&#8221;  Unlimited storage is simply another play on the averages.  If you have enough customers, you can give people unlimited storage, because your average storage will be extremely low.  It&#8217;s like an all-you-can-eat buffet:  the average person is paying for the people loading their plates.</p>
<p>Thus, purely from a cost perspective, storage for backups from a web hosting company should be 1/8th &#8211; 1/34th the cost of Amazon S3.</p>
<p>One final issue to look at is availability.  Amazon has an actual SLA for S3 which guarantees 99.9% uptime, or you will get reduced rate for the period of the outage.  This is actually pretty cool as they will return from 10%-25% of your money for that month.  Keep in mind this is good, as the incentives will push Amazon to get better and better, but it&#8217;s no guarantee of stability.  Once they have a &#8220;really bad month,&#8221; the incentive to keep the month good goes away.  After all, under the <a href="http://www.amazon.com/gp/browse.html?node=379654011">Amazon S3 Service Level Agreement</a> they could be down the entire month (0% uptime), but still charge you 75% for storage fees.  In extremis this sounds ridiculous, but this is a problem in the structure of most SLA&#8217;s, not just Amazon&#8217;s.</p>
<p>In the month of July, Amazon S3 had an 8 hour outage due to a data corruption problem.  Amazon CTO Werner Vogels mentions in his blog the root cause <a href="http://www.allthingsdistributed.com/2008/07/root_cause.html"> was single bit corruption of internal state messages that are distributed via Gossip techniques.</a>  It&#8217;s good to keep in mind that the core technologies for AWS are new and so have a few minor kinks to work out.  While Amazon does not have to rely on software upgrades to existing services for revenue, they still have to provide missing features for their customers.  So I would expect errors like this to abate over time, but not completely vanish.  In a distributed system, any error, regardless how small, becomes serious through amplification.</p>
<p>This puts them at 98.88% for the month of July, just for this event.  To their credit, the response was very professional, swift and public.  CEO Jeff Bezos <a href="http://www.techcrunch.com/2008/07/21/bezos/"> even talked about it.</a>  To put this in perspective, service companies usually apologize in private, and try to make it up when the monthly SLA reports are given to customers.  In my mind this is one of the things, besides the technology, which puts Amazon way out in front in the race to define this industry.  I especially like the <a href="http://status.aws.amazon.com/">Amazon AWS dashboard</a> which shows their history of outages.</p>
<p>The web hosting companies have no SLAs for most self-serve customers (although upscale web hosting, which costs more, may have SLAs for businesses).  Uptime is measured by third party companies.  It&#8217;s not clear how the measurements are taken or what the relationships are from the hosting companies.  I have to believe the hosting companies are providing advertising money to these uptime companies, which doesn&#8217;t bode well for the objectivity of the measuring company.  Still, it is not hard to find companies with uptime of 99.5%.  This would equate to 2 hours of downtime a month, which for a backup solution, seems completely acceptable, especially at 1/8th the price.  This does not take into account data transfer, or computational costs, which would be free for the web hosting company, but translate into even higher S3 costs.</p>
<p>Also note you can have a secondary web hosting service, or even a different server with the same service, to get a much better availability through trivial redundancy.  Note that two hosting servers would give availability of 99.9975%, which is 1 minute of downtime a month, and far better than Amazon, at 1/4th the cost.</p>
<p>Note that either of these solutions is superior to owning your own hardware, as <a href="http://jeremy.zawodny.com/blog/archives/007624.html">Jeremy Zawodny calculates.</a>  Also, if you are interested in software which backs up to S3, <a href="http://jeremy.zawodny.com/blog/archives/007641.html">Jeremy has done the legwork for you.</a></p>
<p>Amazon S3 (and AWS) really shines as a model for scale-on-demand.  This makes a lot of sense for things like a website with a video that becomes popular overnight, and needs to scale up to meet a rush of users.  But the scale-on-demand terms approach loses the pricing advantage when the max utilization is close to the average utilization.  For long term storage this tends to be true, as it grows very slowly.  I&#8217;m thinking S3 is designed more for scaling I/O reads and writes, which are costly features our backup solution won&#8217;t use.</p>
<p>Don&#8217;t get me wrong, I think S3 and AWS in general are the coolest thing since sliced bread, but they don&#8217;t make sense for every project (especially where scalability isn&#8217;t important).  We&#8217;ll come back to some cool uses of AWS services in future articles where the model does make sense.</p>
<p><strong>Giving Dreamhost A Try</strong></p>
<p>I&#8217;d heard good things about Dreamhost, so I&#8217;m giving them my business.  Setting up backups was trivial, rsync was already installed so it worked on my first rsync command attempt.  We&#8217;ll see how the availability is over the coming months.</p>
<p><!-- AddThis Button BEGIN --><br />
<a href="http://www.addthis.com/bookmark.php" target="_blank"><img src="http://s9.addthis.com/button1-share.gif" width="125" height="16" border="0" alt="Bookmark and Share" /></a><br />
<!-- AddThis Button END --></p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/rogerbush.wordpress.com/60/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/rogerbush.wordpress.com/60/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rogerbush.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rogerbush.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rogerbush.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rogerbush.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rogerbush.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rogerbush.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rogerbush.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rogerbush.wordpress.com/60/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rogerbush.wordpress.com/60/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rogerbush.wordpress.com/60/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rogerbush.wordpress.com&blog=2869039&post=60&subd=rogerbush&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://rogerbush.wordpress.com/2008/08/14/cloud-computing-for-personal-backups-amazon-s3-versus-dreamhost/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e9fee4f625317e26721c36555f2002da?s=96&#38;d=identicon" medium="image">
			<media:title type="html">rogerbush</media:title>
		</media:content>

		<media:content url="http://s9.addthis.com/button1-share.gif" medium="image">
			<media:title type="html">Bookmark and Share</media:title>
		</media:content>

		<media:content url="http://s9.addthis.com/button1-share.gif" medium="image">
			<media:title type="html">Bookmark and Share</media:title>
		</media:content>
	</item>
		<item>
		<title>Command line programs and perl Getopt::Long</title>
		<link>http://rogerbush.wordpress.com/2008/02/19/command-line-programs-and-perl-getoptlong/</link>
		<comments>http://rogerbush.wordpress.com/2008/02/19/command-line-programs-and-perl-getoptlong/#comments</comments>
		<pubDate>Tue, 19 Feb 2008 00:22:30 +0000</pubDate>
		<dc:creator>rogerbush</dc:creator>
				<category><![CDATA[Command Line]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[command line option parsing]]></category>
		<category><![CDATA[linux command line utility]]></category>
		<category><![CDATA[linux perl scripting]]></category>
		<category><![CDATA[perl command line option parsing]]></category>
		<category><![CDATA[perl Getopt Long]]></category>
		<category><![CDATA[perl plain old documentation]]></category>
		<category><![CDATA[perl self documenting code]]></category>

		<guid isPermaLink="false">http://rogerbush.wordpress.com/?p=5</guid>
		<description><![CDATA[In Today&#8217;s Post
I provide a simple template I use to start every command-line utility I write in Perl.  We&#8217;ll also make command-line option parsing easy using Getopt::Long, as well as make encapsulated documentation using POD.
Perl, the Swiss Army Chainsaw 
Perl is an excellent language for system scripting. There are those who criticize Perl for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rogerbush.wordpress.com&blog=2869039&post=5&subd=rogerbush&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p><b>In Today&#8217;s Post</b></p>
<p>I provide a simple template I use to start every command-line utility I write in Perl.  We&#8217;ll also make command-line option parsing easy using Getopt::Long, as well as make encapsulated documentation using POD.</p>
<p><b>Perl, the Swiss Army Chainsaw </b></p>
<p>Perl is an excellent language for system scripting. There are those who criticize Perl for being &#8220;syntactically excessive.&#8221;  However, as languages evolve expressive power they often become more complicated.  Remaining backward compatible requires carrying around a certain amount of syntactic baggage.</p>
<p>I&#8217;m pragmatic in that I&#8217;d choose a language that was powerful with a few eccentricities over one that was perfect and less useful, or one that had stopped evolving.  Languages are a living thing &#8211; they need to adapt to changes in the environment or they die.</p>
<p>Perl will always be alive and well because it&#8217;s just too damn useful. Larry Wall, the creator of Perl, has described his creation variously as &#8220;duct tape for the Web&#8221; and a &#8220;Swiss Army chainsaw&#8221;.  Perl&#8217;s hallmark is flexibility, which is just what&#8217;s required to glue things together that weren&#8217;t necessarily designed to be glued together.</p>
<p>In a future post I&#8217;ll show you a style of Perl programming that works well for me and provides just the right amount of OOP without going overboard.  This is actually simpler to do than you might think.</p>
<p><span id="more-5"></span></p>
<p><b>Unix Command Line Utilities Are Reusable, System-Level Building Blocks</b></p>
<ol>
<li>One of the reasons Unix is so successful is the philosophy of &#8220;doing one thing really well.&#8221;  Command-line utilities are designed to address a single problem well.</li>
<li>A second reason Unix is so successful is the &#8220;building block approach&#8221; &#8211;  simple, well designed components can be connected together in various ways to solve a larger problem.</li>
<li>Unix command-line utilities should be thought of as reusable, system-level building blocks.</li>
<li>Write a command-line utility rather than a script.  This is a conceptual task, which requires a few simple steps.  One can often alter a script to turn it into a command-line utility.</li>
</ol>
<p>Note, when I say Unix here, I mean &#8220;Unix and all it&#8217;s related descendants&#8221;, such as Linux (I&#8217;m old school, deal).  I typically work with Linux (RHEL4, FC8) and FreeBSD (4 and 6).</p>
<p>I differentiate between scripts and command-line utilities.  I think of a script as being designed for a single purpose, whereas a command-line utility was (should be) a building block.  This is largely conceptual, and I realize it&#8217;s not a hard and fast rule, but it tends to be true.</p>
<p>I can&#8217;t count the number of times I&#8217;ve had to rewrite something someone wrote, just because they didn&#8217;t take the time to think of things from the perspective of reusability.  The template below solves a few problems in this area.</p>
<p><b>Parsing Options</b></p>
<p>Command-line utilities often have a plethora of options that fine tune their behavior.  For example the &#8220;ls&#8221; command has 36 &#8220;single character&#8221; options on OS X.  The reason for having so many options is related to the design principles I&#8217;ve already mentioned:  do one thing really well and make it so no one has to ever do it again (i.e. make it a building block).  &#8220;ls&#8221; is designed to be used interactively, as well as produce output that is consumed by other programs, and so modifying it&#8217;s output via options is important (less so today since languages like perl have their own internal methods of listing files, like globbing).</p>
<p>One of the annoying parts of writing a command-line utility is parsing options.  Options may have arguments, be different types, have valid and invalid ranges, and they need error checking.  It&#8217;s also useful to be able to print a list of them from the program (using -h, or &#8211;help).</p>
<p>The perl library Getopt::Long solves the problems I&#8217;ve just described in an elegant way.  In addition it&#8217;s got lots of extra bells and whistles for doing things like &#8220;repeated arguments&#8221;.  The template below uses Getopt::Long and demonstrates it&#8217;s power.</p>
<p><b>What does this code do?</b></p>
<p>The code below is a &#8220;starting template&#8221;.  I have a directory called &#8220;templates&#8221; which are simple program starting points for different purposes.  When I want to write a command-line utility, I copy the template and make changes.</p>
<ol>
<li>Command-line option parsing through Getopt::Long.</li>
<li>Internal documentation which can be programatically displayed, or formatted by other programs.  This is done with Perl&#8217;s POD format (Plain Old Documentation).</li>
<li>Basic error handling.</li>
<li>As a template, its purpose is to jog my memory so I don&#8217;t have to try to remember how to use Getopt::Long, or the correct format for POD.  Thus the code as-is doesn&#8217;t do too much (we&#8217;ll see how to use it below).</li>
</ol>
<p><a title="code" name="code"></a><b>Starting Template:  Unix Command Line Utility</b></p>
<div class="sc"> The Starting Template</p>
<pre>
<span class="comment-delimiter">#</span><span class="comment">!/usr/bin/perl
</span>
<span class="comment-delimiter"># </span><span class="comment">Use this "starter template" as a starting point for your
</span><span class="comment-delimiter"># </span><span class="comment">own cmdline utilities.  Shows how to use Getopt::Long for
</span><span class="comment-delimiter"># </span><span class="comment">option processing and POD for documentation.
</span><span class="comment-delimiter">#</span><span class="comment">
</span><span class="comment-delimiter"># </span><span class="comment">http://rogerbush.wordpress.com/2008/02/19/</span>
<span class="comment-delimiter"># </span><span class="comment">command-line-programs-and-perl-getoptlong/</span>

<span class="keyword">use</span> <span class="constant">strict</span>;
<span class="keyword">use</span> <span class="constant">warnings</span>;

<span class="keyword">use</span> <span class="constant">Getopt</span>::Long;
<span class="keyword">use</span> <span class="constant">Pod</span>::Usage;

<span class="keyword">sub</span> <span class="function-name">main</span>
{
    <span class="type">my</span> %<span class="underline"><span class="variable-name">options</span></span>;

    GetOptions
        (\%<span class="underline"><span class="variable-name">options</span></span>,
         <span class="string">'help|?'</span>,
         <span class="string">'length=i'</span>,
         <span class="string">'filename=s'</span>,
         <span class="string">'addr=s@'</span>,
         <span class="string">'man'</span>) or pod2usage (2);

    pod2usage (1) <span class="keyword">if</span> $<span class="variable-name">options</span>{help};
    pod2usage (-exitstatus =&gt; 0, -verbose =&gt; 2)
        <span class="keyword">if</span> $<span class="variable-name">options</span>{man};

    <span class="keyword">if</span> ($<span class="variable-name">options</span>{length})
    {
        pod2usage (<span class="string">"--length must be &gt; 0."</span>)
            <span class="keyword">unless</span> $<span class="variable-name">options</span>{length} &gt; 0;

        print <span class="string">"length is $options{length}\n"</span>;
    }

    print <span class="string">"filename is $options{filename}\n"</span>
        <span class="keyword">if</span> $<span class="variable-name">options</span>{filename};

    <span class="keyword">if</span> ($<span class="variable-name">options</span>{addr})
    {
        <span class="keyword">foreach</span> <span class="type">my</span> $<span class="variable-name">a</span> (@{${<span class="variable-name">options</span>{addr}}})
        {
            print <span class="string">"addr: $a\n"</span>;
        }
    }
}

main ();
__END__

<span class="comment">=head1 NAME

perlcmdline - A do nothing, starter template for implementing
command-line utilities.

=head1 SYNOPSIS

perlcmdline [options]

"perlcmdline --help" will list options.  "perlcmdline --man"
will show docs.

=head1 OPTIONS

=over 4

=item B&lt;--help&gt;

Print a brief help message and exits.

=item B&lt;--man&gt;

Prints the manual page and exits.

=item B&lt;--length positive-integer&gt;

A "do nothing" length value which must be an int &gt; 0.

=item B&lt;--filename input-filename&gt;

A "do nothing" string value.

=item B&lt;--addr hostname-or-ip&gt;

An internet address.  This option may be used multiple
times in a single command to specify multiple addresses.

=back

=head1 AUTHOR

Roger Bush, rogerbush8 at yahoo dot com

=head1 COPYRIGHT

Copyright (c) 2007 Roger Bush. All rights reserved. This
program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

=cut

</span></pre>
</div>
<div class="cl"> The Naughty Bits</p>
<pre>
1. Getopt takes many forms.  In the form I'm using, I pass
   in a hash to hold all of the variables rather than assigning
   values to loose variables.
2. Getopt allows other types of inputs, such as hashes,
   flags, and several other types.  For a full list see the
   reference at the end of this post.
3. If a value is not specified, it will not be defined in the
   options hash (e.g. if ( ! defined $options{foo})).
4. The expression @{$options{addr}} evaluates to an array
   (it dereferences the array ref at $options{addr}.  Similarly
   %{hash_ref} would dereference a hash.
5. __END__ tells perl where the code has ended.  This always
   precedes the docs at the end which are called "POD" (Plain
   Old Documentation).  Perl's POD has special directives which
   start a new line with a "=name-of-directive".  See ref on
   POD at end of blog for more.
6. =over, and =back are POD indenting commands.  over
   indents the given number of spaces, and back goes back
   to the indenting setup prior to the over statement.
7. pod2usage prints out the SYNOPSIS and potentially
   the options in the POD section, when given a single
   numeric argument (1 is Usage, 2 is Options).  It can also
   be provided a hash of options affecting its behavior.</pre>
</div>
<p><a title="demo" name="demo"></a>Time to give the program a try.  Let&#8217;s try an unknown option to see how it handles it:</p>
<div class="cl">
<pre>
$ ./perlcmdline --foo
Unknown option: foo
Usage:
    perlcmdline [options]

    "perlcmdline --help" will list options. "perlcmdline --man"
    will show docs.</pre>
</div>
<p>It told us that &#8220;foo&#8221; was an &#8220;Unknown option&#8221; and then it printed out usage from the SYNOPSIS (POD) section.  This is the behavior of &#8220;pod2usage(2)&#8221;.  Note the helpful &#8220;&#8211;help&#8221; and &#8220;&#8211;man&#8221; info we added to SYNOPSIS.</p>
<p>Now let&#8217;s try the &#8220;&#8211;length&#8221; option without the argument (it requires a number):</p>
<div class="cl">
<pre>
$ ./perlcmdline --length
Option length requires an argument
Usage:
    perlcmdline [options]

    "perlcmdline --help" will list options. "perlcmdline --man"
    will show docs.</pre>
</div>
<p>Now let&#8217;s try a length that is equal to zero (we explicitly check for this and exit from the command-line):</p>
<div class="cl">
<pre>
$ ./perlcmdline --length 0
--length must be &gt; 0.
Usage:
    perlcmdline [options]

    "perlcmdline --help" will list options. "perlcmdline --man"
    will show docs.</pre>
</div>
<p>Now let&#8217;s try misspelling length (we&#8217;ll truncate it):</p>
<div class="cl">
<pre>
$ ./perlcmdline --len 5
length is 5</pre>
</div>
<p>Whoa, what happened there?  It turns out that Getopt automatically supports abbreviation.  It will use the string to match the longest match possible.</p>
<p>Let&#8217;s try &#8220;&#8211;help&#8221;:</p>
<div class="cl">
<pre>
./perlcmdline --help
Usage:
    perlcmdline [options]

    "perlcmdline --help" will list options. "perlcmdline --man"
    will show docs.

Options:
    --help
        Print a brief help message and exits.

    --man
        Prints the manual page and exits.

    --length positive-integer
        A "do nothing" length value which must be an int &gt; 0.

    --filename input-filename
        A "do nothing" string value.

    --addr hostname-or-ip
        An internet address.  This option may be used multiple
        times in a single command to specify multiple addresses.</pre>
</div>
<p>Very cool.  The options listed under (POD) OPTIONS got printed this time, due to pod2usage(1).  Try &#8220;&#8211;man&#8221; on your own.</p>
<p>Now, let&#8217;s try some legitimate options.</p>
<div class="cl">
<pre>
$ ./perlcmdline --filename bar --length 2 --addr 127.0.0.1
--addr 192.168.0.1
length is 2
filename is bar
addr: 127.0.0.1
addr: 192.168.0.1</pre>
</div>
<p>Note that the multiple options specified using &#8211;addr worked properly, as well as the other options.</p>
<p><b>Conclusion</b></p>
<p>Now you&#8217;ve got a template for creating command-line utilities that have embedded documentation and handle options with ease.  You should copy it and use it as a starting point whenever you want to write a script or command-line utility.  We&#8217;ve alsoseen the basics of using Getopt::Long, as well as basic POD, and pod2usage.</p>
<p>There are a few more important aspects to writing good command-line utilities that I&#8217;ll save for other posts.</p>
<p><b>References</b></p>
<ol>
<li><a href="http://perldoc.perl.org/Getopt/Long.html">Getopt::Long</a> docs at perldoc.perl.org</li>
<li><a href="http://perldoc.perl.org/pod2usage.html">pod2usage</a> docs at perldoc.perl.org</li>
<li><a href="http://perldoc.perl.org/perlpod.html">POD</a> docs at perldoc.perl.org</li>
</ol>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/rogerbush.wordpress.com/5/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/rogerbush.wordpress.com/5/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/rogerbush.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/rogerbush.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/rogerbush.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/rogerbush.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/rogerbush.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/rogerbush.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/rogerbush.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/rogerbush.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/rogerbush.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/rogerbush.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=rogerbush.wordpress.com&blog=2869039&post=5&subd=rogerbush&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://rogerbush.wordpress.com/2008/02/19/command-line-programs-and-perl-getoptlong/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/e9fee4f625317e26721c36555f2002da?s=96&#38;d=identicon" medium="image">
			<media:title type="html">rogerbush</media:title>
		</media:content>
	</item>
	</channel>
</rss>