<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5098591171030974653</id><updated>2011-07-07T18:44:13.113-04:00</updated><category term='ruby'/><category term='katastrophe'/><category term='metamorphoses'/><category term='urls'/><category term='de usu fructuque lingarum'/><category term='debugging'/><category term='copyrights'/><category term='poesie'/><category term='eval'/><category term='lzma'/><category term='amazingly bad apis'/><category term='irb'/><category term='picasso'/><category term='de origine nominis'/><category term='hash tableaux'/><category term='compression'/><category term='turpentine'/><category term='bargains'/><category term='sussman'/><category term='verité'/><category term='pontification'/><category term='drm'/><category term='amazingly good small pieces loosely joined'/><category term='microsoft'/><category term='fark'/><category term='first-class functions'/><category term='09 something something c0'/><category term='memcache-client'/><category term='monkeypatching'/><category term='nyc'/><category term='bzip'/><category term='dirty hacks'/><category term='faux currying'/><category term='zlib'/><category term='awesomely dirty hacks'/><category term='json'/><category term='tinyurl'/><title type='text'>slightly new.</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>16</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-417688715158821928</id><published>2011-05-30T08:24:00.006-04:00</published><updated>2011-06-02T02:58:50.726-04:00</updated><title type='text'>Vandalism, second chances, and bots</title><content type='html'>Three interesting observations about the English Wikipedia:&lt;div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Almost half of all the text added gets reverted.&lt;/li&gt;&lt;li&gt;Over three quarters of contributions from registered users are from someone who's had a contribution reverted.&lt;/li&gt;&lt;li&gt;Half of the top 50 contributors (by amount contributed) are bots.&lt;/li&gt;&lt;/ol&gt;&lt;/div&gt;&lt;div&gt;(This measures the size of contribution as the number of bits of entropy of a revision when compressed with a model primed with the prior five unreverted revisions of the page.  It's imperfect, especially for large deletions that don't cause a page to get reverted, but it's a first approximation.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://slightlynew.blogspot.com/2011/05/who-writes-wikipedia-information.html"&gt;Source&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-417688715158821928?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/417688715158821928/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=417688715158821928' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/417688715158821928'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/417688715158821928'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2011/05/vandalism-second-chances-and-bots.html' title='Vandalism, second chances, and bots'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-4289461467350203744</id><published>2011-05-26T21:42:00.035-04:00</published><updated>2011-05-28T14:48:57.703-04:00</updated><title type='text'>Who writes Wikipedia? An information-theoretic analysis of anonymity and vandalism in user-generated content</title><content type='html'>Who writes Wikipedia?&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;It's ten and a half years old, and anyone &lt;i&gt;can&lt;/i&gt; contribute, but who does?  Who actually writes it? How much do registered users contribute versus anonymous users?  How does anonymity correlate with the information-theoretic content of their contributions?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And, also, who un-writes Wikipedia?  Who changes a page so unhelpfully that it gets reverted? And how does anonymity factor into that?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are a lot of ways of adding unhelpful text; for this analysis, we'll only look at the most egregious example, where the page has to be reverted, versus incorporating the edit (with hedges or provisos).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Jimmy Wales, co-founder and promoter of Wikipedia, has asserted that it's a small core of devoted contributors that add the majority of new content; specifically, that over 50% of the edits are done by just 524 users, 0.7% of the user base, and 73.4% of the edits are done by 1400 users, just 2% of the user base.  Aaron Swartz did a &lt;a href="http://www.aaronsw.com/weblog/whowriteswikipedia"&gt;study&lt;/a&gt; in 2006 that analyzed how much the current version of each page was based on the new content of each page revision, and found that large numbers of anonymous contributors added much more text than a small registered core; specifically that 8 of the top 10 contributors (for a specific page) are unregistered, and that 6 out of the top 10 have made less than 25 edits to the entire site.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Counting sheer number of edits is an imperfect measure of value added to a text, for obvious reasons.  Textual size of the diff between a page's new content and the current version of the page, is closer, but still imperfect.  Firstly, you'd need to compress the diffs to be accurate in sizing, to take account for pages with long domain-specific vocabulary; ideally, you'd compress literal text differently than text-copying metadata.  Secondly, this penalizes anyone who tightens a loosely-written paragraph into a dense sentence.  For this analysis, I'm just going to look at information-theoretic gain in each revision, and not focus on impact on the currently final version of the page.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When stated in those terms, compressing the literal text and text-copying metadata differently, LZMA comes to mind.  It does Lempel-Zip phrase matching, but has a different statistical model for literals and phrase pointers.  Because this analysis doesn't use the impact on the final page, our entropy model for a revisions will be its compressed size when the compression model has been primed on the last 5 revisions.  (More formally, the entropy for a revision R_n of a page is bytelength(LZMA(R_n-5 .. R_n)) - bytelength(LZMA(R_n-5 .. R_n-1)).)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;(Applying compression to estimate text entropy has been done many times before, often with interesting results.  One fascinating poster I saw at the 2003 Data Compression Conference was about comparability of translations: &lt;a href="http://www.eecs.harvard.edu/~michaelm/ListByYear.html"&gt;Behr, Fossum, Mitzenmacher, and Xiao&lt;/a&gt;, at Harvard, compared different languages' translations of the Bible (one of the most translated books) and several parallel texts from the UN's corpus, finding that even though a language like Chinese was much shorter in bytes than a language like French, they both compressed to similar amounts by PPM (one of the best statistical compressors).)&lt;/i&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;So.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Who writes Wikipedia?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's quite complex, and it would be too simplistic to say that registered folks have contributed the majority of the content that has shaped Wikipedia to become what it is, and anonymous folks mostly vandalize.  Wikipedia has grown up over the past decade, and its current trajectory mirrors several natural processes for life and growth.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Both the &lt;a href="https://github.com/lsb/ugc-contributors"&gt;code&lt;/a&gt; to parse Wikipedia and &lt;a href="http://www.infochimps.com/datasets/entropy-per-revision-of-wikipedia-pages-beginning-with-m"&gt;the resulting database&lt;/a&gt; are (CC) BY-SA.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;1) The code to churn through Wikipedia.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Wikipedia publishes &lt;a href="http://dumps.wikimedia.org/enwiki/20110317/"&gt;data dumps&lt;/a&gt;.  The XML dump with the complete text of every revision is the pages-meta-history, every revision of every page.  It's 40GB of XML compressed, split across 15 files, and 3TB uncompressed.  (The compression used, incidentally, is also LZMA, used by the 7zip format.)  We'll reduce the problem to something manageable, and only consider pages beginning with the letter M.  On &lt;a href="http://aws.amazon.com/ec2/"&gt;EC2&lt;/a&gt;, a 2GHz core-hour is $0.03 and compresses roughly 50k revisions running this code, so if you get an 8-core machine with 7GB of RAM, it'll cost you about twenty dollars for a few days' work.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Each pages-meta-history XML file is composed of {page} &lt;page&gt;&lt;page&gt;elements, each containing their non-chronological {revision}&lt;revision&gt;&lt;revision&gt;s.  Revisions are highly compressible even with a fast gzip, are relatively short, and are a few thousand in number at most per page.  We'll parse the 15 different XML files in parallel and produce work units a line at a time (as JSON objects), splitting into larger work units, and then rely on xargs to use all available cores to consume these work units.  It's slightly underengineered, but it's straightforward and gets the job done in an appropriate amount of time.&lt;/revision&gt;&lt;/revision&gt;&lt;/page&gt;&lt;/page&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;2) Results.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The data is in a sqlite database, so follow along if you like! I'll give a quick overview of the schema, and then we'll dive in.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When running through the full meta-history, we'll only be looking at revision ids and doing LZMA compression, ripping apart the XML with regexps; for the metadata about revisions and pages and users, we'll be using proper XML parsing on the &lt;i&gt;stub&lt;/i&gt;-meta-history.  This leads to the tables raw_revisions, pages, and users coming in from parse-stubs.rb, and the table diffs coming in from mh-diff-consumer.rb. The view of the revisions is just the natural join of all these tables, with a few convenience columns thrown in.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;So let's start with how much content there is in the revisions of Wikipedia, unreverted (good_contrib) and reverted (bad_contrib).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;R1.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;i&gt;select sum(good_contrib), sum(bad_contrib) from revisions;&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;1028184338, 785454552&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;There's a lot of user-generated content, and surprisingly, almost as much vandalism as content, 800MB versus 1000MB. Let's break it down by year and anonymity.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;R2.&lt;/div&gt;&lt;div&gt;&lt;i&gt;select year, is_registered, sum(good_contrib), sum(bad_contrib) from revisions group by year, is_registered;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://2.bp.blogspot.com/-2a3l9rjdgCk/Td-csa3DVgI/AAAAAAAAAAs/DoBFqCWvN9A/s1600/anonymity-reversion-totals.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;/a&gt;&lt;div&gt;&lt;a href="http://2.bp.blogspot.com/-2a3l9rjdgCk/Td-csa3DVgI/AAAAAAAAAAs/DoBFqCWvN9A/s1600/anonymity-reversion-totals.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 164px;" src="http://2.bp.blogspot.com/-2a3l9rjdgCk/Td-csa3DVgI/AAAAAAAAAAs/DoBFqCWvN9A/s400/anonymity-reversion-totals.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611375947665331714" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Look at the ratio of good contributions to bad contributions. At the beginning, it's all groundwork that stays. 2005 is the year at which anonymous contributions start to be more bad than good, and 2007 is the year at which registered contributions spike in badness. Let's get a time series of good-to-bad content ratio for registered users, good-to-bad content ratio for anonymous users, registration-to-anonymous ratio for good contributions, and registration-to-anonymous ratio for bad contributions.&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;R3. &lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://3.bp.blogspot.com/-NlJdX3nmgG8/Td-TkqLjwII/AAAAAAAAAAc/yMkCAsAVgls/s1600/anonymity-reversions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 222px;" src="http://3.bp.blogspot.com/-NlJdX3nmgG8/Td-TkqLjwII/AAAAAAAAAAc/yMkCAsAVgls/s400/anonymity-reversions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611365918734270594" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And here, it is apparent.  Registered users dipped to contributing as much vandalism as content in 2007, and have taken an upswing to over three times as much good content.  Anonymous users dipped to contributing as much vandalism as content in 2005, and through 2010 are contributing roughly twice as much vandalism as content (2011 only goes up to week 11).  The total good content has been coming much more from registered users, outpacing anonymous users' contributions 4 to 1, and keeps growing.  The total bad content has been coming slightly more from anonymous users than registered users.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, how should a collaborative community shield against destructive behavior?  Imagine a community is trying to collect an "interesting events board".  One model is to carefully vet incoming members to ensure they're of the sort to do good work, like a newspaper editorial board.  Another model is to allow completely anonymous text, like Craigslist or even 4chan.  Another model is to allow differing amounts of anonymity, but to foster a sense of community where people can immediately see other people's contributions and give praise or blame based on the contribution, like an urban cafe's pinboard.  Wikipedia, and many social news sites, follow this third way.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's easy to see that Wikipedia has enough anonymous vandalism as is, and anonymizing more would be unhelpful.  But how about the other direction?  What would happen if Wikipedia were registration-and-invite-only?  What would Wikipedia look like with only contributions from pure contributors, who never made an edit that got reverted?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;R4.&lt;/div&gt;&lt;div&gt;&lt;i&gt;create temp table got_reverted (user_id integer primary key);&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;insert into got_reverted select distinct user_id from revisions where bad_contrib &amp;gt; 0;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;select got_reverted.user_id is null as pure, is_registered, sum(good_contrib), sum(bad_contrib) from revisions left join got_reverted using (user_id) group by pure, is_registered;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;pure, is_registered, sum(good_contrib), sum(bad_contrib)&lt;/div&gt;&lt;div&gt;0, 0, 59251203, 356581309&lt;/div&gt;&lt;div&gt;0, 1, 623377449, 428873513&lt;/div&gt;&lt;div&gt;1, 0, 154674459, 0&lt;/div&gt;&lt;div&gt;1, 1, 190881227, 0&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It's optimistic to say Wikipedia would only have a quarter of its current content counting pure users.  But it's certainly interesting to see that anonymous vandals mostly vandalize, whereas registered vandals add more value than not.  But let's break this down by year, to see the ratio of good-to-bad contributions from impure registered users, the ratio of good-to-bad contributions from impure anonymous users, the ratio of impurity-to-purity in good registered contributions, and the ratio of impurity-to-purity in good anonymous contributions.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;R5.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;select year, got_reverted.user_id is null as pure, is_registered, sum(good_contrib), sum(bad_contrib) from revisions left join got_reverted using (user_id) group by year, pure, is_registered;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;a href="http://1.bp.blogspot.com/-w90aL6CCF9Q/Td-V2IjT4dI/AAAAAAAAAAk/461WM3qdpUA/s1600/impurity-anonymity-reversions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;br /&gt;&lt;img src="http://1.bp.blogspot.com/-w90aL6CCF9Q/Td-V2IjT4dI/AAAAAAAAAAk/461WM3qdpUA/s400/impurity-anonymity-reversions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611368417967989202" style="cursor: pointer; width: 400px; height: 218px; " /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Impure registered users have been adding three times as much content as vandalism.  Impure anonymous users have been adding a tenth as much content as vandalism.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Good registered users' contributions are about three times as much from users who have made an unhelpful edit as not.  Good anonymous users' contributions are about a third as much from users who have made an unhelpful edit as not.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is most likely shaping, and shaped by, the Wikipedia policy that if you anonymously make an edit that gets reverted, another unhelpful edit will cause you to be banned.  And similarly, sites that allow anonymous content and are not young and small (like Wikipedia in its first dozen months), like Craigslist and social news sites, need to teach newcomers what is valuable and need to ensure that newcomers listen to their advice.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Three folks at Dartmouth measured the good or bad quality of a contribution in the same way as Aaron Swartz, how much of a revision's diff against the previous version of the page appears in the final version, and found that the highest quality contributions come from highly-contributing registered users and single-time anonymous users ("zealots" and "Good Samaritans", in their terminology).  Their paper was published in 2007, and a preliminary version was published in 2005; the community has aged significantly since then.  Note that 2005 was when anonymous users started vandalizing more than contributing, and 2007 was the year of the minimum content-to-vandalism for registered users.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So let's compare how much good content vs vandalism comes from the population, in total, split by whether they've registered, and then split by the number of edits, whether over 40 or under 4.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;R6.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;i&gt;create temp table user_edit_count as select user_id, count(*) as edit_count from revisions group by user_id;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;select is_registered, edit_count &amp;gt; 10 as high_edit_count, round(sum(good_contrib * 1.0) / sum(bad_contrib),3) as good_ratio from revisions natural join user_edit_count where edit_count not between 4 and 40 group by is_registered, high_edit_count;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;is_registered, high_edit_count, good_ratio&lt;/div&gt;&lt;div&gt;0, 0, 0.583&lt;/div&gt;&lt;div&gt;0, 1, 0.801&lt;/div&gt;&lt;div&gt;1, 0, 1.017&lt;/div&gt;&lt;div&gt;1, 1, 2.134&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;The more you edit Wikipedia, the more your edits are valuable to Wikipedia.  There might indeed be Good Samaritans, but their efforts seem to be outweighed by fly-by vandals.  Let's break this apart by year, to see if the numbers might have changed since 5 years ago.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;R7.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://4.bp.blogspot.com/-m9H4moAUUIc/Td-cs5YrMJI/AAAAAAAAAA8/hZcI6ANWMck/s1600/edit-amount-anonymity-reversion-amounts.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 239px;" src="http://4.bp.blogspot.com/-m9H4moAUUIc/Td-cs5YrMJI/AAAAAAAAAA8/hZcI6ANWMck/s400/edit-amount-anonymity-reversion-amounts.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611375955859419282" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There it is, the green line of anonymous Good Samaritans, plunging below the 1-good-to-1-bad line in 2006.  I'd guess that the study was done on 2005 data, for which that held true; presumably some of the anonymous Good Samaritans created accounts, and lots of the vandalism swamped their good intentions over the second half of the last decade.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What size pages do anonymous and registered users contribute to, both reverted and not?  How much do each add per revision, reverted and not?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;R8.&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;i&gt;create temp table average_page_sizes as select year, avg(page_size) as average_page_size from revisions group by year;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;create temp table average_diff_sizes as select year, avg(diff_size) as average_diff_size from revisions group by year;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;i&gt;select revisions.year, is_registered, is_reverted, round(avg(page_size)) as average_page_size, round(avg(page_size) / average_page_size,3) as comparative_page_size, round(avg(diff_size)) as average_diff_size, round(avg(diff_size) / average_diff_size,3) as comparative_page_size from revisions natural join average_page_sizes natural join average_diff_sizes group by revisions.year, is_registered, is_reverted;&lt;/i&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, first we'll look at the yearly average diff sizes, for registered and anonymous users, and for good and bad contrib, and then we'll compare those segmented diff sizes against the year's average diff size.  Next, the same for page sizes that registered/anonymous users gave good and bad contributions to.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://3.bp.blogspot.com/-kvTeepK6pRQ/Td-jWVPM51I/AAAAAAAAABE/NOqSgs3KpzE/s1600/average-diff-size-anonymity-reversions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 224px;" src="http://3.bp.blogspot.com/-kvTeepK6pRQ/Td-jWVPM51I/AAAAAAAAABE/NOqSgs3KpzE/s400/average-diff-size-anonymity-reversions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611383264780281682" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://3.bp.blogspot.com/-m4W9XyocBII/Td-jWVXY7NI/AAAAAAAAABM/Q0UeOVGrvSM/s1600/average-diff-size-ratio-anonymity-reversions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 227px;" src="http://3.bp.blogspot.com/-m4W9XyocBII/Td-jWVXY7NI/AAAAAAAAABM/Q0UeOVGrvSM/s400/average-diff-size-ratio-anonymity-reversions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611383264814623954" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And pages:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://1.bp.blogspot.com/-yc8Ny3dQ_DQ/Td-jWjP5CRI/AAAAAAAAABU/z7SqpwaZVaI/s1600/average-page-size-anonymity-reversions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 190px;" src="http://1.bp.blogspot.com/-yc8Ny3dQ_DQ/Td-jWjP5CRI/AAAAAAAAABU/z7SqpwaZVaI/s400/average-page-size-anonymity-reversions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611383268541270290" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://4.bp.blogspot.com/-fkP6jqHGXRE/Td-jW1ZbYXI/AAAAAAAAABc/NPX5hqXeuiE/s1600/average-page-size-ratio-anonymity-reversions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 242px;" src="http://4.bp.blogspot.com/-fkP6jqHGXRE/Td-jW1ZbYXI/AAAAAAAAABc/NPX5hqXeuiE/s400/average-page-size-ratio-anonymity-reversions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611383273413108082" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Registered contributors' contributions get reverted more on larger pages, whereas anonymous contributors' contributions get reverted more on smaller pages.  While the average page size is going up, the average size where one can make a productive contribution is rising more slowly.  Anonymous users who make small contributions fare better on larger pages than registered users.  Interestingly, registered users' bad contributions are significantly greater, information-theoretically, than registered users' good contributions, and anonymous users' good contributions are slightly greater than anonymous users' bad contributions.  These numbers have stayed relatively constant over time, suggesting &lt;i&gt;it might be possible to classify revisions as vandalism partly by the information-theoretic content&lt;/i&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;From R2, we've seen that registered users add more, in total, than anonymous users.  What's the dropoff?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;R9.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;create temp table user_edit_amounts as select user_id, sum(good_contrib) as goods, sum(bad_contrib) as bads, is_registered from revisions group by user_id;&lt;/div&gt;&lt;div&gt;select goods from user_edit_amounts where is_registered order by goods desc limit 1000;&lt;/div&gt;&lt;div&gt;select goods from user_edit_amounts where not is_registered order by goods desc limit 1000;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://1.bp.blogspot.com/-Rx8AajsKWN4/Td-vDWmqtWI/AAAAAAAAABk/ZlWbu-TMuLs/s1600/top-25-contributors-contributions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 202px;" src="http://1.bp.blogspot.com/-Rx8AajsKWN4/Td-vDWmqtWI/AAAAAAAAABk/ZlWbu-TMuLs/s400/top-25-contributors-contributions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611396132869158242" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://1.bp.blogspot.com/-RZ2iw5v6brY/Td-vDXGe9qI/AAAAAAAAABs/5QQ6N2dPXV8/s1600/top-1000-contributors-contributions.gif" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 400px; height: 207px;" src="http://1.bp.blogspot.com/-RZ2iw5v6brY/Td-vDXGe9qI/AAAAAAAAABs/5QQ6N2dPXV8/s400/top-1000-contributors-contributions.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5611396133002606242" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;The top registered editors give much more good content than the top anonymous editors.  Who are these fantastically productive registered users?  Those people have contributed enormous amounts, they must work like machines!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;R10.&lt;/div&gt;&lt;div&gt;&lt;i&gt;select name, goods from user_edit_amounts natural join users order by goods desc limit 50;&lt;/i&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;SmackBot, 9227561 *&lt;/div&gt;&lt;div&gt;RjwilmsiBot, 6828002 *&lt;/div&gt;&lt;div&gt;Dr. Blofeld, 3936971&lt;/div&gt;&lt;div&gt;Ram-Man, 3050752&lt;/div&gt;&lt;div&gt;Kotbot, 2885893 *&lt;/div&gt;&lt;div&gt;Leszek Jańczuk, 2496383&lt;/div&gt;&lt;div&gt;Yobot, 2269891 *&lt;/div&gt;&lt;div&gt;Rich Farmbrough, 2152852&lt;/div&gt;&lt;div&gt;AlbertHerring, 2011080&lt;/div&gt;&lt;div&gt;Ser Amantio di Nicolao, 2011080&lt;/div&gt;&lt;div&gt;Polbot, 1727872 *&lt;/div&gt;&lt;div&gt;Cydebot, 1689522 *&lt;/div&gt;&lt;div&gt;CapitalBot, 1643092 *&lt;/div&gt;&lt;div&gt;The Anomebot2, 1607194 *&lt;/div&gt;&lt;div&gt;Lightbot, 1574000 *&lt;/div&gt;&lt;div&gt;TUF-KAT, 1391214&lt;/div&gt;&lt;div&gt;Alansohn, 1266745&lt;/div&gt;&lt;div&gt;Starzynka, 1256326&lt;/div&gt;&lt;div&gt;Bearcat, 1194658 *&lt;/div&gt;&lt;div&gt;Rjwilmsi, 1186757&lt;/div&gt;&lt;div&gt;Rambot, 1133351 *&lt;/div&gt;&lt;div&gt;RussBot, 1124530 *&lt;/div&gt;&lt;div&gt;Geo Swan, 1106468&lt;/div&gt;&lt;div&gt;DumZiBoT, 981678 *&lt;/div&gt;&lt;div&gt;ProteinBoxBot, 960781 *&lt;/div&gt;&lt;div&gt;Ganeshbot, 960205 *&lt;/div&gt;&lt;div&gt;WhisperToMe, 939078&lt;/div&gt;&lt;div&gt;Darius Dhlomo, 920829&lt;/div&gt;&lt;div&gt;Eubot, 920828 *&lt;/div&gt;&lt;div&gt;Viridiscalculus, 911794&lt;/div&gt;&lt;div&gt;D6, 861183 *&lt;/div&gt;&lt;div&gt;Full-date unlinking bot, 851573 *&lt;/div&gt;&lt;div&gt;Arcadian, 829771&lt;/div&gt;&lt;div&gt;BOTijo, 827633 *&lt;/div&gt;&lt;div&gt;Orderinchaos, 823965&lt;/div&gt;&lt;div&gt;Lugnuts, 818752&lt;/div&gt;&lt;div&gt;Plasticspork, 809442 *&lt;/div&gt;&lt;div&gt;Plastikspork, 779035&lt;/div&gt;&lt;div&gt;Carlossuarez46, 773060&lt;/div&gt;&lt;div&gt;Geschichte, 752836&lt;/div&gt;&lt;div&gt;Imzadi1979, 735127&lt;/div&gt;&lt;div&gt;Wetman, 726056&lt;/div&gt;&lt;div&gt;Luckas-bot, 722364 *&lt;/div&gt;&lt;div&gt;PageantUpdater, 713171 *&lt;/div&gt;&lt;div&gt;Attilios, 700275&lt;/div&gt;&lt;div&gt;Charles Matthews, 664671&lt;/div&gt;&lt;div&gt;Droll, 645061&lt;/div&gt;&lt;div&gt;Thijs!bot, 636961 *&lt;/div&gt;&lt;div&gt;CJCurrie, 634663&lt;/div&gt;&lt;div&gt;Colonies Chris, 608999&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Indeed they are machines.  About half of the top 50 contributors are bots.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Two interesting observations:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1.  &lt;a href="http://www.treehugger.com/files/2010/12/ten-percent-human-90-bacteria.php"&gt;Our DNA only controls about 10% to 1% of who we are day-to-day.&lt;/a&gt;  There's a large number of bacteria that've co-evolved with us to help us in our daily toil.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2.  Scalable automation is quite powerful. Quoth Paul Graham: &lt;a href="http://www.paulgraham.com/vcsqueeze.html"&gt;&lt;i&gt;"At most startups ten years ago, software development meant ten programmers writing code in C++. Now the same work might be done by one or two using Python or Ruby.  During the Bubble, a lot of people predicted that startups would outsource their development to India. I think a better model for the future is David Heinemeier Hansson, who outsourced his development to a more powerful language instead."&lt;/i&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In ten short years, looking at the information-theoretic contributions of users, it's possible to see Wikipedia grow up and become an enormous ecosystem of a wide variety of contributors.  But this analysis is just scratching the surface: take the code on Github, the lightweight dataset on Infochimps, and let me know what you come up with!&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-4289461467350203744?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/4289461467350203744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=4289461467350203744' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/4289461467350203744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/4289461467350203744'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2011/05/who-writes-wikipedia-information.html' title='Who writes Wikipedia? An information-theoretic analysis of anonymity and vandalism in user-generated content'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-2a3l9rjdgCk/Td-csa3DVgI/AAAAAAAAAAs/DoBFqCWvN9A/s72-c/anonymity-reversion-totals.gif' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-842690722689212164</id><published>2010-03-21T12:57:00.000-04:00</published><updated>2010-03-21T13:36:52.993-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='dirty hacks'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><category scheme='http://www.blogger.com/atom/ns#' term='first-class functions'/><category scheme='http://www.blogger.com/atom/ns#' term='awesomely dirty hacks'/><title type='text'>First-class functions make printf-debugging obsolete</title><content type='html'>&lt;blockquote&gt;&lt;/blockquote&gt;&lt;b&gt;&lt;i&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;tl;dr: Instead of looking through lines of text in an error log, store the variable bindings of your error cases and debug in a REPL in the context of your error.&lt;/span&gt;&lt;/i&gt;&lt;/b&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;One of the most productive features of a language is a read-eval-print loop: type in an expression, and see its value.  A standard RDBMS has one for SQL, and a standard browser has one for Javascript.  Good languages like Ruby, Python, Haskell and Lisp each have one.  They're incredibly useful for exploratory programming.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;If you're playing with library functions, or doing a join on a new set of tables, or trying to do complex Ajax calls, you'll have an easier time if you see the results of your code immediately.  The faster the results come back, the most you keep your train of thought, and the easier it is to debug your code, and write more code based on it until you've solved your problem.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;In an error case, you'd ideally like to be in that read-eval-print loop, poking around to see where your assumptions didn't hold true, but often times there's only a few print statements logging a few values.  This is one way of getting back into that read-eval-print loop in Ruby.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;As a motivating example, let's say you're trying to get the average number of letters per vowel in English words.  Assuming that &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;/usr/share/dict/words&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; is sufficient, we can loop over every word, count the characters and count the vowels, divide, and then do the average.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;As a first guess, let's try&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words = File.readlines("/usr/share/dict/words")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words.map {|word| word.length / word.count("aeiouy") }.average&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and bam, a divide-by-zero error.  Ruby doesn't tell us the local variable bindings, what "word" was set to when it broke.  It'd be great if, instead of doing&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words.map {|word| &lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;print word;&lt;/span&gt;&lt;/b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; word.length / word.count("aeiouy") }.average&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span"  style="font-size:medium;"&gt;&lt;span class="Apple-style-span"   style="  ;font-family:Georgia, serif;font-size:16px;"&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;we could make a read-eval-print loop that operated in the context of the function.  We already have the print part, and for the read part we can pass in a string, so all we need to do is evaluate a string in the context of the function, and store that closure over the function context outside its scope.  The function &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;eval&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; evaluates a string, we can make a function via &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;lambda&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;, and we can store our function into a global variable, as follows:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words.map {|word| &lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$repl = lambda {|string| eval string };&lt;/span&gt;&lt;/b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; word.length / word.count("aeiouy") }.average&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and now, when the divide-by-zero error happens, &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$repl&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; is a function that evaluates its argument in the loop context of the failing words, with all of its local variables set.  In my case,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt;$repl.call("word")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; "A\n"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Voila!  We didn't account for uppercase letters, and there's also some trailing whitespace coming through.  Map each word to its lowercase, whitespace-stripped version, and try again:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words=File.readlines("/usr/share/dict/words").map {|word| word.strip.downcase }&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words.map {|word| $repl = lambda {|string| eval string}; word.length / word.count("aeiouy") }.average&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and this time&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt; $repl.call("word")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; "b"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;we see another case to watch out for.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;But why should we junk up our mapping function?  Is there some way to package up variable bindings in a library?  One of Ruby's core classes, actually, is Binding.  An instance of Binding is basically the lookup table that the interpreter uses to look up the value of a variable, which you can then pass to &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;eval&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;.  You can get the current binding at any point by calling&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Kernel.binding&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and you can call .binding on a block or a lambda, to get their bindings.  Assuming the file "save_bindings.rb" had the following code:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;$bindings = {}&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;def save_binding(key, &amp;amp;block)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  $bindings[key] = block.binding&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  block.call&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;def use_binding_to_eval(key, string)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  b = $bindings[key]&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  b ? eval(string,b) : "invalid key"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;end&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;then we could do something refreshing like&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;require "save_bindings"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;words.map {|word| &lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;save_binding("counting") {&lt;/span&gt;&lt;/b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; word.length / word.count("aeiouy") &lt;/span&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/b&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; }.average&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;so instead of doing a &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;printf&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;, we call &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;save_binding&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;, and then pass in a block that contains the code we want to run.  So &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;save_binding&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; stores the binding in a hash table based on a key (the same key every time), and runs the code, and then, when our divide-by-zero gets thrown, we can see what the problem is by calling&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt; use_binding_to_eval("counting", "word")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; "b"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;or, equally easily, anything more complex:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt; use_binding_to_eval("counting", "word.length")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; 1&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt; use_binding_to_eval("counting", "word == 'b'")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; true&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt; use_binding_to_eval("counting", "word.upcase!")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; "B"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&gt;&gt; use_binding_to_eval("counting", "word")&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;=&gt; "B"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and the environment variables all point to live values, as we can see with the destructively-updating &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;String#upcase!&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Imagine if this were in a webapp.  Take the following Sinatra code, in "letters-per-vowel.rb":&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;require "rubygems"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;require "sinatra"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;get("/letters-per-vowel") {&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  word = params["word"]&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  (word.length / word.count("aeiouy")).to_s&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and run it in production mode to avoid Sinatra's helpful error messages:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;ruby letters-per-vowel.rb -e production -p 4567&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;and go to &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;localhost:4567/letters-per-vowel?word=aeiou&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; or whatever vowelful word you want, and you get back what you expect, the string value of the numerical value of the division of the length of the word divided by the number of vowels.  But what to do when you hit &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;localhost:4567/letters-per-vowel?word=grrr&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; or something like that?  Use "save_bindings.rb" from earlier and make yourself a browser-based read-eval-print loop:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;require "rubygems"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;require "sinatra"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;require "save_bindings"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;SECRET = "password1"&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;get("/letters-per-vowel") {&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;  word = params["word"]&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;b&gt;  save_binding("LPV") {&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;    (word.length / word.count("aeiouy")).to_s&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;b&gt;  }&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;b&gt;get("/debug") {&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;b&gt;  use_binding_to_eval(params["key"], params["string"]).to_s if params["secret"] == SECRET&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;b&gt;}&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;We need a secret, to avoid people evaluating arbitrary code on the server.  And now, if you hit &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;localhost:4567/letters-per-vowel?word=grrr&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; you'll still get that error message, but then go to &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;localhost:4567/debug?secret=password1&amp;amp;key=LPV&amp;amp;string=request.env['HTTP_USER_AGENT']&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; and you can see the user agent from the environment of the request that caused the error, or go to &lt;/span&gt;&lt;span class="Apple-style-span"  style="font-family:'courier new';"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;localhost:4567/debug?secret=password1&amp;amp;key=LPV&amp;amp;string=word.size&lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; to see how long the word was.  It's a bit of a security hole, but you could probably get even more exciting results if you put it behind the slick front-end of &lt;/span&gt;&lt;a href="http://tryruby.org/"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;http://tryruby.org&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Check it out at &lt;/span&gt;&lt;a href="http://github.com/lsb/save-bindings"&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;http://github.com/lsb/save-bindings&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt; and let me know what you think!&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-size: small;"&gt;Lee&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-842690722689212164?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/842690722689212164/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=842690722689212164' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/842690722689212164'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/842690722689212164'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2010/03/first-class-functions-make-printf.html' title='First-class functions make printf-debugging obsolete'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-2679639276094400921</id><published>2009-08-31T12:05:00.000-04:00</published><updated>2010-02-16T03:48:28.786-05:00</updated><title type='text'>DB indices</title><content type='html'>I think it'd be pretty nifty to get a daily summary of which indices lead to the fastest replay of yesterday's logs, based on IO and RAM requirements.  Anyone know of any tools for doing this?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-2679639276094400921?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/2679639276094400921/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=2679639276094400921' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/2679639276094400921'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/2679639276094400921'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2009/08/db-indices.html' title='DB indices'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-7198612634669688799</id><published>2009-05-27T10:43:00.000-04:00</published><updated>2009-05-27T21:11:17.076-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='zlib'/><category scheme='http://www.blogger.com/atom/ns#' term='ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='bzip'/><category scheme='http://www.blogger.com/atom/ns#' term='memcache-client'/><category scheme='http://www.blogger.com/atom/ns#' term='monkeypatching'/><category scheme='http://www.blogger.com/atom/ns#' term='dirty hacks'/><category scheme='http://www.blogger.com/atom/ns#' term='lzma'/><category scheme='http://www.blogger.com/atom/ns#' term='awesomely dirty hacks'/><category scheme='http://www.blogger.com/atom/ns#' term='compression'/><title type='text'>Ruby's memcache-client 1.7.2 does NOT support compression; patch it in with 7 lines of code</title><content type='html'>The Ruby memcache-client 1.7.2 code seems like the most popular memcached client.  Alas, &lt;span class="Apple-style-span" style=""&gt;&lt;span class="Apple-style-span" style="font-style: italic; "&gt;&lt;a href="http://github.com/mperham/memcache-client/blob/1ab648c4444d92a4d2c8882004622a2ad08687cc/lib/memcache.rb#L103"&gt;memcache-client 1.7.2 does not recognize :compress =&gt; true&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;, &lt;a href="http://www.google.com/search?q=ruby+memcache-client+%22%3Acompression+%3D%3E+true%22"&gt;all the demos out there notwithstanding&lt;/a&gt;.&lt;div&gt;Let's fix this by monkeypatching Zlib compression into Marshal.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;br /&gt;require 'zlib'&lt;br /&gt;&lt;br /&gt;module Marshal&lt;br /&gt;  @@load_uc = method :load&lt;br /&gt;  @@dump_uc = method :dump&lt;br /&gt;  def self.load(v) @@load_uc[Zlib::Inflate.inflate(v)] end&lt;br /&gt;  def self.dump(v) Zlib::Deflate.deflate(@@dump_uc[v]) end&lt;br /&gt;end&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And there we go!  Four lines of patching Marshal.load for a better memory footprint.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To dissect each phrase of that sentence:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;"Four lines": I wanted just to try how well zlib would work on my Marshalled ActiveRecord objects and html fragments, and it did so handily, almost 3:1.  Indeed, the only reason I poked around at the source code is because one of my largest but still highly-compressible HTML fragments was 1.2MB, over the size limit.  I've since gone back to storing large HTML fragments on disk (uncompressed), having found many more values to store in Memcached.&lt;/li&gt;&lt;li&gt;"patching Marshal.load": monkeypatching Marshal is not as bad as String.  Chances are, you use the Marshal format as a blob, and you keep your Marshal files to yourself (and leave external serialization to friendlier fare like JSON).  So, all in all, it's much easier to change the Marshal format than mucking through the memcache-client code.&lt;/li&gt;&lt;li&gt;"better memory footprint": instead of Zlib, try LZMA, with slightly smaller compressed sizes than BZIP and faster decompression times, good properties for cache compression.  But Zlib is already in the standard library, so it's a good first approximation.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;The ersatz alias_method_chaining feels kludgy, as does Ruby's distinction between methods and lambdae.  Ah well.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thoughts?&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-7198612634669688799?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/7198612634669688799/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=7198612634669688799' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/7198612634669688799'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/7198612634669688799'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2009/05/rubys-memcache-client-172-does-not.html' title='Ruby&apos;s memcache-client 1.7.2 does NOT support compression; patch it in with 7 lines of code'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-7892634361222614135</id><published>2009-03-29T17:21:00.000-04:00</published><updated>2009-03-29T17:23:05.333-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='verité'/><category scheme='http://www.blogger.com/atom/ns#' term='picasso'/><category scheme='http://www.blogger.com/atom/ns#' term='bargains'/><category scheme='http://www.blogger.com/atom/ns#' term='turpentine'/><title type='text'>Reality</title><content type='html'>&lt;blockquote&gt;When art critics get together they talk about Form and Structure and Meaning. When artists get together they talk about where you can buy cheap turpentine. -- Picasso&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-7892634361222614135?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/7892634361222614135/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=7892634361222614135' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/7892634361222614135'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/7892634361222614135'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2009/03/reality.html' title='Reality'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-5662696445329819641</id><published>2009-02-24T12:38:00.000-05:00</published><updated>2009-02-24T17:51:26.273-05:00</updated><title type='text'>A Full Web Service with HTTP caching in 7 lines</title><content type='html'>Two lovely gems, sinatra and rack-cache.  Sinatra is pretty easy web-service-creation, and Rack::Cache is pretty easy http caching.  Together?  Jubilation.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;require 'rubygems'&lt;br /&gt;require 'sinatra'&lt;br /&gt;require 'rack/cache'&lt;br /&gt;&lt;br /&gt;use Rack::Cache&lt;br /&gt;&lt;br /&gt;get('/quadruple/:n') {&lt;br /&gt;  sleep 1&lt;br /&gt;  response.headers['Cache-Control'] = 'max-age=1000000'&lt;br /&gt;  (params[:n].to_i * 4).to_s&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and then &lt;pre&gt;ruby sinatra-add.rb -e production&lt;/pre&gt; and you're done.&lt;br /&gt;&lt;br /&gt;There are, of course, many other fiddly bits to configure Rack::Cache with, like &lt;code&gt;use Rack::Cache, :entitystore =&gt; 'file:/tmp/'&lt;/code&gt; if you don't want to keep it all in a hash in memory, and &lt;code&gt;:verbose =&gt; false&lt;/code&gt; if you don't want that in your logs, but that's basically it.&lt;br /&gt;&lt;br /&gt;It's pretty amazing: that's a real live web service, with HTTP caching, in 7 gentle lines.  (I don't count the closing brace, nor the sleep 1, which is purely for effect.)&lt;br /&gt;&lt;br /&gt;Anyone know an easier way in any other language?  Either in number of lines, or in directness of code?&lt;br /&gt;&lt;br /&gt;Also, how bad would it be to authbind that to port 80, and just let it go open to the world?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Update: It's interesting to me for the same reason PHP is interesting, both as a social commentary on the bits of plumbing we've agreed upon as useful, and as an aid to wrapping an HTTP interface around some Ada code with its own homegrown database.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-5662696445329819641?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/5662696445329819641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=5662696445329819641' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5662696445329819641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5662696445329819641'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2009/02/full-web-service-with-http-caching-in-7.html' title='A Full Web Service with HTTP caching in 7 lines'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-890170100843611887</id><published>2007-12-01T18:18:00.000-05:00</published><updated>2007-12-02T14:49:49.187-05:00</updated><title type='text'>Top five contradictory positions of Ron Paul Graham.</title><content type='html'>1.  &lt;a href="http://en.wikipedia.org/wiki/Political_positions_of_Ron_Paul#Secure_borders_and_legal_immigration"&gt;Supports secure borders&lt;/a&gt;, but &lt;a href="http://www.bookshelf.jp/texi/onlisp/onlisp_15.html"&gt;writes macros for variable capture&lt;/a&gt;.&lt;br /&gt;2.  &lt;a href="http://en.wikipedia.org/wiki/Political_positions_of_Ron_Paul#Rejection_of_conspiracy_theory"&gt;Rejects conspiracy theories of 9/11&lt;/a&gt;, but funded &lt;a href="http://reddit.com/"&gt;a social news website&lt;/a&gt; &lt;a href="http://google.com/search?q=9%2F11+conspiracy+theory+site%3Areddit.com"&gt;full of 9/11 conspiracy theories&lt;/a&gt;.&lt;br /&gt;3. &lt;a href="http://ronpaulblimp.com/"&gt;Will launch an ad blimp in two weeks&lt;/a&gt;, but &lt;a href="http://paulgraham.com/submarine.html"&gt;remains cynical about submarine-style PR campaigns&lt;/a&gt;.&lt;br /&gt;4.  &lt;a href="http://en.wikipedia.org/wiki/Ron_Paul#Internet_popularity"&gt;Says he doesn't organize his supporters on the internet&lt;/a&gt;, but &lt;a href="http://google.com/search?q=ask+pg+site%3Anews.ycombinator.com"&gt;provides a Q&amp;amp;A internet forum&lt;/a&gt;.&lt;br /&gt;5.  &lt;a href="http://en.wikipedia.org/wiki/Political_positions_of_Ron_Paul#Opposition_to_inflation_and_the_Federal_Reserve"&gt;Opposes inflation&lt;/a&gt;, but &lt;a href="http://www.paulgraham.com/accgen.html"&gt;supports accumulator generators&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Leave more in the comments...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-890170100843611887?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/890170100843611887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=890170100843611887' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/890170100843611887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/890170100843611887'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/12/top-five-contradictory-positions-of-ron.html' title='Top five contradictory positions of Ron Paul Graham.'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-5769308924652937529</id><published>2007-11-24T19:23:00.000-05:00</published><updated>2007-11-24T20:41:43.115-05:00</updated><title type='text'>Anti-Features?, or, Never attribute to malice what can be explained by math.</title><content type='html'>Apparently, if you buy a cheap camera, and you can't save an uncompressed picture, that's treacherous computing, that's an &lt;a href="http://www.fsf.org/blogs/community/antifeatures"&gt;anti-feature&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It's not.  The camera won't stop working if you correctly patch in a different chip set.&lt;br /&gt;&lt;br /&gt;And assuming the worst, assuming that the chips are exactly the same in an expensive camera, it still doesn't matter.  The native resolution is still small on the cheap camera--that's mostly what makes it cheap--so you'll be better off buying the high-resolution expensive camera.&lt;br /&gt;&lt;br /&gt;Of course, if you really want a camera that you can program, you might like &lt;a href="http://gizmodo.com/gadgets/hack-attack/xo-laptop-hacked-to-remotely-run-roomba-round-rooms-322343.php"&gt;an XO that drives a Roomba&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-5769308924652937529?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/5769308924652937529/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=5769308924652937529' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5769308924652937529'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5769308924652937529'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/11/anti-features-or-never-attribute-to.html' title='Anti-Features?, or, Never attribute to malice what can be explained by math.'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-86675774648474401</id><published>2007-06-06T22:42:00.000-04:00</published><updated>2007-06-07T14:36:42.371-04:00</updated><title type='text'>cheap illustrations.</title><content type='html'>I'm a cheap nerd, so I printed myself an enormous wall decoration last August 2006:  it's Pi, in a textual fractal.  (&lt;a href="http://poetaexmachina.net/pi2.gif"&gt;This&lt;/a&gt; is it to two levels, and the one in my room is to five.)  3'x5', $10 from Kinko's.  Nice conversation piece.&lt;br /&gt;&lt;br /&gt;I've been &lt;a href="http://poetaexmachina.net/books"&gt;printing&lt;/a&gt; more things recently, I've been thinking about combining text and images, and an &lt;a href="http://bestlatin.net/"&gt;author&lt;/a&gt; of &lt;a href="http://www.lulu.com/content/370912"&gt;two&lt;/a&gt; &lt;a href="http://www.lulu.com/content/431684"&gt;books&lt;/a&gt; on classical themes bemoaned the expensiveness of art in an email today.  (Illustrating an book on Aesop's Fables, specifically.)&lt;br /&gt;&lt;br /&gt;Most art schools have &lt;a href="http://intranet.risd.edu/departments/workstudy/search/default.asp"&gt;job boards&lt;/a&gt;; the artists can be of varying qualities, but sketches in pen are quick to do, and hard to do badly.  (Lots of approximate lines.)  Twenty seconds a sketch, which is a not-too-fast speed, scan one as you're doing the next one, and you've got a hundred in an hour.&lt;br /&gt;&lt;br /&gt;Has anyone used Flickr for (cc)by-sa images?  I wonder if that's cheaper.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-86675774648474401?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/86675774648474401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=86675774648474401' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/86675774648474401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/86675774648474401'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/06/cheap-illustrations.html' title='cheap illustrations.'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-1578356783587769465</id><published>2007-05-21T09:46:00.000-04:00</published><updated>2007-05-21T10:07:40.703-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='amazingly bad apis'/><category scheme='http://www.blogger.com/atom/ns#' term='pontification'/><category scheme='http://www.blogger.com/atom/ns#' term='amazingly good small pieces loosely joined'/><category scheme='http://www.blogger.com/atom/ns#' term='metamorphoses'/><title type='text'>Amazingly bad APIs?</title><content type='html'>There are some &lt;a href="http://paulbuchheit.blogspot.com/2007/05/amazingly-bad-apis.html"&gt;amazingly bad APIs in Java&lt;/a&gt;, so Paul Buchheit says.  The best API is system(), or backticks.&lt;br /&gt;&lt;br /&gt;The end result is basically an ImageMagick conversion (kudos to a &lt;a href="http://linear1.org/gm/archives/00000162.php"&gt;useful incantation&lt;/a&gt; for the less than sign):&lt;br /&gt;&lt;br /&gt;mogrify -geometry "220x133&gt;" -antialias -quality 90 /tmp/c.jpg /tmp/c-thumb.jpg&lt;br /&gt;&lt;br /&gt;You wouldn't write an encyclopedia in limericks (though &lt;a href="http://en.wikipedia.org/wiki/Metamorphoses"&gt;it's been done before&lt;/a&gt;), so there's no need to use a general purpose programming language to manipulate images.  Small Pieces Loosely Joined is pretty popular for Unix, CGI/REST, Erlang, Messenger RNA, etc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-1578356783587769465?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/1578356783587769465/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=1578356783587769465' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/1578356783587769465'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/1578356783587769465'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/05/amazingly-bad-apis.html' title='Amazingly bad APIs?'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-5789871590573052195</id><published>2007-05-02T20:31:00.000-04:00</published><updated>2007-05-02T21:03:48.125-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='drm'/><category scheme='http://www.blogger.com/atom/ns#' term='09 something something c0'/><category scheme='http://www.blogger.com/atom/ns#' term='microsoft'/><title type='text'>microsoft knows drm.</title><content type='html'>&lt;span style="font-family: lucida grande;"&gt;I've been hearing a lot about this string of hexadecimal numbers.  It starts with 09, ends with c0, and I think it has f9 and 88 somewhere inside.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: lucida grande;"&gt;Let's ask Microsoft search what it is!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: lucida grande;"&gt;require('open-uri') &amp;&amp;amp; puts(open('http://search.msn.com/results.aspx?q=09+f9+88+c0') {|f| f.read}.downcase.gsub(/&lt;[^&gt;]+&gt;/,'').tr('^0-9a-f','').scan(/09.+?c0/).inject(Hash.new(0)) {|h,nu| h[nu]+=1;h}.sort_by {|str,freq| freq}.last.first&lt;/span&gt;)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: lucida grande;"&gt;Whew!&lt;br /&gt;&lt;br /&gt;For non-Rubyists:&lt;/span&gt;&lt;span style="font-family: lucida grande;"&gt; require we can open urls like files, and put this string:  open the msn search page for all pages that have 09, f9, 88, and c0, read it in one gulp into lowercase, regexp out all html tags, translate out any character that's not a hex digit, scan for all substrings that start with 09 and end with c0, make a histogram* of the array, sort by most popular, and take the most popular string.&lt;br /&gt;(* Inject a hash table through the array of scanned substrings; the strings are the keys, the frequencies are the values; add one to the value every time you see any string, starting at zero.)&lt;br /&gt;&lt;br /&gt;Remember folks, this is Microsoft's suggested answer to the &lt;a href="http://weblog.raganwald.com/2007/05/128-bit-programming-challenge.html"&gt;128-bit programming challenge&lt;/a&gt;&lt;/span&gt;&lt;a href="http://weblog.raganwald.com/2007/05/128-bit-programming-challenge.html"&gt;&lt;span style="font-family: lucida grande;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: lucida grande;"&gt; posed earlier today, so like love, and Cambridge weather, it's just temporary.&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-5789871590573052195?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/5789871590573052195/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=5789871590573052195' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5789871590573052195'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5789871590573052195'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/05/microsoft-knows-drm.html' title='microsoft knows drm.'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-2807520186048915414</id><published>2007-04-26T11:42:00.000-04:00</published><updated>2007-04-26T11:47:30.500-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='urls'/><category scheme='http://www.blogger.com/atom/ns#' term='fark'/><category scheme='http://www.blogger.com/atom/ns#' term='tinyurl'/><category scheme='http://www.blogger.com/atom/ns#' term='copyrights'/><title type='text'>fark is grabbing copyrights?</title><content type='html'>&lt;span style="font-family: lucida grande;"&gt;Some would think so.  But if you read really closely, you're not submitting anything meaningful, you're submitting a web link.  And if they want the copyright of that web link, then, ok, TinyURL it.  They'll have the copyright to a TinyURL uid.  And the copyright to any images on the page stay where they were.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: lucida grande;"&gt;Pretty simple.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-2807520186048915414?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/2807520186048915414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=2807520186048915414' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/2807520186048915414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/2807520186048915414'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/04/fark-is-grabbing-copyrights.html' title='fark is grabbing copyrights?'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-1320965996795289445</id><published>2007-04-08T19:28:00.000-04:00</published><updated>2007-04-08T19:54:46.164-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='faux currying'/><category scheme='http://www.blogger.com/atom/ns#' term='hash tableaux'/><category scheme='http://www.blogger.com/atom/ns#' term='katastrophe'/><category scheme='http://www.blogger.com/atom/ns#' term='irb'/><title type='text'>kata 6.</title><content type='html'>prag dave's &lt;a href="http://codekata.pragprog.com/2007/01/kata_six_anagra.html"&gt;anagrams&lt;/a&gt; resonated with me, because i'm working on hashing text down.&lt;br /&gt;&lt;br /&gt;so follow along in irb, if you have /usr/share/dict/words:&lt;br /&gt;&lt;br /&gt;class Symbol&lt;br /&gt;  def to_proc(*args) lambda {|*a| a.first.send self, *(args+a[1..-1])} end&lt;br /&gt;  alias [] to_proc&lt;br /&gt;end  # for faux currying&lt;br /&gt;&lt;br /&gt;w = File.readlines('/usr/share/dict/words').map {|w| w.strip.downcase}.uniq ;:done&lt;br /&gt;h = Hash.new([])&lt;br /&gt;w.each {|word| nu = word.split('').sort.join; h[nu]+=[word]} ;:done&lt;br /&gt;anas = h.values.find_all {|v| v.size &gt; 1} ;:done&lt;br /&gt;puts anas.map(&amp;:join[',']) # all anagram n-tuples&lt;br /&gt;puts "---"&lt;br /&gt;puts anas.sort_by(&amp;amp;:size)[-30..-1].map(&amp;:join[',']) # the top by set size&lt;br /&gt;puts "---"&lt;br /&gt;puts anas.sort_by {|a| a.first.size}[-30..-1].map(&amp;amp;:join[',']) # the top by word size&lt;br /&gt;&lt;br /&gt;in fairness, the symbol-currying is in "sym2proc.r", so it's really just 10 lines of code, but that's the general idea.&lt;br /&gt;(lots of the library functions look haskelly, but ruby just felt better for string processing)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-1320965996795289445?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/1320965996795289445/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=1320965996795289445' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/1320965996795289445'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/1320965996795289445'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/04/kata-6.html' title='kata 6.'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-3456408758717909787</id><published>2007-03-08T18:41:00.000-05:00</published><updated>2007-03-09T05:25:05.307-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='de usu fructuque lingarum'/><category scheme='http://www.blogger.com/atom/ns#' term='sussman'/><category scheme='http://www.blogger.com/atom/ns#' term='json'/><category scheme='http://www.blogger.com/atom/ns#' term='eval'/><title type='text'>eval is amazing.</title><content type='html'>&lt;span style="font-family:lucida grande;"&gt;In terms of things, eval is much bigger than JSON callbacks, even bigger than Lisp itself.  It's big like the ribosome --- eval is how things come alive.&lt;br /&gt;&lt;br /&gt;For example, I'm rendering urls and associated metadata in the browser from a ruby cgi script.  Instead of some big XML specification, I'm just passing an array of strings from server to client.  It's &lt;span style="font-style: italic;"&gt;big_array&lt;/span&gt;.inspect in Ruby, and eval(&lt;span style="font-style: italic;"&gt;inspectedBigArray&lt;/span&gt;) in Javascript. No monadic parsers, no macro magic; inspect and eval, code I don't even have to write myself.  (And if the  inspected big array is too big, the browser can start lysis; so it goes.)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:lucida grande;"&gt;&lt;br /&gt;G J Sussman: Programming is a good medium for expressing poorly-understood and sloppily-formulated ideas: exactly the opposite of people who'd want to plague me with type theory.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-3456408758717909787?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/3456408758717909787/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=3456408758717909787' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/3456408758717909787'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/3456408758717909787'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/03/eval-is-amazing.html' title='eval is amazing.'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5098591171030974653.post-5630717285948719973</id><published>2007-02-23T09:16:00.001-05:00</published><updated>2011-06-02T17:46:14.980-04:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='poesie'/><category scheme='http://www.blogger.com/atom/ns#' term='nyc'/><category scheme='http://www.blogger.com/atom/ns#' term='de origine nominis'/><title type='text'>origins</title><content type='html'>&lt;span style="font-family: lucida grande;"&gt;Some people call it disclaimers, some call it biscuiting, some youthful naïveté.  When I was 16, my &lt;a href="http://www.janestreet.org/"&gt;poetry teacher&lt;/a&gt; had told us that the only unwritten preface we were allowed to give to any piece was "It's slightly new, but it smacks of genius."&lt;/span&gt;&lt;a href="http://janestreet.com/"&gt;&lt;span style="font-family: lucida grande;"&gt;&lt;/span&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5098591171030974653-5630717285948719973?l=slightlynew.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://slightlynew.blogspot.com/feeds/5630717285948719973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5098591171030974653&amp;postID=5630717285948719973' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5630717285948719973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5098591171030974653/posts/default/5630717285948719973'/><link rel='alternate' type='text/html' href='http://slightlynew.blogspot.com/2007/02/origins.html' title='origins'/><author><name>lee</name><uri>http://www.blogger.com/profile/16737006640455843661</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
