Cameron Laird's personal notes on "*-URL!" mechanics

One reader called this "a worthwhile handbook for possibly-soon-to-be-news-editors".

You're probably reading this page because Cameron Laird invited you to help with the weekly digests. Why did he do that? Most likely is that your Usenet postings convinced him you write accurately and knowledgeably about the subject at hand, you're helpful to the community at large, and you can understand what others to say. You're a good leader. Drafting "*-URL!" digests is a different way to volunteer those virtues. Perhaps later we'll make more explicit "what's in it for me?" (or at least why we previous editors have enjoyed our stints).

How do the editors of the weekly "Python-URL!" and "Tcl-URL!" digests of language news go about their work? Here's what Leam Hall wrote on the subject in 2000 (a different redaction of which is available through the Tcl-ers Wiki):

How to be a Tcl-URL! Editor.

The Tcl-URL! is a weekly update on Tcl, its parts, and its community. It is part update, part synopsis, and part evangelism. The target audience may or may not know a lot of Tcl, nor may they have yet committed a lot of time investigating it. Probably the most important part of this process is the human factor; picking a few significant highlights, and introducing them in a single sentence that draws the reader. If you can pique the interest of the person, you have done a great job.

The post goes out early Monday afternoon. Cameron Laird does the final editing and verification Monday morning, your draft to him needs to be in his mailbox sometime early Monday morning. (Early for Cameron is 5 am CST--though I think he would let you be a bit later than that.) Because post volume generally goes down on the weekend, you may be able to do this Sunday afternoon or evening.

The first step is to pick good posts. Read the posts from the past week. A good candidate post provides two answers. First, it answers some question that is not answered in the basic documentation. For example, a good post might be: "Strosberg hits on why you don't want to leave 'puts' calls in your Tcl code". A less than ideal example would be an individual having install issues on a known good platform. A good example would be instructions on porting Tcl to a totally new platform. Posts can come from the comp.lang.tcl newsgroup, or anywhere else you find good material.

The next step is to find the post in the www.deja.com archives. Spend a few minutes with http://phaseit.net/claird/comp.infosystems.search/dejanews.html for a good understanding of how to do this. Once you understand it, build the draft e-mail. A good target number is 6-8 significant posts. Put yourself in the place of the potential reader. If you only got one e-mail about Tcl each week, what would you want them to know about our community?

Third, send yourself the e-mail, and verify that the links work. Few things are as embarrassing as sending a broken link.

Fourth, write a one line introduction to the thread that encapsulates its intent. This is a key issue. In your phrase you want to be brief, yet give enough information so the reader can wisely judge whether or not he wants to pursue that thread.

Fifth, and last, send it to Cameron. He will give it the final polish and distribute it. And he'll probably also send you a big thanks as well.

Notice that second step. It calls for a function which maps from the space of interesting Usenet postings to URLs which can be shared with others. Realization of that function is indeed one of the purposes of "Cameron Laird's personal notes on DejaNews".

In spring 2001, Andrew Kuchling described his work in summarizing the ???? mailing list this way:

So, to rescuing the summaries... For the information of potential volunteers, I'll summarize the process. Writing them isn't difficult and can easily be done while watching TV -- I usually pull the mail archive on my laptop and do just that. They're just time-consuming, taking up two evenings at a time.
  1. Grab a copy of the mailbox archive for the month.
  2. Load it up into mutt (or other MUA of choice), and delete messages outside of the two-week period being summarized.
  3. Sort the remaining messages by thread, and go through finding the interesting threads. What makes a thread interesting?
    • Threads discussing how to fix a particular bug, or tracing down a bug's root cause, can be quite lengthy but usually aren't interesting.
    • Minute discussions of language syntax are rarely interesting, and I think even Guido tunes them out after a while.
    • New proposals that spark off a discussion are interesting.
    • Discussions surrounding a PEP are interesting.
  4. <time consuming but fun step> Read through each interesting thread, pulling out interesting quotes and summarizing the flow of the argument. This is fun because you can introduce your own biases; I think this is why people become journalists.
  5. <time consuming and tedious step> For each quote, find the URL for that message in the archives on mail.python.org, and link to it. This is painful; recently I've started spidering the archives with wget and grepping them in order to avoid ferreting through the indexes over my modem connection.

    Thinking about it now, it would have been much easier on me to just do what Linux Weekly News does, and make a local copy of each message. That means you wouldn't have to worry about links breaking (moving Mailman to mail.python.org required regenerating the archives, and the links in the October & November summaries are now all broken), and the author wouldn't have to hunt for the right message URL. You'd lose the easy ability to follow the thread, of course.

  6. The summary is written as plain ASCII, and it gets posted, and mailed to LWN and LinuxToday. The HTML version is then just a matter of escaping <&>, turning the URLs into <A> tags with an Emacs macro, and putting <pre> tags around it; no fancy XML DTD...

Kragen Sitaker describes his "procedure to write Python-URL! in about 16-20 hours per week" in the tips that follow, which I reproduce with his permission:
    # this first loop takes about 3/4 of the time
    for message in messages('comp.lang.python'):  # about 700 per week
        if interesting(message):                  # about 100-150 per week
            save_to_file('News/purl-%d' % serialnum, message)
    purlfile = new_python_url_file_called('summary')
    for message in filemessages('News/purl-%d' % serialnum):
        if really_interesting(message):           # about 50 per week
            summary = message.summarize()
            purlfile.append(summary)
            purlfile.append(message.message_id)
    for update in software_updates(freshmeat) + random_other_web_stuff():
        if interesting(update):
            summary = update.summarize()
            purlfile.append(summary)
            purlfile.append(update.url)
    purlfile.organize_into_categories()
    for category in purlfile.categories():
        category.sort_summaries_by_interestingness()
    for category in purlfile.categories():
        category.remove_least_interesting_summaries()  # try to cut down to 25-30
    purlfile.proofread_and_correct()
    purlfile.convert_message_ids_to_links()
    check_links(purlfile.converted_to_html())
    purlfile.mail_to_cameron()
    
See http://freshmeat.net/browse/178/?filter=&orderby=date_updated_unix_DESC&topic_id=178 for new Python software. My own estimate is that a weekly editor should spend under two (!) hours, rather than the twenty Kragen mentions.

See http://pythonowns.blogspot.com/ for a Python-oriented blog.

Mygale has good, if occasional, news; it is posted here: http://www.awaretek.com/output.html.

query-replace-regexp \([^ ]*\)@\([^ ]*\)
message-IDs with 
http://groups.google.com/groups?selm=\1%40\2
to make them into links.  That is:
(defun link-message-ids ()
  "Turn Usenet message-IDs into URLs on groups.google.com."
  (interactive)
  (goto-char (point-min))
  (query-replace-regexp "\\([^ ]*\\)@\\([^ ]*\\)" 
			"http://groups.google.com/groups?selm=\\1%40\\2"))
agrify is handy for making URLs into links so you can check them.

For putting links to articles into the summary buffer, I have this function, which is really a keyboard macro transliterated into elisp:

(defun copy-message-id-to-other-buffer ()
  "Find a Message-ID and put it in the other buffer."
  (interactive)
  (save-excursion
    (search-backward "\nMessage-ID: ")
    (search-forward "<")
    (let ((start (point)))
      (search-forward ">") 
      (backward-char)
      (copy-region-as-kill start (point)))
    (switch-to-other-buffer 1)
    (insert "\n        ")
    (yank)
    (insert "\n")
    (message "Copied msgid to %s" (buffer-name (current-buffer)))
    (switch-to-other-buffer 1)))
(global-set-key [f5] 'copy-message-id-to-other-buffer)
I use the following skeleton (with outline-mode) to organize the content as I write it. It's easy to move a paragraph into one of these categories and then hide it with C-c C-d. When I'm done, I delete the asterisks and the 'Unsorted' category. Here's 'agrify', so-called because it converts Phil Agre output (or, generally, arbitrary text) to HTML:
#!/usr/bin/perl -w
use strict;

# Philip Agre likes to write documents in ASCII.  This program does
# some heuristics to try to convert his ASCII documents to HTML.
# It's mostly a collection of crude kludges, and it's possible for it to
# produce stuff that's not valid HTML.  (Normally it produces valid HTML 
# 4.0 Transitional.)

my $between_paras = 1;
my $in_ul = 0;
my $in_ol = 0;
my $in_bq = 0;  # blockquote
my $did_title = 0;
my $was_short_line = 0;
my $length = undef;
sub end_ul {
    if ($in_ul) {
        print "</ul>\n";
        $in_ul = 0;
    }
}

sub end_ol {
    if ($in_ol) {
        print "</ol>\n";
        $in_ol = 0;
    }
}

while (<>) {
    if (/^$/) {
        $between_paras = 1;
    } elsif (/^\s*\*\*\s*$/) { # Phil's new '**' section divider
        end_ul; end_ol; print "<hr>\n"; $between_paras = 1;
    } else {
        # first nonblank line is the title; not perfect, but right sometimes
        if (not $did_title) {
            print '<!DOCTYPE HTML PUBLIC ',
                '"-//W3C//DTD HTML 4.0 Transitional//EN" ',
                '"http://www.w3.org/TR/REC-html40/loose.dtd">', "\n",
                "<html><head><title>$_</title></head>\n";
            print '<body bgcolor="white"><h1>', $_, "</h1>";
            $did_title = 1;
        } else {
	    # get length before munging
	    my $last_line_length = $length;
            $length = length();
            if ($between_paras) {
                # now we start a paragraph
		if (s/^\s*\*\s*//) {
                    end_ol;
                    if (not $in_ul) {
                        print "<ul>\n";
                        $in_ul = 1;
                    }
                    print "<li> ";
		    $between_paras = 0;
                } elsif (s/^\s*\((\d+)\)\s*//) {
                    end_ul;
                    if (not $in_ol) {
                        print "<ol>\n";
                        $in_ol = 1;
                    }
                    print "<li> ";
		    $between_paras = 0;
		} elsif (s|^\s*//||) {  # Phil uses // to start section titles
		    end_ol;
		    end_ul;
		    print "</blockquote>" if $in_bq;
		    $in_bq = 0;
		    $_ = "<h2>$_</h2>\n";
		    $between_paras = 1;  # the title is one line long
                } else {
                    end_ol;
                    end_ul;
		    # If Phil used multiple levels of indents, or if he
		    # indented the first lines of unindented paragraphs,
		    # this would work badly.  As it is, it sometimes 
		    # identifies things as blockquotes that are just indented
		    # paragraphs, but that's OK --- browsers render 
		    # blockquotes as indented paragraphs.
		    if (/^\s{2,}/) {
			print "<blockquote>" if not $in_bq;
			$in_bq = 1;
		    } else {
			print "</blockquote>" if $in_bq;
			$in_bq = 0;
		    }
                    print "<p>\n";
                    $between_paras = 0;
                }
            } else {
                # inside a paragraph, if the previous line
                # was "short", print a line break
                if ($was_short_line) {
                    print qq(<br len="$last_line_length">);  # invalid HTML
                    # print '<br>';
                }
            }
            # 57 seems to work OK for Phil's text; there really
            # are 57-char-wide lines in Phil's text sometimes.
            # this includes the \n.  Increasing this to 59 reduces
	    # the number of unintentional joins, but adds some accidental
	    # splits.  Perhaps we should look at text a paragraph at a time
	    # rather than a line at a time?
            $was_short_line = ($length < 57);
            s/\*(.*?)\*/<i>$1<\/i>/g;  # *this* is italics
            s/^>(>*From)/$1/;          # mailers mangle things
            # URLs become links; we assume they don't end with 
            # . or ,  --- although it's legal for them to do so, it
            # is unusual, but it is common for unnecessary , and .
            # characters to be appended.
            s|(http://[^ \n>")]*[^ \n.,>")])|<a href="$1">$1</a>|g;
            print;
        }
    }
}

end_ul;
end_ol;
print "</body></html>\n";
    
One question those on the Python side ask: is the daily Python-URL digest separate from "Python-URL!"? Yes, there's no direct connection; we're just friends.

[New in 2005: explain Sheila's leadership with del.icio.us ...]


Cameron Laird's personal notes on "*-URL!" mechanics/claird@phaseit.net