Programming

Git patch workflow

Probably most people who use git know about patch management and whatnot, so I’m writing this largely for myself as I keep forgetting (mostly because I don’t have to do it very often).

Some code that I’m working on has a development branch which I’m working on while other fixes are being made to the master branch. This is great, except for the fact that I need to apply any changes made to master to the development branch as well (we’ll call it “feat/X”). And since I’m the only one working on feat/X, but others add fixes or necessary changes to master, I need to apply those changes. Some changes will apply clean, others need to be modified due to one part that is receiving a major overhaul.

There is probably a better way to do this and if so, please do let me know in the comments. Even though I use git almost daily, I am by no means a power user and would love more “insider tips”.

So I have my working copy, and I need it to know about both branches. By default, if you clone the repository you’ll just be on the master branch. So in this case I would have to git checkout feat/X. Once I do that, I will see:

% git branch
* feat/X
  master

So do a git checkout master to switch back to the master branch. From here we can make a series of patches that represent code changes made to master after feat/X was branched:

% git format-patch feat/X

The first time you do this, all the patches will be new. But as you move along, you’ll continue to see these patches show up so it’s useful to keep them kicking around so you know which are new. You’ll see patches in the format “0001-The-commit-text.patch” and “0002-Other-commit-text.patch”.

I move the patches out of the way (I have horrid file management, so I just mv *patch ~/). Then git checkout feat/X to switch to the feat/X branch. Now we apply the patches one-by-one:

% git am <0001-The-commit-text.patch

If the patch is successful, you can do the same to the next. If not, however, you need to fiddle with things a bit. You'll see an error like this:

% git am <~/0001-The-commit-text.patch
Applying: The commit text
error: patch failed: something/else.py:559
error: something/else.py: patch does not apply
Patch failed at 0001 The commit text
The copy of the patch that failed is found in:
   /home/vdanen/my-git-repo/.git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

You can run git am --abort which will revert the commit attempt, to allow you to manually apply the patch and commit. You can also use git am --skip to do the same. I suspect if you fed git something like git am <~/*patch these commands would mean different things (abort the session versus skip the one patch) but if you're doing one patch at a time they both do the same thing.

You can also check the log output with git log master..HEAD --stat which will show that our branch contains the update, along with the appropriate author information (might be you, might be me, might be someone else). I tend to skip the log bit and just apply one patch at a time. Rinse and repeat.

Finally, if you don't move your patches like I did, you can use git clean to remove the patches from your working directory. Use '-n' first though, so it runs in dry-run mode.

Like I indicated earlier, there are probably simpler ways to do this, but this works for me. I got all of this from Ry's Git Tutorial on Patch Workflows. Excellent tutorial.

Review of O’Reilly’s Learning Python, 5th Edition

I’ve been programming in Python for a few years now (I can pretty much mark the beginning because of starting at Red Hat). Since picking it up, I’ve fallen in love with the language and have a few books on the subject. One of the most indispensable books I have is Learning Python 3rd Edition published by O’Reilly. Recently I received a copy of the 5th Edition for review.

learning-python

My first reaction was that it was no mere book, but a tome. The 3rd Edition was no slouch, weighing in at 700 pages, but the 5th Edition is a hefty 1540 pages, over twice the size! This edition was released in June 2013 (ISBN 978-1-449-35573-9); the previous edition that I owned (the 3rd) was published in October 2007. This new edition was updated to cover Python versions 3.3 and 2.7 (the 4th Edition covered 2.6 and 3.0/3.1, so I imagine it too had a hefty size increase over the 3rd Edition).

Before I go further with the review I have to note that Learning Python 3rd Edition is nearly always on my desk. I referenced that book all the time. With the number of alternatives out there (searching on the internet or looking at other books that I have, such as Programming Python, the Python Cookbook, and Python in a Nutshell), this is the one book I used all the time. It is worn, crinkled, dirty, and probably sticky in a few spots as well. However, I’m not one to read books like this from front to back — I use them as reference material for areas that interest me or I need help with (or I need to brush up on).

The first section of the book, “Getting Started” goes into the basics of Python: what it is, what it’s used for, how to use it, why you would use it, and so forth. The 5th Edition expands on this, particular in regards to version 3.3 and its new options in Windows. This is all the really basic stuff, explained quite well and great for those interested in getting into Python without a lot of knowledge of the language. Those more experienced with Python will likely skip this section for the most part, but there are some good bits in here.

The second section of the book, “Types and Operations”, gets to the meat of writing code. This is the section that talks about Python object types (lists, dictionaries, tuples), how Python handles numbers, dynamic typing, manipulating and using strings, and handling file operations. One thing I noticed immediately is that the section dealing with numbers is greatly expanded from the 3rd Edition and this is largely due to the changes of how these are handled in Python 2.x vs 3.x. It is in this chapter that you begin to see why the book is so hefty — instead of focusing on just one major version of the language, it provides the necessary information for both 2.x and 3.x and the differences between the two. Because Python 2.x is still so widely used, it would have been impossible to ignore it unless they decided to write two books, one for each major version. The section on handling strings has likewise been expanded, enhanced, and re-organized with quite a bit of extra content. Again, quite a bit of this is due to the coverage of both Python 2.x and 3.x.

The third section of the book, “Statements and Syntax”, gets into the fundamentals of handling your code: typical statements (if/elif/else, variable assignments, loop handling, creating functions, namespaces, module handling, and exception handling). Here again a lot of content is devoted to the differences between 2.x and 3.x, and even between different versions of 2.x. There is a lot of content here — chapters devoted to topics about looping, if statements, iteration and comprehension, the 2.x print statement versus the 3.x print function, and even how to generate documentation for your code using PyDoc. For those learning Python, and even for those who choose to use this book as a reference, this section will be greatly used.

Perhaps one of the most significantly changed sections between the two editions of Learning Python (again, noting that I’ve never looked at the 4th Edition so I don’t know how significant a change it is between the 4th and 5th editions), is what was “Functions” in the 3rd edition is now “Functions and Generators” in the 5th Edition. There is a significant overhaul between the two: where before “Scopes and Arguments” were a single chapter, we now have two chapters, one devoted to each topic; the 33 page chapter has turned into 67 pages spanning two chapters. It also goes into great detail about generator functions and expressions (something I have yet to fully explore). Suffice it to say, there is a good 45 pages of new content here that will prove interesting to read (list comprehension, generators, etc.).

The fifth section, “Modules and Packages” likewise contains greatly expanded content covering the use of modules, how to create them, why (and how) you should use them, the differences between importing and reloading modules, full coverage of Python 3.3 namespace packages, problems you can encounter with module use, and so forth. It talks about the features common to both major versions of Python, and has sections that are more specific, such as how byte code is handled (.pyc in versions prior to 3.2 and the __pycache__ directory in 3.2+) and namespace packages that were introduced in 3.3 (which is also shows the differences between regular packages and namespace packages). This once again shows the level of detail the book provides for users of any version of Python.

Section six is all about Object-Oriented Programming and classes, and gives great detail and examples on topics such as polymorphism, classes and subclasses, object handling, operator overloading, and more.

Section seven goes into the how’s and why’s of exceptions and how to use them to write good error-handling in your code. This is the last section that appears in both editions, although (as with every other section), this one is greatly expanded.

The new section in the 5th Edition, compared to the 3rd, is the “Advanced Topics” section and its five chapters. This section goes into unicode and byte strings — an area that has seen a lot of changes between Python 2.x and 3.x, as well as managed attributes, decorators for functions and classes, metaclasses and the differences between them.

Obviously I’ve not yet read the entire book and chances are I never will. Most of these programming books I never do read end-to-end, but I do use them as functional references and to learn new things from (pretty much the entire “Advanced Topics” section is new to me), as well as to brush up on older things. From the parts I’ve read, and the comparisons I’ve made to the 3rd edition, this is most definitely a worthwhile “upgrade”. One thing I appreciate as well is the consistent back-and-forth regarding Python 2.x versus 3.x. As a Python coder who has never yet touched Python 3.x, this book will help me understand the subtle, and sometimes not-so-subtle, nuances between both major versions — when the time comes (aka: when the time is available). As such, this book is valuable to me now (writing code in Python 2.x) and will be valuable to me in the future.

I certainly appreciate the expanded content, as it’s already twigged a few things in my mind about the code I’ve written so far, and has given me some new ideas on how to handle certain issues or improve performance/handling of my code. This is both good and dangerous! Having said that, this is most definitely a book worth getting for any Python programmer who either is new to the language or is a veteran of the language — the first will appreciate the no-nonsense easy-to-understand approach to introducing both the basics and some quite advanced topics, while the latter will appreciate it as a reference book or to expand on their understanding of certain topics. For myself, I’m somewhere in the middle of the two and this book is introducing me to a lot of stuff I do not yet know, and is a very handy reference for the stuff that I do know (and need reminders of every once in a while).

I’ve highly recommended the 3rd edition of this book to anyone asking about good books on Python, and the 5th Edition is no different. I highly recommend this book to anyone who wants more than just a passing understanding of Python.

And as a replacement for my well-worn and well-loved 3rd Edition, I know this book will see much use from me.

Of bugzilla, python, fetchmail, and procmail

I’ve been working with bugzilla for years.. at Mandriva I was the primary bugzilla care-taker for a few years, and now with Red Hat I do a lot of work on some internal tools that interface with bugzilla that enhances and directs the workflow of day-to-day work (being that we work with bugzilla all the time). I also run my own bugzilla instance a) so I can keep up to date with the goings-on of bugzilla and b) so I can track various things in some of the scripts I write and other stuff. One thing that I do is also use it as a way of logging issues that may come up with the various bits of web hosting that I do.

So what I needed was a way to take incoming emails generated by lfd (my primary concern right now is high load average warnings; I wanted to log them to bugzilla so that if I’m unavailable my dad could also see them (he helps maintain the server and hosts a bunch of his stuff there as well) and perhaps deal with them or at the very least if he wanted/needed to comment on them he could do so via bugzilla, and I can also make note of resolutions or causes, etc. Yes, I’m turning bugzilla into a poor man’s “RT” ticketing system (I have no interest in setting something like that up, I’m already using bugzilla, and this is the best place for me to stuff these sorts of notes). I’ve tried the email_in.pl method and while it works, it only works if you have a specific format so you can assign it to the right component and product — not something that will work with these lfd-generated emails.

Being that a lot of the work that I’ve been doing has to do with using python and xmlrpc to manipulate bugs in the Red Hat bugzilla, it seemed like a reasonable approach to take to deal with my own bug mangling. The problem is that these emails were being sent to root, which in turn forwards directly to me. I also wanted to keep a copy of those mails in my own mailbox in case bugzilla, or anything in between, did something funky, so I opted to do a few things:

  • use a gmail filter to forward those emails to another email account specifically for bugzilla mails
  • setup fetchmail to pull (via pop3) those emails
  • setup procmail to filter those emails and send them to a helper script
  • write a helper script that will then call the python-bugzilla tool to file the bug

The first step was easy. Fetchmail was pretty easy too (although it’s been a few years). Procmail was easy, particularly now since I’m only concerned with one particular type of email and my gmail filter is quite specific. The helper script was initially a shell script that was going to call the bugzilla script but I quickly found limitations to that, particular since lfd’s email also has a few attachments and I was having issues with getting it to file the bugs properly. So instead of using uudeview and a shell script, I opted to write something in python.

Instead of procmail feeding uudeview and then feeding my script, I made use of some of the features of python that allow for manipulating email messages (something I’ve never done before). I also found that passing stdin to the python script was somehow also passing stdin to python-bugzilla when I was calling it, which was causing all kinds of grief.

So with this script I learned all kinds of new things: how to manipulate an email message in python and reduce an email with attachments to a message body with individual attachments as objects and how to use subprocess (yes, I’m still using the commands stuff by and large, but that was really problematic with stdin being persistent).

All in all it works quite well. However, I do still have one problem that I’ve not yet liked, and that is with binary files. I’m not sure where the issue is coming from, but for some reason when I call python-bugzilla myself, in a shell, and feed it the file to attach to the bug it works fine — however, when I call it from my script (so no shell), it uses the –file argument as the name of the file and wants stdin as the contents of the file. This is all fine and dandy, but somewhere along the line something is rendering that binary file (was testing with a jpeg) into text and when it’s attached to the bug it has the right mime type, but the contents are wrong and no image is displayed. So, dear lazy web, if there any python folks out there who want to look at my script and tell me what I’m doing wrong, I’d be much obliged…. =)

Anyways, since no post like this is really any good without the files involved, what follows is the script (process-mail) and the .procmailrc file (which is pretty bare bones and doesn’t filter much of anything):

# .procmailrc
#

HOME="/home/mailer"
SHELL=/bin/sh
VERBOSE=off
LOGFILE=$HOME/.procmail/procmail.log
# inserts a blank line between log entries
LOG="
"

:0
*^content-Type:
{
    :0fw
    | /usr/bin/python $HOME/bin/process-mail
}

All this does is call the process-mail script. It will/may eventually filter on subject and sender if I find that unwanted emails are triggering new bugs. For the moment I don’t particularly care.

And the process-mail python script:

#!/usr/bin/env python

import commands
import email
import os
import subprocess
import sys
import tempfile

# email comes from stdin due to procmail
raw_msg    = sys.stdin.readlines()

log        = open(os.environ['HOME'] + '/tmp/bugzilla-email.log', 'a')
bz_prog    = '/usr/bin/bugzilla'
bz_dest    = '--bugzilla=https://bugzilla.annvix.com/xmlrpc.cgi'
directory  = tempfile.mkdtemp()

f = tempfile.NamedTemporaryFile(delete=False)
f.write(''.join(raw_msg))
f.close()

msg = email.message_from_file(open(f.name))

attachments = {}

for part in msg.walk():
    a_payload = part.get_payload()
    a_name    = part.get_filename()
    a_type    = part.get_content_type()
    if a_name is None and a_type == 'text/plain':
        email_body = part.get_payload()
    elif a_name is not None:
        tf_name = '%s/%s' % (directory, a_name)
        tf_file = open(tf_name, 'wb')
        tf_file.write(a_payload)
        tf_file.close()
        attachments[a_name] = a_type

os.unlink(f.name)

email_to   = msg['to']
email_from = msg['from']
email_sub  = msg['subject']

log.write('Email received from %s to %s with subject "%s"\n' % (email_from, email_to, email_sub))

cmd = [bz_prog, bz_dest, 'new', '-i', '-p', 'Web Hosting', '-v', 'none', '-c', 'Availability', '-s', email_sub, '-l', email_body]
bug = subprocess.Popen(cmd, stdout=subprocess.PIPE).communicate()[0]
if bug != '':
    log.write('Filed bug %s\n' % bug)

if len(attachments) > 0:
    for x in attachments:
        attachment = os.path.join(directory, x)
        log.write('Found attachment: %s\n' % attachment)
        cmd = [bz_prog, bz_dest, 'attach', '--file=%s' % x, '--type=%s' % attachments[x], '--desc=mail attachment: %s' % x, bug.strip('\n')]
        if 'text' in attachments[x]:
            a_file = open(attachment, 'r')
        else:
            # this does not work.. attaching a jpg results in mangled text and I'm not sure why...
            a_file = open(attachment, 'rb')
        foo = subprocess.Popen(cmd, stdin=a_file, stdout=subprocess.PIPE).communicate()[0]
        a_file.close()

        if foo != '':
            log.write(foo + '\n')
        else:
            log.write('Failed to attach bug to bugzilla!\n')

log.close()

sys.exit(0)

I’m not sure why this isn’t working for binary attachments though… it’s probably something simple, but I’ve not had a chance to figure out what the problem is. Dear lazy web…. any advice? =)

Converting subversion to git redux

I know I’ve written about this in the past (here and here), but I needed to do another conversion the other day that was similar, yet different. Previous posts talked about pulling parts of a subversion repository into a git repo — effectively taking one svn repo apart into multiple git repos. This time I just needed to do a straight conversion, however I needed to exclude one single directory from ever being a part of the history of the repo.

Since this was a fairly important repo to convert, I did a few trial runs first and ended up scripting it since there isn’t just a single command to do what I needed. Essentially, we are doing a git clone from a subversion repository (a standard one with trunk/, tags/, branches/ this time), but excluding one directory (we’ll call it private). I also wanted to convert the svn branches to tags since that’s effectively what they were. Also, since the git repository was not local, and for the sake of expediency I didn’t want to tar something up and email it, we’re taking our converted-and-cleaned-up new git repo, changing the upstream, and then pushing the whole thing to a remote bare repository.

Ready? (Note: a few lines are manually wrapped with ‘\’ below)


#!/bin/sh
WORKDIR="/srv/svn2git/git"
REMOTE="git+ssh://remote.git.host/myrepo.git"
mkdir -p ${WORKDIR}

pushd ${WORKDIR}
git svn clone https://remote.svn.host/repos/myrepo --no-metadata \
-A /srv/svn2git/authors-transform.txt --stdlayout \
--ignore-paths="^trunk/private" ${WORKDIR}/from-svn
cd from-svn
git init --bare ../bare.git
cd ../bare.git
git symbolic-ref HEAD refs/heads/trunk
cd ../from-svn
git remote add bare ../bare.git
git config remote.bare.push 'refs/remotes/*:refs/heads/*'
git push bare
cd ../bare.git
git branch -m trunk master
for x in branch_one branch_two branch_three; do
git tag "${x}" refs/heads/${x}
git branch -D ${x}
done

cd ..
git clone bare.git myrepo
cd myrepo
git remote rm origin
git remote add origin ${REMOTE}
git config remote.origin.push 'refs/remotes/*:refs/heads/*'
git config master.remote origin
git config master.merge refs/head/master
git push --set-upstream origin master

And that is all there was too it. The svn authors file was created ny using:


$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); \
print $2" = "$2" <"$2">"}' | sort -u > authors-transform.txt

in the existing copy of the subversion repository that I had (and then mangling it to suit my needs, particularly changing it to add the committers’ real names and email addresses as well).

Git commit hook to bugzilla using git-notifier

I’m a big fan of the git-notifier script, which acts as a hook in git to send you nice emails about things that have changed in your git repos. I’m also a bugzilla user, so I wanted to be able to put git commit notifications, automatically, in bugzilla if “bug #X” is in the commit log. Initially I was intending to use gitzilla for this, but I didn’t feel like attempting to make it work with the latest release of pybugz (says it is tested with version 0.8.0 but the current version is 0.10.1. In retrospect, that might have actually been easier. =)

Anyways, I decided to use git-notifier to send an email to bugzilla (while gitzilla uses XMLRPC, which would have been preferable, I also have incoming email support enabled in bugzilla). It took some trial and error, but I got it working (although I suspect there are easier ways to do it).

The first thing I had to do was patch git-notifier to accept a bug id, because bugzilla needs to know what bug to route the incoming email to. This was very easy to do (I’ll be sending this upstream to see if they want to include it, but I may also change it to pull more info from the git config so that the post-receive hook doesn’t have to be so obscene:

--- git-notifier.orig	2012-09-21 10:04:21.283442085 -0600
+++ git-notifier	2012-09-21 10:25:04.811307023 -0600
@@ -37,6 +37,7 @@
     ("link", True, None, "Link to insert into mail, %s will be replaced with revision"),
     ("updateonly", False, False, "update state file only, no mails"),
     ("users", True, None, "location of a user-to-email mapping file"),
+    ("bug_id", True, False, "bug ID (for sending email to bugzilla)"),
     ]
 
 class State:
@@ -250,6 +251,11 @@
 
     repo = Config.repouri
 
+    if Config.bug_id:
+        bzid = "@bug_id = %s\n\n" % Config.bug_id
+    else:
+        bzid = ""
+
     if not repo:
 
         if gitolite:
@@ -269,10 +275,10 @@
 X-Git-Repository: %s
 X-Mailer: %s %s
 
-%s
+%s%s
 
 """ % (Config.sender, Config.mailinglist, Config.emailprefix, subject, repo,
-       Name, Version, mailTag("Repository", repo)),
+       Name, Version, bzid, mailTag("Repository", repo)),
 
     return (out, fname)

This works, and works well, but the post-receive hook is messy. What used to just be:

#!/bin/sh
/srv/git/hooks/git-notifier $@ --link="http://[url];a=commitdiff;h=%s" \
  --emailprefix="[git: [repo]]"

Has now turned into this monstrosity:

#!/bin/sh
while read oldrev newrev refname
do
    commit=$(git rev-parse $newrev)
done

bzemail="bugzilla-daemon@bugzilla.me.com"

/srv/git/hooks/git-notifier $@ --link="http://[url];a=commitdiff;h=%s" \
  --emailprefix="[git: [repo]]"

for BUG in $(git log ${commit} -n 1 | sed 's/bug #/bug#/g' | \
  egrep -i -o 'bug#[0-9]*'); do
    BUGID=$(echo "${BUG}" | sed 's/bug#//i')
    EMAIL=$(git log ${commit} -n 1 --pretty=format:"%ae")
    test=$(echo "$BUGID" | sed 's/[0-9]*//g')
    if [ "${test}x" = "x" ]; then
    # make sure it is a digit
        /srv/git/hooks/git-notifier $@ --link="http://[url];a=commitdiff;h=%s" \
  --emailprefix="[git: [repo]]" --bug_id=${BUGID} --mailinglist=${bzemail} \
  --sender=${EMAIL} --manual=${commit}
    fi
done

(lines wrapped for readability)

So while it’s fugly, there’s quite a bit of magic to it. Seems that when you call git-notifier again, it won’t send an email because it knows it’s already been sent, which is why we need the commit hash, and feed it to it with the –manual option. The –mailinglist option is used to point to bugzilla (again, the git-notifier config is pointing to another email address to receive the commits already, so we need to override it). The –sender option takes the committer’s email address as the value (the $EMAIL variable), which also overrides the default git-notifier sender (which is the local user on the system unless you’re using gitolite (which I’m not)). The –bug_id is a digit to reference the bug in the commit (this should also send multiple mails if more than one bug is referenced in the commit, but I’ve not tested that yet). The end result is you get a copy of the git commit directly into bugzilla, in the same format that you would get it via email.

I may spend some time later on trying to make gitzilla play nicely with the newer version of pybugz, but for now this scratches my itch. Like I said, not the prettiest solution, but it works as a quick-n-dirty hack. The inspiration for using email to send this to bugzilla was from Gentoo’s Bugzilla Email wiki entry (the subversion integration part in particular).

Note that since the email is being sent as the committer, the committer needs to have an account with that email address in your bugzilla. If not, bugzilla’s email_in.pl will bounce it back. So you may want to have a bugzilla “commits” account as a dummy account from which you can email these things if not all of your committers have bugzilla access or use the same email address in bugzilla that they do in git.

If anyone has any suggestions on a better way to do this (particularly via XMLRPC which I think would be a nicer way to go), I’m all ears. (Short of writing my own — I could do this, having worked with bugzilla and lately with XMLRPC access, quite a bit — I’m too lazy to write something from scratch)