Search-oriented tools for Unix-style mail

A brief comparison of mu and notmuch

In a traditional Unix mail setup, users receive mail on a remote server, which they log into through a terminal to read. This would once commonly have been a server run by their university or employer. The mail is delivered to their home directory (or possibly a spool file), after which it can be read or otherwise manipulated as the user would like. More on that part in a minute.

This setup has since been largely supplanted by "ISP-style" mail, except at a dwindling number of university departments. Nowadays it is still common that some other organization receives and stores users' mail on a remote server on their behalf (the organization might be a university, employer, or internet service provider, or a dedicated email provider like Gmail). But rather than interacting with remote mail by logging in to a Unix terminal session, users either use a desktop mail client (Outlook, Thunderbird, etc.) that interacts with the remote mail store over IMAP, or a webmail client that functions in some provider-specific manner.

With the rise of cheap servers in The Cloud, however, it's now (relatively) easy for Unixophiles to receive their own mail on a remote Unix server, and then log in to said server to interact with it. This can be done either by running your own full-on mailserver (I use postfix), or by using an email provider but syncing mail to your server using a tool like OfflineIMAP or mbsync.

Storing your own email on your own server doesn't by itself necessitate any change in reading mail: one can run a mail server and still read email over IMAP with a conventional mail client, or through webmail. But for some people, including me and maybe you, it's appealing to at least have the option to manipulate mail directly on the remote server in a Unix-style terminal manner, especially in the case where we are already doing other things on this cloud Unix server, so frequently have a screen or tmux session open over ssh anyway.

notmuch and mu

Thus we return to the original situation: you have a bunch of mail in your home directory on a remote Unix server. What do you do with it? The fact that some decades have elapsed and we still like text and command lines doesn't oblige everything about this setup to stand still. One way people now commonly interact with data is by using search as the main interface. The new crop of Unix-style mail tools likewise use search as the main interface to the mail store. Full-text search on even very large mail archives is quite feasible nowadays, so this is the starting point of the two tools I'll look at here, notmuch and mu.

Both have a broadly similar approach. You point them to a directory that has your email in maildir format. They index it with Xapian, and provide some utilities that use the index to query and manipulate mail. I'll outline the main differences as I see them. It's worth keeping in mind in what follows that I've used mu for a year or so, but haven't used notmuch as my daily email interface, though I've attempted to highlight advantages of each despite my own personal choice.

Data models

While the tools have a lot of minor differences, the big one, which will probably be decisive for most people when choosing between them, is that they have quite different data models. Mu uses maildirs for message filing, while notmuch uses the Xapian db for message tagging. This has implications for both how messages are sorted, and for where the canonical metadata resides. The more traditional of these is mu, which uses a file-mail-into-folders paradigm. The folders are on-disk maildir directories, and you file messages by moving the mail files between directories. Notmuch on the other hand uses a Gmail-style paradigm of adding one or more tags to messages. Tags are stored in the Xapian database, and the original maildir files are treated as immutable. Messages on disk are never touched or moved; all changes to message state (even "deleting") are done by just changing tags in the db.

This clearly makes notmuch more flexible: a tag can be anything, with user-defined tags and "built-in" tags like sender and subject being treated equivalently. Mu on the other hand treats the maildir files as the canonical database, so its Xapian index only includes only data from there: the message headers and content, the folder, etc., with no user-defined tags. But mu's approach does have the advantage of being more interoperable, since other clients can read the mail store as just a plain maildir, without having to know about mu. Since the important metadata for notmuch is all in the Xapian database, pointing other tools at your email requires interfacing them with notmuch.

Depending on your preferences, one or the other approach might also feel like a "safer" way of handling mail. Mu personally feels safer to me, since the Xapian database is purely a search index that can be regenerated at any time from the maildir files with no data loss. I feel more comfortable backing up the simple structure of a maildir directory (plaintext files in regular Unix directories) and relying on that for long-term archival, than I do in relying on the integrity of a binary database file. On the other hand, notmuch's choice to never modify/delete/touch your original mail files is safer in a different way; even if things go haywire, you may lose metadata but the messages themselves will still be there somewhere.

'Doing email'

Beyond the differences in data model, the two tools share a similar "toolbox" approach, providing a variety of command-line tools that can either be used directly, or used by higher-level mail clients. For example, using mu, to find all messages Julian Togelius sent to me about our book:

mu find from:Togelius subject:book

To just list unread mail in my inbox:

mu find maildir:/ flag:unread

Search terms are joined by an implicit AND, but you can also use explicit boolean operators according to Xapian syntax, plus additional things like date ranges. Search terms prefixed by a field such as 'from:', 'subject:', 'date:', etc. finds emails matching the relevant field; anything else searches the fulltext of all emails. Notmuch has a similar search command. For those used to the often slow performance of fulltext search on a large mail store over IMAP, you'll be pleasantly surprised at how fast this is with either mu or notmuch.

There are a variety of other commands for both mu and notmuch, letting you do your entire email usage from the command line, if you'd like. (This toolbox style may remind some old-timers of the MH approach to email, but rebuilt on top of a modern database.)

Of course, you don't have to do your email entirely from the command-line, and both ship with several choices of email clients that build on top of the low-level functionality. These can be used instead of the command-line tools or (as I prefer) intermixed with them, doing some things in a full email client and some from the command line. I can't properly review the various mail clients here, because I've only really used mu4e, an emacs-based client for mu. It has a very nice manual, and I like it despite not really being an emacs power user. But you can use a variety of other clients, including integrating general-purpose clients like mutt with either mu or notmuch. That's an overview for another article.

Initial mail sorting

One final point to mention is initial mail sorting, the processing of new mail as it comes into your inbox, to do things like separate mailing list mails from the rest of your email, filter out spam, etc. This is traditionally (in the world of Unix email) performed by a tool like procmail that gets new mail handed off from the mail transfer agent as it comes in, and processes/sorts it according to a set of recipes set up by the user. Since mu uses regular maildir folders for its sorting, it doesn't require mu-aware tools to do initial mail processing, and many mu users therefore continue to use procmail or a replacement like maildrop.

While some view this as an advantage of mu's approach, I personally don't like this solution, because it means that you have to deal with two syntaxes: one using procmail/maildrop for initial sorting, and a different syntax when searching the mail store for any other reason. I'd rather just use the one syntax for both. Therefore I do initial mail sorting using a simple shell script on top of 'mu find'. The --fields=l argument to 'mu find' tells it to return pathnames as search results, which can then be used to refile messages by just moving them with 'mv'. Sample script (note that this example uses GNU extensions to 'xargs' and 'mv'):

# update index with any new mail
mu index --quiet

# refile new mailing list mail somewhere other than my inbox
mu find maildir:/ AND flag:new AND list:members.sigcis.org --fields=l \
| xargs -r mv -t ~/Maildir/lists/sigcis/new

# update index to reflect any moved mail
mu index --quiet

Since notmuch uses its own tagging system, it requires notmuch-aware initial mail processing, so the choice of whether to keep using procmail-like tools or not doesn't exist. You can use a shell-script type solution like the one I use for mu (see here), but there's also a more full-featured filtering/sorting tool available, called afew.

* * *

In summary: reading your mail on a remote Unix server is hot again (ok, sort-of hot), and mu and notmuch are two tools to help you do it. Their most important difference is in the data model: mu uses maildir files and directories as its canonical storage, with a Xapian database as a fast index into it, while notmuch has a tag-oriented model where the underlying maildir is immutable storage and all metadata is stored in the Xapian database. Both then have a toolbox of utilities to search and interact with mail from the command line, as well as the ability to interface them with a choice of mail clients.