How to slugify text in Vim (properly)

Recently I had to write a lot of attributes —titles and matching slugs (in the URL)— for a bunch of links for a simple Hungarian web page I was building. There were a lot of links. Since I was editing the HTML template and associated URL configuration in Vim, I figured I’d quickly run some macro to generate me slugs from the page titles, so that I wouldn’t have to do them one-by-one. It turned out none of the existing solutions did quite what was necessary so I developed my own solution (shown below), but first: What is a slug?

Define: slug

Slugifying is a step up from ascii-fication. If we take the latter to mean “removing all non-ASCII characters from a string” then slugifying simplifies it even more. The point of slugifying is to generate (usually from a link or post title) a string good for use as a URL, without the characters getting garbled up into non-human-readable URL-encoded rubbish like this:

Slugify%20text%20in%20Vim%2C%20for%20example%20%E1rv%EDzt%171r%151t%FCk%F6rf%FAr%F3g%E9p%0A

when what you really want is something like this:

slugify-text-in-vim-for-example-arvizturotukorfurogep

Existing solutions and the problem of OSX

I based my solution on xolox’s slug function from his str collection, but even more hardcore. His doesn’t handle accented characters well.

Mine shells out to iconv, like the Diacritic plugin does.

This doesn’t work so well on OSX because apparently its transliteration is rubbish, my workaround is to do a second pass and remove OSX’s garbage. I later found out that it’s because OSX uses the BSD libiconv which is much leaner and simpler and lighter than the GNU libc (this can be a good thing) but also apparently puts in much effort into transliterating strings in locales other than English. For example, if I convert a German word like “grün” to a German locale, I expect to get “groen”, and if I convert it to ASCII, which has no accented characters, then I expect grun, with no accents.

The iconv command on OSX would give you gr”un. IMO this is not useful in any language and it also doesn’t get me any closer to removing the accents to form slugs. A Hungarian example with a typical test word:

  • Árvíztűrőtükörfúrógép input text
  • ‘Arv’izt”ur”ot”uk”orf’ur’og’ep libiconv (OSX)
  • Arvizturotukorfurogep glibc (this is what I want)

The solution

Since I can’t expect this to work consistently on Mac and Linux and I myself often switch between both I decided to brute force it, use iconv and strip any left over apostrophes and quotes from the result to handle the OSX case:

command! Slugify call setline('.', join(split(tolower(substitute(iconv(getline('.'), 'utf8', 'ascii//TRANSLIT'), "[\"']", '', 'g')), '\W\+'), '-'))

Probably not the most elegant solution, but at least it works for me…. consistently.

Edit from the future: I have now used this so much that I’ve committed it to my vimrc.

All Hallows Day

Candles lighting up the cemetery on Halloween in Vaszar.

Candles lighting up the cemetery on Halloween in Vaszar.

We didn’t dress up and go trick-or-treating for Halloween in Namibia, but Jack-o-Lanterns and spooky costume parties are what come to mind when I think of the time around the transition from October to November. In Hungary –and it turns out many countries in this area– it’s celebrated a bit differently and spooky costume Halloween parties have only started to become popular in the 21st century.

On 1 November, Hungarians travel to the countryside for All Saints’ Day and All Souls’ Day on 2 November. They visit the cemeteries where their ancestors are buried and decorate the graves with flowers and candles. It looks really beautiful at night and I took some photos when we were in Vaszar on the weekend. It was a dark night and the pictures are mostly out of focus, but I think they’re still pretty.

It turns out Halloween is a combination of the words Hallow (meaning holy or saint) and e’en (a contraction of even, which is the Scots spelling of eve or evening), and the celebration has a long Christian and Celtic folk history.