Recently I had to write a lot of attributes —titles and matching slugs (in the URL)— for a bunch of links for a simple Hungarian web page I was building. There were a lot of links. Since I was editing the HTML template and associated URL configuration in Vim, I figured I’d quickly run some macro to generate me slugs from the page titles, so that I wouldn’t have to do them one-by-one. It turned out none of the existing solutions did quite what was necessary so I developed my own solution (shown below), but first: What is a slug?
Slugifying is a step up from ascii-fication. If we take the latter to mean “removing all non-ASCII characters from a string” then slugifying simplifies it even more. The point of slugifying is to generate (usually from a link or post title) a string good for use as a URL, without the characters getting garbled up into non-human-readable URL-encoded rubbish like this:
when what you really want is something like this:
Existing solutions and the problem of OSX
I based my solution on xolox’s slug function from his str collection, but even more hardcore. His doesn’t handle accented characters well.
Mine shells out to iconv, like the Diacritic plugin does.
This doesn’t work so well on OSX because apparently its transliteration is rubbish, my workaround is to do a second pass and remove OSX’s garbage. I later found out that it’s because OSX uses the BSD libiconv which is much leaner and simpler and lighter than the GNU libc (this can be a good thing) but also apparently puts in much effort into transliterating strings in locales other than English. For example, if I convert a German word like “grün” to a German locale, I expect to get “groen”, and if I convert it to ASCII, which has no accented characters, then I expect grun, with no accents.
The iconv command on OSX would give you gr”un. IMO this is not useful in any language and it also doesn’t get me any closer to removing the accents to form slugs. A Hungarian example with a typical test word:
- Árvíztűrőtükörfúrógép input text
- ‘Arv’izt”ur”ot”uk”orf’ur’og’ep libiconv (OSX)
- Arvizturotukorfurogep glibc (this is what I want)
Since I can’t expect this to work consistently on Mac and Linux and I myself often switch between both I decided to brute force it, use iconv and strip any left over apostrophes and quotes from the result to handle the OSX case:
command! Slugify call setline('.', join(split(tolower(substitute(iconv(getline('.'), 'utf8', 'ascii//TRANSLIT'), "[\"']", '', 'g')), '\W\+'), '-'))
Probably not the most elegant solution, but at least it works for me…. consistently.
Edit from the future: I have now used this so much that I’ve committed it to my vimrc.