A “Hello, World!” program is generally a computer program that ignores any input and outputs or displays a message similar to “Hello, World!”. A small piece of code in most general-purpose programming languages, this program is used to illustrate a language’s basic syntax. “Hello, World!” programs are often the first a student learns to write in a given language,[1] and they can also be used as a sanity check to ensure computer software intended to compile or run source code is correctly installed, and that its operator understands how to use it. -Wikipedia
Hello World! Since this is my first post in this blog I thought I’d follow the tradition and greet the world. But I also wanted to give the world something so I thought I’d write about a handy trick you can do with sed
, one that I see many people being unaware of.
~$ whatis sed
If you type whatis sed
on a *nix terminal you will see that sed is a stream editor for filtering and transforming text. To see the full functionality you can of course, as always, type man sed
and read the manual. RTFM is always the right thing to do but manuals can be overwhelming, especially if you ’re a novice user and you want to get something done fast. sed
in particular has a lot of commands that manipulate text in differnet ways. Undoubtely though, the most common one used in scripts or in a terminal is the s command.
The s command
An example s command of sed
can be seen in the title of this blog post. We echo
a string, pipe the output of echo
to sed
with the ‘|’ character and we tell sed
to substitute the first occurence of the character ‘k’ with ’l’. The result is Hello World
. (If we wanted to replace all occurencies we would add a g
after the last /
which stands for global
.
The “problem”
Imagine you have a file full of domains, one per line and you want to add the protocol at the start of each line ( https://). This can be done in a lot of ways of course but one of them is sed
. For brevity let’s say we want to do it with one domain, for example gnu.org. We could write a sed
command like so:
echo "gnu.org" | sed 's/^/https:\/\//g'
and get the result https://gnu.org
.
Damn that’s ugly, right? Let’s break it down. First of all, the ^
symbol is a regular expression, meaning the beginning of the line. So we ’re basically adding what comes after the separator /
to the begining of the line. The string to be added is https://
. But since /
is the separator, we need to escape it. That’s why we prepend \
before each one of the two /
’s. The last /
is another separator and the g
is to perform this transformation at the beginning of each line in our hypothetical text file.
So there is no problem here, this just werks. However it looks really bad and when you type it you have to be very careful not to mess up the /
’s and \
’s. Trust me, you WILL forget one or the other and get frustrated. While using computers one of our goals is to try to minimize the distance between mind and matter as much as possible, struggling mentally with an ugly syntax is the opposite, we fall victims to the whims of the machine. Thankfully, the people who wrote sed
(Jay Fenlason, Tom Lord, Ken Pizzini, Paolo Bonzini, Jim Meyering, and Assaf Gordon for GNU sed
, the one which I am using and talking about here) understand that, so there’s a more elegant way to do this:
The solution:
As we ’ve seen /
is the separator in sed
commands, right? BZZT! Wrong! The separator can be literally ANY character. Yet, from what I’ve seen on scripts and commands that people share, few people seem to know that. Thus we could have the same result with:
echo "gnu.org" | sed 's#^#https://#g'
echo "gnu.org" | sed 'sA^Ahttps://Ag'
echo "gnu.org" | sed 's7^7https://7g'
echo "gnu.org" | sed 's:^:https\://:g'
and so on. Note that in the last example i am using :
as a separator so i escaped the :
in the protocol, just to show that it can work.
Disclaimer:
I am talking about GNU sed
here, the version of sed
that exists in GNU/Linux systems which is not POSIX-compliant. If you ’re using a BSD, OSX, Plan 9 or other version of sed there’s a chance that your mileage may vary.