1. STRIP ALL LEADING BLANKS FROM EVERY LINE OF A FILE
sed 's/^ *//' testfile.1 > testfile.new BEFORE: This is testfile.1 This is a test file with enough amazing variety in it .ce .bf hpss12 so that we can play with the amazing noninteractive stream editor. .ju It has some amazing lines beginning with amazing dots (.) that might represent amazing formatting commands. .bf gothic Here are some amazing lines with lots of amazing extra spaces to play with amazingly . AFTER: This is testfile.1 This is a test file with enough amazing variety in it .ce .bf hpss12 so that we can play with the amazing noninteractive stream editor. .ju It has some amazing lines beginning with amazing dots (.) that might represent amazing formatting commands. .bf gothic Here are some amazing lines with lots of amazing extra spaces to play with amazingly .
2. DELETE EVERY LINE THAT BEGINS WITH A DOT
sed '/^\./d' testfile.1 > testfile.new AFTER: This is testfile.1 This is a test file with enough amazing variety in it so that we can play with the amazing noninteractive stream editor. It has some amazing lines beginning with amazing dots (.) that might represent amazing formatting commands. Here are some amazing lines with lots of amazing extra spaces to play with amazingly .
3. REPLACE ALL STRINGS OF BLANKS BY SINGLE BLANKS.
sed 's/ */ /g' testfile.1 > testfile.new AFTER: This is testfile.1 This is a test file with enough amazing variety in it .ce .bf hpss12 so that we can play with the amazing noninteractive stream editor. .ju It has some amazing lines beginning with amazing dots (.) that might represent amazing formatting commands. .bf gothic Here are some amazing lines with lots of amazing extra spaces to play with amazingly .
4. DO #3 FOR EVERY FILE IN A COLLECTION AND PLACE THE RESULTS IN A SUBDIRECTORY:
mkdir lower for FILE in testfile.* do sed 's/ */ /g' $FILE > lower/$FILE done
grep and egrep regular expressions; sed has same matching capabilities as grep.
decreasing order of precedence
1. c Any non-special character matches itself. Example: grep '^ab' file will find all lines that start with the letters ab. 2. \c Turn off special meaning of character c. Example: grep '\*' file will print all lines with the '*' character in them. Since '*' is a character with special meaning to grep, it is necessary to escape it. 3. ^ Beginning of line Example: sed '/^\/\*/d' file will delete from file all lines that start with the string "/*". Note that since '/' and '*' both have special meaning to sed, they are both escaped. 4. $ End of line Example: sed 's/$/./' file will stick a period at the end of line - even blank lines. 5. . Any single character Example: sed 's/.$/./' file will not stick a period on lines that contain only a carriage return. Unfortunately it also wipes out the last character in the line with a period. For the cure, see \n below. 6. [...] Any single character in the range Example: grep -n '[0-9]*\.[0-9][0-9]*' file will match any fixed point number like 4.566 . Note the escape of '.'. 7. [^...] Any single character not in the range. Example: grep -n 'int *[a-zA-Z_][^=]*(' nim.c should find all functions of type int in the file nim.c 8. \n What the nth \(...\) matched - grep and sed only. Example: grep '\([a-z]\)\1' file will find all doubled lower case letters (like 'tt') in file. 9. r* 0 or more occurences of expression r 10. r+ 1 or more occurrences of expression r (egrep only) 11. r? 0 or 1 occurrences of r (egrep only) 12. r1r2 regular expression 1 followed by regular expression 2 13. r1|r2 regular expression 1 or regular expression 2 (egrep only) 14. \(...\) Tagged regular expression (see \n above) (grep and sed only) 15. (r) Parentheses can be used for grouping (egrep only) Example: egrep -n '(yes|no)$' file will find any "yes" or "no" at the end of a line. How would 'yes|no$' be different?
1. COMMON ERROR - c* means 0 or more c's, not 1 or more. EXAMPLE: sed 's/ */ /g' testfile.1 #One blank instead of two OUTPUT: T h i s i s t e s t f i l e . 1 T h i s i s a t e s t f i l e w i t h e n o u g h a m a z i n g v a r i e t y i n i t . c e . b f h p s s 1 2 s o t h a t w e c a n p l a y w i t h t h e a m a z i n g n o n i n t e r a c t i v e s t r e a m e d i t o r . . j u I t h a s s o m e a m a z i n g l i n e s b e g i n n i n g w i t h a m a z i n g d o t s ( . ) t h a t m i g h t r e p r e s e n t a m a z i n g f o r m a t t i n g c o m m a n d s . . b f g o t h i c H e r e a r e s o m e a m a z i n g l i n e s w i t h l o t s o f a m a z i n g e x t r a s p a c e s t o p l a y w i t h a m a z i n g l y . 2. EXTENT OF THE MATCH The match is always maximal, that is, if string1 and string2 both match, and string1 is a prefix of string2, then any match or replacement affects all of string2. Example: This is testfile.2 It, too, has some lines with leading blanks and some "quoted strings" "of various" kinds and "flavors" "" including "empty" and ""!!! "I'm dying", he croaked. Now we might think that the following command would delete all the quoted strings: sed 's/".*"//g' testfile.2 Here is the result: This is testfile.2 It, too, has some lines with leading blanks and some including !!! , he croaked. For the effect that we want, we should say sed 's/"[^"]*"//g' testfile.2 to limit the extent of the match. Here is the output: This is testfile.2 It, too, has some lines with leading blanks and some kinds and including and !!! , he croaked.
I will try to show useful examples of every element, but I will not try to keep them 'pure'.
EXAMPLE 1. List all files in one of my directories that are readable by all system users. (Illustrates ^ and .) To do this, we first need to look at the output of a typical ls -l. This one lists one of my directories, namely polish: ls -l polish The output is: total 102 -rw------- 1 rossa group 697 Sep 17 07:53 func.ex -rw-r--r-- 18 rossa group 1424 Sep 16 13:00 getop.c -rw------- 1 rossa group 1204 Sep 17 07:26 getop.o -rw-r--r-- 17 rossa group 296 Sep 17 07:13 makefile -rwx------ 1 rossa group 35711 Sep 17 07:27 polish -rw-r--r-- 1 rossa group 2807 Sep 16 13:06 polish.c -rw-r--r-- 18 rossa group 137 Sep 17 07:26 polish.h -rw------- 1 rossa group 1540 Sep 17 07:26 polish.o -rw-r--r-- 18 rossa group 877 Sep 17 07:27 pushpop.c -rw------- 1 rossa group 1112 Sep 17 07:27 pushpop.o We are looking for 'r' in the 8th character position, so we just say ls -l polish | grep '^.......r' We anchor at the far left of the line, count 7 any characters, and then ask for an 'r'. Here is the output: -rw-r--r-- 18 rossa group 1424 Sep 16 13:00 getop.c -rw-r--r-- 17 rossa group 296 Sep 17 07:13 makefile -rw-r--r-- 1 rossa group 2807 Sep 16 13:06 polish.c -rw-r--r-- 18 rossa group 137 Sep 17 07:26 polish.h -rw-r--r-- 18 rossa group 877 Sep 17 07:27 pushpop.c EXAMPLE 2. Well, it's somebody named John ... (Illustrates [^...] and *) I ran the following command on quapaw: grep '[^:]*:[^:]*:[^:]*:[^:]*:[^:]*John[^:]*' /etc/passwd and here is the result: jroberts:9pagMBuiO.pZ6:281:79:Johnnie C. Roberts:/usr/facstaff/jroberts:/bin/csh mg96fn:39AzGSMvUeHjU:310:99:Sheets, John R.:/usr/students/mg96fn:/bin/csh jme:rlOoe8FXdD4LE:332:144:John M Enger:/usr/facstaff/jme:/bin/csh twzcb4:oosOkbghaM1Jc:359:99:Johnson Patrick W:/usr/students/twzcb4:/bin/csh smj:H2gk.4YzqZnf.:375:149:Stephen M Johnson:/usr/facstaff/smj:/bin/csh jherb:YvNfFL9F2JqiE:434:99:John W Herbold Jr:/usr/students/jherb:/bin/csh REMARKS: - This is overkill - grep John /etc/passwd would usually give the same results. - However, we have shown an example of matching in a particular subfield. We skip to the correct field by repeating a 'no colons, then colon' construct. - But, even so, this is still inelegant; an awk solution is better: awk -F: '$5 ~ /John/ {print $1 ": " $5}' /etc/passwd produces the following: jroberts: Johnnie C. Roberts mg96fn: Sheets, John R. jme: John M Enger twzcb4: Johnson Patrick W smj: Stephen M Johnson jherb: John W Herbold Jr There is still a regular expression in the mix, but it is simply /John/ . EXAMPLE 3. Get rid of all blank lines in a file. (Illustrates ^, $, [...] and *) grep -v '^[ ]*$' file The elements of the command are as follows: - The -v option tells grep to print only lines that do not match. - ^ anchors to the beginning of the line. - [ ] has a blank and a tab in it. - *, so match any number of blanks and tabs and - $, we must be at the end of the line.
A. THE PROBLEM
Sometimes you create numbered lists, but later you need to insert a new item somewhere in the middle of the list. In the example below, we have left out William Henry (Harrison), who should be number 9. The problem is to increment the number of every item > 8. Here is the file:
This is testawk.1; it contains some numbered items. Presidents: 1. George 2. John 3. Thomas 4. James 5. James 6. John Quincy 7. Andrew 8. Martin 9. John 10. James K 11. Zachary 12. Millard 13. Franklin 14. James 15. Abraham Oops! We left out William Henry!
B. THE SOLUTION
The command
sed 's/[0-9][0-9]*/:&:/' testawk.1 | awk -F: '$2 ~ /[0-9][0-9]*/ && $2 > 8 { $2 = $2 + 1 } \ { print $1 $2 $3 }' >> talk.5added the following result to this file (talk.5):
This is testawk.1; it contains some numbered items. Presidents 1. George 2. John 3. Thomas 4. James 5. James 6. John Quincy 7. Andrew 8. Martin 10. John 11. James K 12. Zachary 13. Millard 14. Franklin 15. James 16. Abraham Oops! We left out William Henry!
C. THE sed COMMAND
sed 's/[0-9][0-9]*/:&:/' testawk.1 - finds the FIRST occurence of a number in each line, matching the pattern [0-9][0-9]* ; - substitutes by surrounding the entire pattern that was matched (this is what the & means) with colons. The purpose is to fence the number off in what awk will believe is its own subfield. The output of the sed command by itself is shown below: This is testawk.:1:; it contains some numbered items. Presidents: :1:. George :2:. John :3:. Thomas :4:. James :5:. James :6:. John Quincy :7:. Andrew :8:. Martin :9:. John :10:. James K :11:. Zachary :12:. Millard :13:. Franklin :14:. James :15:. AbrahamYou should notice the potential for mistakes implied by the colons in the first line. IT IS POSSIBLE, HOWEVER, TO RESTRICT THE CHANGES TO AN ARBITRARY RANGE OF LINES.
D. THE awk COMMAND
awk -F: '$2 ~ /[0-9][0-9]*/ && $2 > 8 { $2 = $2 + 1 } \ { print $1 $2 $3 }'Awk is a pattern-action language. An awk program is a sequence of lines of the form
$2 ~ /[0-9][0-9]*/ && $2 > 8 { $2 = $2 + 1 } { print $1 $2 $3 }For each line of its input, awk runs through the patterns in sequence. If it matches the pattern, it performs the action. Thus in plain English, the program above says
If the second field is a number and it's > 8, then increment the second field. Print the first, second and third fields of every line.
The \ at the end of the first line is immediately followed by a carriage return. In other words, it indicates an escaped return. This indicates to the awk command that there is more to come...
How does awk know where the fields begin and end? We provided, as a field separator, the character ':', and we told awk about it by using the -F option.
E. AUTOMATING THE COMMAND
If this happens a lot to us, we would want to place a general form of the complex command in an executable file. Here is the file (awk.1)
sed 's/[0-9][0-9]*/:&:/' $1 | awk -F: 'BEGIN { }\ $2 ~ /[0-9][0-9]*/ && $2 > LOW { $2 = $2 + 1} \ {print $1 $2 $3 }' LOW=$2and here is the result of executing
awk.1 testawk.1 5 This is testawk.1; it contains some numbered items. Presidents 1. George 2. John 3. Thomas 4. James 5. James 7. John Quincy 8. Andrew 9. Martin 10. John 11. James K 12. Zachary 13. Millard 14. Franklin 15. James 16. Abraham Oops! We left out William Henry!
The BEGIN pattern is there for an awk technical reason (later!). The LOW= on the command sets a variable named LOW to the second argument of the awk.1 command, and then LOW is available for use within the awk program - but - (here is the technical reason) only after the first line of the awk script has been read.
THE PROBLEM
Here is a file (testfile.4):
testfile.4 Is this the Isis that is? Does is come before was? What is the is-ness of isinglass? We say "is" when we mean is?
Now, I would like to change each occurrence of the word "is" to "was". Of course, I do not want to change Isis to Waswas. So a first effort misses in several ways:
sed 's/is/was/g' testfile.4 testfile.4 Is thwas the Iswas that was? Does was come before was? What was the was-ness of wasinglass? We say "was" when we mean was?
CATCH THE WORD "is", BUT NOT WORDS THAT "is" IS A SUBSTRING OF
We need to fence off the word "is" in our patterns, so we might try
sed 's/[^A-Za-z]is[^A-Za-z]/was/g' testfile.4where we are looking for characters that might precede and follow "is". The results are not satisfactory:
testfile.4 Is this the Isis thatwas Does is come before was? Whatwasthewasness of isinglass? We say was when we meanwasThe trouble is that we did not remember the characters that preceded and followed "is", so we correct the command to
sed 's/\([^A-Za-z]\)is\([^A-Za-z]\)/\1was\2/g' testfile.4 testfile.4 Is this the Isis that was? Does is come before was? What was the was-ness of isinglass? We say "was" when we mean was?
BEGINNINGS AND ENDINGS
Our command will not catch the word "is" at either the beginning or the ending of a line, because there are no characters there to match part of the pattern. The easiest way to fix this is to use pipes to add blanks at the beginning and end of each line, then do the substitution, and then strip the beginning and ending blanks.
sed 's/^/ /' testfile.4 | sed 's/$/ /' | sed 's/\([^A-Za-z]\)is\([^A-Za-z]\)/\1was\2/g' | sed 's/^ //'| sed 's/ $//' testfile.4 Is this the Isis that was? Does was come before was? What was the was-ness of isinglass? We say "was" when we mean was?
TWO FORMS OF THE WORD
It should now be clear that to catch the form "Is", we need only add one more pipe:
sed 's/^/ /' testfile.4 | sed 's/$/ /' | sed 's/\([^A-Za-z]\)is\([^A-Za-z]\)/\1was\2/g' | sed 's/\([^A-Za-z]\)Is\([^A-Za-z]\)/\1Was\2/g' | sed 's/^ //'| sed 's/ $//'
PATTERNS AND vi
Probably it is easier to do this kind of thing from vi, which also knows about patterns. While in vi, try something like
:g/^/s/\([^a-z]\)is\([^a-z]\)/\1was\2/g This does a global command g on every line /^/ matching and replacing as in sed.In fact, vi possesses a special way to force matches of words. \< forces the match to begin only at the beginning of a word, that is, on the beginning of a line, or on a letter, digit or underline character, and after a character not one of those. So the following, within vi, is certainly better than the long pipe above:
:g/^/s/\<is\>/was/g
sed COMMAND LINE SYNTAX
The syntax for invoking sed has the following two forms:
sed [options] 'command' files sed [options] -f scriptfile files
THE -n OPTION
With the -n option, only lines specified with the p command or p flag of an s command are output. Here is an example:
# Determine the number of occurences of the string 'pipe' in file pipe2.c : sed -n 's/pipe/pipe\ /gp' pipe2.c | sed -n '/pipe/p' | wc -l # In the first line, the \ is an escape for the newline character, that is, # the \ must be the last character on the line.THE -f OPTION
The argument following -f is a script file, where sed will find a sequence of editing commands. All of the editing commands in a script are applied in sequence to each line of input. Addresses embedded in the commands may restrict the lines to which a command is applied.
Example: the following script will
(1) Remove all leading blanks from each line
(2) Remove all completely blank lines
(3) Reduce strings of multiple blanks and white space to single blanks.s/^[ ]*// /^$/d s/[ ][ ]*/ /gEach bracket contains a blank and a tab character.Example: The following script will change strings representing integers so that commas are inserted in the expected places.
:loop s/\([0-9]\)\([0-9][0-9][0-9][^0-9]\)/\1,\2/g s/\([0-9]\)\([0-9][0-9][0-9]\)$/\1,\2/ /[0-9][0-9][0-9][0-9]/b loopNotes:
(1) If this script resides in a file commize, and we wish to edit file letter, the invocation would besed -f commize letter
(2) :loop is a label, which can be branched to, using the b command. Sed is a programming language.
(3) The logic is as follows. If there are four digits in a row, followed by either a nondigit or end-of-line, then insert a comma after the first digit. If there are still any sequences of 4 digits on the line, branch back to :loop. Note that the reason for the loop is that commas must be inserted from right to left.
(4) The next line is not read until we reach the end of the script. The substitutions continue on the pattern space, which is the current line as altered by previous sed commands.Exercise: If the script is
s/dog/cat/g s/cat/horse/gwill there be any cats in the output?GENERAL SYNTAX OF sed COMMANDS
[address][,address]commandADDRESSES
An address may be either a line number, a pattern, or $. Line numbers refer to the whole of the input, not to individual files. That is, if you are editing a number of files at one sed command, there is only one line 1.
Example: The following script leaves lines 1-5 alone, but deletes leading blanks on all remaining lines:
6,$ s/^[ ]*//Example: Let us consider the effect of two variant commands on a file. Here is the file:
testfile.4 Is this the Isis that is? Does was come before was? What is the is-ness of isinglass? We say "is" when we mean is?First, here is the output of the commandsed -n '2,3 p' testfile.4 Is this the Isis that is? Does was come before was?You will note the effect of the -n option, which causes only matching lines to be printed. Now we show the output ofsed '2,3 p' testfile.4 testfile.4 Is this the Isis that is? Does Is this the Isis that is? Does was come before was? was come before was? What is the is-ness of isinglass? We say "is" when we mean is?Example: Here is another file
testfile.6 go 1 2 3 stop 4 5 go 6 7 8 9 stop 10 11 go 12 13And here is the effect of the commandsed '/go/,/stop/ d' testfile.6 testfile.6 4 5 10 11Thus any range of lines can be selected either by the line numbers or by pattern matching.
Example: Delete up to the first blank line (e.g. get rid of the mail header in a message saved in a file):
sed '1,/^$/ d' fileREVERSING THE SENSE OF A MATCH
An exclamation mark following an address reverses the sense of a match: Here is the output from sed '/go/,/stop/!d' testfile.6
go 1 2 3 stop go 6 7 8 9 stop go 12 13Exercise: Indent all lines that do not begin with a dot.
sed COMMANDS
SUBSTITUTION
The command syntax is
[address]s/pattern/replacement/flagswhere the flags that modify the substitution arennn a number from 1 to 512 indicates that only the nnnth instance of the pattern should be replaced. g change all occurrences in the pattern space p print the contents of the pattern space w file write the contents of the pattern space to the indicated file.Example: Replace the second tab character on each line with a newline:
sed 's/ /\ /2' fileHere the / / contain an embedded tab character, and the newline must be escaped.Example: Here is the file sed.2
/is/ { w changes s/is/boo/w changes } s/.*://Here is an input file:
testfile.4 Is this the Isis that is? Does was come before was? What is the is-ness of isinglass? We say "is" when we mean is?We now do the command
grep -n $ testfile.4 | sed -f sed.2The command changes the first is on any line:testfile.4 Is thboo the Isis that is? Does was come before was? What boo the is-ness of isinglass? We say "boo" when we mean is?The file changes is very interesting, for it contains the before and after states of each changed line:
2:Is this the Isis that is? Does 2:Is thboo the Isis that is? Does 4:What is the is-ness of isinglass? 4:What boo the is-ness of isinglass? 5:We say "is" when we mean is? 5:We say "boo" when we mean is?Now here are some details. First, the grep command just adds line numbers to each line of the input file:grep -n $ testfile.4 1:testfile.4 2:Is this the Isis that is? Does 3:was come before was? 4:What is the is-ness of isinglass? 5:We say "is" when we mean is?Then the commands in sed.2 act like this. The braces group commands, so that for any line matching /is/, all the commands in braces will be executed. So we write out the before state of each matching line, and then make the substitution and write the after state. Finally, for all lines, we strip off the line numbers added by grep.
REMARK. Of course, it's easier to use the diff utility to compare files after an editing session. If we do
sed 's/is/boo' testfile.4 > testfile.4athen the commanddiff testfile.4 testfile.4aproduces a set of actions that would transform the first file into the second. Here is the output of the diff command:2c2 < Is this the Isis that is? Does --- > Is thboo the Isis that is? Does 4,5c4,5 < What is the is-ness of isinglass? < We say "is" when we mean is? --- > What boo the is-ness of isinglass? > We say "boo" when we mean is?
OTHER sed FEATURES
APPEND AND INSERT
[line address]a\ text line 1\ text line 2\ text line 3\ ... text line nwill append the text lines into the pattern space after the addressed line. Example: double spacingsed 'a\ ' testfile.4produces this output:testfile.4 Is this the Isis that is? Does was come before was? What is the is-ness of isinglass? We say "is" when we mean is?Insertion (the i command) is similar and places the text before the addressed line in the pattern space.
CHANGE
[address]c\ textreplaces the addressed line range by the supplied text.TRANSFORM (y)
[address]y/ab..c/xy..z/replaces character a by character x, character b by character y, ... , character c by character z. Alas, y has no knowledge of character ranges, so conversion from lower to upper case is accomplished with two strings of 26 characters.PRINT (p)
Outputs the contents of the pattern space.
PRINT LINE NUMBER
[line address] =prints the line number of the matching line.NEXT (n)
[address]noutputs the pattern space and then reads the next line of input without returning to the top of the sctipt.Example: Delete one blank line following any line that begins with
Roman-numeral.but delete no other blank lines in the file. The script is:/^[IVXLCMD][IVXLCMD]*./{ n /^$/d }READING AND WRITING FILES
[line-address]r filereads the contents of file into the pattern space after the addressed line.
[address]w filewrites the pattern space to file.Example: Replace the line
throughout a file by the contents of a file common.for - the script is /^/r common.for /^ /d Example: Suppose a data file contains names of students and grades, like this:
Apt Abner:100 Bright Betty:98 Churl Charles:45 Dull Dora:67We extract A's, B's, C's, D's and F's to separate files. The script is/[^:]*:100/w grade.a /[^:]*:9[0-9]/w grade.a /[^:]*:8[0-9]/w grade.b /[^:]*:7[0-9]/w grade.c /[^:]*:6[0-9]/w grade.d /[^:]*:[^6-9][0-9]$/w grade.f /[^:]*:[0-9]$/w grade.fQUIT (q)
[line-address]qwill terminate the script when a matching line is reached.Example: Copy a file down to Roman numeral XIII :
sed '/XIII/q' fileThis is a little tricky, because the line with XIII in it will be printed before sed quits. If you don't want it, you need to do a little more work:sed -n '/XIII/q p' file