Regular expressions and commands that use them

SOME ILLUSTRATIVE USES OF sed

1. STRIP ALL LEADING BLANKS FROM EVERY LINE OF A FILE

   sed  's/^ *//' testfile.1 > testfile.new

BEFORE:

This is testfile.1
      This is a test file with enough amazing variety in it
.ce
.bf hpss12
     so that we can play with the amazing noninteractive 
                        stream editor.
.ju
 It has some amazing lines beginning with amazing dots (.) that might represent
       amazing formatting commands.
.bf gothic
   Here       are some     amazing lines
     with    lots    of amazing   extra     spaces
      to   play   with  amazingly       .


AFTER:


This is testfile.1
This is a test file with enough amazing variety in it
.ce
.bf hpss12
so that we can play with the amazing noninteractive 
stream editor.
.ju
It has some amazing lines beginning with amazing dots (.) that might represent
amazing formatting commands.
.bf gothic
Here       are some     amazing lines
with    lots    of amazing   extra     spaces
to   play   with  amazingly       .

2. DELETE EVERY LINE THAT BEGINS WITH A DOT

sed '/^\./d' testfile.1 > testfile.new

AFTER:


This is testfile.1
      This is a test file with enough amazing variety in it
     so that we can play with the amazing noninteractive 
                        stream editor.
 It has some amazing lines beginning with amazing dots (.) that might represent
       amazing formatting commands.
   Here       are some     amazing lines
     with    lots    of amazing   extra     spaces
      to   play   with  amazingly       .

3. REPLACE ALL STRINGS OF BLANKS BY SINGLE BLANKS.

sed 's/  */ /g' testfile.1 > testfile.new

AFTER:
 
This is testfile.1
 This is a test file with enough amazing variety in it
.ce
.bf hpss12
 so that we can play with the amazing noninteractive 
 stream editor.
.ju
 It has some amazing lines beginning with amazing dots (.) that might represent
 amazing formatting commands.
.bf gothic
 Here are some amazing lines
 with lots of amazing extra spaces
 to play with amazingly .

4. DO #3 FOR EVERY FILE IN A COLLECTION AND PLACE THE RESULTS IN A SUBDIRECTORY:

mkdir lower
for FILE in testfile.*
do
sed 's/  */ /g' $FILE > lower/$FILE
done

REGULAR EXPRESSIONS

grep and egrep regular expressions; sed has same matching capabilities as grep.
decreasing order of precedence

1. c           Any non-special character matches itself.

Example: grep '^ab' file
will find all lines that start with the letters ab.

2. \c         Turn off special meaning of character c.

Example: grep '\*' file
will print all lines with the '*' character in them. Since '*' is a character
with special meaning to grep, it is necessary to escape it.

3. ^         Beginning of line

Example: 
sed '/^\/\*/d' file
will delete from file all lines that start with the string "/*". Note that
since '/' and '*' both have special meaning to sed, they are both escaped.

4. $         End of line

Example:
sed 's/$/./' file
will stick a period at the end of line - even blank lines.

5. .         Any single character

Example:
sed 's/.$/./' file
will not stick a period on lines that contain only a carriage return.
Unfortunately it also wipes out the last character in the line with a period.
For the cure, see \n below. 

6. [...]    Any single character in the range

Example:
grep -n '[0-9]*\.[0-9][0-9]*' file
will match any fixed point number like 4.566 .
Note the escape of '.'.

7. [^...]   Any single character not in the range.

Example:
grep -n 'int  *[a-zA-Z_][^=]*(' nim.c
should find all functions of type int in the file nim.c

8. \n      What the nth \(...\) matched  - grep and sed only.
Example: grep '\([a-z]\)\1' file
will find all doubled lower case letters (like 'tt') in file.

9. r*     0 or more occurences of expression r

10. r+    1 or more occurrences of expression r (egrep only)

11. r?    0 or 1 occurrences of r (egrep only)

12. r1r2  regular expression 1 followed by regular expression 2

13. r1|r2  regular expression 1 or regular expression 2 (egrep only)

14. \(...\) Tagged regular expression (see \n above) (grep and sed only)

15. (r)     Parentheses can be used for grouping (egrep only)

Example:
egrep -n '(yes|no)$' file
will find any "yes" or "no" at the end of a line. How would
'yes|no$' be different?

REGULAR EXPRESSION REMARKS

1. COMMON ERROR   -  c* means 0 or more c's, not 1 or more.

EXAMPLE:

sed 's/ */ /g' testfile.1       #One blank instead of two

OUTPUT:


 T h i s i s t e s t f i l e . 1 
 T h i s i s a t e s t f i l e w i t h e n o u g h a m a z i n g v a r i e t y i n i t 
 . c e 
 . b f h p s s 1 2 
 s o t h a t w e c a n p l a y w i t h t h e a m a z i n g n o n i n t e r a c t i v e 
 s t r e a m e d i t o r . 
 . j u 
 I t h a s s o m e a m a z i n g l i n e s b e g i n n i n g w i t h a m a z i n g d o t s ( . ) t h a t m i g h t r e p r 
e s e n t 
 a m a z i n g f o r m a t t i n g c o m m a n d s . 
 . b f g o t h i c 
 H e r e a r e s o m e a m a z i n g l i n e s 
 w i t h l o t s o f a m a z i n g e x t r a s p a c e s 
 t o p l a y w i t h a m a z i n g l y . 
 
2.  EXTENT OF THE MATCH
    
    The match is always maximal, that is, if string1 and string2 both match,
and string1 is a prefix of string2, then any match or replacement affects
all of string2.
   Example: 

This is testfile.2
It, too, has some 
      lines with leading blanks and some "quoted strings"
"of various"  kinds and "flavors"
""
including "empty" and  ""!!!
"I'm dying", he croaked.

Now we might think that the following command would delete all the quoted 
strings:

sed 's/".*"//g' testfile.2

Here is the result:

This is testfile.2
It, too, has some 
      lines with leading blanks and some 


including !!!
, he croaked.

For the effect that we want, we should say

sed 's/"[^"]*"//g' testfile.2

to limit the extent of the match. Here is the output:


This is testfile.2
It, too, has some 
      lines with leading blanks and some 
  kinds and 

including  and  !!!
, he croaked.

REGULAR EXPRESSION EXAMPLES

I will try to show useful examples of every element, but I will not try to keep them 'pure'.

EXAMPLE 1. List all files in one of my directories that are readable by all 
system users. (Illustrates ^ and .)

To do this, we first need to look at the output of a typical ls -l. This one
lists one of my directories, namely polish:

ls -l polish

The output is:

total 102
-rw-------   1 rossa    group        697 Sep 17 07:53 func.ex
-rw-r--r--  18 rossa    group       1424 Sep 16 13:00 getop.c
-rw-------   1 rossa    group       1204 Sep 17 07:26 getop.o
-rw-r--r--  17 rossa    group        296 Sep 17 07:13 makefile
-rwx------   1 rossa    group      35711 Sep 17 07:27 polish
-rw-r--r--   1 rossa    group       2807 Sep 16 13:06 polish.c
-rw-r--r--  18 rossa    group        137 Sep 17 07:26 polish.h
-rw-------   1 rossa    group       1540 Sep 17 07:26 polish.o
-rw-r--r--  18 rossa    group        877 Sep 17 07:27 pushpop.c
-rw-------   1 rossa    group       1112 Sep 17 07:27 pushpop.o

We are looking for 'r' in the 8th character position, so we just say

ls -l polish | grep '^.......r'

We anchor at the far left of the line, count 7 any characters, and then ask 
for an 'r'. Here is the output:

-rw-r--r--  18 rossa    group       1424 Sep 16 13:00 getop.c
-rw-r--r--  17 rossa    group        296 Sep 17 07:13 makefile
-rw-r--r--   1 rossa    group       2807 Sep 16 13:06 polish.c
-rw-r--r--  18 rossa    group        137 Sep 17 07:26 polish.h
-rw-r--r--  18 rossa    group        877 Sep 17 07:27 pushpop.c
 
EXAMPLE 2. Well, it's somebody named John ...
(Illustrates [^...] and *)

I ran the following command on quapaw:

grep '[^:]*:[^:]*:[^:]*:[^:]*:[^:]*John[^:]*' /etc/passwd

and here is the result:


jroberts:9pagMBuiO.pZ6:281:79:Johnnie C. Roberts:/usr/facstaff/jroberts:/bin/csh
mg96fn:39AzGSMvUeHjU:310:99:Sheets, John R.:/usr/students/mg96fn:/bin/csh
jme:rlOoe8FXdD4LE:332:144:John M Enger:/usr/facstaff/jme:/bin/csh
twzcb4:oosOkbghaM1Jc:359:99:Johnson Patrick W:/usr/students/twzcb4:/bin/csh
smj:H2gk.4YzqZnf.:375:149:Stephen M Johnson:/usr/facstaff/smj:/bin/csh
jherb:YvNfFL9F2JqiE:434:99:John W Herbold Jr:/usr/students/jherb:/bin/csh

REMARKS:
   - This is overkill - grep John /etc/passwd would usually give the same 
     results. 
   - However, we have shown an example of matching in a particular subfield.
     We skip to the correct field by repeating a 'no colons, then colon'
     construct.
   - But, even so, this is still inelegant; an awk solution is better:
     
     awk -F: '$5 ~ /John/ {print $1 ": " $5}' /etc/passwd

     produces the following:

jroberts: Johnnie C. Roberts
mg96fn: Sheets, John R.
jme: John M Enger
twzcb4: Johnson Patrick W
smj: Stephen M Johnson
jherb: John W Herbold Jr

     There is still a regular expression in the mix, but it is simply /John/ .

EXAMPLE 3. Get rid of all blank lines in a file.
(Illustrates ^, $, [...] and *)

grep -v '^[ 	]*$' file

The elements of the command are as follows:
   -   The -v option tells grep to print only lines that do not match.
   -   ^ anchors to the beginning of the line.
   -   [ 	] has a blank and a tab in it.
   -   *, so match any number of blanks and tabs and
   -   $, we must be at the end of the line.

AN EXAMPLE SHOWING THE POWER OF sed AND awk.

A. THE PROBLEM

Sometimes you create numbered lists, but later you need to insert a new item somewhere in the middle of the list. In the example below, we have left out William Henry (Harrison), who should be number 9. The problem is to increment the number of every item > 8. Here is the file:

This is testawk.1; it contains some numbered items.

Presidents:

   1. George
   2. John
   3. Thomas
   4. James
   5. James
   6. John Quincy
   7. Andrew
   8. Martin
   9. John
   10. James K
   11. Zachary
   12. Millard
   13. Franklin
   14. James
   15. Abraham


Oops! We left out William Henry!

B. THE SOLUTION

The command

sed 's/[0-9][0-9]*/:&:/' testawk.1 |
awk -F: '$2 ~ /[0-9][0-9]*/ && $2 > 8 { $2 = $2 + 1 } \
   { print $1 $2 $3 }'  >> talk.5
added the following result to this file (talk.5):
This is testawk.1; it contains some numbered items.

Presidents

   1. George
   2. John
   3. Thomas
   4. James
   5. James
   6. John Quincy
   7. Andrew
   8. Martin
   10. John
   11. James K
   12. Zachary
   13. Millard
   14. Franklin
   15. James
   16. Abraham


Oops! We left out William Henry!

C. THE sed COMMAND

sed 's/[0-9][0-9]*/:&:/' testawk.1 

   - finds the FIRST occurence of a number in each line, matching
     the pattern [0-9][0-9]* ;
   
   - substitutes by surrounding the entire pattern that was matched 
     (this is what the & means) with colons.
 
The purpose is to fence the number off in what awk will believe is its
own subfield.

The output of the sed command by itself is shown below:


This is testawk.:1:; it contains some numbered items.

Presidents:

   :1:. George
   :2:. John
   :3:. Thomas
   :4:. James
   :5:. James
   :6:. John Quincy
   :7:. Andrew
   :8:. Martin
   :9:. John
   :10:. James K
   :11:. Zachary
   :12:. Millard
   :13:. Franklin
   :14:. James
   :15:. Abraham

You should notice the potential for mistakes implied by the colons in the first line. IT IS POSSIBLE, HOWEVER, TO RESTRICT THE CHANGES TO AN ARBITRARY RANGE OF LINES.

D. THE awk COMMAND

awk -F: '$2 ~ /[0-9][0-9]*/ && $2 > 8 { $2 = $2 + 1 } \
   { print $1 $2 $3 }'
Awk is a pattern-action language. An awk program is a sequence of lines of the form
pattern { actions }
so that we could indeed place the program part of the above in a file:
$2 ~ /[0-9][0-9]*/ && $2 > 8 { $2 = $2 + 1 }
   { print $1 $2 $3 }
For each line of its input, awk runs through the patterns in sequence. If it matches the pattern, it performs the action. Thus in plain English, the program above says

If the second field is a number and it's > 8, then increment the second field. Print the first, second and third fields of every line.

The \ at the end of the first line is immediately followed by a carriage return. In other words, it indicates an escaped return. This indicates to the awk command that there is more to come...

How does awk know where the fields begin and end? We provided, as a field separator, the character ':', and we told awk about it by using the -F option.

E. AUTOMATING THE COMMAND

If this happens a lot to us, we would want to place a general form of the complex command in an executable file. Here is the file (awk.1)

sed 's/[0-9][0-9]*/:&:/' $1 |
awk -F: 'BEGIN  { }\
$2 ~ /[0-9][0-9]*/ && $2 > LOW { $2 = $2 + 1} \
   {print $1 $2 $3 }' LOW=$2
and here is the result of executing
awk.1 testawk.1 5


This is testawk.1; it contains some numbered items.

Presidents

   1. George
   2. John
   3. Thomas
   4. James
   5. James
   7. John Quincy
   8. Andrew
   9. Martin
   10. John
   11. James K
   12. Zachary
   13. Millard
   14. Franklin
   15. James
   16. Abraham


Oops! We left out William Henry!

The BEGIN pattern is there for an awk technical reason (later!). The LOW= on the command sets a variable named LOW to the second argument of the awk.1 command, and then LOW is available for use within the awk program - but - (here is the technical reason) only after the first line of the awk script has been read.

AN sed EXAMPLE

THE PROBLEM

Here is a file (testfile.4):


testfile.4
Is this the Isis that is? Does
is come before was? 
What is the is-ness of isinglass?
We say "is" when we mean is?

Now, I would like to change each occurrence of the word "is" to "was". Of course, I do not want to change Isis to Waswas. So a first effort misses in several ways:


sed 's/is/was/g' testfile.4


testfile.4
Is thwas the Iswas that was? Does
was come before was? 
What was the was-ness of wasinglass?
We say "was" when we mean was?

CATCH THE WORD "is", BUT NOT WORDS THAT "is" IS A SUBSTRING OF
We need to fence off the word "is" in our patterns, so we might try

sed 's/[^A-Za-z]is[^A-Za-z]/was/g' testfile.4
where we are looking for characters that might precede and follow "is". The results are not satisfactory:

testfile.4
Is this the Isis thatwas Does
is come before was? 
Whatwasthewasness of isinglass?
We say was when we meanwas
The trouble is that we did not remember the characters that preceded and followed "is", so we correct the command to
sed 's/\([^A-Za-z]\)is\([^A-Za-z]\)/\1was\2/g' testfile.4


testfile.4
Is this the Isis that was? Does
is come before was? 
What was the was-ness of isinglass?
We say "was" when we mean was?

BEGINNINGS AND ENDINGS

Our command will not catch the word "is" at either the beginning or the ending of a line, because there are no characters there to match part of the pattern. The easiest way to fix this is to use pipes to add blanks at the beginning and end of each line, then do the substitution, and then strip the beginning and ending blanks.

sed 's/^/ /' testfile.4 |
sed 's/$/ /' |
sed 's/\([^A-Za-z]\)is\([^A-Za-z]\)/\1was\2/g' |
sed 's/^ //'|
sed 's/ $//'


testfile.4
Is this the Isis that was? Does
was come before was? 
What was the was-ness of isinglass?
We say "was" when we mean was?

TWO FORMS OF THE WORD

It should now be clear that to catch the form "Is", we need only add one more pipe:


sed 's/^/ /' testfile.4 |
sed 's/$/ /' |
sed 's/\([^A-Za-z]\)is\([^A-Za-z]\)/\1was\2/g' |
sed 's/\([^A-Za-z]\)Is\([^A-Za-z]\)/\1Was\2/g' |
sed 's/^ //'|
sed 's/ $//'

PATTERNS AND vi

Probably it is easier to do this kind of thing from vi, which also knows about patterns. While in vi, try something like

:g/^/s/\([^a-z]\)is\([^a-z]\)/\1was\2/g

This does a global command                  g
on every line                               /^/
matching and replacing as in sed.           

In fact, vi possesses a special way to force matches of words. \< forces the match to begin only at the beginning of a word, that is, on the beginning of a line, or on a letter, digit or underline character, and after a character not one of those. So the following, within vi, is certainly better than the long pipe above:
:g/^/s/\<is\>/was/g

sed GENERALITIES

sed COMMAND LINE SYNTAX

The syntax for invoking sed has the following two forms:

sed [options] 'command' files
sed [options] -f scriptfile files

THE -n OPTION

With the -n option, only lines specified with the p command or p flag of an s command are output. Here is an example:

# Determine the number of occurences of the string 'pipe' in file pipe2.c :

sed -n 's/pipe/pipe\
/gp' pipe2.c | sed -n '/pipe/p' | wc -l

# In the first line, the \ is an escape for the newline character, that is,
# the \ must be the last character on the line.

THE -f OPTION

The argument following -f is a script file, where sed will find a sequence of editing commands. All of the editing commands in a script are applied in sequence to each line of input. Addresses embedded in the commands may restrict the lines to which a command is applied.

Example: the following script will
(1) Remove all leading blanks from each line
(2) Remove all completely blank lines
(3) Reduce strings of multiple blanks and white space to single blanks.

s/^[ 	]*//
/^$/d
s/[ 	][ 	]*/ /g
Each bracket contains a blank and a tab character.

Example: The following script will change strings representing integers so that commas are inserted in the expected places.


:loop
s/\([0-9]\)\([0-9][0-9][0-9][^0-9]\)/\1,\2/g
s/\([0-9]\)\([0-9][0-9][0-9]\)$/\1,\2/
/[0-9][0-9][0-9][0-9]/b loop

Notes:
(1) If this script resides in a file commize, and we wish to edit file letter, the invocation would be

sed -f commize letter

(2) :loop is a label, which can be branched to, using the b command. Sed is a programming language.
(3) The logic is as follows. If there are four digits in a row, followed by either a nondigit or end-of-line, then insert a comma after the first digit. If there are still any sequences of 4 digits on the line, branch back to :loop. Note that the reason for the loop is that commas must be inserted from right to left.
(4) The next line is not read until we reach the end of the script. The substitutions continue on the pattern space, which is the current line as altered by previous sed commands.

Exercise: If the script is

s/dog/cat/g
s/cat/horse/g
will there be any cats in the output?

GENERAL SYNTAX OF sed COMMANDS

[address][,address]command

ADDRESSES

An address may be either a line number, a pattern, or $. Line numbers refer to the whole of the input, not to individual files. That is, if you are editing a number of files at one sed command, there is only one line 1.

Example: The following script leaves lines 1-5 alone, but deletes leading blanks on all remaining lines:

6,$ s/^[ 	]*//

Example: Let us consider the effect of two variant commands on a file. Here is the file:

testfile.4
Is this the Isis that is? Does
was come before was? 
What is the is-ness of isinglass?
We say "is" when we mean is?
First, here is the output of the command
sed -n '2,3 p' testfile.4


Is this the Isis that is? Does
was come before was? 

You will note the effect of the -n option, which causes only matching lines to be printed. Now we show the output of
sed '2,3 p' testfile.4


testfile.4
Is this the Isis that is? Does
Is this the Isis that is? Does
was come before was? 
was come before was? 
What is the is-ness of isinglass?
We say "is" when we mean is?

Example: Here is another file

testfile.6
go
1
2
3
stop
4
5
go
6
7
8
9
stop
10
11
go
12
13
And here is the effect of the command
sed '/go/,/stop/ d' testfile.6

testfile.6
4
5
10
11

Thus any range of lines can be selected either by the line numbers or by pattern matching.

Example: Delete up to the first blank line (e.g. get rid of the mail header in a message saved in a file):

sed '1,/^$/ d' file

REVERSING THE SENSE OF A MATCH

An exclamation mark following an address reverses the sense of a match: Here is the output from sed '/go/,/stop/!d' testfile.6


go
1
2
3
stop
go
6
7
8
9
stop
go
12
13

Exercise: Indent all lines that do not begin with a dot.

sed COMMANDS

SUBSTITUTION

The command syntax is

[address]s/pattern/replacement/flags
where the flags that modify the substitution are
nnn	a number from 1 to 512 indicates that only the nnnth instance
        of the pattern should be replaced.

g 	change all occurrences in the pattern space

p 	print the contents of the pattern space

w file	write the contents of the pattern space to the indicated file.

Example: Replace the second tab character on each line with a newline:

sed 's/	/\
/2' file
Here the / / contain an embedded tab character, and the newline must be escaped.

Example: Here is the file sed.2


/is/ {
w changes
s/is/boo/w changes
}
s/.*://

Here is an input file:

testfile.4
Is this the Isis that is? Does
was come before was? 
What is the is-ness of isinglass?
We say "is" when we mean is?

We now do the command

grep -n $ testfile.4 | sed -f sed.2
The command changes the first is on any line:
testfile.4
Is thboo the Isis that is? Does
was come before was? 
What boo the is-ness of isinglass?
We say "boo" when we mean is?

The file changes is very interesting, for it contains the before and after states of each changed line:

2:Is this the Isis that is? Does
2:Is thboo the Isis that is? Does
4:What is the is-ness of isinglass?
4:What boo the is-ness of isinglass?
5:We say "is" when we mean is?
5:We say "boo" when we mean is?
Now here are some details. First, the grep command just adds line numbers to each line of the input file:
grep -n $ testfile.4

1:testfile.4
2:Is this the Isis that is? Does
3:was come before was? 
4:What is the is-ness of isinglass?
5:We say "is" when we mean is?

Then the commands in sed.2 act like this. The braces group commands, so that for any line matching /is/, all the commands in braces will be executed. So we write out the before state of each matching line, and then make the substitution and write the after state. Finally, for all lines, we strip off the line numbers added by grep.

REMARK. Of course, it's easier to use the diff utility to compare files after an editing session. If we do

sed 's/is/boo' testfile.4 > testfile.4a
then the command
diff testfile.4 testfile.4a
produces a set of actions that would transform the first file into the second. Here is the output of the diff command:
2c2
< Is this the Isis that is? Does
---
> Is thboo the Isis that is? Does
4,5c4,5
< What is the is-ness of isinglass?
< We say "is" when we mean is?
---
> What boo the is-ness of isinglass?
> We say "boo" when we mean is?

OTHER sed FEATURES

APPEND AND INSERT

[line address]a\
text line 1\
text line 2\
text line 3\
...
text line n
will append the text lines into the pattern space after the addressed line. Example: double spacing
sed 'a\
' testfile.4
produces this output:
testfile.4

Is this the Isis that is? Does

was come before was? 

What is the is-ness of isinglass?

We say "is" when we mean is?

Insertion (the i command) is similar and places the text before the addressed line in the pattern space.

CHANGE

[address]c\
text
replaces the addressed line range by the supplied text.

TRANSFORM (y)

[address]y/ab..c/xy..z/
replaces character a by character x, character b by character y, ... , character c by character z. Alas, y has no knowledge of character ranges, so conversion from lower to upper case is accomplished with two strings of 26 characters.

PRINT (p)

Outputs the contents of the pattern space.

PRINT LINE NUMBER

[line address] =
prints the line number of the matching line.

NEXT (n)

[address]n 
outputs the pattern space and then reads the next line of input without returning to the top of the sctipt.

Example: Delete one blank line following any line that begins with

Roman-numeral.
but delete no other blank lines in the file. The script is:
/^[IVXLCMD][IVXLCMD]*./{
n
/^$/d
}

READING AND WRITING FILES

[line-address]r file
reads the contents of file into the pattern space after the addressed line.

[address]w file
writes the pattern space to file.

Example: Replace the line throughout a file by the contents of a file common.for - the script is

/^/r common.for
/^/d

Example: Suppose a data file contains names of students and grades, like this:

Apt Abner:100
Bright Betty:98
Churl Charles:45
Dull Dora:67
We extract A's, B's, C's, D's and F's to separate files. The script is
/[^:]*:100/w grade.a
/[^:]*:9[0-9]/w grade.a
/[^:]*:8[0-9]/w grade.b
/[^:]*:7[0-9]/w grade.c
/[^:]*:6[0-9]/w grade.d
/[^:]*:[^6-9][0-9]$/w grade.f
/[^:]*:[0-9]$/w grade.f

QUIT (q)

[line-address]q
will terminate the script when a matching line is reached.

Example: Copy a file down to Roman numeral XIII :

sed '/XIII/q' file
This is a little tricky, because the line with XIII in it will be printed before sed quits. If you don't want it, you need to do a little more work:
sed -n '/XIII/q
p' file
25