Easiest way to remove unwanted lines from a huge text file

Posted on

QUESTION :

I have a large text file with a size of more than 30 megabytes. I want to remove all the lines which don’t match some specific criteria, e.g. lines that don’t have the string ‘START’.

What’s the easiest way to do this?

ANSWER :

If the pattern is really that simple, grep -v will work:

grep -v START bigfile.txt > newfile.txt

newfile.txt will have everything from bigfile.txt except lines with “START”.

(In case it isn’t obvious, this is something you’ll do in Terminal or other command line tool)

The original question asked how to remove the lines that didn’t match a pattern. In other words, how to keep the lines that do match the pattern. Thus, no need for -v.

grep START infile.txt > outfile.txt

Note that grep can use regular expressions to do much more powerful pattern matching. The syntax is a bit obtuse though.

Use GNU sed with the -i argument.

grep -v START inputfile

should work. grep is standard on both MacOS and Linux/Unix, can be installed on MS Windows.

Option -v is for inverting the match – only output lines that do not contain the pattern (the inverse of the usual grep behaviour).