I have a large text file with a size of more than 30 megabytes. I want to remove all the lines which don’t match some specific criteria, e.g. lines that don’t have the string ‘START’.
What’s the easiest way to do this?
If the pattern is really that simple,
grep -v will work:
grep -v START bigfile.txt > newfile.txt
newfile.txt will have everything from bigfile.txt except lines with “START”.
(In case it isn’t obvious, this is something you’ll do in Terminal or other command line tool)
The original question asked how to remove the lines that didn’t match a pattern. In other words, how to keep the lines that do match the pattern. Thus, no need for
grep START infile.txt > outfile.txt
Note that grep can use regular expressions to do much more powerful pattern matching. The syntax is a bit obtuse though.
Use GNU sed with the
grep -v START inputfile
grep is standard on both MacOS and Linux/Unix, can be installed on MS Windows.
-v is for inverting the match – only output lines that do not contain the pattern (the inverse of the usual grep behaviour).