I need to extract the second field of selected lines in a GEDCOM file. These lines are all of the following format:
% grep @ /tmp/XYZ | tail -5 0 @X701@ OBJE 0 @X702@ OBJE 0 @X750@ OBJE 0 @X765@ OBJE 0 @X766@ OBJE
But in the following,
% egrep "0 @[^@]@" /tmp/XYZ % perl -CSD -p -i -e 's:0 @([^@])@ .*:ZYX 1:g;' /tmp/XYZ
the first finds nothing and the second changes nothing;
I don’t understand why.
CSD is because although the file is mostly ASCII, it contains some French, Polish, and Chinese, and is encoded UTF-8.
As far as I am aware,
@ is not a special character for regular expressions.
Update: I am looking for the field that has the function of a primary key. It is always delimited by
@ and therefore cannot contain an
@. Some lines might reference such a key, but it is only primary when the line starts with
0 . I must not match lines that contain other
@ but that should be ensured by putting in a string-begin
^. I must also not hit on lines of other formats—I used grep to show the format of the target lines, and tail to limit the size to less than five thousand.
- If you might have lines that look like
60 @FOO@ blah
42.0 @记鬼四七@ quux
(and you don’t want to match them),
you should begin your regex with a
To match any number of non-
between the two
@characters, you need
% egrep '^0 @[^@]*@' /tmp/XYZ % perl -CSD -p -i -e 's:^0 @([^@]*)@ .*:ZYX 1:g;' /tmp/XYZ
+if you must have at least one non-
between the two
- To avoid matching lines that have a third
@, use another
to specify that the rest of the line is characters other than
% egrep '^0 @[^@]*@ [^@]*$' /tmp/XYZ