Why does this regexp — which attempts to match characters between “@” characters — fail?

Posted on


I need to extract the second field of selected lines in a GEDCOM file. These lines are all of the following format:

% grep @ /tmp/XYZ | tail -5
0 @X701@ OBJE
0 @X702@ OBJE
0 @X750@ OBJE
0 @X765@ OBJE
0 @X766@ OBJE

But in the following,

% egrep "0 @[^@]@" /tmp/XYZ
% perl -CSD -p -i -e 's:0 @([^@])@ .*:ZYX 1:g;' /tmp/XYZ

the first finds nothing and the second changes nothing;
I don’t understand why.

The CSD is because although the file is mostly ASCII, it contains some French, Polish, and Chinese, and is encoded UTF-8.

As far as I am aware, @ is not a special character for regular expressions.

Update: I am looking for the field that has the function of a primary key.  It is always delimited by @ and therefore cannot contain an @.  Some lines might reference such a key, but it is only primary when the line starts with 0 .  I must not match lines that contain other @ but that should be ensured by putting in a string-begin ^.  I must also not hit on lines of other formats—I used grep to show the format of the target lines, and tail to limit the size to less than five thousand.


  1. If you might have lines that look like
    60 @FOO@ blah


    42.0 @记鬼四七@ quux

    (and you don’t want to match them),
    you should begin your regex with a ^; e.g., ^0 @….

  2. [^@] will match X or 7
    To match any number of non-@ characters (e.g., X701)
    between the two @ characters, you need [^@]* or [^@]+; e.g.,

    % egrep '^0 @[^@]*@' /tmp/XYZ
    % perl -CSD -p -i -e 's:^0 @([^@]*)@ .*:ZYX 1:g;' /tmp/XYZ

    Use + if you must have at least one non-@ character
    between the two @ characters. 
    Don’t use @ unless plain @ fails.

  3. To avoid matching lines that have a third @, use another [^@]*
    to specify that the rest of the line is characters other than @.

    % egrep '^0 @[^@]*@ [^@]*$' /tmp/XYZ

Leave a Reply

Your email address will not be published. Required fields are marked *