Functional Ecological Genomics 2017

Materials for Stephanie Spielman's sessions

Regular Expressions Cheatsheet

Regular expressions are the superheroes of text search and/or replace. With great regular expressions comes great text file manipulation.

Caution: Regular expressions come in different flavors, depending on the language you are using. Always look up your language’s syntax when you encounter issues! This is a non-exhaustive list of Perl- and Python-style regular expressions. Note that some of these symbols will need to be modified for use with grep.

Basic regular expressions

Regular Expression Meaning
\w Letter, number, or underscore
\W Any non- letter, number, or underscore
\d Number
\D Any non-number
[] Custom character set
E.g.,[ACGT] will detect any occurrence of A, C, G, or T only
[^] Exclude custom character set
E.g.,[^ACGT] will detect any occurrence that isn’t A, C, G, or T (case sensitive!)
\t Tab symbol
\n New line. Note: Your system might use \r.
\s Any type of whitespace
\S Anything not whitespace
. Wildcard (matches anything)

Symbols and quanitifiers

Regular Expression add-ons Meaning Example
\ Escape symbol to search for a literal string \. matches an actual period
^ Match the start of the line only ^> matches any > that begins a line
$ Match the end of the line only $t matches any lower-case t that ends a line
  Quantifier: Match 1 or more occurrences \w+ matches 1 or more letters, numbers, or underscores
* Quantifier: Match 0 or more occurrences
\w* matches 0 or more letters, numbers, or underscores
{} Quantifier: Match a specified number of times (in a row!) \d{2} Matches exactly 2 numbers
\d{1,4} matches (inclusive) between 1 and 4 numbers
\d{5,} matches 5 or more numbers
\d{,3} matches 3 or fewer numbers
? Quantifier: Make the previous character optional colou?r matches either color or colour.
() Capture text inside parentheses for subsequent literal replacement (see next section)  

Replacing text

When performing search/replace, you often want to save some elements of the “searched text” to incorporate into the “replace text”. To this end, text can be captured with parentheses, and re-inserted with $1, $2, etc. (or \\1, \\2, etc. in grep) for each of the captured groups.

Some examples:

Original text Search Term Replace with New text
I would like a new dog. (.+ )dog\. $1cat. I would like a new cat.
I am 75 years old. (I am) (\d+) years old. $1 28, not $2. I am 28, not 75.
AC-GTT---AGANN??GCTA? ([N\?-]) (replace w/ nothing) ACGTTAGAGCTA
AC-GTT---AGANN??GCTA? [^ACGT] (replace w/ nothing) ACGTTAGAGCTA