Regular expressions, Parsing

Patterns

{ } (Curly Braces) — Matches a specific number of times

{m}The preceding element or subexpression must occur exactly m times.
{m,n}The preceding element or subexpression must occur between m and n times, inclusive.
{m,}The preceding element or subexpression must occur at least m times.

ref: https://www.oreilly.com/library/view/oracle-regular-expressions/0596006012/re13.html

ref: https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch06s01.html

MetacharacterDescription
^Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
.Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches “abc”, etc., but [a.c] matches only “a”, “.”, or “c”.
[ ]A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches “a”, “b”, or “c”. [a-z] specifies a range which matches any lowercase letter from “a” to “z”. These forms can be mixed: [abcx-z] matches “a”, “b”, “c”, “x”, “y”, or “z”, as does [a-cx-z]. The - character is treated as a literal character if it is the last or the first (after the ^, if present) character within the brackets: [abc-], [-abc]. Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^) character: []abc].
[^ ]Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than “a”, “b”, or “c”. [^a-z] matches any single character that is not a lowercase letter from “a” to “z”. Likewise, literal characters and ranges can be mixed.
$Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
( )Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n). A marked subexpression is also called a block or capturing group. BRE mode requires \( \).
\nMatches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. Also known as a backreference. backreferences are only supported in BRE mode
*Matches the preceding element zero or more times. For example, ab*c matches “ac”, “abc”, “abbbc”, etc. [xyz]* matches “”, “x”, “y”, “z”, “zx”, “zyx”, “xyzzy”, and so on. (ab)* matches “”, “ab”, “abab”, “ababab”, and so on.
{m,n}Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only “aaa”, “aaaa”, and “aaaaa”. This is not found in a few older instances of regexes. BRE mode requires \{m,n\}.

Src: https://en.wikipedia.org/wiki/Regular_expression

Logical “or”

You can provide as many terms as desired, as long as they are separated with the pipe character: |. This character separates terms contained within each (...) group.

^I like (dogs|penguins), but not (lions|tigers).$

src: https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/

rename 3 digits

Intro01_0(\d{3})
Intro01_0$1.exr

Utilities

grep, egrep, fgrep, rgrep – print lines that match patterns

-E, --extended-regexpInterpret PATTERNS as extended regular expressions (EREs, see below).
-G, --basic-regexpInterpret PATTERNS as basic regular expressions (BREs, see below). This is the default.
-A NUM, --after-context=NUMPrint NUM lines of trailing context after matching lines. Places a line containing a group separator (–) between contiguous groups of matches. With the -o or –only-matching option, this has no effect and a warning is given.
-B NUM, --before-context=NUMPrint NUM lines of leading context before matching lines. Places a line containing a group separator (–) between contiguous groups of matches. With the -o or –only-matching option, this has no effect and a warning is given.
-v, --invert-matchInvert the sense of matching, to select non-matching lines.
-i, --ignore-caseIgnore case distinctions in patterns and input data, so that characters that differ only in case match each other.
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Outputs some lines before and after the match(es)

hdparm -I /dev/disk/by-id/scsi-SATA_45v3f5fc35 | grep -A 9 Security:

search a pattern in files within a folder

grep pattern1 ./*

Search for two different patterns

mycommand | grep 'pattern1\|pattern2'

or

mycommand | grep -e "pattern1" -e "pattern2"

Get lines with a pattern preceded by white spaces

echo "llist jobs client=san-fd jobstatus=A" | bconsole | grep -E "^\s+jobid"

Get lines not containing a pattern

zfs list -r -t snapshot -o name | grep -v autosnap
zfs list -r -t snapshot -o name | grep -v -e autosnap -e syncoid

https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo

Destroy all zfs snapshots with a pattern

-o name is to select only the property “name” of the snapshot

zfs list -H -r -t snapshot -o name mydataset | grep mypattern | xargs -n1 sudo zfs destroy

Destroy all zfs snapshots NOT containing two different patterns

zfs list -H -r -t snapshot -o name | grep -v -e autosnap -e syncoid | xargs -n1 sudo zfs destroy

Destroy all datasets that match several date patterns

Here we want to remove the snapshots that contain those dates in their name:

  • 2022-04
  • 2022-05-01 to 09
  • 2022-05-11 to 13

Make sure to test the output before running this line ! (that is, without the last pipe to zfs destroy)

zfs list -H -t snapshot SAN300/projects -o name | grep -E "(^SAN300/projects@autosnap_2022-04-|^SAN300/projects@autosnap_2022-05-0[1-9]|^SAN300/projects@autosnap_2022-05-1[0-3])" | xargs -n1 sudo zfs destroy

sed – stream editor for filtering and transforming text

Documentation

dDelete the pattern space; immediately start next cycle.
y/source-chars/dest-chars/Transliterate any characters in the pattern space which match any of the source-chars with the corresponding character in dest-chars.

Remove line 664 from file ~/.ssh/known_hosts (-i is for –in-place)

sed -i '664d' ~/.ssh/known_hosts

If you want to delete lines 5 through 10 and 12

sed -e '5,10d;12d' file

Replace a string with / (like a URL)

escape character is then # to replace /

sed -i 's#deb.debian.org/debian#ftp2.de.debian.org/debian#g' sources.list

https://www.systutorials.com/how-to-delete-a-specific-line-from-a-text-file-in-command-line-on-linux/

Replace empty spaces with underscore

sed 's/ /_/g' /tmp/archive_dirs

Mac OS X – Replace empty space by newline

sed 's/ /\'$'\n/g'

Remove the last line

$ go to last line, d delete

seq 1 10 | sed '$d'
seq 1 10 | sed \$d

iconv – convert text from one character encoding to another

Clean localized strings

$ echo ßüäö | iconv -f UTF-8 -t ASCII//TRANSLIT
ssuao

xargs – build and execute command lines from standard input

xargs [options] [command [initial-arguments]]
-0, –nullInput items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally)
-t, –verbosePrint the command line on the standard error output before executing it
find /tmp -name core -type f -print | xargs /bin/rm -f
find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f

Generates a compact listing of all the users on the system

cut -d: -f1 < /etc/passwd | sort | xargs echo

gawk – pattern scanning and processing language

Misc

https://www.baeldung.com/linux/grep-sed-awk-differences