Regular expressions, Parsing

Patterns

{ } (Curly Braces) — Matches a specific number of times

`{m}`	The preceding element or subexpression must occur exactly `m` times.
`{m,n}`	The preceding element or subexpression must occur between `m` and `n` times, inclusive.
`{m,}`	The preceding element or subexpression must occur at least `m` times.

ref: https://www.oreilly.com/library/view/oracle-regular-expressions/0596006012/re13.html

ref: https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch06s01.html

Metacharacter	Description
`^`	Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
`.`	Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, `a.c` matches “abc”, etc., but `[a.c]` matches only “a”, “.”, or “c”.
`[ ]`	A bracket expression. Matches a single character that is contained within the brackets. For example, `[abc]` matches “a”, “b”, or “c”. `[a-z]` specifies a range which matches any lowercase letter from “a” to “z”. These forms can be mixed: `[abcx-z]` matches “a”, “b”, “c”, “x”, “y”, or “z”, as does `[a-cx-z]`. The `-` character is treated as a literal character if it is the last or the first (after the `^`, if present) character within the brackets: `[abc-]`, `[-abc]`. Note that backslash escapes are not allowed. The `]` character can be included in a bracket expression if it is the first (after the `^`) character: `[]abc]`.
`[^ ]`	Matches a single character that is not contained within the brackets. For example, `[^abc]` matches any character other than “a”, “b”, or “c”. `[^a-z]` matches any single character that is not a lowercase letter from “a” to “z”. Likewise, literal characters and ranges can be mixed.
`$`	Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
`( )`	Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, `\n`). A marked subexpression is also called a block or capturing group. BRE mode requires ``.
`\n`	Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. Also known as a backreference. backreferences are only supported in BRE mode
`*`	Matches the preceding element zero or more times. For example, `abc` matches “ac”, “abc”, “abbbc”, etc. `[xyz]` matches “”, “x”, “y”, “z”, “zx”, “zyx”, “xyzzy”, and so on. `(ab)*` matches “”, “ab”, “abab”, “ababab”, and so on.
`{m,n}`	Matches the preceding element at least m and not more than n times. For example, `a{3,5}` matches only “aaa”, “aaaa”, and “aaaaa”. This is not found in a few older instances of regexes. BRE mode requires `\{m,n\}`.

Src: https://en.wikipedia.org/wiki/Regular_expression

Logical “or”

You can provide as many terms as desired, as long as they are separated with the pipe character: |. This character separates terms contained within each (...) group.

^I like (dogs|penguins), but not (lions|tigers).$

src: https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/

rename 3 digits

Intro01_0(\d{3})

Intro01_0$1.exr

Utilities

grep, egrep, fgrep, rgrep – print lines that match patterns

`-E, --extended-regexp`	Interpret PATTERNS as extended regular expressions (EREs, see below).
`-G, --basic-regexp`	Interpret PATTERNS as basic regular expressions (BREs, see below). This is the default.
`-A NUM, --after-context=NUM`	Print NUM lines of trailing context after matching lines. Places a line containing a group separator (–) between contiguous groups of matches. With the -o or –only-matching option, this has no effect and a warning is given.
`-B NUM, --before-context=NUM`	Print NUM lines of leading context before matching lines. Places a line containing a group separator (–) between contiguous groups of matches. With the -o or –only-matching option, this has no effect and a warning is given.
`-v, --invert-match`	Invert the sense of matching, to select non-matching lines.
`-i, --ignore-case`	Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other.

Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Outputs some lines before and after the match(es)

hdparm -I /dev/disk/by-id/scsi-SATA_45v3f5fc35 | grep -A 9 Security:

search a pattern in files within a folder

grep pattern1 ./*

Search for two different patterns

mycommand | grep 'pattern1\|pattern2'

mycommand | grep -e "pattern1" -e "pattern2"

Get lines with a pattern preceded by white spaces

echo "llist jobs client=san-fd jobstatus=A" | bconsole | grep -E "^\s+jobid"

Get lines not containing a pattern

zfs list -r -t snapshot -o name | grep -v autosnap
zfs list -r -t snapshot -o name | grep -v -e autosnap -e syncoid

https://stackoverflow.com/questions/3548453/negative-matching-using-grep-match-lines-that-do-not-contain-foo

Destroy all zfs snapshots with a pattern

-o name is to select only the property “name” of the snapshot

zfs list -H -r -t snapshot -o name mydataset | grep mypattern | xargs -n1 sudo zfs destroy

Destroy all zfs snapshots NOT containing two different patterns

zfs list -H -r -t snapshot -o name | grep -v -e autosnap -e syncoid | xargs -n1 sudo zfs destroy

Destroy all datasets that match several date patterns

Here we want to remove the snapshots that contain those dates in their name:

2022-04
2022-05-01 to 09
2022-05-11 to 13

Make sure to test the output before running this line ! (that is, without the last pipe to zfs destroy)

zfs list -H -t snapshot SAN300/projects -o name | grep -E "(^SAN300/projects@autosnap_2022-04-|^SAN300/projects@autosnap_2022-05-0[1-9]|^SAN300/projects@autosnap_2022-05-1[0-3])" | xargs -n1 sudo zfs destroy

sed – stream editor for filtering and transforming text

Documentation

`d`	Delete the pattern space; immediately start next cycle.
`y/source-chars/dest-chars/`	Transliterate any characters in the pattern space which match any of the `source-chars` with the corresponding character in `dest-chars`.

Remove line 664 from file ~/.ssh/known_hosts (-i is for –in-place)

sed -i '664d' ~/.ssh/known_hosts

If you want to delete lines 5 through 10 and 12

sed -e '5,10d;12d' file

Replace a string with / (like a URL)

escape character is then # to replace /

sed -i 's#deb.debian.org/debian#ftp2.de.debian.org/debian#g' sources.list

https://www.systutorials.com/how-to-delete-a-specific-line-from-a-text-file-in-command-line-on-linux/

Replace empty spaces with underscore

sed 's/ /_/g' /tmp/archive_dirs

Mac OS X – Replace empty space by newline

sed 's/ /\'$'\n/g'

Remove the last line

$ go to last line, d delete

seq 1 10 | sed '$d'
seq 1 10 | sed \$d

iconv – convert text from one character encoding to another

Clean localized strings

$ echo ßüäö | iconv -f UTF-8 -t ASCII//TRANSLIT
ssuao

xargs – build and execute command lines from standard input

xargs [options] [command [initial-arguments]]

-0, –null	Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally)
-t, –verbose	Print the command line on the standard error output before executing it

find /tmp -name core -type f -print | xargs /bin/rm -f
find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f

Generates a compact listing of all the users on the system

cut -d: -f1 < /etc/passwd | sort | xargs echo

gawk – pattern scanning and processing language

Misc

https://www.baeldung.com/linux/grep-sed-awk-differences