Regular expressions, Parsing
- Patterns
- Utilities
- grep, egrep, fgrep, rgrep – print lines that match patterns
- Outputs some lines before and after the match(es)
- search a pattern in files within a folder
- Search for two different patterns
- Get lines with a pattern preceded by white spaces
- Get lines not containing a pattern
- Destroy all zfs snapshots with a pattern
- Destroy all zfs snapshots NOT containing two different patterns
- Destroy all datasets that match several date patterns
- sed – stream editor for filtering and transforming text
- iconv – convert text from one character encoding to another
- xargs – build and execute command lines from standard input
- gawk – pattern scanning and processing language
- grep, egrep, fgrep, rgrep – print lines that match patterns
- Misc
Patterns
{ } (Curly Braces) — Matches a specific number of times
{m} | The preceding element or subexpression must occur exactly m times. |
{m,n} | The preceding element or subexpression must occur between m and n times, inclusive. |
{m,} | The preceding element or subexpression must occur at least m times. |
ref: https://www.oreilly.com/library/view/oracle-regular-expressions/0596006012/re13.html
ref: https://www.oreilly.com/library/view/regular-expressions-cookbook/9781449327453/ch06s01.html
Metacharacter | Description |
---|---|
^ | Matches the starting position within the string. In line-based tools, it matches the starting position of any line. |
. | Matches any single character (many applications exclude newlines, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, a.c matches “abc”, etc., but [a.c] matches only “a”, “.”, or “c”. |
[ ] | A bracket expression. Matches a single character that is contained within the brackets. For example, [abc] matches “a”, “b”, or “c”. [a-z] specifies a range which matches any lowercase letter from “a” to “z”. These forms can be mixed: [abcx-z] matches “a”, “b”, “c”, “x”, “y”, or “z”, as does [a-cx-z] . The - character is treated as a literal character if it is the last or the first (after the ^ , if present) character within the brackets: [abc-] , [-abc] . Note that backslash escapes are not allowed. The ] character can be included in a bracket expression if it is the first (after the ^ ) character: []abc] . |
[^ ] | Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than “a”, “b”, or “c”. [^a-z] matches any single character that is not a lowercase letter from “a” to “z”. Likewise, literal characters and ranges can be mixed. |
$ | Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line. |
( ) | Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n ). A marked subexpression is also called a block or capturing group. BRE mode requires \( \) . |
\n | Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is vaguely defined in the POSIX.2 standard. Some tools allow referencing more than nine capturing groups. Also known as a backreference. backreferences are only supported in BRE mode |
* | Matches the preceding element zero or more times. For example, ab*c matches “ac”, “abc”, “abbbc”, etc. [xyz]* matches “”, “x”, “y”, “z”, “zx”, “zyx”, “xyzzy”, and so on. (ab)* matches “”, “ab”, “abab”, “ababab”, and so on. |
{m,n} | Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only “aaa”, “aaaa”, and “aaaaa”. This is not found in a few older instances of regexes. BRE mode requires \{m,n\} . |
Src: https://en.wikipedia.org/wiki/Regular_expression
Logical “or”
You can provide as many terms as desired, as long as they are separated with the pipe character: |
. This character separates terms contained within each (...)
group.
^I like (dogs|penguins), but not (lions|tigers).$
src: https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/
rename 3 digits
Intro01_0(\d{3})
Intro01_0$1.exr
Utilities
grep, egrep, fgrep, rgrep – print lines that match patterns
-E, --extended-regexp | Interpret PATTERNS as extended regular expressions (EREs, see below). |
-G, --basic-regexp | Interpret PATTERNS as basic regular expressions (BREs, see below). This is the default. |
-A NUM, --after-context=NUM | Print NUM lines of trailing context after matching lines. Places a line containing a group separator (–) between contiguous groups of matches. With the -o or –only-matching option, this has no effect and a warning is given. |
-B NUM, --before-context=NUM | Print NUM lines of leading context before matching lines. Places a line containing a group separator (–) between contiguous groups of matches. With the -o or –only-matching option, this has no effect and a warning is given. |
-v, --invert-match | Invert the sense of matching, to select non-matching lines. |
-i, --ignore-case | Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other. |
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).
Outputs some lines before and after the match(es)
hdparm -I /dev/disk/by-id/scsi-SATA_45v3f5fc35 | grep -A 9 Security:
search a pattern in files within a folder
grep pattern1 ./*
Search for two different patterns
mycommand | grep 'pattern1\|pattern2'
or
mycommand | grep -e "pattern1" -e "pattern2"
Get lines with a pattern preceded by white spaces
echo "llist jobs client=san-fd jobstatus=A" | bconsole | grep -E "^\s+jobid"
Get lines not containing a pattern
zfs list -r -t snapshot -o name | grep -v autosnap
zfs list -r -t snapshot -o name | grep -v -e autosnap -e syncoid
Destroy all zfs snapshots with a pattern
-o name is to select only the property “name” of the snapshot
zfs list -H -r -t snapshot -o name mydataset | grep mypattern | xargs -n1 sudo zfs destroy
Destroy all zfs snapshots NOT containing two different patterns
zfs list -H -r -t snapshot -o name | grep -v -e autosnap -e syncoid | xargs -n1 sudo zfs destroy
Destroy all datasets that match several date patterns
Here we want to remove the snapshots that contain those dates in their name:
- 2022-04
- 2022-05-01 to 09
- 2022-05-11 to 13
Make sure to test the output before running this line ! (that is, without the last pipe to zfs destroy)
zfs list -H -t snapshot SAN300/projects -o name | grep -E "(^SAN300/projects@autosnap_2022-04-|^SAN300/projects@autosnap_2022-05-0[1-9]|^SAN300/projects@autosnap_2022-05-1[0-3])" | xargs -n1 sudo zfs destroy
sed – stream editor for filtering and transforming text
Documentation
- https://www.gnu.org/software/sed/manual/sed.html
- https://www.gnu.org/software/sed/manual/sed.html#Common-Commands
d | Delete the pattern space; immediately start next cycle. |
y/source-chars/dest-chars/ | Transliterate any characters in the pattern space which match any of the source-chars with the corresponding character in dest-chars. |
Remove line 664 from file ~/.ssh/known_hosts (-i is for –in-place)
sed -i '664d' ~/.ssh/known_hosts
If you want to delete lines 5 through 10 and 12
sed -e '5,10d;12d' file
Replace a string with / (like a URL)
escape character is then # to replace /
sed -i 's#deb.debian.org/debian#ftp2.de.debian.org/debian#g' sources.list
Replace empty spaces with underscore
sed 's/ /_/g' /tmp/archive_dirs
Mac OS X – Replace empty space by newline
sed 's/ /\'$'\n/g'
Remove the last line
$ go to last line, d delete
seq 1 10 | sed '$d'
seq 1 10 | sed \$d
iconv – convert text from one character encoding to another
Clean localized strings
$ echo ßüäö | iconv -f UTF-8 -t ASCII//TRANSLIT
ssuao
xargs – build and execute command lines from standard input
xargs [options] [command [initial-arguments]]
-0, –null | Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally) |
-t, –verbose | Print the command line on the standard error output before executing it |
find /tmp -name core -type f -print | xargs /bin/rm -f
find /tmp -name core -type f -print0 | xargs -0 /bin/rm -f
Generates a compact listing of all the users on the system
cut -d: -f1 < /etc/passwd | sort | xargs echo