Home Blog Certs Knowledge Base About

Text Processing

Linux text processing tools: grep (search), awk (fields and programs), sed (stream editor), cut / sort / uniq / xargs — the building blocks of any pipe pipeline.

grep

grep flags
FlagDescription
-iCase-insensitive match
-r / -RRecursive search (R follows symlinks)
-lPrint only filenames with matches
-LFiles WITHOUT matches
-nShow line numbers
-cCount matching lines
-vInvert match
-wMatch whole word
-xMatch whole line
-EExtended regex (egrep)
-PPerl-compatible regex (PCRE)
-FFixed string (no regex)
-oPrint only the matching part of the line
-A NN lines after each match
-B NN lines before each match
-C NN lines around each match
-m NStop after N matches
--include="*.py"Search only in .py files
--exclude-dir=".git"Exclude a directory
grep examples
CommandDescription
grep -rn "TODO" src/ --include="*.py"Find TODO in Python files
grep -E "^(ERROR|WARN)" app.logLines starting with ERROR or WARN
grep -oP "(?<=Host: )\S+" access.logExtract Host headers
grep -v "^#" /etc/ssh/sshd_config | grep -v "^$"Config without comments and blank lines
grep -c "ERROR" app.logCount error lines

awk

Core constructs
ExpressionDescription
awk '{print $1}'First field (whitespace delimiter)
awk '{print $NF}'Last field
awk '{print $1, $3}'Fields 1 and 3 separated by space
awk -F: '{print $1}' /etc/passwdUse : as delimiter
awk 'NR==5'Print line 5
awk 'NR>=3 && NR<=7'Lines 3–7
awk '/pattern/'Lines matching pattern
awk '!/pattern/'Lines NOT matching
awk '$3 > 100 {print}'Lines where field 3 > 100
awk '{sum+=$1} END{print sum}'Sum first column
awk 'END{print NR}'Line count (like wc -l)
awk '{gsub(/old/,"new"); print}'Global substitution on each line
awk '!seen[$0]++'Remove duplicates (preserve order)
awk 'BEGIN{FS=":"; OFS="\t"} {print $1,$3}'Input and output field separators
awk '{a[$1]+=$2} END{for(k in a) print k,a[k]}'Group by key with sum

Built-in awk variables: NR (line number) · NF (field count) · FS (input separator) · OFS (output separator) · RS (record separator) · ORS (output record separator) · FILENAME

sed

sed commands
ExpressionDescription
sed 's/old/new/'Replace first occurrence per line
sed 's/old/new/g'Replace all occurrences
sed 's/old/new/gi'Case-insensitive replacement
sed -i 's/old/new/g' fileIn-place replacement
sed -i.bak 's/.../.../' fileIn-place with .bak backup
sed -n '5p'Print only line 5
sed -n '3,7p'Lines 3–7
sed -n '/pattern/p'Lines matching pattern
sed -n '/start/,/end/p'Block between two patterns
sed '3d'Delete line 3
sed '/pattern/d'Delete lines matching pattern
sed '/^#/d; /^$/d'Remove comments and blank lines
sed '5a\new line'Append line after line 5
sed '5i\new line'Insert line before line 5
sed 'y/abc/ABC/'Transliterate characters
sed 'G'Add blank line after every line
sed -e 's/a/b/' -e 's/c/d/'Multiple commands

cut, sort, uniq, xargs

cut — field extraction
CommandDescription
cut -d: -f1 /etc/passwdField 1 with : delimiter
cut -d, -f2-4Fields 2, 3, 4
cut -d: -f1,3Fields 1 and 3
cut -c1-10Characters 1–10
cut -c-5First 5 characters
cut -c10-From character 10 to end
sort
CommandDescription
sortAlphabetical sort
sort -nNumeric sort
sort -rnReverse numeric sort
sort -uUnique lines
sort -k2,2nSort by field 2 numerically
sort -t: -k3,3n /etc/passwdSort passwd by UID
sort -hHuman-readable numbers (1K, 2M)
sort -RRandom shuffle
uniq
CommandDescription
uniqRemove consecutive duplicates (requires sort first)
uniq -cCount occurrences
uniq -dDuplicates only
uniq -uUnique lines only
sort | uniq -c | sort -rnTop frequent lines
xargs — argument passing
CommandDescription
find . -name "*.log" | xargs rmDelete all .log files
find . -name "*.py" | xargs grep "TODO"grep across found files
cat hosts.txt | xargs -I{} ping -c1 {}Ping each host
echo "a b c" | xargs -n1One argument at a time
xargs -P4 -I{} cmd {}4 parallel processes
find . -print0 | xargs -0 rmNull delimiters (files with spaces)
xargs -n3 echo3 arguments per invocation