Text Processing — maks.top

Linux text processing tools: grep (search), awk (fields and programs), sed (stream editor), cut / sort / uniq / xargs — the building blocks of any pipe pipeline.

grep

grep flags

Flag	Description
-i	Case-insensitive match
-r / -R	Recursive search (R follows symlinks)
-l	Print only filenames with matches
-L	Files WITHOUT matches
-n	Show line numbers
-c	Count matching lines
-v	Invert match
-w	Match whole word
-x	Match whole line
-E	Extended regex (egrep)
-P	Perl-compatible regex (PCRE)
-F	Fixed string (no regex)
-o	Print only the matching part of the line
-A N	N lines after each match
-B N	N lines before each match
-C N	N lines around each match
-m N	Stop after N matches
--include="*.py"	Search only in .py files
--exclude-dir=".git"	Exclude a directory

grep examples

Command	Description
grep -rn "TODO" src/ --include="*.py"	Find TODO in Python files
grep -E "^(ERROR\|WARN)" app.log	Lines starting with ERROR or WARN
grep -oP "(?<=Host: )\S+" access.log	Extract Host headers
grep -v "^#" /etc/ssh/sshd_config \| grep -v "^$"	Config without comments and blank lines
grep -c "ERROR" app.log	Count error lines

awk

Core constructs

Expression	Description
awk '{print $1}'	First field (whitespace delimiter)
awk '{print $NF}'	Last field
awk '{print $1, $3}'	Fields 1 and 3 separated by space
awk -F: '{print $1}' /etc/passwd	Use : as delimiter
awk 'NR==5'	Print line 5
awk 'NR>=3 && NR<=7'	Lines 3–7
awk '/pattern/'	Lines matching pattern
awk '!/pattern/'	Lines NOT matching
awk '$3 > 100 {print}'	Lines where field 3 > 100
awk '{sum+=$1} END{print sum}'	Sum first column
awk 'END{print NR}'	Line count (like wc -l)
awk '{gsub(/old/,"new"); print}'	Global substitution on each line
awk '!seen[$0]++'	Remove duplicates (preserve order)
awk 'BEGIN{FS=":"; OFS="\t"} {print $1,$3}'	Input and output field separators
awk '{a[$1]+=$2} END{for(k in a) print k,a[k]}'	Group by key with sum

Built-in awk variables: NR (line number) · NF (field count) · FS (input separator) · OFS (output separator) · RS (record separator) · ORS (output record separator) · FILENAME

sed

sed commands

Expression	Description
sed 's/old/new/'	Replace first occurrence per line
sed 's/old/new/g'	Replace all occurrences
sed 's/old/new/gi'	Case-insensitive replacement
sed -i 's/old/new/g' file	In-place replacement
sed -i.bak 's/.../.../' file	In-place with .bak backup
sed -n '5p'	Print only line 5
sed -n '3,7p'	Lines 3–7
sed -n '/pattern/p'	Lines matching pattern
sed -n '/start/,/end/p'	Block between two patterns
sed '3d'	Delete line 3
sed '/pattern/d'	Delete lines matching pattern
sed '/^#/d; /^$/d'	Remove comments and blank lines
sed '5a\new line'	Append line after line 5
sed '5i\new line'	Insert line before line 5
sed 'y/abc/ABC/'	Transliterate characters
sed 'G'	Add blank line after every line
sed -e 's/a/b/' -e 's/c/d/'	Multiple commands

cut, sort, uniq, xargs

cut — field extraction

Command	Description
cut -d: -f1 /etc/passwd	Field 1 with : delimiter
cut -d, -f2-4	Fields 2, 3, 4
cut -d: -f1,3	Fields 1 and 3
cut -c1-10	Characters 1–10
cut -c-5	First 5 characters
cut -c10-	From character 10 to end

sort

Command	Description
sort	Alphabetical sort
sort -n	Numeric sort
sort -rn	Reverse numeric sort
sort -u	Unique lines
sort -k2,2n	Sort by field 2 numerically
sort -t: -k3,3n /etc/passwd	Sort passwd by UID
sort -h	Human-readable numbers (1K, 2M)
sort -R	Random shuffle

uniq

Command	Description
uniq	Remove consecutive duplicates (requires sort first)
uniq -c	Count occurrences
uniq -d	Duplicates only
uniq -u	Unique lines only
sort \| uniq -c \| sort -rn	Top frequent lines

xargs — argument passing

Command	Description
find . -name "*.log" \| xargs rm	Delete all .log files
find . -name "*.py" \| xargs grep "TODO"	grep across found files
cat hosts.txt \| xargs -I{} ping -c1 {}	Ping each host
echo "a b c" \| xargs -n1	One argument at a time
xargs -P4 -I{} cmd {}	4 parallel processes
find . -print0 \| xargs -0 rm	Null delimiters (files with spaces)
xargs -n3 echo	3 arguments per invocation