1. Let's assume you have very long CSV file and you wish to find all non-empty columns #25, so:
tail -n +1 data.csv | cut -d ',' -f 25 | grep -v "^[[:space:]]*$"
2. In the previous file you need to find number of all rows with value 'CA' in the column 25:
cat -n data.csv | cut -d ',' -f 25,1 | grep -e "CA.*"
In output, you will get column value as well as row number
3. In the huge file you wish to get row #250007 in separate file:
sed -n '250007p' data.csv | cat > line_250007.csv
4. To get number of tabulation in line
tr -cd \t < line_250007.csv | wc -c
5. Ensure file copy is complete before starting process or ensure that file is not used by other process right now:
if [ $(lsof $file | wc -l) -gt 1 ] ; then
echo "file $file still loading, skipping it"
else
echo "file $file completed upload, process it"
fi
6. Color script output for better visualisation
NORMAL=$(tput sgr0)
GREEN=$(tput setaf 2; tput bold)
RED=$(tput setaf 1)
function red() {
echo -e "$RED$*$NORMAL"
}
function green() {
echo -e "$GREEN$*$NORMAL"
}
red "This is error message"
7. Find all files in directory by regexp:
for FILE in $(ls $DIR | grep ^report_.*csv$)
do
...
done
8. Check previous command result
mkdir -p $WRK_DIR
if [ $? -ne 0 ]; then
#do something
fi
9. Extract 12 hours from NOW and cast to specific format:
date -d "+12 hours" +\%Y\%m\%d\%H