" /> Taming the Beast, How I Handled a 1GB CSV File with a Simple Bash Script | Dinusha Tharindu
Post

Taming the Beast, How I Handled a 1GB CSV File with a Simple Bash Script

Taming the Beast, How I Handled a 1GB CSV File with a Simple Bash Script

👇 The Problem

Not long ago, I was deep into an access review project for a client. The task was straightforward on paper: pull a list of who had access to what across a large repository of internal documents.

After aggregating logs and generating the report, I exported the data into a CSV file. But there was a catch.

The file was almost 1GB in size.

I quickly realized I had a problem when my usual tools — Excel, VS Code, Notepad++ — all failed to open the file. Excel froze, editors crashed, and I couldn’t even scroll through a few lines to verify what I had. I was effectively locked out of my own data.


🔍 The Discovery

I knew there had to be a better way, so I turned to something I hadn’t used in a while: the terminal.

After a bit of research and experimentation, I found a reliable method using classic Unix tools like split and mv to break the massive CSV file into smaller, more manageable pieces. Then I packaged the whole process into a simple Bash script for future use.


🛠️ The Solution — A Bash Script to Split Large CSVs

Here’s the script that saved the day — and continues to serve me well:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
bash
#!/bin/bash

# ----------------------------
# Split a large CSV file into smaller chunks
# Usage: ./split_csv.sh filename.csv [size|lines] [value]
# Example: ./split_csv.sh bigfile.csv size 100m
#          ./split_csv.sh bigfile.csv lines 415000
# ----------------------------

set -e

if [ "$#" -ne 3 ]; then
    echo "Usage: $0 filename.csv [size|lines] [value]"
    echo "Example: $0 bigfile.csv size 100m"
    echo "         $0 bigfile.csv lines 415000"
    exit 1
fi

FILENAME="$1"
MODE="$2"
VALUE="$3"

if [ ! -f "$FILENAME" ]; then
    echo "Error: File '$FILENAME' not found."
    exit 1
fi

echo "Cleaning old split files..."
rm -f x* *.csv.part

if [ "$MODE" = "size" ]; then
    echo "Splitting '$FILENAME' into chunks of size $VALUE..."
    split -b "$VALUE" "$FILENAME"
elif [ "$MODE" = "lines" ]; then
    echo "Splitting '$FILENAME' into chunks of $VALUE lines each..."
    split -l "$VALUE" "$FILENAME"
else
    echo "Error: Unknown mode '$MODE'. Use 'size' or 'lines'."
    exit 1
fi

echo "Renaming split files..."
for f in x*; do
    mv "$f" "$f.csv.part"
done

echo "Done! Split files saved as *.csv.part"


This post is licensed under CC BY 4.0 by the author.