Skip to content

diegoporto10/data-cleaning-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Data Cleaning Script

Reusable Pandas pipeline for fast data cleaning:

  • Standardizes column names to snake_case
  • Trims strings and converts "nan", "" to missing
  • Drops duplicate rows
  • Coerces numeric types (smart: Int64 only if all values are whole numbers, else Float64)
  • Adds age_group and tenure_group helper bands

Quickstart

Windows PowerShell

py -m venv .venv
.\.venv\Scripts\Activate.ps1   # if blocked: Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
python -m pip install -r requirements.txt

python src\clean.py --input data\raw\sample.csv --output data\processed\clean.csv --int-cols age
start data\processed\clean.csv

About

Reusable Pandas pipeline for cleaning tabular data (trim, types, dedupe, banding)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages