🔧

Data Manipulation

Data manipulation involves cleaning, transforming, and reshaping data to prepare it for analysis and make it suitable for specific tasks.

Overview

Data manipulation focuses on cleaning, transforming, and reshaping data. Real-world data is often messy and requires manipulation before analysis.

Data manipulation includes tasks like filtering, sorting, aggregating, merging, and reshaping data using tools like pandas, dplyr, and SQL.

Key Technologies

Python Libraries

Polars
Dask

R Packages

dplyr
tidyr
data.table
tidyverse

Tools

Excel
OpenRefine
Data Wrangling Tools

Key Concepts

Data Cleaning

Clean data by handling missing values, removing duplicates, and correcting errors.

Data Transformation

Transform data by reshaping, aggregating, and creating derived variables.

Data Merging

Combine data from multiple sources using joins, merges, and concatenation.

Efficient Processing

Process large datasets efficiently using vectorized operations and optimized libraries.

Subscribe toChangelog

📚
Be among the first to receive actionable tips.

Weekly insights on software engineering, execution, and independent income, plus clear, actionable lessons I’m learning while building, shipping, and iterating.

By submitting this form, you'll be signed up to my free newsletter. I may also send you other emails about my courses. You can opt-out at any time. For more information, see our privacy policy.