Efficient CSV File Handling in Python with pandas

Working with CSV Files in Python Using pandas

The pandas library provides powerful tools for processing, analyzing, and manipulating data in CSV format. Let's explore basic operations and some advanced techniques.

Library Installation

Install pandas using pip:

1pip install pandas

Basic CSV Operations

Reading and Writing CSV Files

 1import pandas as pd
 2
 3# Reading a CSV file
 4df = pd.read_csv('data.csv')
 5
 6# Displaying the first 5 rows
 7print(df.head())
 8
 9# Writing data back to a CSV file
10df.to_csv('output.csv', index=False)

This script demonstrates basic reading of a CSV file into a DataFrame, displaying the first few rows, and saving the data to a new file.

Data Filtering

1# Filtering rows where the value in the 'age' column is greater than 30
2filtered_df = df[df['age'] > 30]
3print(filtered_df)

Merging Multiple CSV Files

1import glob
2
3# Getting a list of all CSV files in the current directory
4csv_files = glob.glob('*.csv')
5
6# Reading and combining all CSV files
7df_list = [pd.read_csv(file) for file in csv_files]
8combined_df = pd.concat(df_list, ignore_index=True)
9print(combined_df)

Handling Missing Values

1# Filling missing values with the mean of the column
2df['column_name'].fillna(df['column_name'].mean(), inplace=True)
3
4# Removing rows with missing values
5df.dropna(inplace=True)

Data Grouping and Aggregation

1# Grouping by 'category' column and calculating the mean of 'value'
2grouped = df.groupby('category')['value'].mean()
3print(grouped)

Applying Functions to Columns

1# Applying a custom function to a column
2df['new_column'] = df['old_column'].apply(lambda x: x * 2)

Conclusion

Pandas significantly simplifies working with CSV files in Python, offering a wide range of functions for data processing and analysis. From simple file reading and writing to complex operations of filtering, grouping, and data transformation - pandas is an indispensable tool for working with tabular data.

Experiment with various pandas functions to increase the efficiency of your data work!

comments powered by Disqus

Translations: