Introduction to Data Analysis with Pandas
Posted on June 1, 2024 (Last modified on June 8, 2024) • 2 min read • 274 wordsGet started with data analysis in Python using the Pandas library, including data manipulation, aggregation, and visualization techniques.
Pandas is a powerful library for data analysis in Python. This guide covers data manipulation, aggregation, and visualization techniques using Pandas.
First, install the Pandas library.
pip install pandas
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
# Select a single column
print(df["column_name"])
# Select multiple columns
print(df[["column1", "column2"]])
# Filter rows based on a condition
filtered_df = df[df["column_name"] > 50]
print(filtered_df)
Filtering can also be combined with logical operators.
filtered_df = df[(df["column1"] > 50) & (df["column2"] < 100)]
print(filtered_df)
# Add a new column based on existing columns
df["new_column"] = df["column1"] + df["column2"]
print(df.head())
New columns can also be derived from existing ones.
df["double_column"] = df["column1"] * 2
print(df.head())
# Group by a column and calculate the mean
grouped_df = df.groupby("column_name").mean()
print(grouped_df)
Grouping can also include multiple aggregation functions.
grouped_df = df.groupby("column_name").agg({"column1": "mean", "column2": "sum"})
print(grouped_df)
# Create a pivot table
pivot_table = df.pivot_table(values="value_column", index="index_column", columns="columns_column", aggfunc="mean")
print(pivot_table)
Pivot tables can also include multiple levels of indexing.
pivot_table = df.pivot_table(values="value_column", index=["index1", "index2"], columns="columns_column", aggfunc="mean")
print(pivot_table)
import matplotlib.pyplot as plt
# Plot a line chart
df.plot(x="column1", y="column2")
plt.show()
# Plot a bar chart
df["column_name"].value_counts().plot(kind="bar")
plt.show()
Visualization can also include customizing the plots.
ax = df.plot(x="column1", y="column2", kind="scatter")
ax.set_title("Scatter Plot")
ax.set_xlabel("Column 1")
ax.set_ylabel("Column 2")
plt.show()
Pandas is an essential tool for data analysis in Python. Practice loading, manipulating, aggregating, and visualizing data using Pandas to enhance your data analysis skills.