Introduction to Data Analysis with Pandas
Posted on June 1, 2024 (Last modified on June 8, 2024) • 2 min read • 274 wordsGet started with data analysis in Python using the Pandas library, including data manipulation, aggregation, and visualization techniques.
Pandas is a powerful library for data analysis in Python. This guide covers data manipulation, aggregation, and visualization techniques using Pandas.
First, install the Pandas library.
pip install pandasimport pandas as pd
df = pd.read_csv("data.csv")
print(df.head())# Select a single column
print(df["column_name"])
# Select multiple columns
print(df[["column1", "column2"]])# Filter rows based on a condition
filtered_df = df[df["column_name"] > 50]
print(filtered_df)Filtering can also be combined with logical operators.
filtered_df = df[(df["column1"] > 50) & (df["column2"] < 100)]
print(filtered_df)# Add a new column based on existing columns
df["new_column"] = df["column1"] + df["column2"]
print(df.head())New columns can also be derived from existing ones.
df["double_column"] = df["column1"] * 2
print(df.head())# Group by a column and calculate the mean
grouped_df = df.groupby("column_name").mean()
print(grouped_df)Grouping can also include multiple aggregation functions.
grouped_df = df.groupby("column_name").agg({"column1": "mean", "column2": "sum"})
print(grouped_df)# Create a pivot table
pivot_table = df.pivot_table(values="value_column", index="index_column", columns="columns_column", aggfunc="mean")
print(pivot_table)Pivot tables can also include multiple levels of indexing.
pivot_table = df.pivot_table(values="value_column", index=["index1", "index2"], columns="columns_column", aggfunc="mean")
print(pivot_table)import matplotlib.pyplot as plt
# Plot a line chart
df.plot(x="column1", y="column2")
plt.show()
# Plot a bar chart
df["column_name"].value_counts().plot(kind="bar")
plt.show()Visualization can also include customizing the plots.
ax = df.plot(x="column1", y="column2", kind="scatter")
ax.set_title("Scatter Plot")
ax.set_xlabel("Column 1")
ax.set_ylabel("Column 2")
plt.show()Pandas is an essential tool for data analysis in Python. Practice loading, manipulating, aggregating, and visualizing data using Pandas to enhance your data analysis skills.