11.2 Pandas: Data Analysis
Introduction to Pandas, the essential tool for working with structured data (e.g., tables, CSV) in Python.
Pandas is an open-source library that provides high-performance data structures and easy-to-use data analysis tools. It is the essential tool for working with structured data (e.g., data in tables or worksheets) in Python. It dramatically simplifies reading, cleaning, transforming, and analyzing data from sources like CSV files or databases.
Core Data Structures in Pandas:
- Series: A one-dimensional, labeled data structure, similar to a column in a spreadsheet or a Python dictionary. Each item has an index.
- DataFrame: The most important structure in Pandas. It is a two-dimensional, labeled data structure, similar to a table in a SQL database or an Excel spreadsheet. It has both row and column indices. You can think of it as a collection of Series objects, where each Series is a column of the DataFrame.
Tip!
The following code requires the Pandas package to be installed ('pip install pandas') and will not run completely in the live editor.
# import pandas as pd # Common import convention
# # Creating a DataFrame
# data = {
# 'Name': ['Antonis', 'Maria', 'George', 'Eleni'],
# 'Age': [25, 30, 22, 35],
# 'City': ['Athens', 'Thessaloniki', 'Patras', 'Athens']
# }
# df = pd.DataFrame(data)
# print(f"My DataFrame:\n{df}")
# # Accessing columns
# print(f"\n'Name' column:\n{df['Name']}")
# # Filtering data
# athens_residents = df[df['City'] == 'Athens']
# print(f"\nAthens residents:\n{athens_residents}")
# # Calculating the mean
# mean_age = df['Age'].mean()
# print(f"\nMean age: {mean_age}")
print("The Pandas code is commented out. To run it, install Pandas locally ('pip install pandas').")
print("Pandas is the essential tool for analyzing and manipulating structured data.")
Explore More with AI
Use AI to generate new examples, delve deeper into theory, or get your questions answered.