!pip install polarsWhat is Polars?
Polars is a DataFrame library — like Pandas, but much faster. It’s written in Rust and uses all your CPU cores automatically. It’s quickly becoming the standard for tabular data work in Python, especially on larger datasets.
Key advantages over Pandas: - 5–10x faster on large datasets - Uses all CPU cores automatically - Lazy evaluation — optimizes operations before running them - Cleaner, more consistent syntax
Creating and Exploring DataFrames
Polars DataFrames look similar to Pandas but use pl.col() for column references instead of df["column"].
import polars as pl
df = pl.DataFrame({
"name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
"age": [25, 30, 35, 28, 22],
"salary": [70000, 85000, 92000, 78000, 65000],
"department": ["Engineering", "Marketing", "Engineering", "HR", "Engineering"]
})
print(df)
print("\nShape:", df.shape)
print("\nSchema:", df.schema)Filtering, Adding Columns, and Grouping
# Filter rows
print("--- Engineers earning over $75,000 ---")
print(df.filter((pl.col("department") == "Engineering") & (pl.col("salary") > 75000)))
# Add a new column
df = df.with_columns((pl.col("salary") * 1.10).alias("salary_with_raise"))
print("\n--- With 10% raise ---")
print(df)
# Group and aggregate
print("\n--- Average salary by department ---")
print(df.group_by("department").agg([
pl.col("salary").mean().alias("avg_salary"),
pl.len().alias("headcount")
]).sort("avg_salary", descending=True))Lazy Evaluation — Polars’ Best Feature
Lazy mode builds an optimized query plan before running anything. This saves memory and speeds up complex pipelines significantly.
result = (
df.lazy()
.filter(pl.col("salary") > 70000)
.group_by("department")
.agg(pl.col("salary").mean().alias("avg_salary"))
.sort("avg_salary", descending=True)
.collect() # execute the plan
)
print(result)Polars vs. Pandas
| Pandas | Polars | |
|---|---|---|
| Speed | Moderate | Much faster |
| Multi-core | No | Yes |
| Lazy execution | No | Yes |
| Syntax | df["col"] |
pl.col("col") |
Summary
In this post we covered creating DataFrames, filtering rows, adding columns, grouping and aggregating, and using lazy mode. Polars is the most impactful library to add to your data science toolkit right now — the performance gains on real datasets are immediately noticeable.