1. Multi-Index (Hierarchical Indexing) Definition: Multi-Index allows you to use more than one index level, creating complex data structures ideal for representing higher-dimensional data. Example: import pandas as pd df = pd.DataFrame({ “City”: [“Delhi”, “Delhi”, “Mumbai”, “Mumbai”], “Year”: [2022, 2023, 2022, 2023], “Sales”: [200, 250, 300, 350] }).set_index([“City”, “Year”]) print(df) Output: Sales City Year Delhi 2022 200 2023 250 Mumbai 2022 300 2023 350 2. Advanced GroupBy Operations Definition: GroupBy splits data, applies operations (sum, mean, custom function), and combines results. Example: df.groupby(“City”).agg({“Sales”: [“sum”, “mean”]}) Output: Sales sum mean City Delhi 450 225.0 Mumbai 650 325.0 3. Window Functions Definition: Window functions perform calculations over a sliding window, such as moving averages. Example: df[“MA_2”] = df[“Sales”].rolling(window=2).mean() print(df) Output: Sales MA_2 200 NaN 250 225.0 300 NaN 350 325.0 4. Advanced Merging and Joining Definition: Pandas supports SQL-style joins: inner, left, right, outer, cross. Example: df1 = pd.DataFrame({“ID”: [1,2], “Name”: [“A”,”B”]}) df2 = pd.DataFrame({“ID”: [1,3], “City”: [“Delhi”,”Mumbai”]}) pd.merge(df1, df2, on=”ID”, how=”outer”) Output: ID Name City 0 1 A Delhi 1 2 B NaN 2 3 NaN Mumbai 5. Pivot Tables and Crosstab Definition: Pivot tables summarize data; crosstab counts relationships between two columns. Example (Pivot Table): pd.pivot_table(df.reset_index(), values=”Sales”, index=”City”, columns=”Year”) Output: Year 2022 2023 City Delhi 200 250 Mumbai 300 350 6. String Operations & Regex Definition: Pandas offers vectorized string functions with regex support. Example: s = pd.Series([“User_01”, “Admin_22″]) s.str.extract(r”(\d+)”) Output: 0 0 01 1 22 7. Categorical Data & Memory Optimization Definition: Categorical data reduces memory and improves performance for repeated text values. Example: df[“City”] = df[“City”].astype(“category”) print(df[“City”].memory_usage()) 8. Apply, Map, and ApplyMap Definition: Used for applying functions across Series/DataFrame. Example: df[“Sales_Tax”] = df[“Sales”].apply(lambda x: x * 0.18) print(df) 9. Advanced Missing Data Handling Definition: Fill, interpolate, or replace NaN using multiple strategies. Example: df[“Sales”].interpolate(method=”linear”) 10. Query and Eval for Performance Definition: query() and eval() evaluate expressions faster than normal filtering. Example: df.query(“Sales > 250”) Output: Sales City Year Mumbai 2022 300 2023 350 11. Advanced Datetime Operations Definition: Pandas handles timestamps, time differences, resampling, etc. Example: date_df = pd.DataFrame({ “date”: pd.date_range(“2023-01-01″, periods=4, freq=”D”), “value”: [10,20,30,40] }) date_df[“month”] = date_df[“date”].dt.month 12. Memory-Efficient Chunking Definition: Load large files in chunks to avoid memory overload. Example: chunks = pd.read_csv(“big.csv”, chunksize=5000) for chunk in chunks: print(chunk.shape) 13. Custom Accessors Definition: Create your own .str, .dt-like accessor. Example: @pd.api.extensions.register_series_accessor(“trim”) class TrimAccessor: def __init__(self, s): self.s = s def spaces(self): return self.s.str.strip() s = pd.Series([” hello “, ” world “]) s.trim.spaces() 14. Pipe for Method Chaining Definition: pipe() makes chaining operations readable and modular. Example: def add_tax(df): df[“Tax”] = df[“Sales”] * 0.1 return df result = (df.reset_index() .pipe(add_tax) .query(“Sales > 250”)) print(result) Conclusion These 14 advanced Pandas techniques are essential for professional data analysis. Mastering multi-indexing, window functions, advanced merging, and memory optimization will dramatically improve your data-processing efficiency. Use these examples in real projects, and your performance in Pandas will improve instantly.
How to Handle Large Datasets and Perform Complex Data Operations Efficiently
