Topic: https://intoli.com/blog/pandas-aggregation/
hide preview

What's next? verify your email address for reply notifications!

unverified 7y, 7d ago

This is a very complicated use of Pandas! It is much easier to do aggregation and grouping in pandas than how this article is written.

remark link
hide preview

What's next? verify your email address for reply notifications!

andre 7y, 6d ago

Some of the later use-cases are more complex, but it can't get much simpler than df.groupby('species').agg('mean')! :)

hide preview

What's next? verify your email address for reply notifications!

KPUsiTIU 7y, 7d ago [edited]

I'm pretty sure there is an as_index=false parameter that you can add to a group by operation to keep everything as columns.

remark link
hide preview

What's next? verify your email address for reply notifications!

andre 7y, 6d ago [edited]

There is indeed! However, when using multiple aggregation functions it still leads to indexed groups.

hide preview

What's next? verify your email address for reply notifications!

2yScKKE1 6y, 200d ago

Hi Andre, great article!

I have following warning with code in Single Grouping Column, Custom Aggregation section.

gdf = df.groupby('species').agg({
    'sepal width' : {
        'width min': 'min',
        'width max': 'max'
    },
    'sepal length' : ['max', 'mean', percentile(20)]
})

throws:FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

I however have workaround yet a bit clumsy.

def my_agg(df):
    x = df.groupby('species')
    names = {
        'width max': x['sepal width'].max(),
        'width min': x['sepal width'].min(),
        'max length': x['sepal length'].max(),
        'mean length':x['sepal length'].mean(),
        'length percentile':x['sepal length'].apply(percentile(20))     
    }
    return pd.DataFrame(names)

Is it alright that way?

remark link
hide preview

What's next? verify your email address for reply notifications!

andre 6y, 200d ago [edited]

That's unfortunate, isn't it?

As far as your code goes, I would wager that doing aggregations manually will give you a performance penalty (which may not matter to you unless your dataset is huge) since Pandas tends to perform these operations using carefully optimized methods.

Another thing you can do is give your grouped DataFrame a dictionary of lists, e.g.,

df.groupby('species').agg({ 
  'sepal width': ['min', 'max'],
  'sepal length': ['min', 'max'],
})

and then rename the resulting DataFrame manually...

hide preview

What's next? verify your email address for reply notifications!