Hi Andre, great article!
I have following warning with code in Single Grouping Column, Custom Aggregation
section.
gdf = df.groupby('species').agg({ 'sepal width' : { 'width min': 'min', 'width max': 'max' }, 'sepal length' : ['max', 'mean', percentile(20)] })
throws:FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
I however have workaround yet a bit clumsy.
def my_agg(df): x = df.groupby('species') names = { 'width max': x['sepal width'].max(), 'width min': x['sepal width'].min(), 'max length': x['sepal length'].max(), 'mean length':x['sepal length'].mean(), 'length percentile':x['sepal length'].apply(percentile(20)) } return pd.DataFrame(names)
Is it alright that way?
That's unfortunate, isn't it?
As far as your code goes, I would wager that doing aggregations manually will give you a performance penalty (which may not matter to you unless your dataset is huge) since Pandas tends to perform these operations using carefully optimized methods.
Another thing you can do is give your grouped DataFrame a dictionary of lists, e.g.,
df.groupby('species').agg({ 'sepal width': ['min', 'max'], 'sepal length': ['min', 'max'], })
and then rename the resulting DataFrame manually...