Aggregation and Extension of Data
Contents
Aggregation and Extension of Data¶
Content Summary¶
This chapter covers more drastic data manipulation and data transformation techniques to improve usefulness of a dataset. These techniques include:
Grouping data and applying transformations across those groups,
Manipulating the granulary to a coarser view of the data, while understanding the information lost from applying such a transformation,
Adding new observations to an existing dataset, paying special attention to potential differences in the process that generated the datasets,
Adding new attributes to existing observations, paying special attention to how an imperfect correndspondence may bias the original dataset.
Assesing the differences between populations of a dataset using statistical inference (permutation tests).
Datasets¶
The two datasets used in this chapter consist of:
All attempts to climb Mt. Rainier, in Washington State.
The population and average income of California counties and cities.
Summary of Library References¶
In the lists below, assume that the usual imports have been executed:
import pandas as pd
import numpy as np
import seaborn as sns
Aggregation methods:
Function or Method Name |
Description |
---|---|
Split-Apply-Combine processing on tables |
|
Apply collections of functions to groups |
|
Apply transformations to groups |
|
Apply general functions to groups |
|
Filter out groups based on conditions |
Reshaping methods:
Function or Method Name |
Description |
---|---|
Reshape (pivot) the entries of a DataFrame |
Appending and joining methods:
Function or Method Name |
Description |
---|---|
Concatentate a list of dataframes by rows/columns |
|
Join two DataFrames by common columns |
Datetime:
Function or Method Name |
Description |
---|---|
convert strings to datetime objects |
|
datetime related properties and methods |
Plotting:
Function or Method Name |
Description |
---|---|
plot a scatter-matrix |
|
scatter-plot with easy customization |
|
(strip/box)-plotting by categories |