Querying and Describing Data
Contents
Querying and Describing Data¶
Content Summary¶
This chapter covers techniques for exploring, understanding, and describing the data contained in tables:
selecting subsets of rows and columns of a table using conditions,
classifying different kinds of measurements contained in the columns of a table,
describing/summarizing the measurements of a population using techniques appropriate for the kind of data being described.
Datasets¶
The primary dataset in this chapter consists of restaurant health inspection data from the San Francisco Health Department.
Summary of Library References¶
In the lists below, assume that the usual imports have been executed:
import pandas as pd
import numpy as np
import seaborn as sns
Selecting data:
Function or Method Name |
Description |
---|---|
|
Column Selection |
Selects sub-tables by index/boolean array |
|
Selects sub-tables by positional index |
Computing distributions:
Function or Method Name |
Description |
---|---|
Returns the counts of values of a column |
|
Returns a table with rows sorted by index |
|
Returns bins and counts in each bin |
Plotting:
Function or Method Name |
Description |
---|---|
plots column(s) in a DataFrame/Series |
|
plots a rug-plot/kde/histogram of data |
|
plots box-plot of data |
|
plots a categorical histogram of data |