The Basics of Tabular Data

Contents

The Basics of Tabular Data¶

Content Summary¶

The basics of tabular data consist of understanding:

the structure of a table and how it represents a real-world phenomenon,
the basic operations that can be performed on a table and how they reflect the real-world phenomenon it represents,
the computational foundations for tabular data structures in Pandas.

Datasets¶

The primary dataset in this chapter consists of player statistics from the US Women’s National Team in Soccer between 1991 and 2019. The data is taken from Football Reference.

Summary of Library References¶

In the lists below, assume that the usual imports have been executed:

import pandas as pd
import numpy as np

Creating Tabular Structures:¶

Function or Method Name	Description
`pd.Series`	Series constructor
`pd.DataFrame`	DataFrame constructor
`pd.read_csv`	Reading CSV from file

Series/DataFrame attributes and methods:¶

Function or Method Name	Description
`shape`	Number of rows/columns
`head`	Returns first few lines
`tail`	Returns the last few lines
`nunique`	Returns the number of unique values
`dtypes`	Returns the type of the column(s)
`astype`	Returns column(s) coerced to a given type
`sort_values`	sorts Series/DataFrame according to its values
`drop_duplicates`	drops duplicates indices/columns
`Series.apply` and `DataFrame.apply`	apply a function to the entries of a Series / slices of a DataFrame
`Series.agg` and `DataFrame.agg`	apply a collection of functions to a Series/DataFrame

Methods for computing descriptive statistics on Series/DataFrames:¶

Function or Method Name	Description
`describe`	Returns descriptive statistice of column(s)
`count`	Returns the number of non-null entries
`sum`	Returns the sum
`median`	Returns the median
`mean`	Returns the median
`std`	Returns the sample standard deviation
`var`	Returns the sample variance

previous

An Example Investigation

next

2.1. Introduction to Tabular Data