The Basics of Tabular Data


Content Summary

The basics of tabular data consist of understanding:

  • the structure of a table and how it represents a real-world phenomenon,

  • the basic operations that can be performed on a table and how they reflect the real-world phenomenon it represents,

  • the computational foundations for tabular data structures in Pandas.

Datasets

The primary dataset in this chapter consists of player statistics from the US Women’s National Team in Soccer between 1991 and 2019. The data is taken from Football Reference.

Summary of Library References

In the lists below, assume that the usual imports have been executed:

import pandas as pd
import numpy as np

Creating Tabular Structures:

Function or Method Name

Description

pd.Series

Series constructor

pd.DataFrame

DataFrame constructor

pd.read_csv

Reading CSV from file

Series/DataFrame attributes and methods:

Function or Method Name

Description

shape

Number of rows/columns

head

Returns first few lines

tail

Returns the last few lines

nunique

Returns the number of unique values

dtypes

Returns the type of the column(s)

astype

Returns column(s) coerced to a given type

sort_values

sorts Series/DataFrame according to its values

drop_duplicates

drops duplicates indices/columns

Series.apply and DataFrame.apply

apply a function to the entries of a Series / slices of a DataFrame

Series.agg and DataFrame.agg

apply a collection of functions to a Series/DataFrame

Methods for computing descriptive statistics on Series/DataFrames:

Function or Method Name

Description

describe

Returns descriptive statistice of column(s)

count

Returns the number of non-null entries

sum

Returns the sum

median

Returns the median

mean

Returns the median

std

Returns the sample standard deviation

var

Returns the sample variance