{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_cell" ] }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "pd.set_option('display.max_rows', 7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Describing Different Kinds of Data\n", "---\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to understand and *describe* a dataset statistically, the observations need to be measured in a quantifiable way. However, the attributes of a dataset vary drastically based on the nature of what is being measured; datasets are often a mixture of numbers, labels, and language-based descriptions. \n", "\n", "Specifying the *kind of data* contained in an attribute helps define strategies to quantify and describe the population in terms of the attribute." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Example:** The dataset below contains information on Health Department inspections for restaurants in San Francisco. Each row describes a different inspection of a restaurant in the city." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | business_name | \n", "business_postal_code | \n", "inspection_date | \n", "month | \n", "day | \n", "inspection_score | \n", "risk_category | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "Sushirrito | \n", "94111 | \n", "2019-03-01 | \n", "3 | \n", "1 | \n", "86.0 | \n", "Low Risk | \n", "
1 | \n", "Swensen's of SF Inc | \n", "94109 | \n", "2018-02-13 | \n", "2 | \n", "13 | \n", "96.0 | \n", "Low Risk | \n", "
2 | \n", "Vinyl Cafe and Wine Bar | \n", "94117 | \n", "2017-01-10 | \n", "1 | \n", "10 | \n", "77.0 | \n", "High Risk | \n", "
3 | \n", "Andrea's Bakery | \n", "94112 | \n", "2017-10-25 | \n", "10 | \n", "25 | \n", "65.0 | \n", "Moderate Risk | \n", "
4 | \n", "MORNING DUE | \n", "94104 | \n", "2018-08-09 | \n", "8 | \n", "9 | \n", "86.0 | \n", "Low Risk | \n", "