{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 5 (Oct 19)\n", "\n", "Today we will learn to import data into Python, manipulate 2D tabular data (think spreadsheets) with `pandas`, and save our data to text files like CSV (comma-separated values).\n", "\n", "Outline:\n", "\n", "1. Intro to `pandas`\n", "2. Data I/O (reading and writing)\n", "3. Selecting data\n", "4. Manipulating data\n", "5. Plotting\n", "\n", "## Readings (optional)\n", "\n", "If you find this week's material new or challenging, you may want to read through some or all the following resources while working on your assignment:\n", "\n", "- [SPIRL Ch. 4.5 Pandas](https://cjtu.github.io/spirl/sp_pandas.html)\n", "- [Python for Data Science pandas tutorials](https://wavedatalab.github.io/datawithpython/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below we have our regular imports, but this time with `pandas` which most people alias to `pd`. The Pandas package is included with Anaconda, so you shouldn't need to install anything." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Intro to Pandas\n", "\n", "[Pandas](https://pandas.pydata.org/) is a package at the heart of most data analysis tasks with Python. It is quick at opening, selecting, manipulating and plotting 2D data with columns and rows (like tables and spreadsheets) which pandas calls **DataFrames**.\n", "\n", "Below we make a simple `pandas.DataFrame` to see what Pandas can do. One way to make a DataFrame is with two zipped lists, which we saw when making dictionaries in assignment 2." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('apples', 142), ('oranges', 119), ('pears', 248), ('bananas', 84), ('kiwis', 41), ('cherries', 187)]\n" ] } ], "source": [ "fruit = ['apples', 'oranges', 'pears', 'bananas', 'kiwis', 'cherries']\n", "count = [142, 119, 248, 84, 41, 187]\n", "inventory = list(zip(fruit, count))\n", "print(inventory)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To turn this into a Pandas **DataFrame**, we call `pd.DataFrame` and specify the `data` as our zipped lists and also the `columns` names in order." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "df is type: \n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FruitCount
0apples142
1oranges119
2pears248
3bananas84
4kiwis41
5cherries187
\n", "
" ], "text/plain": [ " Fruit Count\n", "0 apples 142\n", "1 oranges 119\n", "2 pears 248\n", "3 bananas 84\n", "4 kiwis 41\n", "5 cherries 187" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(data=inventory, columns=['Fruit', 'Count'])\n", "print('df is type: ', type(df))\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we see we have a table version of our inventory data with named columns, \"Fruit\", and \"Count\". Let's see what we can learn about these data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FruitCount
0apples142
1oranges119
2pears248
3bananas84
4kiwis41
\n", "
" ], "text/plain": [ " Fruit Count\n", "0 apples 142\n", "1 oranges 119\n", "2 pears 248\n", "3 bananas 84\n", "4 kiwis 41" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head() # show first 5 rows" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FruitCount
0apples142
1oranges119
2pears248
3bananas84
4kiwis41
5cherries187
\n", "
" ], "text/plain": [ " Fruit Count\n", "0 apples 142\n", "1 oranges 119\n", "2 pears 248\n", "3 bananas 84\n", "4 kiwis 41\n", "5 cherries 187" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(6) # show first n rows" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(df) # number of rows" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['Fruit', 'Count'], dtype='object')\n", "['Fruit', 'Count']\n" ] } ], "source": [ "print(df.columns) # column names\n", "print(list(df.columns)) # as a normal list" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Fruit object\n", "Count int64\n", "dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes # see data types for each column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each column in a DataFrame is assigned a type. We can change the type of a column with `.astype`. This is sometimes needed to ensure that you are working with numerical data, dates, etc." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Fruit object\n", "Count float64\n", "dtype: object" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.astype({\"Fruit\": str, \"Count\": float})\n", "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A quirk of pandas is that it doesn't recognize the `str` type. All we need to do is spell out `\"string\"` in this case." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Fruit string\n", "Count float64\n", "dtype: object" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.astype({\"Fruit\": \"string\", \"Count\": float})\n", "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data I/O: Importing a CSV\n", "\n", "Data I/O (in/out) operations are an essential and often tedious part of analyzing scientific data. Pandas offers some help and can read simple structured data files with little trouble. These are mainly through the `pd.read_` functions listed [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html), some of which include:\n", "\n", "- CSV: `pd.read_csv()`\n", "- Fixed-width: `pd.read_fwf()`\n", "- Excel: `pd.read_excel()`\n", "- and more...\n", "\n", "Here we read in a **CSV** from a public URL. You can also read in CSVs on your local computer by passing in the path." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarUnnamed: 9Unnamed: 10Unnamed: 11Unnamed: 12Unnamed: 13
018.08307130350412.0701chevrolet chevelle malibuNaNNaNNaNNaNNaN
115.08350165369311.5701buick skylark 320NaNNaNNaNNaNNaN
218.08318150343611.0701plymouth satelliteNaNNaNNaNNaNNaN
316.08304150343312.0701amc rebel sstNaNNaNNaNNaNNaN
417.08302140344910.5701ford torinoNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "0 18.0 8 307 130 3504 12.0 70 \n", "1 15.0 8 350 165 3693 11.5 70 \n", "2 18.0 8 318 150 3436 11.0 70 \n", "3 16.0 8 304 150 3433 12.0 70 \n", "4 17.0 8 302 140 3449 10.5 70 \n", "\n", " origin car Unnamed: 9 Unnamed: 10 Unnamed: 11 \\\n", "0 1 chevrolet chevelle malibu NaN NaN NaN \n", "1 1 buick skylark 320 NaN NaN NaN \n", "2 1 plymouth satellite NaN NaN NaN \n", "3 1 amc rebel sst NaN NaN NaN \n", "4 1 ford torino NaN NaN NaN \n", "\n", " Unnamed: 12 Unnamed: 13 \n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "4 NaN NaN " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "csv_url = \"https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv\"\n", "df = pd.read_csv(csv_url)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems this CSV was imported with a bunch of empty columns. We can cut those off after importing, or adjust our `read_csv` to only include certain columns. This can be done by giving a list of indices to the `usecols` argument, but a range works just as well. Here we want the first 9 columns, meaning `range(0, 9)` in Python." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincar
018.08307130350412.0701chevrolet chevelle malibu
115.08350165369311.5701buick skylark 320
218.08318150343611.0701plymouth satellite
316.08304150343312.0701amc rebel sst
417.08302140344910.5701ford torino
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "0 18.0 8 307 130 3504 12.0 70 \n", "1 15.0 8 350 165 3693 11.5 70 \n", "2 18.0 8 318 150 3436 11.0 70 \n", "3 16.0 8 304 150 3433 12.0 70 \n", "4 17.0 8 302 140 3449 10.5 70 \n", "\n", " origin car \n", "0 1 chevrolet chevelle malibu \n", "1 1 buick skylark 320 \n", "2 1 plymouth satellite \n", "3 1 amc rebel sst \n", "4 1 ford torino " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "csv_url = \"https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/cars1.csv\"\n", "df = pd.read_csv(csv_url, usecols=range(0, 9))\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pandas `read_csv` can also read in other character delimited file formats, e.g. space or tab-separated (TSV) files. All we need to do is change the delimiter. Ex.\n", "\n", "`df = pd.read_csv(f, delimiter='\\t')` # Tab-separated values\n", "\n", "`df = pd.read_csv(f, delimiter='\\s+')` # Whitespace separated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Assignment 1] \n", "\n", "In this assignment, you will practice reading in several datasets of different formats.\n", "\n", "### 1a) CSV (comma-separated)\n", "\n", "We can read in CSV files like above with `pd.read_csv`. Try it on [a5data.csv](https://raw.githubusercontent.com/cjtu/spirl/master/spirl/data/a5data.csv) in the cell provided below (either go to the link and copy the url or right-click then copy link url to use in `pd.read_csv`).\n", "\n", "\n", "### 1b) TSV (tab-separated)\n", "\n", "To read in TSV files, we can also use `pd.read_csv` with the `delimiter='\\t'` argument. Try it on [a5data.tsv](https://raw.githubusercontent.com/cjtu/spirl/master/spirl/data/a5data.tsv) in the cell provided below.\n", "\n", "\n", "### 1c) Excel (.xls, .xlsx)\n", "\n", "To read in Excel files, we use `pd.read_excel`. Try it on [a5data.xlsx](https://raw.githubusercontent.com/cjtu/spirl/master/spirl/data/a5data.xlsx) in the cell provided below.\n", "\n", "\n", "### 1d) ASCII text file\n", "\n", "To read in text files with entries separated by some amount of whitespace (tabs, spaces, etc), we can use `pd.read_csv` with the `delimiter='\\s+'` argument. Try it on [a5data.txt](https://raw.githubusercontent.com/cjtu/spirl/master/spirl/data/a5data.txt) in the cell provided below.\n", "\n", "\n", "### 1e) ASCII text file with header rows\n", "\n", "To have pandas skip rows while importing, we can use the `skiprows` argument of `pd.read_csv`. Try it on [a5data_headers.txt](https://raw.githubusercontent.com/cjtu/spirl/master/spirl/data/a5data_headers.txt) in the space provided below.\n", "\n", "\n", "### 1f) BONUS: FITS table file\n", "\n", "(Optional): For this question, you will need to read in tabular data in a `.fits` file with `astropy.table.Table()` (note: this only works for FITS tables, not images). Once you've read in the table with astropy, convert it to a `pd.DataFrame` with the `.to_pandas()` method. You will need the following line to import astropy Table: \n", "\n", "`from astropy.table import Table`.\n", "\n", "Try it on [a5data.fits](https://raw.githubusercontent.com/cjtu/spirl/master/spirl/data/a5data.fits) in the cell provided below. See other file formats supported by astropy [here](https://docs.astropy.org/en/stable/io/unified.html#built-in-table-readers-writers).\n", "\n", "This data is from a paper by [Shih-Yun Tang et al. (2019)](https://ui.adsabs.harvard.edu/abs/2019ApJ...877...12T/abstract)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [1a code below]\n", "\n", "print(df['Gmag'].max())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [1b code below]\n", "\n", "print(df['Gmag'].max())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [1c code below]\n", "\n", "print(df['Gmag'].max())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [1d code below]\n", "\n", "print(df['Gmag'].max())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [1e code below]\n", "\n", "print(df['Gmag'].max())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [1f code below]\n", "from astropy.table import Table\n", "\n", "print(df['Gmag'].max())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Selecting data with Pandas\n", "\n", "Pandas offers several ways to access and filter data in DataFrames. The most basic are `[]`, `.loc[]` and `.iloc[]`.\n", "\n", "### Selecting with square brackets `[]` and dot `.`\n", "\n", "Selecting with square brackets `[]` gives us the column that we specify." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 chevrolet chevelle malibu\n", "1 buick skylark 320\n", "2 plymouth satellite\n", "3 amc rebel sst\n", "4 ford torino\n", " ... \n", "193 ford maverick\n", "194 amc hornet\n", "195 chevrolet chevette\n", "196 chevrolet woody\n", "197 vw rabbit\n", "Name: car, Length: 198, dtype: object" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['car'] # Select a column by its name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can access DataFrame columns with `.columnname`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 chevrolet chevelle malibu\n", "1 buick skylark 320\n", "2 plymouth satellite\n", "3 amc rebel sst\n", "4 ford torino\n", " ... \n", "193 ford maverick\n", "194 amc hornet\n", "195 chevrolet chevette\n", "196 chevrolet woody\n", "197 vw rabbit\n", "Name: car, Length: 198, dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.car" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no major difference between the two methods, it's a matter of personal preference. The dot (`.`) version is less typing, but is harder to use when you have your column name stored as a variable, for example:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 18.0\n", "1 15.0\n", "2 18.0\n", "3 16.0\n", "4 17.0\n", " ... \n", "193 24.0\n", "194 22.5\n", "195 29.0\n", "196 24.5\n", "197 29.0\n", "Name: mpg, Length: 198, dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col = 'mpg'\n", "df[col] # This works!\n", "# df.col # This doesn't! AttributeError since \"col\" is not a column name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting with `.loc[]`\n", "\n", "We can also select labelled columns or rows in our DataFrames with `.loc[]`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.loc['car']\n", "# KeyError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While DataFrame `.loc[]` allows us to select by the row name and column name, we have to do it in the proper order `[rows, columns]`. To select the `car` columns, we can use the `:` as a placeholder meaning \"all rows\"." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 chevrolet chevelle malibu\n", "1 buick skylark 320\n", "2 plymouth satellite\n", "3 amc rebel sst\n", "4 ford torino\n", " ... \n", "193 ford maverick\n", "194 amc hornet\n", "195 chevrolet chevette\n", "196 chevrolet woody\n", "197 vw rabbit\n", "Name: car, Length: 198, dtype: object" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[:, 'car']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now we know `df['car']` is equivalent to `df.loc[:, 'car']`. We can also select named **rows** of data by their `Index`. Right now our rows are just numbered, but a DataFrame Index could have any unique label (usually `int` or `str`)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mpg 18.0\n", "cylinders 8\n", "displacement 307\n", "horsepower 130\n", "weight 3504\n", "acceleration 12.0\n", "model 70\n", "origin 1\n", "car chevrolet chevelle malibu\n", "Name: 0, dtype: object" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[0] # First row of data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also select a single cell of data by specifying the row and column." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"plymouth 'cuda 340\"" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[11, 'car'] # The car in row 11" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2130" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[29, 'weight'] # The weight in row 29" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also select multiple `Index` or `columns` by passing a list to `.loc[]`." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "car buick skylark 320\n", "horsepower 165\n", "cylinders 8\n", "Name: 1, dtype: object" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[1, ['car', 'horsepower', 'cylinders']]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12 chevrolet monte carlo\n", "13 buick estate wagon (sw)\n", "14 toyota corona mark ii\n", "15 plymouth duster\n", "16 amc hornet\n", "Name: car, dtype: object" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[range(12, 17), 'car']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we select along one row or column, we get a 1D slice of our DataFrame. We can also select multiple indices and columns to get a smaller DataFrame. The result keeps the same order as the rows and columns we specify." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carcylindershorsepower
3amc rebel sst8150
2plymouth satellite8150
7plymouth fury iii8215
1buick skylark 3208165
\n", "
" ], "text/plain": [ " car cylinders horsepower\n", "3 amc rebel sst 8 150\n", "2 plymouth satellite 8 150\n", "7 plymouth fury iii 8 215\n", "1 buick skylark 320 8 165" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.loc[[3, 2, 7, 1], ['car', 'cylinders', 'horsepower']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A common task is to make a subset of our DataFrame with only the columns we need. We can do this easily with `.loc[]`." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carcylindershorsepowerweightmpgacceleration
0chevrolet chevelle malibu8130350418.012.0
1buick skylark 3208165369315.011.5
2plymouth satellite8150343618.011.0
3amc rebel sst8150343316.012.0
4ford torino8140344917.010.5
\n", "
" ], "text/plain": [ " car cylinders horsepower weight mpg acceleration\n", "0 chevrolet chevelle malibu 8 130 3504 18.0 12.0\n", "1 buick skylark 320 8 165 3693 15.0 11.5\n", "2 plymouth satellite 8 150 3436 18.0 11.0\n", "3 amc rebel sst 8 150 3433 16.0 12.0\n", "4 ford torino 8 140 3449 17.0 10.5" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub_df = df.loc[:, ['car', 'cylinders', 'horsepower', 'weight', 'mpg', 'acceleration']]\n", "sub_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One limitation of `.loc[]` is that we can't select a range of `str`, in this case our columns. We can choose a range of columns with `.iloc[]` instead. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sub_df = df.loc[:, range(4)]\n", "# KeyError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting with `.iloc[]`\n", "\n", "With `.iloc[]`, we can select data by the numbered `Index` or `columns`." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepower
018.08307130
115.08350165
218.08318150
316.08304150
417.08302140
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower\n", "0 18.0 8 307 130\n", "1 15.0 8 350 165\n", "2 18.0 8 318 150\n", "3 16.0 8 304 150\n", "4 17.0 8 302 140" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub_df = df.iloc[:, range(4)]\n", "sub_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So in pandas, both the order of the rows and columns and their labels are important! We can select by labels using `.loc` and by integer index order using `.iloc`." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carcylindershorsepower
3amc rebel sst8150
2plymouth satellite8150
7plymouth fury iii8215
1buick skylark 3208165
\n", "
" ], "text/plain": [ " car cylinders horsepower\n", "3 amc rebel sst 8 150\n", "2 plymouth satellite 8 150\n", "7 plymouth fury iii 8 215\n", "1 buick skylark 320 8 165" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df.loc[[3, 2, 7, 1], ['car', 'cylinders', 'horsepower']]\n", "df2.head()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 18.0\n", "1 15.0\n", "2 18.0\n", "3 16.0\n", "4 17.0\n", " ... \n", "193 24.0\n", "194 22.5\n", "195 29.0\n", "196 24.5\n", "197 29.0\n", "Name: mpg, Length: 198, dtype: float64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sub_df.iloc[:, 0] # mpg was 1st column" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 amc rebel sst\n", "2 plymouth satellite\n", "7 plymouth fury iii\n", "1 buick skylark 320\n", "Name: car, dtype: object" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.iloc[:, 0] # car is 1st column" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting a new index in pandas\n", "\n", "We can choose a column of unique values to be our new `pandas` index. Here, let's use the car names." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigin
car
chevrolet chevelle malibu18.08307130350412.0701
buick skylark 32015.08350165369311.5701
plymouth satellite18.08318150343611.0701
amc rebel sst16.08304150343312.0701
ford torino17.08302140344910.5701
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight \\\n", "car \n", "chevrolet chevelle malibu 18.0 8 307 130 3504 \n", "buick skylark 320 15.0 8 350 165 3693 \n", "plymouth satellite 18.0 8 318 150 3436 \n", "amc rebel sst 16.0 8 304 150 3433 \n", "ford torino 17.0 8 302 140 3449 \n", "\n", " acceleration model origin \n", "car \n", "chevrolet chevelle malibu 12.0 70 1 \n", "buick skylark 320 11.5 70 1 \n", "plymouth satellite 11.0 70 1 \n", "amc rebel sst 12.0 70 1 \n", "ford torino 10.5 70 1 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = df.set_index('car')\n", "df2.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can select certain rows by the car name using `.loc`." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "mpg 16.0\n", "cylinders 8\n", "displacement 304\n", "horsepower 150\n", "weight 3433\n", "acceleration 12.0\n", "model 70\n", "origin 1\n", "Name: amc rebel sst, dtype: object" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.loc['amc rebel sst']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Be careful when working with a numerical `Index()`. You can run into confusing situations where the index associated with your rows is not in the same order as your rows. Recall the earlier example" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carcylindershorsepower
3amc rebel sst8150
2plymouth satellite8150
7plymouth fury iii8215
1buick skylark 3208165
\n", "
" ], "text/plain": [ " car cylinders horsepower\n", "3 amc rebel sst 8 150\n", "2 plymouth satellite 8 150\n", "7 plymouth fury iii 8 215\n", "1 buick skylark 320 8 165" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3 = df.loc[[3, 2, 7, 1], ['car', 'cylinders', 'horsepower']]\n", "df3.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, our `Index` has the numbers `[3, 2, 7, 1]`, in that order as we can see with the `.index`." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Int64Index([3, 2, 7, 1], dtype='int64')" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.index" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means we can select any values in our index with `.loc`, or any *row* number from 0 to 3 with `.iloc`." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "car plymouth fury iii\n", "cylinders 8\n", "horsepower 215\n", "Name: 7, dtype: object\n", "car plymouth fury iii\n", "cylinders 8\n", "horsepower 215\n", "Name: 7, dtype: object\n" ] } ], "source": [ "print(df3.loc[7]) # Index 7\n", "print(df3.iloc[2]) # Row 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But here we need to be careful because we can pick the 1st row with `.iloc[0]`, but there is no row with index `0` (remember our only indices are `[3, 2, 7, 1]`)." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "car amc rebel sst\n", "cylinders 8\n", "horsepower 150\n", "Name: 3, dtype: object\n" ] } ], "source": [ "print(df3.iloc[0])\n", "# print(df3.loc[0]) # KeyError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the numerical `Index` assigned by pandas doesn't matter and we don't want this situation where there are gaps or different orders of our rows, we can call `.reset_index(drop=True)`. This will renumber our DataFrame rows starting from 0." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carcylindershorsepower
3amc rebel sst8150
2plymouth satellite8150
7plymouth fury iii8215
1buick skylark 3208165
\n", "
" ], "text/plain": [ " car cylinders horsepower\n", "3 amc rebel sst 8 150\n", "2 plymouth satellite 8 150\n", "7 plymouth fury iii 8 215\n", "1 buick skylark 320 8 165" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df3.head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
carcylindershorsepower
0amc rebel sst8150
1plymouth satellite8150
2plymouth fury iii8215
3buick skylark 3208165
\n", "
" ], "text/plain": [ " car cylinders horsepower\n", "0 amc rebel sst 8 150\n", "1 plymouth satellite 8 150\n", "2 plymouth fury iii 8 215\n", "3 buick skylark 320 8 165" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df4 = df3.reset_index(drop=True)\n", "df4.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Summary: Selecting from DataFrames\n", "\n", "Keep in mind:\n", "\n", "- `['colname']` or `.colname` selects columns\n", "- `.loc[]` selects by a **label** (i.e., a value in `df.index` or `df.columns`)\n", "- `.iloc[]` selects by a numbered index in row / column order, starting at 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding a new column\n", "\n", "We can add columns to a DataFrame with `DataFrame['col'] = values`. We need to make sure the new values are the same number of rows as the DataFrame. An easy way to do this is to derive columns from existing DataFrame columns." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
018.08307130350412.0701chevrolet chevelle malibu13.07
115.08350165369311.5701buick skylark 32015.68
218.08318150343611.0701plymouth satellite13.07
316.08304150343312.0701amc rebel sst14.70
417.08302140344910.5701ford torino13.84
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "0 18.0 8 307 130 3504 12.0 70 \n", "1 15.0 8 350 165 3693 11.5 70 \n", "2 18.0 8 318 150 3436 11.0 70 \n", "3 16.0 8 304 150 3433 12.0 70 \n", "4 17.0 8 302 140 3449 10.5 70 \n", "\n", " origin car lp100km \n", "0 1 chevrolet chevelle malibu 13.07 \n", "1 1 buick skylark 320 15.68 \n", "2 1 plymouth satellite 13.07 \n", "3 1 amc rebel sst 14.70 \n", "4 1 ford torino 13.84 " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert miles per gallons to liters per 100 km\n", "df['lp100km'] = round(235.214583 / df['mpg'], 2)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving to a file\n", "\n", "It's nice to manipulate and plot our data in Python, but we often want to save the results of our efforts for later.\n", "\n", "The simplest way to do so is with pandas `.to_csv`. Pandas saves the `Index` by default, but since we just have default indices, you can just omit the index from the csv with `index=False`." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [], "source": [ "fname = '/full/path/to/cars.csv' # <- change to real path\n", "# df.to_csv(fname, index=False) # <- uncomment to save to file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other useful DataFrame methods" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
19324.0620081301217.6761ford maverick9.80
19422.5623290308517.6761amc hornet10.45
19529.048552203522.2761chevrolet chevette8.11
19624.549860216422.1761chevrolet woody9.60
19729.049070193714.2762vw rabbit8.11
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "193 24.0 6 200 81 3012 17.6 76 \n", "194 22.5 6 232 90 3085 17.6 76 \n", "195 29.0 4 85 52 2035 22.2 76 \n", "196 24.5 4 98 60 2164 22.1 76 \n", "197 29.0 4 90 70 1937 14.2 76 \n", "\n", " origin car lp100km \n", "193 1 ford maverick 9.80 \n", "194 1 amc hornet 10.45 \n", "195 1 chevrolet chevette 8.11 \n", "196 1 chevrolet woody 9.60 \n", "197 2 vw rabbit 8.11 " ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail() # last 5 rows" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementweightaccelerationmodeloriginlp100km
count198.000000198.000000198.000000198.000000198.000000198.000000198.000000198.000000
mean19.7196975.898990223.4696973177.88888915.00555672.8181821.43939413.029293
std5.8142541.785417115.181017934.7837332.8723821.8653320.7080853.924258
min9.0000003.00000068.0000001613.0000008.00000070.0000001.0000006.720000
25%15.0000004.000000113.2500002302.50000013.00000071.0000001.0000009.650000
50%19.0000006.000000228.0000003030.00000015.00000073.0000001.00000012.380000
75%24.3750008.000000318.0000004080.75000016.80000074.0000002.00000015.680000
max35.0000008.000000455.0000005140.00000023.50000076.0000003.00000026.130000
\n", "
" ], "text/plain": [ " mpg cylinders displacement weight acceleration \\\n", "count 198.000000 198.000000 198.000000 198.000000 198.000000 \n", "mean 19.719697 5.898990 223.469697 3177.888889 15.005556 \n", "std 5.814254 1.785417 115.181017 934.783733 2.872382 \n", "min 9.000000 3.000000 68.000000 1613.000000 8.000000 \n", "25% 15.000000 4.000000 113.250000 2302.500000 13.000000 \n", "50% 19.000000 6.000000 228.000000 3030.000000 15.000000 \n", "75% 24.375000 8.000000 318.000000 4080.750000 16.800000 \n", "max 35.000000 8.000000 455.000000 5140.000000 23.500000 \n", "\n", " model origin lp100km \n", "count 198.000000 198.000000 198.000000 \n", "mean 72.818182 1.439394 13.029293 \n", "std 1.865332 0.708085 3.924258 \n", "min 70.000000 1.000000 6.720000 \n", "25% 71.000000 1.000000 9.650000 \n", "50% 73.000000 1.000000 12.380000 \n", "75% 74.000000 2.000000 15.680000 \n", "max 76.000000 3.000000 26.130000 " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe() # summary statistics on each numerical column" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FruitCount
0apples142.0
1apples142.0
2pears248.0
3bananas84.0
4kiwisNaN
\n", "
" ], "text/plain": [ " Fruit Count\n", "0 apples 142.0\n", "1 apples 142.0\n", "2 pears 248.0\n", "3 bananas 84.0\n", "4 kiwis NaN" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fruit = ['apples', 'apples', 'pears', 'bananas', 'kiwis', 'cherries']\n", "count = [142, 142, 248, 84, None, 187]\n", "fruit_df = pd.DataFrame(zip(fruit, count), columns=['Fruit', 'Count'])\n", "fruit_df.head()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FruitCount
0apples142.0
1apples142.0
2pears248.0
3bananas84.0
4kiwis0.0
\n", "
" ], "text/plain": [ " Fruit Count\n", "0 apples 142.0\n", "1 apples 142.0\n", "2 pears 248.0\n", "3 bananas 84.0\n", "4 kiwis 0.0" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fruit_df = fruit_df.fillna(0) # fill missing values with 0\n", "fruit_df.head()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FruitCount
0apples142.0
2pears248.0
3bananas84.0
4kiwis0.0
5cherries187.0
\n", "
" ], "text/plain": [ " Fruit Count\n", "0 apples 142.0\n", "2 pears 248.0\n", "3 bananas 84.0\n", "4 kiwis 0.0\n", "5 cherries 187.0" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fruit_df = fruit_df.drop_duplicates()\n", "fruit_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filtering DataFrames\n", "\n", "We often want to select data within a certain range or on a specific condition. There are a couple ways to do this. The simplest (but slowest and most error-prone) way is to use a loop.\n", "\n", "Say we want to select only cars with 4 cylinders. Here we use `df.iterrows()` which will return the `Index` of the row and the row itself on each iteration in the loop." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['toyota corona mark ii', 'datsun pl510', 'volkswagen 1131 deluxe sedan', 'peugeot 504', 'audi 100 ls']\n" ] } ], "source": [ "cars_4cyl = []\n", "for i, row in df.iterrows():\n", " if row['cylinders'] == 4:\n", " cars_4cyl.append(row['car'])\n", "print(cars_4cyl[:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a list of 4 cylinder cars, but we've lost all other information stored about them in the DataFrame. We would need to make a separate list for each column we want to store. \n", "\n", "Instead of using a loop, we can do the **vectorized** solution which keeps our data in a DataFrame and selects info based on our condition (`row['cylinders'] == 4`)." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Max cylinders in DataFrame 4\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
1424.0411395237215.0703toyota corona mark ii9.80
1827.049788213014.5703datsun pl5108.71
1926.049746183520.5702volkswagen 1131 deluxe sedan9.05
2025.0411087267217.5702peugeot 5049.41
2124.0410790243014.5702audi 100 ls9.80
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "14 24.0 4 113 95 2372 15.0 70 \n", "18 27.0 4 97 88 2130 14.5 70 \n", "19 26.0 4 97 46 1835 20.5 70 \n", "20 25.0 4 110 87 2672 17.5 70 \n", "21 24.0 4 107 90 2430 14.5 70 \n", "\n", " origin car lp100km \n", "14 3 toyota corona mark ii 9.80 \n", "18 3 datsun pl510 8.71 \n", "19 2 volkswagen 1131 deluxe sedan 9.05 \n", "20 2 peugeot 504 9.41 \n", "21 2 audi 100 ls 9.80 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_4cyl = df[df.cylinders == 4]\n", "# df_4cyl = df[df['cylinders'] == 4] # Equivalent to the above\n", "print('Max cylinders in DataFrame', df_4cyl.cylinders.max())\n", "df_4cyl.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To break down what we are doing here, we can look at what `df.cylinders == 4` gives us." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 False\n", "3 False\n", "4 False\n", " ... \n", "193 False\n", "194 False\n", "195 True\n", "196 True\n", "197 True\n", "Name: cylinders, Length: 198, dtype: bool" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.cylinders == 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a **boolean index**, meaning it will be `True` or `False` based on the condition we supplied (whether there are 4 cylinders or not). \n", "\n", "When we pass a boolean index to a DataFrame in square brackets (`[]`), it's similar to supplying specific labels or indices as we learned above. The only difference is it will select only the rows in the DataFrame that have `True`, e.g. rows where the car has 4 cylinders.\n", "\n", "We can supply conditions with any of the comparison operators (`==`, `<`, `<=`, `>`, `>=`). Say we want to find cars that get the fewest miles per gallon (mpg < 12):" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
2510.08360215461514.0701ford f25023.52
2610.08307200437615.0701chevy c2023.52
2711.08318210438213.5701dodge d20021.38
289.08304193473218.5701hi 1200d26.13
6711.08429208463311.0721mercury marquis21.38
10311.08400150499714.0731chevrolet impala21.38
12411.08350180366411.0731oldsmobile omega21.38
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "25 10.0 8 360 215 4615 14.0 70 \n", "26 10.0 8 307 200 4376 15.0 70 \n", "27 11.0 8 318 210 4382 13.5 70 \n", "28 9.0 8 304 193 4732 18.5 70 \n", "67 11.0 8 429 208 4633 11.0 72 \n", "103 11.0 8 400 150 4997 14.0 73 \n", "124 11.0 8 350 180 3664 11.0 73 \n", "\n", " origin car lp100km \n", "25 1 ford f250 23.52 \n", "26 1 chevy c20 23.52 \n", "27 1 dodge d200 21.38 \n", "28 1 hi 1200d 26.13 \n", "67 1 mercury marquis 21.38 \n", "103 1 chevrolet impala 21.38 \n", "124 1 oldsmobile omega 21.38 " ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.mpg < 12]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also string conditions together by separating them with parentheses (`()`) and the **union** `&` or **intersection** `|` operators.\n", "\n", "- Union (`&`): select where condition1 AND condition2 are `True`\n", "- Intersection (`|`): select where condition1 OR condition2 are `True`" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
1926.049746183520.5702volkswagen 1131 deluxe sedan9.05
5626.049170195520.5711plymouth cricket9.05
10226.049746195021.0732volkswagen super beetle9.05
13132.047165183621.0743toyota corolla 12007.35
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "19 26.0 4 97 46 1835 20.5 70 \n", "56 26.0 4 91 70 1955 20.5 71 \n", "102 26.0 4 97 46 1950 21.0 73 \n", "131 32.0 4 71 65 1836 21.0 74 \n", "\n", " origin car lp100km \n", "19 2 volkswagen 1131 deluxe sedan 9.05 \n", "56 1 plymouth cricket 9.05 \n", "102 2 volkswagen super beetle 9.05 \n", "131 3 toyota corolla 1200 7.35 " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "light_high_accel = df[(df.weight < 2000) & (df.acceleration > 20)]\n", "light_high_accel.head()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
19014.58351152421512.8761ford gran torino16.22
19122.06225100323315.4761plymouth valiant10.69
19222.06250105335314.5761chevrolet nova10.69
19324.0620081301217.6761ford maverick9.80
19422.5623290308517.6761amc hornet10.45
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "190 14.5 8 351 152 4215 12.8 76 \n", "191 22.0 6 225 100 3233 15.4 76 \n", "192 22.0 6 250 105 3353 14.5 76 \n", "193 24.0 6 200 81 3012 17.6 76 \n", "194 22.5 6 232 90 3085 17.6 76 \n", "\n", " origin car lp100km \n", "190 1 ford gran torino 16.22 \n", "191 1 plymouth valiant 10.69 \n", "192 1 chevrolet nova 10.69 \n", "193 1 ford maverick 9.80 \n", "194 1 amc hornet 10.45 " ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "heavy_or_8cyl = df[(df.weight > 3000) | (df.cylinders == 8)]\n", "heavy_or_8cyl.tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can take the opposite of a comparison with a preceding **negation** operator (`~`)." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
417.08302140344910.5701ford torino13.84
515.08429198434110.0701ford galaxie 50015.68
1721.0620085258716.0701ford maverick11.20
2510.08360215461514.0701ford f25023.52
3225.0498?204619.0711ford pinto9.41
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "4 17.0 8 302 140 3449 10.5 70 \n", "5 15.0 8 429 198 4341 10.0 70 \n", "17 21.0 6 200 85 2587 16.0 70 \n", "25 10.0 8 360 215 4615 14.0 70 \n", "32 25.0 4 98 ? 2046 19.0 71 \n", "\n", " origin car lp100km \n", "4 1 ford torino 13.84 \n", "5 1 ford galaxie 500 15.68 \n", "17 1 ford maverick 11.20 \n", "25 1 ford f250 23.52 \n", "32 1 ford pinto 9.41 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use str.contains to match a sub-string\n", "ford_cars = df[df.car.str.contains('ford')]\n", "ford_cars.head()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mpgcylindersdisplacementhorsepowerweightaccelerationmodelorigincarlp100km
1522.0619895283315.5701plymouth duster10.69
1618.0619997277415.5701amc hornet13.07
1926.049746183520.5702volkswagen 1131 deluxe sedan9.05
2025.0411087267217.5702peugeot 5049.41
2225.0410495237517.5702saab 99e9.41
\n", "
" ], "text/plain": [ " mpg cylinders displacement horsepower weight acceleration model \\\n", "15 22.0 6 198 95 2833 15.5 70 \n", "16 18.0 6 199 97 2774 15.5 70 \n", "19 26.0 4 97 46 1835 20.5 70 \n", "20 25.0 4 110 87 2672 17.5 70 \n", "22 25.0 4 104 95 2375 17.5 70 \n", "\n", " origin car lp100km \n", "15 1 plymouth duster 10.69 \n", "16 1 amc hornet 13.07 \n", "19 2 volkswagen 1131 deluxe sedan 9.05 \n", "20 2 peugeot 504 9.41 \n", "22 2 saab 99e 9.41 " ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "high_accel_not_ford = df[(df.acceleration > 15) & ~df.car.str.contains('ford')]\n", "high_accel_not_ford.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting with DataFrame.plot\n", "\n", "Pandas gives you a convenient interface into plotting with the `.plot` method. It even labels your axes for you!" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAtcUlEQVR4nO3df5TcVZnn8ffTP+i0+QEhQCZJJ8Q1sHMIP9qlzQSjrOLsDOs4BBWyjIvBHWbicvAYBxTEP9TBZQdQ4ozODjsRHYiK2ICarAM6QMZV3JDYcTrBwLhmBJJOMiH0JiGNSae7+tk/6tukqlPfqu+trm/Xj/68zgldfftb1be6OPXUvc+9zzV3R0REpJimandARERqn4KFiIiUpGAhIiIlKViIiEhJChYiIlJSS7U7kJYzzjjDFy5cWO1uiIjUla1bt77i7meObW/YYLFw4UJ6enqq3Q0RkbpiZi8Vatc0lIiIlKRgISIiJSlYiIhISQoWIiJSkoKFiIiUpGDRQPoHBtm2+xD9A4PV7oqINJiGXTo72azv3cOtj26ntamJoZER7n7/hVzROa/a3RKRBqGRRQPoHxjk1ke3c2xohCODwxwbGuGWR7drhCEiFaNg0QD6Dh6ltSn/pWxtaqLv4NEq9UhEGo2CRQPomNnO0MhIXtvQyAgdM9ur1CMRaTQKFg1g1rQ27n7/hUxpbWJ6WwtTWpu4+/0XMmtaW7W7JiINQgnuBnFF5zyWLTqDvoNH6ZjZrkAhIhWlYNFAZk1rU5AQkVSkOg1lZi+a2bNm1mtmPVHb6Wb2hJn9Kvo6M+f628xsp5n90sx+P6f94uhxdprZl8zM0ux3WrQPQkTq1UTkLN7p7p3u3hV9/0ngKXc/B3gq+h4zOw+4BlgMXA78jZk1R/e5F1gFnBP9u3wC+l1R63v3sOyujVx732aW3bWRDb17qt0lEZHEqpHgXg48EN1+ALgyp/0hdx909xeAncASM5sDzHD3Te7uwLqc+9QF7YMQkXqXdrBw4B/MbKuZrYraZrv7PoDo61lR+zxgd859+6K2edHtse0nMbNVZtZjZj0HDhyo4NMYn3L3QfS80M+af/glPS/0p9k9EZGS0k5wL3P3vWZ2FvCEmf1zkWsL5SG8SPvJje5rgbUAXV1dBa+phnL2QVx73zM8vTMbJL60cSdvXzSLr//J0lT7KSISJ9WRhbvvjb6+DHwXWALsj6aWiL6+HF3eB8zPuXsHsDdq7yjQXjdC90H0vND/eqAY9ZOd/RphiEjVpDayMLOpQJO7H4lu/x5wO7ABuA64M/q6PrrLBuBBM1sDzCWbyN7i7hkzO2JmS4HNwErgy2n1Oy0h+yB+/KtXYtu73jgrrS6KiMRKcxpqNvDdaJVrC/Cgu//AzH4GdJvZ9cAu4GoAd99hZt3Ac8AwcKO7Z6LHugG4H2gHHo/+1Z2k+yAu6jg1qF1EJG2pBQt3/zVwUYH2fuBdMfe5A7ijQHsPcH6l+1irzpg+hWaDTE7Wpdmy7SIi1aDaUDWoY2Y7rS1jVk+1NKkwoIhUjYJFDVJhQBGpNaoNVaNUGFBEaomCRQ1TYUARqRWahhIRkZIULBqIqtqKSFo0DTVG/8BgXeYJ1vfu4ZZHttFsTWR8hM9fdRFXdBYsoSUiEkzBIsf63j3c+uh2WpuaGBoZ4e73X1gXb7j9A4Pc3N3L8AhAdh/jTd29LFt0Rl0FPBGpXZqGitRzGfEde1+NAsUJwyPZdhGRSlCwiJRbRrw2xBXYrZnCuyJS5xQsIuWUEa8Vc08t3Me4dkg/Ga5ku0hjUc4iMrpr+pYxOYt6mPN/7XiG1mZjKKeYVGuz8drxTMHrs8nw7TQ3GZkR5/NXVTY3U6+5HxGJp2CRI+1d02mttJp6SnNeoAAYyjhTT2k+6dr+gUE+/vC2vOtvfnhbxZLhubmfY2RHarc8ul3JdpE6p2AxRuiu6aQBIM1P268dz9DWbAzmBIC2mJHFjr2HCwaWHXsPc+m5Z510fajR3M9ooIATuR8FC5H6pWAxDkkDQNqftjtmtmNN+TXNrcli8i2FTqkt1h7el3rN/YhIPCW4yxSy1DbtlVYhVWoXz53BmOrntDRl2ye6LyJSPzSyKFPIdMtEfNpOmm+ZNa2NNSs6+cSY3d6VfDNXxVyRxqNgUaaQADBRK62S5lsm4s1cFXNFGouCRZlCA0CtfdrWm7mIhEg9WJhZM9AD7HH395jZZ4E/BQ5El3zK3R+Lrr0NuJ5sgaOPuvsPo/aLgfuBduAxYLW7V317cmgASGulVbnXp6mW+iIi4zcRI4vVwPNAbgb1i+7+hdyLzOw84BpgMTAXeNLMznX3DHAvsAp4hmywuBx4fAL6XlJan9BDl9rW0ka4WuqLiFRGqquhzKwD+APgvgSXLwcecvdBd38B2AksMbM5wAx33xSNJtYBV6bV51oQWtSwnCKIaZXjqOeCjCISL+2ls38J3AKMqYnKR8xsu5l9zcxmRm3zgN051/RFbfOi22PbT2Jmq8ysx8x6Dhw4UOiSuhC61Lbv4FF8JH9Wzkc89vr1vXt4651P8Udrn+Gtdz7Fht49JfuUNLjUd0FGEYmTWrAws/cAL7v71jE/uhd4E9AJ7APuGb1LgYfxIu0nN7qvdfcud+8688wzy+p3LeiY2c6x4fzd18eGM7FLbaee0py3extgsEi5j5u7exkcdn4zlGFw2Lmpu7doEFjfu4dld23k2vs2s+yujUWDizbliTSmNEcWy4ArzOxF4CHgMjP7hrvvd/eMu48AXwGWRNf3AfNz7t8B7I3aOwq0N7Sx+fti+fzRQoK54goJhp59ETqtpE15Io0ptQS3u98G3AZgZu8APu7u15rZHHffF132XuAX0e0NwINmtoZsgvscYIu7Z8zsiJktBTYDK4Evp9XvWtB38CjtrS0cGRx+va29tSW2vlJIIcFXjw4V/J1x7eXUeqq1ZcIiMn7V2Gdxt5l1kp1KehH4MIC77zCzbuA5YBi4MVoJBXADJ5bOPk6NrISCsCWiSa8NncoJKSQ4o7214GPEtZc7raR9HCKNZUKChbv/CPhRdPuDRa67A7ijQHsPcH5K3StbyBLRkGtDN/yFFBJcPHcGRn7Sx4ivDVXP53yISOVoB3eZQirJllN1NmQqZ9a0NlZc3MG6Z3a93raiqyP2Pk3RoUe53xdzRec8zpszg97dh+icfxqLZk8ver2INB5VnS1TyBLRcpeTzprWxkXzT0s0vdW9tS+vrbunr2ASesfew3mBAiAzkj3PIs763j2856+f5s//13O856+fTrTUVkQai4JFmULm8tNeThoWjMLOs9AmOxEBBYuyhSwRTXs5aUgwGs1Z5CqWs9AmOxEB5SzGJSSvkOZy0lnT2ljR1cG6TclyFi3NlrfUtqU5PmdR7qhIhQRFGotGFuOUNK8Qem2I/oFBvrVld17bt7bsjj21b0pL/v6LKS3NsSOFckZFITu+RaQ+aGTRAHbsPVxwU96OvYe59Nyz8to7ZrZzdGg4r+3o0HDRkULIqCjt88ZFpDoULBpCWNLax+y08Jzr4qaPkm6yK2fHdxxNZYnUDgWLBrB47gxamsir+dTSVDhpXWzp7MHfDI37HIpKrfzSmRgitUU5iwYwa1oba1Z00tZivKG1mbYWY82KzoKfxuNqQO05eLQiS2RHNwjmKpZsL0TLdUVqj0YWDSJpXmFG+ykF2weHRyoyfRS3QXD1u85N/DiVnMoSkcrQyKKBJFltNTpllaulCd626IyKTB9VYl+GzsQQqT0KFpPMrGltfGDJgry2D/zOAhbNnl50iWzSk/JCD26K66POxBCpLZqGmmT6BwZ5cMuuvLYHN+9i9bvOjZ3KCk02hxzcFEdnYojUFo0sGsjO/Ud4pGc3O/cfib2m1El5Y6eyQpPNowc35Ro9uClUWpsYRSScRhYN4tPfezavRPnKSxZw+/ILClwZ9ym/cHtosln5BpHGpJFFA9i5/0heoABYt2lXwRHG3FMLv2nHtYfu+Fa+QaQxaWTRAHp3H4ptH3tQ0d7Dxwpeu/fwsdhDjczyd3xnv4+nfINI41GwaACd808LaA+fhprS0sxQ5sToYrTwYKnT+xQkRBpH6tNQZtZsZv9kZt+Pvj/dzJ4ws19FX2fmXHubme00s1+a2e/ntF9sZs9GP/uSlfpoO8ksmj2dlZfkL4ddecmCgiOFcqahlIMQkYnIWawGns/5/pPAU+5+DvBU9D1mdh5wDbAYuBz4GzMbraV9L7AKOCf6d/kE9Luu3L78Ah758FI+etkiHvnw0pjkNrx2PMOU1vyXfUprE68dzxS8XjkIEYGUp6HMrAP4A+AO4KaoeTnwjuj2A8CPgFuj9ofcfRB4wcx2AkvM7EVghrtvih5zHXAl8Hiafa83uXsh1v7k17F7IeJGBOWWKFdlWJHJIe2RxV8CtwC58xiz3X0fQPR19MCFeUDuCT59Udu86PbY9pOY2Soz6zGzngMHDlTkCdSDkL0Q5Y4UCu150CFHIpNHaiMLM3sP8LK7bzWzdyS5S4E2L9J+cqP7WmAtQFdXV/i24ToVuheiEquVdMiRyOSS5jTUMuAKM3s3MAWYYWbfAPab2Rx332dmc4CXo+v7gPk59+8A9kbtHQXaJVJOEnq8q5UqXRlW01kitS21aSh3v83dO9x9IdnE9UZ3vxbYAFwXXXYdsD66vQG4xszazOyNZBPZW6KpqiNmtjRaBbUy5z7CxCShxxYSLBWgkhYeBE1nidSDauyzuBPoNrPrgV3A1QDuvsPMuoHngGHgRncfXaJzA3A/0E42sa3k9hhpboSLKyR49/sv5JYx7bOmtQUVHtR0lkh9sHIqgtaDrq4u7+npqXY36l7/wCDL7trIsaETo4gprU389NbLmDWt7aTpo1LXj7Vt9yGuvW8zRwZPbPqb3tbCN/7kd7goZrOhiKTHzLa6e9fYdtWGkqJKHWY0dpVU6OFH2vQnUh8ULBpISJ4gqdDDjELf/LXpT6Q+qDZUgwg9oChEyGFGo2/+hXIZcVR4UKT2KVg0gDSTxKOHGeXmFEYPM4p77Cs653HenBn07j5E5/zTYqvZ5gpZyqtltiITT8GiAVR6z0OucnIKaY5y0nxsEYmnnEUDSDNJPGtaGysu7shrW9HVERuEQo9hDZHmY4tIcQoWDSDNJHH/wCDf+tnuvLZvbdld9AzukNVQIdJ8bBEpTtNQDSKtJPGOvYcZyuQntIcyzo69h7n03LNOuj7NUY6W2YpUj0YWDaRQZdjxevXocFB7mqMcLbMVqR6NLKSoGe2F/xeJa4d0l8Jqma1IdShYTKBaWvKZtC+hx7COCq1qG/K30fneIhNPwWKC1NKSz5C+vHY8Q1uzMZiTt2hrtthjWNPuj4hUh3IWE6CWlnyG9qVjZjvWlH/+lDVZyaRy0tIjtfS3EZF4ChYToJaWfIb2pZykcsj5FOX8bdKogSUixWkaagLU0pLP0MKAEJZUDi09Evq30ZSVSHUkGlmY2eokbVJYrS35DCkMOCrpstw0Ry6ashKpnqQji+uAvxrT9qECbRKjVpZ8llMYEJKvVipnFJX0b5NmDSwRKa5osDCzPwI+ALzRzDbk/Gg60J9mxxpRLSz5TLswYDklykfvV+qaWprOE5lsSo0s/g+wDzgDuCen/QiwPa1OSXpC38zLKX+e1iiq3EAkIuNXNFi4+0vAS8AloQ9sZlOAHwNt0e95xN0/Y2afBf4UOBBd+il3fyy6z23A9UAG+Ki7/zBqvxi4H2gHHgNWe6MeHj4BQt7My536SWsUVc5ZGSIyfolyFmb2PuAu4CzAon/u7jOK3G0QuMzdB8ysFXjazB6PfvZFd//CmN9xHnANsBiYCzxpZue6ewa4F1gFPEM2WFwOPI6ULembea1N/Wg1lEh1JN1ncTdwhbuf6u4z3H16iUCBZw1E37ZG/4qNBpYDD7n7oLu/AOwElpjZHGCGu2+KRhPrgCsT9lvGqZZWcmk1lEj1JF0Ntd/dnw99cDNrBrYCi4D/4e6bzew/Ah8xs5VAD3Czux8E5pEdOYzqi9qGottj2wv9vlVkRyAsWLAgtLsSYyJWciVZbdV38Cg+MmbZ74hrNZTIBCi1Gup90c0eM/s28D2y00sAuPt3it0/mkLqNLPTgO+a2flkp5Q+R3aU8TmyifM/Jju1ddJDFGkv9PvWAmsBurq6lNOoop37jyTOKySdWpp6SnNejSqAwYwz9ZTmivZdRE5WamTxhzm3fwP8Xs73DhQNFq9f6H7IzH4EXJ6bqzCzrwDfj77tA+bn3K0D2Bu1dxRolwkSmif49PeeZd0zu17/fuUlC7h9+QUFrw1ZbfXa8QytzZZ3GFNrhYsaikhhpVZD/ZdyH9jMzgSGokDRDvwucJeZzXH3fdFl7wV+Ed3eADxoZmvIJrjPAba4e8bMjpjZUmAzsBL4crn9kjChS2d37j+SFygA1m3axcqlCwuOMEJWW009pbngqX0aWYikL+lqqC8VaD4M9Lj7+pi7zQEeiPIWTUC3u3/fzL5uZp1kRyYvAh8GcPcdZtYNPAcMAzdG01gAN3Bi6ezjaCXUhIkr0xGXJ+jdfajg9b27DxUMFiGrrSaiXLqIFJY0wT0F+G3g4ej79wM7gOvN7J3u/rGxd3D37cCbC7R/MO6XuPsdwB0F2nuA8xP2VSpo6inNHBvKfzM/NjQS+2m+c/5pQe0hG+1eL5eeEyySlEuvJbV0AJZIiKTBYhHZPRPDAGZ2L/APwH8Ank2pb1IDQj/NL5o9nZWXLGDdpvycRbEkd9LVVvW+g1t7RKSeJQ0W84CpZKeeiG7PjfIJWuTewDpmtpMZs1k+41700/ztyy9g5dKFqeyyrpWCjKHKKZsiUkuSBou7gd5oRZMBlwL/3cymAk+m1DepEWZG7mrl7PfFLZo9PXGQCP3Eneb53mlRxVypd4mChbt/1cweA5aQDRafcvfR5aufSKtzUn19B48ypaWZocyJkuZTWpor9iZXzifukDf/9b17uOWRbTRbExkf4fNXXVSVqZ9aK5siEqpouQ8z++3o678ju7ppN7AL+K2oTRpcuW9ySY8+DT0sKeTI1v6BQW7u7mVw2PnNUIbBYeem7t6qlAeppbIpIuUoNbK4iWz5jHsK/MyByyreI6kp5SSVQ6aVQoJR6Chkx95XGc5/aIZHsu2XnntmqadecfWabxGB0pvyVkVf3zkx3ZFalOYZ3CHBKHTe/9Wjxwv2Ma5dROIl3ZT3BrKjjAXuvsrMzgH+rbt/v8RdpUEkTSqXk8hNGoxCp8RmtLcGtY9KKyGupbNSz5KWKP874Djw1uj7PuC/pdIjqWvl5jhmTWvjovmnlTxQKWTef/HcU2kas3CrybLtcUJyIiFUXl3qXdKls29y9/8UncmNux+1JOsnZdJJe+Nc6Lx/c5MxkrOhsHls9MiR5l4ILZ2Vepc0WByPigE6gJm9iZxS5VIb0txPEPLYaSdyQ6bEQpb9pvmGrqWzUu+SBovPAD8A5pvZN4FlwIfS6pSES3M+vJzHTusMbkgeuELfoNN8Q6/3UiUi5l76jCAz+zrZGlBHgV8Dm939lZT7Ni5dXV3e09NT7W5MiP6BQZbdtTGv4N+U1iZ+eutl434zSvOxyxG6yW5D756T3qAreX2oWthNLlKMmW11966x7UlHFn8HvI1s4cB/Q7b0x4/d/a8q2EcpU5rTJ7U01z66yS67dyJbyPCm7t6iOYXQKbErOudx3pwZqdS1gnRHXCJpSlruY6OZ/W/gLcA7gf8KLAYULGpAOdMnaU3lpKncTXYhb9DVWt6qEYfUuqT7LJ4iW2l2E/AT4C3u/nKaHZPkQufDQ94Qa2uuPW7KtDLHrVerMqz2X0g9SDoNtR24mOwBRIeBQ2a2yd0LF/CRCZd0uqWcN8RaKVOxeO6pBc/gLrZvIkQ1ptxUulzqRaJNee7+Z+5+Kdkzs/vJ5jAOpdgvKUOSjW2hhftCHjtts6a1cc/VF9HW0sQbTmmmraWJe66+qGJ9Knc6L0nBxDjlvh4iEy3pNNRHgLeTHV28BHyN7HSU1JlaykGUI80EdJrTeXHq/fWQySPpNFQ7sAbYOnq0ailmNgX4MdAW/Z5H3P0zZnY68G1gIfAisMLdD0b3uQ24nuxSl4+6+w+j9ouB+6N+PAas9iRrfuUktZWDCJf2/H6a03mFVPr1UKJc0pJ0NdTny3jsQbLndg+YWSvwtJk9DrwPeMrd7zSzTwKfBG41s/OAa8iuspoLPGlm57p7BriXbKn0Z8gGi8uBx8vok1A7OYhQEzW/n2T1VCXzG5V6PZQolzQlLSQYzLMGom9bo38OLAceiNofAK6Mbi8HHnL3QXd/AdgJLDGzOcAMd98UjSbW5dxHylQLOYhQfQeP4iP5A0of8arM73fMbOfYcCav7dhwpuzpo/G+HipUKGlLLVgAmFmzmfUCLwNPuPtmYLa77wOIvp4VXT6P7El8o/qitnnR7bHthX7fKjPrMbOeAwcOVPS5SPVNPaWZwUx+sBjMOFNPaa5Kf8bOhCaZGQ1NiKd14qBIqKQ5i7JEU0idZnYa8F0zO7/I5YXKgXqR9kK/by2wFrLlPsJ6K7Vu7+HCb3x7Dx+t+E7rUvoOHqW9tYUjgydSeO2tLUWnoUKnidI6cVCkHKmOLEa5+yHgR2RzDfujqSWir6Ob+/qA+Tl36wD2Ru0dBdplknn1aOG1FXHt5UryaT70zTl0mij0ep3xLWlLbWRhZmcCQ+5+KCpv/rvAXcAG4Drgzujr+uguG4AHzWwN2QT3OcAWd8+Y2REzWwpsBlYCX06r39J4QlYIJf00H7qKKTQhnuaJgyLlSHMaag7wgJk1kx3BdLv7981sE9BtZtcDu4CrAdx9h5l1A88Bw8CN0TQWwA2cWDr7OFoJNSmVc0xqyFRO6GqrkD0f5ZRLLyeBrkKFkpbUgoW7bwfeXKC9H3hXzH3uAO4o0N5DttSITGKL586gpYm8YoItTdn2QkLf/EM/zaddY6ucBLpIWlJNcItU0qxpbaxZ0cknxpxnUWzqJ6690H1CPv2nXWOrnAS6SJoULKSuhLzhTj2lOe/QJoBjQyOxS21DPv2nXXRQq5uk1ihYSN1JOi//2vEMbc2Wtzejrdl47Xgm9j5J8xDlvJnXb2l4EQULqUMhBzdZk0FOsLAmq8gbeuibeT2XhhcBBQupM9kzuLfT3GRkRpzPX1W5T+flrIYKyUGUM22l1U1SKxQspG70Dwzy8Ye35R1+dPPD2yqaVA59Q0/6Zq4chNS7CdnBLVIJO/YezgsUAEMZZ8few0Xvl7RIX5pv6NphLfVOIwupI4XKhBVrD5N2Ulk5CKlnChZSN0I35ZUj7Tf00BxE6GFGOvxI0qJgIXUjdFPeeH5PLbzRplmlViSUNWoJga6uLu/p6al2NyQFk+HTc//AIMvu2pi3qXBKaxM/vfWygs859HqAnfuPpHKWudQ3M9vq7l1j2zWykLpTK5/8Ib3AVU6V2rjHKXT9p7/3LOue2fX69ysvWcDtyy+oQM+lUSlYSMNL6w09zWmf0KqzIaVNdu4/khcoANZt2sXKpQs1wpBYChbS0NJ6Qy9nR3aokKqzIaVNencfKvgYvbsPKVhILO2zkIYVetpciLTPvB6tOptrtOpsIa+XNskRV9qkc/5pBR8jrl0EFCykgZX7hp70WNVyDidKqmNmO0eH8o+LPTo0HPv4IZv+Fs2ezspLFuS1rbxkgUYVUpSmoaRhpV0ZNu3DicwM8DHfxwvZI3Lx2afz7Z/1Mfobus4+vTKdloalkYU0rNASGyHTVqHTRKH6Dh5lSkt+cnpKS3NFHn/0eQ4Oj3BseITB4cpNz0nj0shCGlpahQTTLgyY5qgodJmtCKQ4sjCz+Wb2j2b2vJntMLPVUftnzWyPmfVG/96dc5/bzGynmf3SzH4/p/1iM3s2+tmXrNR4XCRHGoUEyy0MmCQfkvv4bS1NvOGUZtpaKjcqCj1BUATSHVkMAze7+8/NbDqw1cyeiH72RXf/Qu7FZnYecA2wGJgLPGlm57p7BrgXWAU8AzwGXA48nmLfZRIKLSQYWkcqdBmvj/7X83MXhYSMiso5QVAktWDh7vuAfdHtI2b2PFBsgfty4CF3HwReMLOdwBIzexGY4e6bAMxsHXAlChaSgrQKCYbuyziRV3AgU/L6kNVZ5ZwgmLbJUMKl3k1IzsLMFgJvBjYDy4CPmNlKoIfs6OMg2UDyTM7d+qK2oej22PZCv2cV2REICxYsKHSJSElJy4mEjBTKKd8RehBT0tVZ5ZZir8ed8FI5qQcLM5sGPAp8zN1fNbN7gc+RHVd/DrgH+GMKH0rgRdpPbnRfC6yFbCHB8fdepLDQkUJowjr0+tHVWUcGT+zNGF2dNd4jYaG+d8JLZaS6dNbMWskGim+6+3cA3H2/u2fcfQT4CrAkurwPmJ9z9w5gb9TeUaBdJBVJktChG/5CE9ahCfRyNgkmTfzX8054qZzURhbRiqWvAs+7+5qc9jlRPgPgvcAvotsbgAfNbA3ZBPc5wBZ3z5jZETNbSnYaayXw5bT6LZNb0k/Q5SxtdcB9hEymCWck9rpRoZ/+09okWM6UWFI6m7x+pDmyWAZ8ELhszDLZu6NlsNuBdwJ/BuDuO4Bu4DngB8CN0UoogBuA+4CdwL+g5LakIOQT9Kxpbazo6shrW9HVUXRp683dvRzPwGBmhOMZuKm7N9ES2iSf/svZJJh0GW9o6ZEQOpu8fqS5GuppCucbHitynzuAOwq09wDnV653IicL+QTdPzBId09fXlt3Tx+r33VuwTe6HXtfzTsOFrLHw+7Y+yqXnnvmuPseOg0VmoMILT0SQmeT1weV+xCJhEyJhM61v3r0eFB7OZJOQ4XmINIsPTIq6QhKqkfBQhpe6K7pJFMioXPtM9pbg9pDhUxDhQY65RUEVBtKGtz63j3c8sg2mq2JjI/w+asuKjrdknRKJHSvwuK5p9LabAzlbIRrbTYWzz11fE8wEvKGHvrmP/pcP/HIdpqbjMyIK68wCWlkIQ1rNKk8OOz8ZijD4LBXNKl8Rec8fnrrZXzjT36Hn956WdEgNGtaG/dcfVHe0tl7rr6oYm+4IaOicpLKJ0qPnPiukpKO/qR6NLKQhlVuUjlkp3LS3d6QfiL3is55nDdnBr27D9E5/7SihxmF9CW09Eio7OjvxKjl81dpB3ctUrCQBhb3CTj+k3HapSdCgguEBa7QviftS5olzfsHBvn4w9vypudufnibdnDXIE1DScMazRPkKpYnSHOncjnW9+5h2V0bufa+zSy7ayMbevfEXltO35NO/aRZ0nzH3sN5gQJgKOPs2Ht43I8tlaVgIQ0rNE9QTumJtObay1neGtL39b17uOQvnmTF/9zEJX/xZNFA9M//eiSoPUzcfo3i+ziU45h4moaShhYyNx+6SijNKavQqZ+QvvcPDPKxh3qjybjsfVY/1Bs79fNS/2sF+xLXnvt7Sv3dF8+dQUsTebmllqZsexxVqa0OjSyk4SVd3RSySijtKavQqZ+Qvm/6l/6TsjYetRdy9qypQe2QfApt1rQ21qzopK3FeENrM20txpoVnRU5EVAqSyMLkRxJRyJpFteD8k6zi1sNNfYT/isDxwreP679kjfNCmoPLTsesoor7b+7xFOwEBkjySqhtHc1l3OaXaHpGYeT2t626Ezg+ZPun20vrNCGwjihb+gh00ppFjWU4jQNJXWnFpKbaVdLDX38QtMzn3hkG7c8cvKUzcypp7DykvyTJFdesiD2E31obajQ/EnotNLYIoaVLGoo8TSykLpSS8nNidhkl/TxC32ab7amkxYVjX7Cv/js03loyy6M7NkaXWefHvvY5ZQHecvZM/nJzhM5kLecPTP2+NjhTP5jD2dGih43O6WlmaHMidHFaODSNFS6NLKQulGLyc20q6UmffxC0zODwxkyI2P2MIxkk+S3Pro972yNYn/H0FHOzv1H8gIFwE929rNz/8lLbYeGMwV32Q8NF87NqKhh9ShYSN3QEZzFjZ2OaWoyPv2H5530Jv/a8Uzw3zGkDlbv7kOJ21/s/03Ba+Pay53+q4Wpy3qnaSipG/pUGS9ueub8uafy01svy5vK6h8YLOvvmLQ8yMJZb0jc3jn/tILXxrVD+PRfaOVhKUwjC6kbOoIzXrFAOnYqa/TvmLuzvZJ/x9aW5oJlVlpbTt4jsmj29KBke6hyKw/LyTSykLqiIzgLCz1f40TJ8fzjUiuhY2Y7zU35S22biyz7vX35BaxcujDRPgsIW+SQ9nG2k0lqwcLM5gPrgN8iW1Ngrbv/lZmdDnwbWAi8CKxw94PRfW4DridbB/mj7v7DqP1i4H6gnewZ3qs97sxIaXihlVsni6SBNO2S46GBC7IjjCSjidANf+VUHg6p9FvO9fUqzZHFMHCzu//czKYDW83sCeBDwFPufqeZfRL4JHCrmZ0HXAMsBuYCT5rZue6eAe4FVgHPkA0WlwOPp9h3kbqUJJBOxC7otEaAoX0PPaEwdGl2LS3lTltqOQt33+fuP49uHyG7ZXQesBx4ILrsAeDK6PZy4CF3H3T3F4CdwBIzmwPMcPdN0WhiXc59RCRQuQsFQlcUpbGsuJw9H0krD4cuza7FpdxpmpCchZktBN4MbAZmu/s+yAYUMzsrumwe2ZHDqL6obSi6Pba90O9ZRXYEwoIFCwpdIjLplTNNNHqa3WiGo1qn2ZXT97TqfU22OlWpBwszmwY8CnzM3V8tsjW/0A+8SPvJje5rgbUAXV1dymmIxAg9VvWm7m15G/z+rLt6p9ld0TmPuadO4ce/eoVLzzmDrjcWLmiYK2m9r2NjNgMeG87Ejlom21LuVJfOmlkr2UDxTXf/TtS8P5paIvr6ctTeB8zPuXsHsDdq7yjQLiLjkHSaaNO/9J+0Ezwz4rElzdP26e89y1V/+wxf2riTq/72GT69/tmKPfbYdTPF1tFMtqXcaa6GMuCrwPPuvibnRxuA64A7o6/rc9ofNLM1ZBPc5wBb3D1jZkfMbCnZaayVwJfT6reI5HslZg4+rj1NO/cfYd0zu/La1m3axcqlC8e9N6Pv4FHaW1s4MnhiY2N7a0vRaaXJtJQ7zWmoZcAHgWfNrDdq+xTZINFtZtcDu4CrAdx9h5l1A8+RXUl1Y7QSCuAGTiydfRythBKZMG9bdEZQe5qKlRIZb7Aod1ppsizlTi1YuPvTxB+k+66Y+9wB3FGgvQc4v3K9E5GkRndZr9t04hN9JXdZhyinPEhS5STPJxPt4BaRkm5ffgFXXDg3KKmchkWzp/P2RbPyqtq+fdGsigWucqaVtClPRCSSu/ls7U9+nWjzWRpvov0Dg/zspYN5bT976SD9A4MVPXgq6WNNpk15ChYiUlR4iY303kRraW9DOX+XeqaqsyJSVOg5ImnubK6lvQ3lnq9Sr2drKFiISFGhb9BpHlI1EXsbkr6ZlxO41vfuYdldG7n2vs0su2sjG3r3VKTPE0HTUCJSVOgqobQ//ae5tyFk+iz071Lv01YKFiJSUsgb9EQsQU1jb0M5b+Yhf5dayreUQ8FCRBIJeYOux53N5b6ZJ/271FK+pRzKWYhIKtIoUZ6mtN/M672WlEYWIiJMzPRZPY64RilYiIhEJuLNvF5rSWkaSkREStLIQkQkMpnKd4TSyEJEhMY4UzvN3eEaWYiIUP/7INIeFWlkISJCfe+DmIhRkYKFiAj1vQ8izXpcozQNJSISqdd9EBMxKtLIQkQkR73tPIeJGRWlNrIws68B7wFedvfzo7bPAn8KHIgu+5S7Pxb97DbgeiADfNTdfxi1XwzcD7QDjwGr3d3T6reISD1Ke1SU5sjifuDyAu1fdPfO6N9ooDgPuAZYHN3nb8ysObr+XmAVcE70r9BjiohMemmOilILFu7+Y+D/Jbx8OfCQuw+6+wvATmCJmc0BZrj7pmg0sQ64MpUOi4hIrGrkLD5iZtvN7GtmNjNqmwfszrmmL2qbF90e216Qma0ysx4z6zlw4EDcZSIiEmiig8W9wJuATmAfcE/UbgWu9SLtBbn7WnfvcveuM888c5xdFRGRURMaLNx9v7tn3H0E+AqwJPpRHzA/59IOYG/U3lGgXUREJtCEBosoBzHqvcAvotsbgGvMrM3M3kg2kb3F3fcBR8xsqZkZsBJYP5F9FhGRdJfOfgt4B3CGmfUBnwHeYWadZKeSXgQ+DODuO8ysG3gOGAZudPdM9FA3cGLp7OPRv5K2bt36ipm9VKGnU8oZwCsT9LuqabI8T5g8z3WyPE+YPM91vM/z7EKNpi0L42dmPe7eVe1+pG2yPE+YPM91sjxPmDzPNa3nqR3cIiJSkoKFiIiUpGBRGWur3YEJMlmeJ0ye5zpZnidMnueayvNUzkJERErSyEJEREpSsBARkZIULAJE9axeNrNf5LSdbmZPmNmvoq8ziz1GvYh5rp81sz1m1hv9e3c1+1gJZjbfzP7RzJ43sx1mtjpqb7jXtchzbajX1cymmNkWM9sWPc8/j9ob8TWNe64Vf02VswhgZpcCA8C6nDM67gb+n7vfaWafBGa6+63V7GclxDzXzwID7v6FavatkqKqAnPc/edmNh3YSray8YdosNe1yHNdQQO9rlG1h6nuPmBmrcDTwGrgfTTeaxr3XC+nwq+pRhYBYsquLwceiG4/QIOUUA8sMV+33H2fu/88un0EeJ5sZeOGe12LPNeG4lkD0bet0T+nMV/TuOdacQoW4zc7qmFF9PWsKvcnbYVKzDcEM1sIvBnYTIO/rmOeKzTY62pmzWbWC7wMPOHuDfuaxjxXqPBrqmAhIeJKzNc9M5sGPAp8zN1frXZ/0lTguTbc6xpVt+4kW6l6iZmdX+UupSbmuVb8NVWwGL/9o9V0o68vV7k/qSlSYr6uRXO9jwLfdPfvRM0N+boWeq6N+roCuPsh4Edk5/Ab8jUdlftc03hNFSzGbwNwXXT7Ohq4hHqREvN1K0oQfhV43t3X5Pyo4V7XuOfaaK+rmZ1pZqdFt9uB3wX+mcZ8TQs+1zReU62GCpBbdh3YT7bs+veAbmABsAu42t3rPjEc81zfQXZY+3qJ+dE54HplZm8DfgI8C4xEzZ8iO5ffUK9rkef6RzTQ62pmF5JNYDeT/UDc7e63m9ksGu81jXuuX6fCr6mChYiIlKRpKBERKUnBQkRESlKwEBGRkhQsRESkJAULEREpScFCRERKUrAQEZGSWqrdAZFGERXn+wHZMtFLgW3A3wF/TrZo3X8G3k22Zs88YD5wt7t/xcyagL8G/j3wAtkPcl9z90cm+GmIFKRgIVJZi4CrgVXAz4APAG8DriC7W7oXuJBsMJkK/JOZ/T3wVmAhcAHZwPI88LWJ7bpIPE1DiVTWC+7+bFTAbQfwlGfLJDxLNhgArHf3o+7+CvCPZIu8vQ142N1H3P1fo3aRmqFgIVJZgzm3R3K+H+HESH5sjR0HLOV+iYyLgoXIxFsenZ08i2xxxp+RzXO838yazGx21C5SM5SzEJl4W4C/J1v99HPuvtfMHgXeRbaU9P8lW/X2cPW6KJJPVWdFJpCZfRYYcPcvFPjZNHcfiEYcW4BlUf5CpOo0shCpHd+PDrI5heyIQ4FCaoZGFiIiUpIS3CIiUpKChYiIlKRgISIiJSlYiIhISQoWIiJS0v8HXD+c/sWPZXUAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = df.plot(x='mpg', y='weight', kind='scatter')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, we can customize our subplot with the `ax` object it returned, as usual." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEICAYAAABYoZ8gAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWwElEQVR4nO3dfbRddX3n8feHiAWEFpALRiBGLdVaR4O9Yru0UxS0CKPATFFpq8GlBp06lVVnxsg4Q+yq00zHx1GXGivLCOqIgyD1YWxkRIdWwYARcYILq1EgmSQ8DcQ6UOA7f+x9p6fJvbknyd3n5N79fq111t2PZ39/bPK5+/72U6oKSVJ/HDDuAiRJo2XwS1LPGPyS1DMGvyT1jMEvST1j8EtSzxj82u8luSbJa8ddx/4gyflJ3juG7R6TZGOSXxj1tjX3DH7NqSSbkvw8yY6Bz+PHXddCkOTRwNuA/zzqbVfVVuBrwIpRb1tzz+BXF15SVYcOfDaPu6AF4kzglqq6Y0zb/yRw/pi2rTlk8Gsk2r8ETh0YX5Xk0oHx30jyN0nuTfLdJCcP+b0HJFmZ5G+T3JXksiRHDsx/3sD33pbkvHb6LyX5RJLtSX6S5G1JDmjnnZfk2iTvTHJPkh8nefHAdz4+yVVJ7k7ywySv26ldn01yaZL7k3wvya8keWuSbW0NL2qXPSfJDTu1581JrpyhuS8Gvj6w7NIkleTV7ffek+T1SZ6d5Ka2zR8YWP68JH+d5P1J/k+SW5KcMjD/iUm+0db91SQfHNxHwHXAk5I8YZh9o/2Xwa+xS3Is8EXgT4EjgX8NXJ5kYojV/wg4C/ht4PHAPcAH2+9dAnwZeD8wASwDNrTrvR/4JeBJ7bqvAl498L3PAX4AHAX8OfCxJGnnfRq4vd3e7wL/cTBAgZcAlwBHAN8BvkLzb+1Y4E+Aj7TLXQU8McmvDqz7B+260/knbU07ew5wAvBy4L3AvwNOBX4NeFmS395p2R+17boI+NzAL8pPAdcDjwVWAa8c3EhVPQT8EHjmDPVpvqgqP37m7ANsAnYA97afKwemnzqw3Crg0nb4LcAlO33PV4Dl7fA1wGtn2N5G4JSB8cXA3wOPAt4KXDHNOouAB4CnDUw7H7imHT4P+OHAvEOAAh4HHA88DBw2MP/PgI8PtGvdwLyXtP89FrXjh7XfdXg7/iHgHe3wr9H84vqFGdp6K3DawPjS9ruOHZh2F/DygfHLgQsG2rUZyMD862kCfgnwEHDIwLxLp/bRwLS/Bl417v/P/OzbxyN+deGsqjq8/Zw1xPJPAM5puybuTXIv8DyaEB9m3SsG1ttIE8zH0IT0306zzlHAo4GfDEz7Cc0R+ZT/PTVQVX/XDh5Kc5R/d1Xdv5t1tw4M/xy4s6oeHhif+i6AtcDvtX9NvBK4rKoemKGt99D84tjZztvbefzQgfE7qk3wgdofP9CuvxuYd9s02zqM5he65jGDX6PyM5oj5ymPGxi+jeaI//CBz2OqavUQ33sb8OKd1j2omhOgtwFPnmadO2n+Khjsq14CDHPSdDNwZJLBAB523V1U1beAB4HfAn6Pmbt5AG4CfmVvtjPg2IEuK2hq3wxsoWnX4D46fnDFJI8Cfhn47j7WoDEz+DUqG4BXJDkwySRN3/iUS4GXJPmdJIuSHJTk5CTHDfG9HwbeMXXCMclEkjPbeZ8ETk3ysiSPSvLYJMvao+/L2vUOa9f947aO3aqq24C/Af6srfMZwGvabe2tTwAfAB6qqmt3s9yXaM5H7IujgT9q98M5wK8CX6qqnwDrgVVJHp3kN2m6qQadBGxql9U8ZvBrVP49zdH3PcDbaU4kAv8/TM8ELgS20xyp/xuG+//zfTQnSf8qyf3At2hOYFJVPwVOB94M3E3zy2fqxOS/ovkr5EfAtW09Fw/ZlnNp+tc3A1cAF1XVuiHXnc4lwNPZ/dE+wF8CT93H+yKuozkRfCfwDuB3q+qudt7vA79Jc57gT4HP0JwLYWD+h/dh29pP5B9390katSQHA9uAZ1XVrbMsu4LmpPQFe7Gd82hOkj9vyOU/Q3PfwEVJjqa5lPTEqvq/e7pt7V8eNe4CJPEG4NuzhT5AVa3pqogkz6b5y+jHwIto/gpb3W53G023kBYAg18aoySbgNDcizBujwM+R3Md/+3AG6rqO+MtSV2wq0eSesaTu5LUM/Oiq+eoo46qpUuXjrsMSZpXbrjhhjurapdHn8yL4F+6dCnr168fdxmSNK8kmfaeC7t6JKlnDH5J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+Sesbgl6SeMfglqWfmxZ2789XSlV8cy3Y3rT5jLNuVND94xC9JPdN58LfvUP1Oki+040cmWZfk1vbnEV3XIEn6B6M44n8TsHFgfCVwdVWdAFzdjkuSRqTT4E9yHHAG8BcDk88E1rbDa9k/3jwkSb3R9RH/e4F/CzwyMO2YqtoC0P48eroVk6xIsj7J+u3bt3dcpiT1R2fBn+SfAduq6oa9Wb+q1lTVZFVNTkzs8h4BSdJe6vJyzucCL01yOnAQ8ItJLgW2JllcVVuSLAa2dViDJGknnR3xV9Vbq+q4qloKvAL4H1X1B8BVwPJ2seXA57uqQZK0q3Fcx78aeGGSW4EXtuOSpBEZyZ27VXUNcE07fBdwyii2K0nalXfuSlLPGPyS1DMGvyT1jMEvST1j8EtSzxj8ktQzBr8k9YzBL0k9Y/BLUs8Y/JLUMwa/JPWMwS9JPWPwS1LPGPyS1DMGvyT1jMEvST3T5cvWD0pyfZLvJvl+kre301cluSPJhvZzelc1SJJ21eUbuB4AXlBVO5IcCFyb5MvtvPdU1Ts73LYkaQadBX9VFbCjHT2w/VRX25MkDafTPv4ki5JsALYB66rqunbWG5PclOTiJEfMsO6KJOuTrN++fXuXZUpSr3Qa/FX1cFUtA44DTkrydOBDwJOBZcAW4F0zrLumqiaranJiYqLLMiWpV0ZyVU9V3QtcA5xWVVvbXwiPAB8FThpFDZKkRpdX9UwkObwdPhg4FbglyeKBxc4Gbu6qBknSrrq8qmcxsDbJIppfMJdV1ReSXJJkGc2J3k3A+R3WIEnaSZdX9dwEnDjN9Fd2tU1J0uy8c1eSesbgl6SeMfglqWcMfknqGYNfknrG4JeknjH4JalnDH5J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+Sesbgl6SeMfglqWcMfknqmS5fvXhQkuuTfDfJ95O8vZ1+ZJJ1SW5tfx7RVQ2SpF11ecT/APCCqnomsAw4LclvACuBq6vqBODqdlySNCKdBX81drSjB7afAs4E1rbT1wJndVWDJGlXXb5snfZF6zcAvwx8sKquS3JMVW0BqKotSY6eYd0VwAqAJUuWdFmm5sjSlV8c27Y3rT5jbNuW5ptOT+5W1cNVtQw4DjgpydP3YN01VTVZVZMTExOd1ShJfTOSq3qq6l7gGuA0YGuSxQDtz22jqEGS1Ojyqp6JJIe3wwcDpwK3AFcBy9vFlgOf76oGSdKuuuzjXwysbfv5DwAuq6ovJPkmcFmS1wA/Bc7psAZJ0k46C/6qugk4cZrpdwGndLVdSdLueeeuJPWMwS9JPWPwS1LPGPyS1DOd3rmr8RjnHbR9493Kmo884peknjH4JalnDH5J6hmDX5J6xpO70jw1rhPLnlSe/zzil6SeMfglqWcMfknqGYNfknpmqODfk1cmSpL2b8Me8X84yfVJ/uXUW7UkSfPTUMFfVc8Dfh84Hlif5FNJXri7dZIcn+RrSTYm+X6SN7XTVyW5I8mG9nP6PrdCkjS0oa/jr6pbk7wNWA/8F+DEJAEurKrPTbPKQ8Cbq+rGJIcBNyRZ1857T1W9c1+LlyTtuaGCP8kzgFcDZwDrgJe0gf544JvALsFfVVuALe3w/Uk2AsfOVeGSpL0zbB//B4AbgWdW1R9W1Y0AVbUZeNtsKydZSvP+3evaSW9MclOSi5McsedlS5L21rDBfzrwqar6OUCSA5IcAlBVl+xuxSSHApcDF1TVfcCHgCcDy2j+InjXDOutSLI+yfrt27cPWaYkaTbDBv9XgYMHxg9pp+1WkgNpQv+TU+cBqmprVT1cVY8AHwVOmm7dqlpTVZNVNTkxMTFkmZKk2Qwb/AdV1Y6pkXb4kN2t0J74/RiwsarePTB98cBiZwM3D1+uJGlfDXtVz8+SPGuqbz/JrwM/n2Wd5wKvBL6XZEM77ULg3CTLgAI2AefvYc2SpH0wbPBfAHw2yeZ2fDHw8t2tUFXXAplm1peGrk6SNOeGCv6q+naSpwJPoQnzW6rq7zutTJLUiT15EcuzgaXtOicmoao+0UlVkqTODHsD1yU0l2BuAB5uJxdg8EvSPDPsEf8k8LSqqi6LkSR1b9jLOW8GHtdlIZKk0Rj2iP8o4H8luR54YGpiVb20k6okSZ0ZNvhXdVmEJGl0hr2c8+tJngCcUFVfbZ/Ts6jb0iRJXRj21YuvA/4b8JF20rHAlR3VJEnq0LAnd/+Q5hEM90HzUhbg6K6KkiR1Z9jgf6CqHpwaSfIomuv4JUnzzLDB//UkFwIHt+/a/Szwl92VJUnqyrDBvxLYDnyP5mmaX2KIN29JkvY/w17VM/XSlI92W44kqWvDPqvnx0zTp19VT5rziiRJndqTZ/VMOQg4Bzhy7suRJHVtqD7+qrpr4HNHVb0XeEG3pUmSujBsV8+zBkYPoPkL4LBZ1jme5rHNjwMeAdZU1fuSHAl8hubZ/puAl1XVPXtcuSRprwzb1fOugeGHaAN7lnUeAt5cVTcmOQy4Ick64Dzg6qpanWQlzRVDb9mjqiVJe23Yq3qev6dfXFVbgC3t8P1JNtI86uFM4OR2sbXANRj8kjQyw3b1/PHu5lfVu2dZfylwInAdcEz7S4Gq2pJk2kc/JFkBrABYsmTJMGVKkoYw7A1ck8AbaI7YjwVeDzyNpp9/tr7+Q4HLgQuq6r5hC6uqNVU1WVWTExMTw64mSZrFnryI5VlVdT9AklXAZ6vqtbtbKcmBNKH/yar6XDt5a5LF7dH+YmDb3pUuSdobwx7xLwEeHBh/kOaqnBklCfAxYONOXUFXAcvb4eXA54esQZI0B4Y94r8EuD7JFTR38J5Nc6nm7jwXeCXwvSQb2mkXAquBy5K8Bvgpzc1gkqQRGfaqnnck+TLwW+2kV1fVd2ZZ51ogM8w+ZfgSJUlzadiuHoBDgPuq6n3A7Ume2FFNkqQODfvqxYtorrV/azvpQODSroqSJHVn2CP+s4GXAj8DqKrNzHIZpyRp/zRs8D9YVUX7aOYkj+muJElSl4YN/suSfAQ4PMnrgK/iS1kkaV6a9aqe9nr8zwBPBe4DngL8h6pa13FtkqQOzBr8VVVJrqyqXwcMe0ma54bt6vlWkmd3WokkaSSGvXP3+cDrk2yiubInNH8MPKOrwiRJ3dht8CdZUlU/BV48onokSR2b7Yj/Spqncv4kyeVV9S9GUJMkqUOz9fEPPmvnSV0WIkkajdmCv2YYliTNU7N19TwzyX00R/4Ht8PwDyd3f7HT6iRJc263wV9Vi0ZViCRpNPbkscySpAWgs+BPcnGSbUluHpi2KskdSTa0n9O72r4kaXpdHvF/HDhtmunvqapl7edLHW5fkjSNzoK/qr4B3N3V90uS9s6wj2yYS29M8ipgPfDmqrpnuoWSrABWACxZsmSE5Wk+Wrryi+MuQZo3Rn1y90PAk4FlwBbgXTMtWFVrqmqyqiYnJiZGVJ4kLXwjDf6q2lpVD1fVIzQvcjlplNuXJI04+JMsHhg9G7h5pmUlSd3orI8/yaeBk4GjktwOXAScnGQZzeMfNgHnd7V9SdL0Ogv+qjp3mskf62p7kqTheOeuJPWMwS9JPWPwS1LPGPyS1DMGvyT1jMEvST1j8EtSzxj8ktQzBr8k9cw4HsssaR4b5yOwN60+Y2zbXkg84peknjH4JalnDH5J6hmDX5J6xuCXpJ4x+CWpZzoL/iQXJ9mW5OaBaUcmWZfk1vbnEV1tX5I0vS6P+D8OnLbTtJXA1VV1AnB1Oy5JGqHOgr+qvgHcvdPkM4G17fBa4Kyuti9Jmt6o+/iPqaotAO3Po2daMMmKJOuTrN++ffvICpSkhW6/PblbVWuqarKqJicmJsZdjiQtGKMO/q1JFgO0P7eNePuS1HujDv6rgOXt8HLg8yPeviT1XpeXc34a+CbwlCS3J3kNsBp4YZJbgRe245KkEersscxVde4Ms07papuSpNnttyd3JUndMPglqWcMfknqGYNfknrGd+5KmjfG9b7fhfauX4/4JalnDH5J6hmDX5J6xuCXpJ5Z8Cd3x3UySJL2Vx7xS1LPGPyS1DMGvyT1jMEvST1j8EtSzxj8ktQzY7mcM8km4H7gYeChqpocRx2S1EfjvI7/+VV15xi3L0m9ZFePJPXMuIK/gL9KckOSFWOqQZJ6aVxdPc+tqs1JjgbWJbmlqr4xuED7C2EFwJIlS8ZRoyQtSGM54q+qze3PbcAVwEnTLLOmqiaranJiYmLUJUrSgjXy4E/ymCSHTQ0DLwJuHnUdktRX4+jqOQa4IsnU9j9VVf99DHVIUi+NPPir6kfAM0e9XUlSw8s5JalnDH5J6hmDX5J6xuCXpJ4x+CWpZwx+SeoZg1+Sesbgl6SeMfglqWcMfknqmXG+gUuS5oWlK784tm1vWn3GnH+nR/yS1DMGvyT1jMEvST1j8EtSzxj8ktQzBr8k9cxYgj/JaUl+kOSHSVaOowZJ6qtxvGx9EfBB4MXA04Bzkzxt1HVIUl+N44j/JOCHVfWjqnoQ+K/AmWOoQ5J6aRx37h4L3DYwfjvwnJ0XSrICWNGO7kjygxHUBnAUcOeItjVOfWkn9KetfWkn9Kit+U/71NYnTDdxHMGfaabVLhOq1gBrui/nH0uyvqomR73dUetLO6E/be1LO8G27qtxdPXcDhw/MH4csHkMdUhSL40j+L8NnJDkiUkeDbwCuGoMdUhSL428q6eqHkryRuArwCLg4qr6/qjr2I2Rdy+NSV/aCf1pa1/aCbZ1n6Rql+51SdIC5p27ktQzBr8k9Uyvgz/JxUm2Jbl5YNqRSdYlubX9ecQ4a5wLM7RzVZI7kmxoP6ePs8a5kOT4JF9LsjHJ95O8qZ2+EPfpTG1dUPs1yUFJrk/y3badb2+nL8R9OlNb53yf9rqPP8k/BXYAn6iqp7fT/hy4u6pWt88ROqKq3jLOOvfVDO1cBeyoqneOs7a5lGQxsLiqbkxyGHADcBZwHgtvn87U1pexgPZrkgCPqaodSQ4ErgXeBPxzFt4+namtpzHH+7TXR/xV9Q3g7p0mnwmsbYfX0vxjmtdmaOeCU1VbqurGdvh+YCPNneILcZ/O1NYFpRo72tED20+xMPfpTG2dc70O/hkcU1VboPnHBRw95nq69MYkN7VdQfP+T+VBSZYCJwLXscD36U5thQW2X5MsSrIB2Aasq6oFu09naCvM8T41+PvrQ8CTgWXAFuBdY61mDiU5FLgcuKCq7ht3PV2apq0Lbr9W1cNVtYzmLv+Tkjx9zCV1Zoa2zvk+Nfh3tbXtP53qR9025no6UVVb2//JHgE+SvPU1Hmv7Ru9HPhkVX2unbwg9+l0bV2o+xWgqu4FrqHp816Q+3TKYFu72KcG/66uApa3w8uBz4+xls5M/aNpnQ3cPNOy80V7cuxjwMaqevfArAW3T2dq60Lbr0kmkhzeDh8MnArcwsLcp9O2tYt92verej4NnEzziNetwEXAlcBlwBLgp8A5VTWvT4zO0M6Taf50LGATcP5Un+l8leR5wP8Evgc80k6+kKbve6Ht05naei4LaL8meQbNydtFNAeql1XVnyR5LAtvn87U1kuY433a6+CXpD6yq0eSesbgl6SeMfglqWcMfknqGYNfknrG4JeknjH4Jaln/h93rpX/KuFTqwAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = df.mpg.plot(kind='hist')\n", "ax.set_title('Fuel economy (mpg)')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## [Assignment 2]\n", "\n", "In this assignment you will plot a Hertzprung-Russell (H-R) diagram which shows stellar magnitude over color and can tell us about the populations of stars we are observing (see image below).\n", "\n", "In the data we imported in assignment 1 (choose any import link / method you'd like), you will find DataFrame columns for G-band magnitude (`Gmag`), and color as magnitude of the blue part (`BPmag`), and magnitude of the red part (`RPmag`).\n", "\n", "On H-R diagrams (see image), we want absolute magnitude on the y-axis with the brightest stars at the top (**note**: low stellar magnitudes are brighter). We also want to plot color as `BPmag` - `RPmag` on the x-axis.\n", "\n", "In our DataFrame, we only have an apparent magnitude, so we will need to derive the absolute magnitude before we can make our plot.\n", "\n", "We can compute *Absolute magnitude* ($M$) if we know the *apparent magnitude* ($m$) and the *distance modulus* ($\\mu$) by:\n", "\n", "$$\\mu = m - M$$\n", "\n", "The *distance modulus* can be calculated from the distance to a star in *parsecs* (pc):\n", "\n", "$$\\mu = 5log_{10}\\left(\\frac{d_{pc}}{10 \\ pc}\\right)$$\n", "\n", "Finally, we can determine a star's distance in parsecs from the measured parallax in milliarcseconds (mas) by:\n", "\n", "$$d_{pc} = \\frac{1000}{parallax}$$\n", "\n", "Note parallax is `Plx` in our DataFrame. Now we know all we need to make an H-R diagram and identify some stars!\n", "\n", "![HR diagram](https://upload.wikimedia.org/wikipedia/commons/6/6b/HRDiagram.png)\n", "\n", "\n", "### 2a) Convert apparent magnitude to absolute magnitude\n", "\n", "Using the equations in the question, derive the absolute G magnitude ($M$) from the apparent G magnitude for each star in the DataFrame. Hint: you will need columns `Gmag`, `BPmag`, `RPmag`, and `Plx`.\n", "\n", "In each step, save the result as a new column in your DataFrame, for example:\n", "\n", "`df['distance_modulus'] = # calculation for distance modulus here`\n", "\n", "Your final DataFrame should have columns for `distance_modulus`, `distance_parsecs`, and `absolute_magnitude`.\n", "\n", "\n", "### 2b) Calculate the color\n", "\n", "With the data provided, we can find color using blue part (`BPmag`) - red part (`RPmag`). Save this `color` difference as a column of your DataFrame.\n", "\n", "\n", "### 2c) Plot the HR diagram\n", "\n", "Using pandas `DataFrame.plot(x='color', y='absolute_magnitude')`, plot the H-R diagram. Make sure to label your axes!\n", "\n", "### 2d) Identify and select the white dwarfs\n", "\n", "White dwarfs on the H-R diagram are outliers in the lower left. \n", "\n", "Using a **conditional** statement (or two) on absolute magnitude and color, select all white dwarfs and save them to their own DataFrame. Call `.describe()` on this new DataFrame to get summary statistics of the white dwarfs.\n", "\n", "\n", "### 2e) Identify and select the giants\n", "\n", "Giants on the H-R diagram are outliers in the top middle to right section of the diagram.\n", "\n", "Using a **conditional** statement (or two) on absolute magnitude and color, select all giants and save them to their own DataFrame. Call `.describe()` on this new DataFrame to get summary statistics of the giants.\n", "\n", "### 2f) Plot the white dwarfs and giants separately\n", "\n", "Using your white dwarf and supergiant DataFrames, plot each on the same axes as your H-R diagram with distinct markers and colors (hint: try changing the `markerstyle`, `color`, `markersize` options of `.plot()`).\n", "\n", "To plot all on the same axes, make a subplot and pass in ax to each `.plot` call, e.g.\n", "\n", "```Python\n", "fig, ax = plt.subplots()\n", "df.plot(ax=ax)\n", "df2.plot(ax=ax)\n", "...\n", "```\n", "\n", "### 2g) Write the white dwarfs and giants to CSVs\n", "\n", "Write your white dwarf and giants DataFrames to CSV files with `.to_csv()`, using the `index=False` option. Include these CSV files in the assignment 5 directory and in your commit / PR.\n", "\n", "\n", "### 2h) BONUS: Label the white dwarfs and giants\n", "\n", "Research the `ax.annotate()` method of matplotlib axes and label the white dwarf and supergiant points on your plot as \"white dwarf\" and \"supergiant\" with arrows pointing to each. As always, bonus points for any other plot aesthetics you add!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# [your code here]\n" ] } ], "metadata": { "interpreter": { "hash": "33800beb71e4bcd6bcba11240441ea568cc35964f729435babc1b5d8f2854e0f" }, "kernelspec": { "display_name": "Python 3.9.7 64-bit ('astropy': conda)", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }