mirror of
https://github.com/rasbt/python_reference.git
synced 2024-11-27 14:01:15 +00:00
3202 lines
92 KiB
Plaintext
3202 lines
92 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[Back to the GitHub repository](https://github.com/rasbt/python_reference)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Sebastian Raschka 28/01/2015 \n",
|
|
"\n",
|
|
"CPython 3.4.2\n",
|
|
"IPython 2.3.1\n",
|
|
"\n",
|
|
"pandas 0.15.2\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"%load_ext watermark\n",
|
|
"%watermark -a 'Sebastian Raschka' -v -d -p pandas"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1.5em\">[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension.</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Things in Pandas I Wish I'd Known Earlier"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"This is just a small but growing collection of pandas snippets that I find occasionally and particularly useful -- consider it as my personal notebook. Suggestions, tips, and contributions are very, very welcome!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Sections"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"- [Loading Some Example Data](#Loading-Some-Example-Data)\n",
|
|
"- [Renaming Columns](#Renaming-Columns)\n",
|
|
" - [Converting Column Names to Lowercase](#Converting-Column-Names-to-Lowercase)\n",
|
|
" - [Renaming Particular Columns](#Renaming-Particular-Columns)\n",
|
|
"- [Applying Computations Rows-wise](#Applying-Computations-Rows-wise)\n",
|
|
" - [Changing Values in a Column](#Changing-Values-in-a-Column)\n",
|
|
" - [Adding a New Column](#Adding-a-New-Column)\n",
|
|
" - [Applying Functions to Multiple Columns](#Applying-Functions-to-Multiple-Columns)\n",
|
|
"- [Missing Values aka NaNs](#Missing-Values-aka-NaNs)\n",
|
|
" - [Counting Rows with NaNs](#Counting-Rows-with-NaNs)\n",
|
|
" - [Selecting NaN Rows](#Selecting-NaN-Rows)\n",
|
|
" - [Selecting non-NaN Rows](#Selecting-non-NaN-Rows)\n",
|
|
" - [Filling NaN Rows](#Filling-NaN-Rows)\n",
|
|
"- [Appending Rows to a DataFrame](#Appending-Rows-to-a-DataFrame)\n",
|
|
"- [Sorting and Reindexing DataFrames](#Sorting-and-Reindexing-DataFrames)\n",
|
|
"- [Updating Columns](#Updating-Columns)\n",
|
|
"- [Chaining Conditions - Using Bitwise Operators](#Chaining-Conditions---Using-Bitwise-Operators)\n",
|
|
"- [Column Types](#Column-Types)\n",
|
|
" - [Printing Column Types](#Printing-Column-Types)\n",
|
|
" - [Selecting by Column Type](#Selecting-by-Column-Type)\n",
|
|
" - [Converting Column Types](#Converting-Column-Types)\n",
|
|
"- [If-tests](#If-tests)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Loading Some Example Data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"I am heavily into sports prediction (via a machine learning approach) these days. So, let us use a (very) small subset of the soccer data that I am just working with."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>PLAYER</th>\n",
|
|
" <th>SALARY</th>\n",
|
|
" <th>GP</th>\n",
|
|
" <th>G</th>\n",
|
|
" <th>A</th>\n",
|
|
" <th>SOT</th>\n",
|
|
" <th>PPG</th>\n",
|
|
" <th>P</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> Sergio Agüero\\n Forward — Manchester City</td>\n",
|
|
" <td> $19.2m</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> Eden Hazard\\n Midfield — Chelsea</td>\n",
|
|
" <td> $18.9m</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> Alexis Sánchez\\n Forward — Arsenal</td>\n",
|
|
" <td> $17.6m</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> Yaya Touré\\n Midfield — Manchester City</td>\n",
|
|
" <td> $16.6m</td>\n",
|
|
" <td> 18</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 19</td>\n",
|
|
" <td> 10.99</td>\n",
|
|
" <td> 197.91</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td> Ángel Di María\\n Midfield — Manchester United</td>\n",
|
|
" <td> $15.0m</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 10.17</td>\n",
|
|
" <td> 132.23</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td> Santiago Cazorla\\n Midfield — Arsenal</td>\n",
|
|
" <td> $14.8m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 9.97</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td> David Silva\\n Midfield — Manchester City</td>\n",
|
|
" <td> $14.3m</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 6</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 10.35</td>\n",
|
|
" <td> 155.26</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> Cesc Fàbregas\\n Midfield — Chelsea</td>\n",
|
|
" <td> $14.0m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> Saido Berahino\\n Forward — West Brom</td>\n",
|
|
" <td> $13.8m</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> Steven Gerrard\\n Midfield — Liverpool</td>\n",
|
|
" <td> $13.8m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" PLAYER SALARY GP G A SOT \\\n",
|
|
"0 Sergio Agüero\\n Forward — Manchester City $19.2m 16 14 3 34 \n",
|
|
"1 Eden Hazard\\n Midfield — Chelsea $18.9m 21 8 4 17 \n",
|
|
"2 Alexis Sánchez\\n Forward — Arsenal $17.6m NaN 12 7 29 \n",
|
|
"3 Yaya Touré\\n Midfield — Manchester City $16.6m 18 7 1 19 \n",
|
|
"4 Ángel Di María\\n Midfield — Manchester United $15.0m 13 3 NaN 13 \n",
|
|
"5 Santiago Cazorla\\n Midfield — Arsenal $14.8m 20 4 NaN 20 \n",
|
|
"6 David Silva\\n Midfield — Manchester City $14.3m 15 6 2 11 \n",
|
|
"7 Cesc Fàbregas\\n Midfield — Chelsea $14.0m 20 2 14 10 \n",
|
|
"8 Saido Berahino\\n Forward — West Brom $13.8m 21 9 0 20 \n",
|
|
"9 Steven Gerrard\\n Midfield — Liverpool $13.8m 20 5 1 11 \n",
|
|
"\n",
|
|
" PPG P \n",
|
|
"0 13.12 209.98 \n",
|
|
"1 13.05 274.04 \n",
|
|
"2 11.19 223.86 \n",
|
|
"3 10.99 197.91 \n",
|
|
"4 10.17 132.23 \n",
|
|
"5 9.97 NaN \n",
|
|
"6 10.35 155.26 \n",
|
|
"7 10.47 209.49 \n",
|
|
"8 7.02 147.43 \n",
|
|
"9 7.50 150.01 "
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"df = pd.read_csv('https://raw.githubusercontent.com/rasbt/python_reference/master/Data/some_soccer_data.csv')\n",
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Renaming Columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Converting Column Names to Lowercase"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>gp</th>\n",
|
|
" <th>g</th>\n",
|
|
" <th>a</th>\n",
|
|
" <th>sot</th>\n",
|
|
" <th>ppg</th>\n",
|
|
" <th>p</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> Cesc Fàbregas\\n Midfield — Chelsea</td>\n",
|
|
" <td> $14.0m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> Saido Berahino\\n Forward — West Brom</td>\n",
|
|
" <td> $13.8m</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> Steven Gerrard\\n Midfield — Liverpool</td>\n",
|
|
" <td> $13.8m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary gp g a sot ppg \\\n",
|
|
"7 Cesc Fàbregas\\n Midfield — Chelsea $14.0m 20 2 14 10 10.47 \n",
|
|
"8 Saido Berahino\\n Forward — West Brom $13.8m 21 9 0 20 7.02 \n",
|
|
"9 Steven Gerrard\\n Midfield — Liverpool $13.8m 20 5 1 11 7.50 \n",
|
|
"\n",
|
|
" p \n",
|
|
"7 209.49 \n",
|
|
"8 147.43 \n",
|
|
"9 150.01 "
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Converting column names to lowercase\n",
|
|
"\n",
|
|
"df.columns = [c.lower() for c in df.columns]\n",
|
|
"\n",
|
|
"# or\n",
|
|
"# df.rename(columns=lambda x : x.lower())\n",
|
|
"\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Renaming Particular Columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> Cesc Fàbregas\\n Midfield — Chelsea</td>\n",
|
|
" <td> $14.0m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> Saido Berahino\\n Forward — West Brom</td>\n",
|
|
" <td> $13.8m</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> Steven Gerrard\\n Midfield — Liverpool</td>\n",
|
|
" <td> $13.8m</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists \\\n",
|
|
"7 Cesc Fàbregas\\n Midfield — Chelsea $14.0m 20 2 14 \n",
|
|
"8 Saido Berahino\\n Forward — West Brom $13.8m 21 9 0 \n",
|
|
"9 Steven Gerrard\\n Midfield — Liverpool $13.8m 20 5 1 \n",
|
|
"\n",
|
|
" shots_on_target points_per_game points \n",
|
|
"7 10 10.47 209.49 \n",
|
|
"8 20 7.02 147.43 \n",
|
|
"9 11 7.50 150.01 "
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df = df.rename(columns={'p': 'points', \n",
|
|
" 'gp': 'games',\n",
|
|
" 'sot': 'shots_on_target',\n",
|
|
" 'g': 'goals',\n",
|
|
" 'ppg': 'points_per_game',\n",
|
|
" 'a': 'assists',})\n",
|
|
"\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Applying Computations Rows-wise"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Changing Values in a Column"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td> Santiago Cazorla\\n Midfield — Arsenal</td>\n",
|
|
" <td> 14.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 9.97</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td> David Silva\\n Midfield — Manchester City</td>\n",
|
|
" <td> 14.3</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 6</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 10.35</td>\n",
|
|
" <td> 155.26</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> Cesc Fàbregas\\n Midfield — Chelsea</td>\n",
|
|
" <td> 14.0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> Saido Berahino\\n Forward — West Brom</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> Steven Gerrard\\n Midfield — Liverpool</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists \\\n",
|
|
"5 Santiago Cazorla\\n Midfield — Arsenal 14.8 20 4 NaN \n",
|
|
"6 David Silva\\n Midfield — Manchester City 14.3 15 6 2 \n",
|
|
"7 Cesc Fàbregas\\n Midfield — Chelsea 14.0 20 2 14 \n",
|
|
"8 Saido Berahino\\n Forward — West Brom 13.8 21 9 0 \n",
|
|
"9 Steven Gerrard\\n Midfield — Liverpool 13.8 20 5 1 \n",
|
|
"\n",
|
|
" shots_on_target points_per_game points \n",
|
|
"5 20 9.97 NaN \n",
|
|
"6 11 10.35 155.26 \n",
|
|
"7 10 10.47 209.49 \n",
|
|
"8 20 7.02 147.43 \n",
|
|
"9 11 7.50 150.01 "
|
|
]
|
|
},
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Processing `salary` column\n",
|
|
"\n",
|
|
"df['salary'] = df['salary'].apply(lambda x: x.strip('$m'))\n",
|
|
"df.tail()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Adding a New Column"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> Cesc Fàbregas\\n Midfield — Chelsea</td>\n",
|
|
" <td> 14.0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" <td> </td>\n",
|
|
" <td> </td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> Saido Berahino\\n Forward — West Brom</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> </td>\n",
|
|
" <td> </td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> Steven Gerrard\\n Midfield — Liverpool</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" <td> </td>\n",
|
|
" <td> </td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists \\\n",
|
|
"7 Cesc Fàbregas\\n Midfield — Chelsea 14.0 20 2 14 \n",
|
|
"8 Saido Berahino\\n Forward — West Brom 13.8 21 9 0 \n",
|
|
"9 Steven Gerrard\\n Midfield — Liverpool 13.8 20 5 1 \n",
|
|
"\n",
|
|
" shots_on_target points_per_game points position team \n",
|
|
"7 10 10.47 209.49 \n",
|
|
"8 20 7.02 147.43 \n",
|
|
"9 11 7.50 150.01 "
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df['team'] = pd.Series('', index=df.index)\n",
|
|
"\n",
|
|
"# or\n",
|
|
"df.insert(loc=8, column='position', value='') \n",
|
|
"\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> Cesc Fàbregas</td>\n",
|
|
" <td> 14.0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" <td> Midfield</td>\n",
|
|
" <td> Chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> Saido Berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> Forward</td>\n",
|
|
" <td> West Brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> Steven Gerrard</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" <td> Midfield</td>\n",
|
|
" <td> Liverpool</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"7 Cesc Fàbregas 14.0 20 2 14 10 \n",
|
|
"8 Saido Berahino 13.8 21 9 0 20 \n",
|
|
"9 Steven Gerrard 13.8 20 5 1 11 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"7 10.47 209.49 Midfield Chelsea \n",
|
|
"8 7.02 147.43 Forward West Brom \n",
|
|
"9 7.50 150.01 Midfield Liverpool "
|
|
]
|
|
},
|
|
"execution_count": 7,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Processing `player` column\n",
|
|
"\n",
|
|
"def process_player_col(text):\n",
|
|
" name, rest = text.split('\\n')\n",
|
|
" position, team = [x.strip() for x in rest.split(' — ')]\n",
|
|
" return pd.Series([name, team, position])\n",
|
|
"\n",
|
|
"df[['player', 'team', 'position']] = df.player.apply(process_player_col)\n",
|
|
"\n",
|
|
"# modified after tip from reddit.com/user/hharison\n",
|
|
"#\n",
|
|
"# Alternative (inferior) approach:\n",
|
|
"#\n",
|
|
"#for idx,row in df.iterrows():\n",
|
|
"# name, position, team = process_player_col(row['player'])\n",
|
|
"# df.ix[idx, 'player'], df.ix[idx, 'position'], df.ix[idx, 'team'] = name, position, team\n",
|
|
" \n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Applying Functions to Multiple Columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 19.2</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 17.6</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> yaya touré</td>\n",
|
|
" <td> 16.6</td>\n",
|
|
" <td> 18</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 19</td>\n",
|
|
" <td> 10.99</td>\n",
|
|
" <td> 197.91</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td> ángel di maría</td>\n",
|
|
" <td> 15.0</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 10.17</td>\n",
|
|
" <td> 132.23</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester united</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"0 sergio agüero 19.2 16 14 3 34 \n",
|
|
"1 eden hazard 18.9 21 8 4 17 \n",
|
|
"2 alexis sánchez 17.6 NaN 12 7 29 \n",
|
|
"3 yaya touré 16.6 18 7 1 19 \n",
|
|
"4 ángel di maría 15.0 13 3 NaN 13 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"0 13.12 209.98 forward manchester city \n",
|
|
"1 13.05 274.04 midfield chelsea \n",
|
|
"2 11.19 223.86 forward arsenal \n",
|
|
"3 10.99 197.91 midfield manchester city \n",
|
|
"4 10.17 132.23 midfield manchester united "
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"cols = ['player', 'position', 'team']\n",
|
|
"df[cols] = df[cols].applymap(lambda x: x.lower())\n",
|
|
"df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Missing Values aka NaNs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Counting Rows with NaNs"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 9,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"3 rows have missing values\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"nans = df.shape[0] - df.dropna().shape[0]\n",
|
|
"\n",
|
|
"print('%d rows have missing values' % nans)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Selecting NaN Rows"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td> ángel di maría</td>\n",
|
|
" <td> 15.0</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 10.17</td>\n",
|
|
" <td> 132.23</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester united</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td> santiago cazorla</td>\n",
|
|
" <td> 14.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 9.97</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"4 ángel di maría 15.0 13 3 NaN 13 \n",
|
|
"5 santiago cazorla 14.8 20 4 NaN 20 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"4 10.17 132.23 midfield manchester united \n",
|
|
"5 9.97 NaN midfield arsenal "
|
|
]
|
|
},
|
|
"execution_count": 10,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Selecting all rows that have NaNs in the `assists` column\n",
|
|
"\n",
|
|
"df[df['assists'].isnull()]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Selecting non-NaN Rows"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 19.2</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 17.6</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> yaya touré</td>\n",
|
|
" <td> 16.6</td>\n",
|
|
" <td> 18</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 19</td>\n",
|
|
" <td> 10.99</td>\n",
|
|
" <td> 197.91</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td> david silva</td>\n",
|
|
" <td> 14.3</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 6</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 10.35</td>\n",
|
|
" <td> 155.26</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> cesc fàbregas</td>\n",
|
|
" <td> 14.0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> steven gerrard</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> liverpool</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"0 sergio agüero 19.2 16 14 3 34 \n",
|
|
"1 eden hazard 18.9 21 8 4 17 \n",
|
|
"2 alexis sánchez 17.6 NaN 12 7 29 \n",
|
|
"3 yaya touré 16.6 18 7 1 19 \n",
|
|
"6 david silva 14.3 15 6 2 11 \n",
|
|
"7 cesc fàbregas 14.0 20 2 14 10 \n",
|
|
"8 saido berahino 13.8 21 9 0 20 \n",
|
|
"9 steven gerrard 13.8 20 5 1 11 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"0 13.12 209.98 forward manchester city \n",
|
|
"1 13.05 274.04 midfield chelsea \n",
|
|
"2 11.19 223.86 forward arsenal \n",
|
|
"3 10.99 197.91 midfield manchester city \n",
|
|
"6 10.35 155.26 midfield manchester city \n",
|
|
"7 10.47 209.49 midfield chelsea \n",
|
|
"8 7.02 147.43 forward west brom \n",
|
|
"9 7.50 150.01 midfield liverpool "
|
|
]
|
|
},
|
|
"execution_count": 11,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df[df['assists'].notnull()]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Filling NaN Rows"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 19.2</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 17.6</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> yaya touré</td>\n",
|
|
" <td> 16.6</td>\n",
|
|
" <td> 18</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 19</td>\n",
|
|
" <td> 10.99</td>\n",
|
|
" <td> 197.91</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td> ángel di maría</td>\n",
|
|
" <td> 15.0</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 13</td>\n",
|
|
" <td> 10.17</td>\n",
|
|
" <td> 132.23</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester united</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td> santiago cazorla</td>\n",
|
|
" <td> 14.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 9.97</td>\n",
|
|
" <td> 0.00</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>6</th>\n",
|
|
" <td> david silva</td>\n",
|
|
" <td> 14.3</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 6</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 10.35</td>\n",
|
|
" <td> 155.26</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> cesc fàbregas</td>\n",
|
|
" <td> 14.0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> steven gerrard</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> liverpool</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"0 sergio agüero 19.2 16 14 3 34 \n",
|
|
"1 eden hazard 18.9 21 8 4 17 \n",
|
|
"2 alexis sánchez 17.6 0 12 7 29 \n",
|
|
"3 yaya touré 16.6 18 7 1 19 \n",
|
|
"4 ángel di maría 15.0 13 3 0 13 \n",
|
|
"5 santiago cazorla 14.8 20 4 0 20 \n",
|
|
"6 david silva 14.3 15 6 2 11 \n",
|
|
"7 cesc fàbregas 14.0 20 2 14 10 \n",
|
|
"8 saido berahino 13.8 21 9 0 20 \n",
|
|
"9 steven gerrard 13.8 20 5 1 11 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"0 13.12 209.98 forward manchester city \n",
|
|
"1 13.05 274.04 midfield chelsea \n",
|
|
"2 11.19 223.86 forward arsenal \n",
|
|
"3 10.99 197.91 midfield manchester city \n",
|
|
"4 10.17 132.23 midfield manchester united \n",
|
|
"5 9.97 0.00 midfield arsenal \n",
|
|
"6 10.35 155.26 midfield manchester city \n",
|
|
"7 10.47 209.49 midfield chelsea \n",
|
|
"8 7.02 147.43 forward west brom \n",
|
|
"9 7.50 150.01 midfield liverpool "
|
|
]
|
|
},
|
|
"execution_count": 12,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Filling NaN cells with default value 0\n",
|
|
"\n",
|
|
"df.fillna(value=0, inplace=True)\n",
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Appending Rows to a DataFrame"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>8 </th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9 </th>\n",
|
|
" <td> steven gerrard</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> liverpool</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"8 saido berahino 13.8 21 9 0 20 \n",
|
|
"9 steven gerrard 13.8 20 5 1 11 \n",
|
|
"10 NaN NaN NaN NaN NaN NaN \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"8 7.02 147.43 forward west brom \n",
|
|
"9 7.50 150.01 midfield liverpool \n",
|
|
"10 NaN NaN NaN NaN "
|
|
]
|
|
},
|
|
"execution_count": 13,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Adding an \"empty\" row to the DataFrame\n",
|
|
"\n",
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"df = df.append(pd.Series(\n",
|
|
" [np.nan]*len(df.columns), # Fill cells with NaNs\n",
|
|
" index=df.columns), \n",
|
|
" ignore_index=True)\n",
|
|
"\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>8 </th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9 </th>\n",
|
|
" <td> steven gerrard</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 5</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 11</td>\n",
|
|
" <td> 7.50</td>\n",
|
|
" <td> 150.01</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> liverpool</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td> new player</td>\n",
|
|
" <td> 12.3</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" <td> NaN</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"8 saido berahino 13.8 21 9 0 20 \n",
|
|
"9 steven gerrard 13.8 20 5 1 11 \n",
|
|
"10 new player 12.3 NaN NaN NaN NaN \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"8 7.02 147.43 forward west brom \n",
|
|
"9 7.50 150.01 midfield liverpool \n",
|
|
"10 NaN NaN NaN NaN "
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Filling cells with data\n",
|
|
"\n",
|
|
"df.loc[df.index[-1], 'player'] = 'new player'\n",
|
|
"df.loc[df.index[-1], 'salary'] = 12.3\n",
|
|
"df.tail(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Sorting and Reindexing DataFrames"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 15,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 19.2</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 17.6</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>8</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> yaya touré</td>\n",
|
|
" <td> 16.6</td>\n",
|
|
" <td> 18</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 19</td>\n",
|
|
" <td> 10.99</td>\n",
|
|
" <td> 197.91</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"0 sergio agüero 19.2 16 14 3 34 \n",
|
|
"2 alexis sánchez 17.6 0 12 7 29 \n",
|
|
"8 saido berahino 13.8 21 9 0 20 \n",
|
|
"1 eden hazard 18.9 21 8 4 17 \n",
|
|
"3 yaya touré 16.6 18 7 1 19 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"0 13.12 209.98 forward manchester city \n",
|
|
"2 11.19 223.86 forward arsenal \n",
|
|
"8 7.02 147.43 forward west brom \n",
|
|
"1 13.05 274.04 midfield chelsea \n",
|
|
"3 10.99 197.91 midfield manchester city "
|
|
]
|
|
},
|
|
"execution_count": 15,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Sorting the DataFrame by a certain column (from highest to lowest)\n",
|
|
"\n",
|
|
"df.sort('goals', ascending=False, inplace=True)\n",
|
|
"df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 19.2</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 17.6</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>5</th>\n",
|
|
" <td> yaya touré</td>\n",
|
|
" <td> 16.6</td>\n",
|
|
" <td> 18</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 1</td>\n",
|
|
" <td> 19</td>\n",
|
|
" <td> 10.99</td>\n",
|
|
" <td> 197.91</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"1 sergio agüero 19.2 16 14 3 34 \n",
|
|
"2 alexis sánchez 17.6 0 12 7 29 \n",
|
|
"3 saido berahino 13.8 21 9 0 20 \n",
|
|
"4 eden hazard 18.9 21 8 4 17 \n",
|
|
"5 yaya touré 16.6 18 7 1 19 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"1 13.12 209.98 forward manchester city \n",
|
|
"2 11.19 223.86 forward arsenal \n",
|
|
"3 7.02 147.43 forward west brom \n",
|
|
"4 13.05 274.04 midfield chelsea \n",
|
|
"5 10.99 197.91 midfield manchester city "
|
|
]
|
|
},
|
|
"execution_count": 16,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Optional reindexing of the DataFrame after sorting\n",
|
|
"\n",
|
|
"df.index = range(1,len(df.index)+1)\n",
|
|
"df.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Updating Columns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 17,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"1 sergio agüero 20 16 14 3 34 \n",
|
|
"2 alexis sánchez 15 0 12 7 29 \n",
|
|
"3 saido berahino 13.8 21 9 0 20 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"1 13.12 209.98 forward manchester city \n",
|
|
"2 11.19 223.86 forward arsenal \n",
|
|
"3 7.02 147.43 forward west brom "
|
|
]
|
|
},
|
|
"execution_count": 17,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Creating a dummy DataFrame with changes in the `salary` column\n",
|
|
"\n",
|
|
"df_2 = df.copy()\n",
|
|
"df_2.loc[0:2, 'salary'] = [20.0, 15.0]\n",
|
|
"df_2.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 18,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>player</th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>sergio agüero</th>\n",
|
|
" <td> 19.2</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>alexis sánchez</th>\n",
|
|
" <td> 17.6</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>saido berahino</th>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" salary games goals assists shots_on_target \\\n",
|
|
"player \n",
|
|
"sergio agüero 19.2 16 14 3 34 \n",
|
|
"alexis sánchez 17.6 0 12 7 29 \n",
|
|
"saido berahino 13.8 21 9 0 20 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"player \n",
|
|
"sergio agüero 13.12 209.98 forward manchester city \n",
|
|
"alexis sánchez 11.19 223.86 forward arsenal \n",
|
|
"saido berahino 7.02 147.43 forward west brom "
|
|
]
|
|
},
|
|
"execution_count": 18,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Temporarily use the `player` columns as indices to \n",
|
|
"# apply the update functions\n",
|
|
"\n",
|
|
"df.set_index('player', inplace=True)\n",
|
|
"df_2.set_index('player', inplace=True)\n",
|
|
"df.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 19,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>player</th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" <th></th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>sergio agüero</th>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>alexis sánchez</th>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>saido berahino</th>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" salary games goals assists shots_on_target \\\n",
|
|
"player \n",
|
|
"sergio agüero 20 16 14 3 34 \n",
|
|
"alexis sánchez 15 0 12 7 29 \n",
|
|
"saido berahino 13.8 21 9 0 20 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"player \n",
|
|
"sergio agüero 13.12 209.98 forward manchester city \n",
|
|
"alexis sánchez 11.19 223.86 forward arsenal \n",
|
|
"saido berahino 7.02 147.43 forward west brom "
|
|
]
|
|
},
|
|
"execution_count": 19,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Update the `salary` column\n",
|
|
"df.update(other=df_2['salary'], overwrite=True)\n",
|
|
"df.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 20,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 16</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 3</td>\n",
|
|
" <td> 34</td>\n",
|
|
" <td> 13.12</td>\n",
|
|
" <td> 209.98</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 9</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 7.02</td>\n",
|
|
" <td> 147.43</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"0 sergio agüero 20 16 14 3 34 \n",
|
|
"1 alexis sánchez 15 0 12 7 29 \n",
|
|
"2 saido berahino 13.8 21 9 0 20 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"0 13.12 209.98 forward manchester city \n",
|
|
"1 11.19 223.86 forward arsenal \n",
|
|
"2 7.02 147.43 forward west brom "
|
|
]
|
|
},
|
|
"execution_count": 20,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Reset the indices\n",
|
|
"df.reset_index(inplace=True)\n",
|
|
"df.head(3)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Chaining Conditions - Using Bitwise Operators"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 21,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> 21</td>\n",
|
|
" <td> 8</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 17</td>\n",
|
|
" <td> 13.05</td>\n",
|
|
" <td> 274.04</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>7</th>\n",
|
|
" <td> santiago cazorla</td>\n",
|
|
" <td> 14.8</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 4</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 9.97</td>\n",
|
|
" <td> 0.00</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>9</th>\n",
|
|
" <td> cesc fàbregas</td>\n",
|
|
" <td> 14.0</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> 2</td>\n",
|
|
" <td> 14</td>\n",
|
|
" <td> 10</td>\n",
|
|
" <td> 10.47</td>\n",
|
|
" <td> 209.49</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"1 alexis sánchez 15 0 12 7 29 \n",
|
|
"3 eden hazard 18.9 21 8 4 17 \n",
|
|
"7 santiago cazorla 14.8 20 4 0 20 \n",
|
|
"9 cesc fàbregas 14.0 20 2 14 10 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"1 11.19 223.86 forward arsenal \n",
|
|
"3 13.05 274.04 midfield chelsea \n",
|
|
"7 9.97 0.00 midfield arsenal \n",
|
|
"9 10.47 209.49 midfield chelsea "
|
|
]
|
|
},
|
|
"execution_count": 21,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Selecting only those players that either playing for Arsenal or Chelsea\n",
|
|
"\n",
|
|
"df[ (df['team'] == 'arsenal') | (df['team'] == 'chelsea') ]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>games</th>\n",
|
|
" <th>goals</th>\n",
|
|
" <th>assists</th>\n",
|
|
" <th>shots_on_target</th>\n",
|
|
" <th>points_per_game</th>\n",
|
|
" <th>points</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> 0</td>\n",
|
|
" <td> 12</td>\n",
|
|
" <td> 7</td>\n",
|
|
" <td> 29</td>\n",
|
|
" <td> 11.19</td>\n",
|
|
" <td> 223.86</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary games goals assists shots_on_target \\\n",
|
|
"1 alexis sánchez 15 0 12 7 29 \n",
|
|
"\n",
|
|
" points_per_game points position team \n",
|
|
"1 11.19 223.86 forward arsenal "
|
|
]
|
|
},
|
|
"execution_count": 22,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Selecting forwards from Arsenal only\n",
|
|
"\n",
|
|
"df[ (df['team'] == 'arsenal') & (df['position'] == 'forward') ]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Column Types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Printing Column Types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 23,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{dtype('float64'): ['games',\n",
|
|
" 'goals',\n",
|
|
" 'assists',\n",
|
|
" 'shots_on_target',\n",
|
|
" 'points_per_game',\n",
|
|
" 'points'],\n",
|
|
" dtype('O'): ['player', 'salary', 'position', 'team']}"
|
|
]
|
|
},
|
|
"execution_count": 23,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"types = df.columns.to_series().groupby(df.dtypes).groups\n",
|
|
"types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Selecting by Column Type"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>player</th>\n",
|
|
" <th>salary</th>\n",
|
|
" <th>position</th>\n",
|
|
" <th>team</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td> sergio agüero</td>\n",
|
|
" <td> 20</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td> alexis sánchez</td>\n",
|
|
" <td> 15</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> arsenal</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td> saido berahino</td>\n",
|
|
" <td> 13.8</td>\n",
|
|
" <td> forward</td>\n",
|
|
" <td> west brom</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td> eden hazard</td>\n",
|
|
" <td> 18.9</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> chelsea</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td> yaya touré</td>\n",
|
|
" <td> 16.6</td>\n",
|
|
" <td> midfield</td>\n",
|
|
" <td> manchester city</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" player salary position team\n",
|
|
"0 sergio agüero 20 forward manchester city\n",
|
|
"1 alexis sánchez 15 forward arsenal\n",
|
|
"2 saido berahino 13.8 forward west brom\n",
|
|
"3 eden hazard 18.9 midfield chelsea\n",
|
|
"4 yaya touré 16.6 midfield manchester city"
|
|
]
|
|
},
|
|
"execution_count": 24,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"# select string columns\n",
|
|
"df.loc[:, (df.dtypes == np.dtype('O')).values].head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Converting Column Types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"df['salary'] = df['salary'].astype(float)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"{dtype('float64'): ['salary',\n",
|
|
" 'games',\n",
|
|
" 'goals',\n",
|
|
" 'assists',\n",
|
|
" 'shots_on_target',\n",
|
|
" 'points_per_game',\n",
|
|
" 'points'],\n",
|
|
" dtype('O'): ['player', 'position', 'team']}"
|
|
]
|
|
},
|
|
"execution_count": 26,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"types = df.columns.to_series().groupby(df.dtypes).groups\n",
|
|
"types"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# If-tests"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to section overview](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"I was recently asked how to do an if-test in pandas, that is, how to create an array of 1s and 0s depending on a condition, e.g., if `val` less than 0.5 -> 0, else -> 1. Using the boolean mask, that's pretty simple since `True` and `False` are integers after all."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"1"
|
|
]
|
|
},
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"int(True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>0</th>\n",
|
|
" <th>1</th>\n",
|
|
" <th>2</th>\n",
|
|
" <th>3</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>2.0</td>\n",
|
|
" <td>0.30</td>\n",
|
|
" <td>4.00</td>\n",
|
|
" <td>5</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>0.8</td>\n",
|
|
" <td>0.03</td>\n",
|
|
" <td>0.02</td>\n",
|
|
" <td>5</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" 0 1 2 3\n",
|
|
"0 2.0 0.30 4.00 5\n",
|
|
"1 0.8 0.03 0.02 5"
|
|
]
|
|
},
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"a = [[2., .3, 4., 5.], [.8, .03, 0.02, 5.]]\n",
|
|
"df = pd.DataFrame(a)\n",
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>0</th>\n",
|
|
" <th>1</th>\n",
|
|
" <th>2</th>\n",
|
|
" <th>3</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>False</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>False</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>False</td>\n",
|
|
" <td>True</td>\n",
|
|
" <td>True</td>\n",
|
|
" <td>False</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" 0 1 2 3\n",
|
|
"0 False False False False\n",
|
|
"1 False True True False"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df = df <= 0.05\n",
|
|
"df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"metadata": {
|
|
"collapsed": false
|
|
},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>0</th>\n",
|
|
" <th>1</th>\n",
|
|
" <th>2</th>\n",
|
|
" <th>3</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>0</td>\n",
|
|
" <td>0</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>0</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>0</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" 0 1 2 3\n",
|
|
"0 0 0 0 0\n",
|
|
"1 0 1 1 0"
|
|
]
|
|
},
|
|
"execution_count": 4,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"df.astype(int)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"collapsed": true
|
|
},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.4.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 0
|
|
}
|