{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Back to the GitHub repository](https://github.com/rasbt/python_reference)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sebastian Raschka 28/01/2015 \n",
"\n",
"CPython 3.4.2\n",
"IPython 2.3.1\n",
"\n",
"pandas 0.15.2\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -a 'Sebastian Raschka' -v -d -p pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Things in Pandas I Wish I'd Known Earlier"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is just a small but growing collection of pandas snippets that I find occasionally and particularly useful -- consider it as my personal notebook. Suggestions, tips, and contributions are very, very welcome!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sections"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- [Loading Some Example Data](#Loading-Some-Example-Data)\n",
"- [Renaming Columns](#Renaming-Columns)\n",
" - [Converting Column Names to Lowercase](#Converting-Column-Names-to-Lowercase)\n",
" - [Renaming Particular Columns](#Renaming-Particular-Columns)\n",
"- [Applying Computations Rows-wise](#Applying-Computations-Rows-wise)\n",
" - [Changing Values in a Column](#Changing-Values-in-a-Column)\n",
" - [Adding a New Column](#Adding-a-New-Column)\n",
" - [Applying Functions to Multiple Columns](#Applying-Functions-to-Multiple-Columns)\n",
"- [Missing Values aka NaNs](#Missing-Values-aka-NaNs)\n",
" - [Counting Rows with NaNs](#Counting-Rows-with-NaNs)\n",
" - [Selecting NaN Rows](#Selecting-NaN-Rows)\n",
" - [Selecting non-NaN Rows](#Selecting-non-NaN-Rows)\n",
" - [Filling NaN Rows](#Filling-NaN-Rows)\n",
"- [Appending Rows to a DataFrame](#Appending-Rows-to-a-DataFrame)\n",
"- [Sorting and Reindexing DataFrames](#Sorting-and-Reindexing-DataFrames)\n",
"- [Updating Columns](#Updating-Columns)\n",
"- [Chaining Conditions - Using Bitwise Operators](#Chaining-Conditions---Using-Bitwise-Operators)\n",
"- [Column Types](#Column-Types)\n",
" - [Printing Column Types](#Printing-Column-Types)\n",
" - [Selecting by Column Type](#Selecting-by-Column-Type)\n",
" - [Converting Column Types](#Converting-Column-Types)\n",
"- [If-tests](#If-tests)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loading Some Example Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I am heavily into sports prediction (via a machine learning approach) these days. So, let us use a (very) small subset of the soccer data that I am just working with."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" PLAYER | \n",
" SALARY | \n",
" GP | \n",
" G | \n",
" A | \n",
" SOT | \n",
" PPG | \n",
" P | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Sergio Agüero\\n Forward — Manchester City | \n",
" $19.2m | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
"
\n",
" \n",
" 1 | \n",
" Eden Hazard\\n Midfield — Chelsea | \n",
" $18.9m | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
"
\n",
" \n",
" 2 | \n",
" Alexis Sánchez\\n Forward — Arsenal | \n",
" $17.6m | \n",
" NaN | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
"
\n",
" \n",
" 3 | \n",
" Yaya Touré\\n Midfield — Manchester City | \n",
" $16.6m | \n",
" 18 | \n",
" 7 | \n",
" 1 | \n",
" 19 | \n",
" 10.99 | \n",
" 197.91 | \n",
"
\n",
" \n",
" 4 | \n",
" Ángel Di María\\n Midfield — Manchester United | \n",
" $15.0m | \n",
" 13 | \n",
" 3 | \n",
" NaN | \n",
" 13 | \n",
" 10.17 | \n",
" 132.23 | \n",
"
\n",
" \n",
" 5 | \n",
" Santiago Cazorla\\n Midfield — Arsenal | \n",
" $14.8m | \n",
" 20 | \n",
" 4 | \n",
" NaN | \n",
" 20 | \n",
" 9.97 | \n",
" NaN | \n",
"
\n",
" \n",
" 6 | \n",
" David Silva\\n Midfield — Manchester City | \n",
" $14.3m | \n",
" 15 | \n",
" 6 | \n",
" 2 | \n",
" 11 | \n",
" 10.35 | \n",
" 155.26 | \n",
"
\n",
" \n",
" 7 | \n",
" Cesc Fàbregas\\n Midfield — Chelsea | \n",
" $14.0m | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
"
\n",
" \n",
" 8 | \n",
" Saido Berahino\\n Forward — West Brom | \n",
" $13.8m | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
"
\n",
" \n",
" 9 | \n",
" Steven Gerrard\\n Midfield — Liverpool | \n",
" $13.8m | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PLAYER SALARY GP G A SOT \\\n",
"0 Sergio Agüero\\n Forward — Manchester City $19.2m 16 14 3 34 \n",
"1 Eden Hazard\\n Midfield — Chelsea $18.9m 21 8 4 17 \n",
"2 Alexis Sánchez\\n Forward — Arsenal $17.6m NaN 12 7 29 \n",
"3 Yaya Touré\\n Midfield — Manchester City $16.6m 18 7 1 19 \n",
"4 Ángel Di María\\n Midfield — Manchester United $15.0m 13 3 NaN 13 \n",
"5 Santiago Cazorla\\n Midfield — Arsenal $14.8m 20 4 NaN 20 \n",
"6 David Silva\\n Midfield — Manchester City $14.3m 15 6 2 11 \n",
"7 Cesc Fàbregas\\n Midfield — Chelsea $14.0m 20 2 14 10 \n",
"8 Saido Berahino\\n Forward — West Brom $13.8m 21 9 0 20 \n",
"9 Steven Gerrard\\n Midfield — Liverpool $13.8m 20 5 1 11 \n",
"\n",
" PPG P \n",
"0 13.12 209.98 \n",
"1 13.05 274.04 \n",
"2 11.19 223.86 \n",
"3 10.99 197.91 \n",
"4 10.17 132.23 \n",
"5 9.97 NaN \n",
"6 10.35 155.26 \n",
"7 10.47 209.49 \n",
"8 7.02 147.43 \n",
"9 7.50 150.01 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_csv('https://raw.githubusercontent.com/rasbt/python_reference/master/Data/some_soccer_data.csv')\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Renaming Columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Converting Column Names to Lowercase"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" gp | \n",
" g | \n",
" a | \n",
" sot | \n",
" ppg | \n",
" p | \n",
"
\n",
" \n",
" \n",
" \n",
" 7 | \n",
" Cesc Fàbregas\\n Midfield — Chelsea | \n",
" $14.0m | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
"
\n",
" \n",
" 8 | \n",
" Saido Berahino\\n Forward — West Brom | \n",
" $13.8m | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
"
\n",
" \n",
" 9 | \n",
" Steven Gerrard\\n Midfield — Liverpool | \n",
" $13.8m | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary gp g a sot ppg \\\n",
"7 Cesc Fàbregas\\n Midfield — Chelsea $14.0m 20 2 14 10 10.47 \n",
"8 Saido Berahino\\n Forward — West Brom $13.8m 21 9 0 20 7.02 \n",
"9 Steven Gerrard\\n Midfield — Liverpool $13.8m 20 5 1 11 7.50 \n",
"\n",
" p \n",
"7 209.49 \n",
"8 147.43 \n",
"9 150.01 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Converting column names to lowercase\n",
"\n",
"df.columns = [c.lower() for c in df.columns]\n",
"\n",
"# or\n",
"# df.rename(columns=lambda x : x.lower())\n",
"\n",
"df.tail(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Renaming Particular Columns"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
"
\n",
" \n",
" \n",
" \n",
" 7 | \n",
" Cesc Fàbregas\\n Midfield — Chelsea | \n",
" $14.0m | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
"
\n",
" \n",
" 8 | \n",
" Saido Berahino\\n Forward — West Brom | \n",
" $13.8m | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
"
\n",
" \n",
" 9 | \n",
" Steven Gerrard\\n Midfield — Liverpool | \n",
" $13.8m | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists \\\n",
"7 Cesc Fàbregas\\n Midfield — Chelsea $14.0m 20 2 14 \n",
"8 Saido Berahino\\n Forward — West Brom $13.8m 21 9 0 \n",
"9 Steven Gerrard\\n Midfield — Liverpool $13.8m 20 5 1 \n",
"\n",
" shots_on_target points_per_game points \n",
"7 10 10.47 209.49 \n",
"8 20 7.02 147.43 \n",
"9 11 7.50 150.01 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df.rename(columns={'p': 'points', \n",
" 'gp': 'games',\n",
" 'sot': 'shots_on_target',\n",
" 'g': 'goals',\n",
" 'ppg': 'points_per_game',\n",
" 'a': 'assists',})\n",
"\n",
"df.tail(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Applying Computations Rows-wise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Changing Values in a Column"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
"
\n",
" \n",
" \n",
" \n",
" 5 | \n",
" Santiago Cazorla\\n Midfield — Arsenal | \n",
" 14.8 | \n",
" 20 | \n",
" 4 | \n",
" NaN | \n",
" 20 | \n",
" 9.97 | \n",
" NaN | \n",
"
\n",
" \n",
" 6 | \n",
" David Silva\\n Midfield — Manchester City | \n",
" 14.3 | \n",
" 15 | \n",
" 6 | \n",
" 2 | \n",
" 11 | \n",
" 10.35 | \n",
" 155.26 | \n",
"
\n",
" \n",
" 7 | \n",
" Cesc Fàbregas\\n Midfield — Chelsea | \n",
" 14.0 | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
"
\n",
" \n",
" 8 | \n",
" Saido Berahino\\n Forward — West Brom | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
"
\n",
" \n",
" 9 | \n",
" Steven Gerrard\\n Midfield — Liverpool | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists \\\n",
"5 Santiago Cazorla\\n Midfield — Arsenal 14.8 20 4 NaN \n",
"6 David Silva\\n Midfield — Manchester City 14.3 15 6 2 \n",
"7 Cesc Fàbregas\\n Midfield — Chelsea 14.0 20 2 14 \n",
"8 Saido Berahino\\n Forward — West Brom 13.8 21 9 0 \n",
"9 Steven Gerrard\\n Midfield — Liverpool 13.8 20 5 1 \n",
"\n",
" shots_on_target points_per_game points \n",
"5 20 9.97 NaN \n",
"6 11 10.35 155.26 \n",
"7 10 10.47 209.49 \n",
"8 20 7.02 147.43 \n",
"9 11 7.50 150.01 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Processing `salary` column\n",
"\n",
"df['salary'] = df['salary'].apply(lambda x: x.strip('$m'))\n",
"df.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding a New Column"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 7 | \n",
" Cesc Fàbregas\\n Midfield — Chelsea | \n",
" 14.0 | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
" | \n",
" | \n",
"
\n",
" \n",
" 8 | \n",
" Saido Berahino\\n Forward — West Brom | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" | \n",
" | \n",
"
\n",
" \n",
" 9 | \n",
" Steven Gerrard\\n Midfield — Liverpool | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
" | \n",
" | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists \\\n",
"7 Cesc Fàbregas\\n Midfield — Chelsea 14.0 20 2 14 \n",
"8 Saido Berahino\\n Forward — West Brom 13.8 21 9 0 \n",
"9 Steven Gerrard\\n Midfield — Liverpool 13.8 20 5 1 \n",
"\n",
" shots_on_target points_per_game points position team \n",
"7 10 10.47 209.49 \n",
"8 20 7.02 147.43 \n",
"9 11 7.50 150.01 "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['team'] = pd.Series('', index=df.index)\n",
"\n",
"# or\n",
"df.insert(loc=8, column='position', value='') \n",
"\n",
"df.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 7 | \n",
" Cesc Fàbregas | \n",
" 14.0 | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
" Midfield | \n",
" Chelsea | \n",
"
\n",
" \n",
" 8 | \n",
" Saido Berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" Forward | \n",
" West Brom | \n",
"
\n",
" \n",
" 9 | \n",
" Steven Gerrard | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
" Midfield | \n",
" Liverpool | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"7 Cesc Fàbregas 14.0 20 2 14 10 \n",
"8 Saido Berahino 13.8 21 9 0 20 \n",
"9 Steven Gerrard 13.8 20 5 1 11 \n",
"\n",
" points_per_game points position team \n",
"7 10.47 209.49 Midfield Chelsea \n",
"8 7.02 147.43 Forward West Brom \n",
"9 7.50 150.01 Midfield Liverpool "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Processing `player` column\n",
"\n",
"def process_player_col(text):\n",
" name, rest = text.split('\\n')\n",
" position, team = [x.strip() for x in rest.split(' — ')]\n",
" return pd.Series([name, team, position])\n",
"\n",
"df[['player', 'team', 'position']] = df.player.apply(process_player_col)\n",
"\n",
"# modified after tip from reddit.com/user/hharison\n",
"#\n",
"# Alternative (inferior) approach:\n",
"#\n",
"#for idx,row in df.iterrows():\n",
"# name, position, team = process_player_col(row['player'])\n",
"# df.ix[idx, 'player'], df.ix[idx, 'position'], df.ix[idx, 'team'] = name, position, team\n",
" \n",
"df.tail(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Applying Functions to Multiple Columns"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" sergio agüero | \n",
" 19.2 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 1 | \n",
" eden hazard | \n",
" 18.9 | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 2 | \n",
" alexis sánchez | \n",
" 17.6 | \n",
" NaN | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 3 | \n",
" yaya touré | \n",
" 16.6 | \n",
" 18 | \n",
" 7 | \n",
" 1 | \n",
" 19 | \n",
" 10.99 | \n",
" 197.91 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
" 4 | \n",
" ángel di maría | \n",
" 15.0 | \n",
" 13 | \n",
" 3 | \n",
" NaN | \n",
" 13 | \n",
" 10.17 | \n",
" 132.23 | \n",
" midfield | \n",
" manchester united | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"0 sergio agüero 19.2 16 14 3 34 \n",
"1 eden hazard 18.9 21 8 4 17 \n",
"2 alexis sánchez 17.6 NaN 12 7 29 \n",
"3 yaya touré 16.6 18 7 1 19 \n",
"4 ángel di maría 15.0 13 3 NaN 13 \n",
"\n",
" points_per_game points position team \n",
"0 13.12 209.98 forward manchester city \n",
"1 13.05 274.04 midfield chelsea \n",
"2 11.19 223.86 forward arsenal \n",
"3 10.99 197.91 midfield manchester city \n",
"4 10.17 132.23 midfield manchester united "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cols = ['player', 'position', 'team']\n",
"df[cols] = df[cols].applymap(lambda x: x.lower())\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Missing Values aka NaNs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Counting Rows with NaNs"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3 rows have missing values\n"
]
}
],
"source": [
"nans = df.shape[0] - df.dropna().shape[0]\n",
"\n",
"print('%d rows have missing values' % nans)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting NaN Rows"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" ángel di maría | \n",
" 15.0 | \n",
" 13 | \n",
" 3 | \n",
" NaN | \n",
" 13 | \n",
" 10.17 | \n",
" 132.23 | \n",
" midfield | \n",
" manchester united | \n",
"
\n",
" \n",
" 5 | \n",
" santiago cazorla | \n",
" 14.8 | \n",
" 20 | \n",
" 4 | \n",
" NaN | \n",
" 20 | \n",
" 9.97 | \n",
" NaN | \n",
" midfield | \n",
" arsenal | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"4 ángel di maría 15.0 13 3 NaN 13 \n",
"5 santiago cazorla 14.8 20 4 NaN 20 \n",
"\n",
" points_per_game points position team \n",
"4 10.17 132.23 midfield manchester united \n",
"5 9.97 NaN midfield arsenal "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Selecting all rows that have NaNs in the `assists` column\n",
"\n",
"df[df['assists'].isnull()]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting non-NaN Rows"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" sergio agüero | \n",
" 19.2 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 1 | \n",
" eden hazard | \n",
" 18.9 | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 2 | \n",
" alexis sánchez | \n",
" 17.6 | \n",
" NaN | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 3 | \n",
" yaya touré | \n",
" 16.6 | \n",
" 18 | \n",
" 7 | \n",
" 1 | \n",
" 19 | \n",
" 10.99 | \n",
" 197.91 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
" 6 | \n",
" david silva | \n",
" 14.3 | \n",
" 15 | \n",
" 6 | \n",
" 2 | \n",
" 11 | \n",
" 10.35 | \n",
" 155.26 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
" 7 | \n",
" cesc fàbregas | \n",
" 14.0 | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 8 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 9 | \n",
" steven gerrard | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
" midfield | \n",
" liverpool | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"0 sergio agüero 19.2 16 14 3 34 \n",
"1 eden hazard 18.9 21 8 4 17 \n",
"2 alexis sánchez 17.6 NaN 12 7 29 \n",
"3 yaya touré 16.6 18 7 1 19 \n",
"6 david silva 14.3 15 6 2 11 \n",
"7 cesc fàbregas 14.0 20 2 14 10 \n",
"8 saido berahino 13.8 21 9 0 20 \n",
"9 steven gerrard 13.8 20 5 1 11 \n",
"\n",
" points_per_game points position team \n",
"0 13.12 209.98 forward manchester city \n",
"1 13.05 274.04 midfield chelsea \n",
"2 11.19 223.86 forward arsenal \n",
"3 10.99 197.91 midfield manchester city \n",
"6 10.35 155.26 midfield manchester city \n",
"7 10.47 209.49 midfield chelsea \n",
"8 7.02 147.43 forward west brom \n",
"9 7.50 150.01 midfield liverpool "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[df['assists'].notnull()]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filling NaN Rows"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" sergio agüero | \n",
" 19.2 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 1 | \n",
" eden hazard | \n",
" 18.9 | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 2 | \n",
" alexis sánchez | \n",
" 17.6 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 3 | \n",
" yaya touré | \n",
" 16.6 | \n",
" 18 | \n",
" 7 | \n",
" 1 | \n",
" 19 | \n",
" 10.99 | \n",
" 197.91 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
" 4 | \n",
" ángel di maría | \n",
" 15.0 | \n",
" 13 | \n",
" 3 | \n",
" 0 | \n",
" 13 | \n",
" 10.17 | \n",
" 132.23 | \n",
" midfield | \n",
" manchester united | \n",
"
\n",
" \n",
" 5 | \n",
" santiago cazorla | \n",
" 14.8 | \n",
" 20 | \n",
" 4 | \n",
" 0 | \n",
" 20 | \n",
" 9.97 | \n",
" 0.00 | \n",
" midfield | \n",
" arsenal | \n",
"
\n",
" \n",
" 6 | \n",
" david silva | \n",
" 14.3 | \n",
" 15 | \n",
" 6 | \n",
" 2 | \n",
" 11 | \n",
" 10.35 | \n",
" 155.26 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
" 7 | \n",
" cesc fàbregas | \n",
" 14.0 | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 8 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 9 | \n",
" steven gerrard | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
" midfield | \n",
" liverpool | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"0 sergio agüero 19.2 16 14 3 34 \n",
"1 eden hazard 18.9 21 8 4 17 \n",
"2 alexis sánchez 17.6 0 12 7 29 \n",
"3 yaya touré 16.6 18 7 1 19 \n",
"4 ángel di maría 15.0 13 3 0 13 \n",
"5 santiago cazorla 14.8 20 4 0 20 \n",
"6 david silva 14.3 15 6 2 11 \n",
"7 cesc fàbregas 14.0 20 2 14 10 \n",
"8 saido berahino 13.8 21 9 0 20 \n",
"9 steven gerrard 13.8 20 5 1 11 \n",
"\n",
" points_per_game points position team \n",
"0 13.12 209.98 forward manchester city \n",
"1 13.05 274.04 midfield chelsea \n",
"2 11.19 223.86 forward arsenal \n",
"3 10.99 197.91 midfield manchester city \n",
"4 10.17 132.23 midfield manchester united \n",
"5 9.97 0.00 midfield arsenal \n",
"6 10.35 155.26 midfield manchester city \n",
"7 10.47 209.49 midfield chelsea \n",
"8 7.02 147.43 forward west brom \n",
"9 7.50 150.01 midfield liverpool "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filling NaN cells with default value 0\n",
"\n",
"df.fillna(value=0, inplace=True)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Appending Rows to a DataFrame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 8 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 9 | \n",
" steven gerrard | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
" midfield | \n",
" liverpool | \n",
"
\n",
" \n",
" 10 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"8 saido berahino 13.8 21 9 0 20 \n",
"9 steven gerrard 13.8 20 5 1 11 \n",
"10 NaN NaN NaN NaN NaN NaN \n",
"\n",
" points_per_game points position team \n",
"8 7.02 147.43 forward west brom \n",
"9 7.50 150.01 midfield liverpool \n",
"10 NaN NaN NaN NaN "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Adding an \"empty\" row to the DataFrame\n",
"\n",
"import numpy as np\n",
"\n",
"df = df.append(pd.Series(\n",
" [np.nan]*len(df.columns), # Fill cells with NaNs\n",
" index=df.columns), \n",
" ignore_index=True)\n",
"\n",
"df.tail(3)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 8 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 9 | \n",
" steven gerrard | \n",
" 13.8 | \n",
" 20 | \n",
" 5 | \n",
" 1 | \n",
" 11 | \n",
" 7.50 | \n",
" 150.01 | \n",
" midfield | \n",
" liverpool | \n",
"
\n",
" \n",
" 10 | \n",
" new player | \n",
" 12.3 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"8 saido berahino 13.8 21 9 0 20 \n",
"9 steven gerrard 13.8 20 5 1 11 \n",
"10 new player 12.3 NaN NaN NaN NaN \n",
"\n",
" points_per_game points position team \n",
"8 7.02 147.43 forward west brom \n",
"9 7.50 150.01 midfield liverpool \n",
"10 NaN NaN NaN NaN "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filling cells with data\n",
"\n",
"df.loc[df.index[-1], 'player'] = 'new player'\n",
"df.loc[df.index[-1], 'salary'] = 12.3\n",
"df.tail(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sorting and Reindexing DataFrames"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" sergio agüero | \n",
" 19.2 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 2 | \n",
" alexis sánchez | \n",
" 17.6 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 8 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 1 | \n",
" eden hazard | \n",
" 18.9 | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 3 | \n",
" yaya touré | \n",
" 16.6 | \n",
" 18 | \n",
" 7 | \n",
" 1 | \n",
" 19 | \n",
" 10.99 | \n",
" 197.91 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"0 sergio agüero 19.2 16 14 3 34 \n",
"2 alexis sánchez 17.6 0 12 7 29 \n",
"8 saido berahino 13.8 21 9 0 20 \n",
"1 eden hazard 18.9 21 8 4 17 \n",
"3 yaya touré 16.6 18 7 1 19 \n",
"\n",
" points_per_game points position team \n",
"0 13.12 209.98 forward manchester city \n",
"2 11.19 223.86 forward arsenal \n",
"8 7.02 147.43 forward west brom \n",
"1 13.05 274.04 midfield chelsea \n",
"3 10.99 197.91 midfield manchester city "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sorting the DataFrame by a certain column (from highest to lowest)\n",
"\n",
"df.sort('goals', ascending=False, inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" sergio agüero | \n",
" 19.2 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 2 | \n",
" alexis sánchez | \n",
" 17.6 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 3 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 4 | \n",
" eden hazard | \n",
" 18.9 | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 5 | \n",
" yaya touré | \n",
" 16.6 | \n",
" 18 | \n",
" 7 | \n",
" 1 | \n",
" 19 | \n",
" 10.99 | \n",
" 197.91 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"1 sergio agüero 19.2 16 14 3 34 \n",
"2 alexis sánchez 17.6 0 12 7 29 \n",
"3 saido berahino 13.8 21 9 0 20 \n",
"4 eden hazard 18.9 21 8 4 17 \n",
"5 yaya touré 16.6 18 7 1 19 \n",
"\n",
" points_per_game points position team \n",
"1 13.12 209.98 forward manchester city \n",
"2 11.19 223.86 forward arsenal \n",
"3 7.02 147.43 forward west brom \n",
"4 13.05 274.04 midfield chelsea \n",
"5 10.99 197.91 midfield manchester city "
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Optional reindexing of the DataFrame after sorting\n",
"\n",
"df.index = range(1,len(df.index)+1)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Updating Columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" sergio agüero | \n",
" 20 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 2 | \n",
" alexis sánchez | \n",
" 15 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 3 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"1 sergio agüero 20 16 14 3 34 \n",
"2 alexis sánchez 15 0 12 7 29 \n",
"3 saido berahino 13.8 21 9 0 20 \n",
"\n",
" points_per_game points position team \n",
"1 13.12 209.98 forward manchester city \n",
"2 11.19 223.86 forward arsenal \n",
"3 7.02 147.43 forward west brom "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Creating a dummy DataFrame with changes in the `salary` column\n",
"\n",
"df_2 = df.copy()\n",
"df_2.loc[0:2, 'salary'] = [20.0, 15.0]\n",
"df_2.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" player | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" sergio agüero | \n",
" 19.2 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" alexis sánchez | \n",
" 17.6 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" salary games goals assists shots_on_target \\\n",
"player \n",
"sergio agüero 19.2 16 14 3 34 \n",
"alexis sánchez 17.6 0 12 7 29 \n",
"saido berahino 13.8 21 9 0 20 \n",
"\n",
" points_per_game points position team \n",
"player \n",
"sergio agüero 13.12 209.98 forward manchester city \n",
"alexis sánchez 11.19 223.86 forward arsenal \n",
"saido berahino 7.02 147.43 forward west brom "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Temporarily use the `player` columns as indices to \n",
"# apply the update functions\n",
"\n",
"df.set_index('player', inplace=True)\n",
"df_2.set_index('player', inplace=True)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" player | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" sergio agüero | \n",
" 20 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" alexis sánchez | \n",
" 15 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" salary games goals assists shots_on_target \\\n",
"player \n",
"sergio agüero 20 16 14 3 34 \n",
"alexis sánchez 15 0 12 7 29 \n",
"saido berahino 13.8 21 9 0 20 \n",
"\n",
" points_per_game points position team \n",
"player \n",
"sergio agüero 13.12 209.98 forward manchester city \n",
"alexis sánchez 11.19 223.86 forward arsenal \n",
"saido berahino 7.02 147.43 forward west brom "
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Update the `salary` column\n",
"df.update(other=df_2['salary'], overwrite=True)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" sergio agüero | \n",
" 20 | \n",
" 16 | \n",
" 14 | \n",
" 3 | \n",
" 34 | \n",
" 13.12 | \n",
" 209.98 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 1 | \n",
" alexis sánchez | \n",
" 15 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 2 | \n",
" saido berahino | \n",
" 13.8 | \n",
" 21 | \n",
" 9 | \n",
" 0 | \n",
" 20 | \n",
" 7.02 | \n",
" 147.43 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"0 sergio agüero 20 16 14 3 34 \n",
"1 alexis sánchez 15 0 12 7 29 \n",
"2 saido berahino 13.8 21 9 0 20 \n",
"\n",
" points_per_game points position team \n",
"0 13.12 209.98 forward manchester city \n",
"1 11.19 223.86 forward arsenal \n",
"2 7.02 147.43 forward west brom "
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Reset the indices\n",
"df.reset_index(inplace=True)\n",
"df.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chaining Conditions - Using Bitwise Operators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" alexis sánchez | \n",
" 15 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 3 | \n",
" eden hazard | \n",
" 18.9 | \n",
" 21 | \n",
" 8 | \n",
" 4 | \n",
" 17 | \n",
" 13.05 | \n",
" 274.04 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 7 | \n",
" santiago cazorla | \n",
" 14.8 | \n",
" 20 | \n",
" 4 | \n",
" 0 | \n",
" 20 | \n",
" 9.97 | \n",
" 0.00 | \n",
" midfield | \n",
" arsenal | \n",
"
\n",
" \n",
" 9 | \n",
" cesc fàbregas | \n",
" 14.0 | \n",
" 20 | \n",
" 2 | \n",
" 14 | \n",
" 10 | \n",
" 10.47 | \n",
" 209.49 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"1 alexis sánchez 15 0 12 7 29 \n",
"3 eden hazard 18.9 21 8 4 17 \n",
"7 santiago cazorla 14.8 20 4 0 20 \n",
"9 cesc fàbregas 14.0 20 2 14 10 \n",
"\n",
" points_per_game points position team \n",
"1 11.19 223.86 forward arsenal \n",
"3 13.05 274.04 midfield chelsea \n",
"7 9.97 0.00 midfield arsenal \n",
"9 10.47 209.49 midfield chelsea "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Selecting only those players that either playing for Arsenal or Chelsea\n",
"\n",
"df[ (df['team'] == 'arsenal') | (df['team'] == 'chelsea') ]"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" games | \n",
" goals | \n",
" assists | \n",
" shots_on_target | \n",
" points_per_game | \n",
" points | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" alexis sánchez | \n",
" 15 | \n",
" 0 | \n",
" 12 | \n",
" 7 | \n",
" 29 | \n",
" 11.19 | \n",
" 223.86 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary games goals assists shots_on_target \\\n",
"1 alexis sánchez 15 0 12 7 29 \n",
"\n",
" points_per_game points position team \n",
"1 11.19 223.86 forward arsenal "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Selecting forwards from Arsenal only\n",
"\n",
"df[ (df['team'] == 'arsenal') & (df['position'] == 'forward') ]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Column Types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Printing Column Types"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{dtype('float64'): ['games',\n",
" 'goals',\n",
" 'assists',\n",
" 'shots_on_target',\n",
" 'points_per_game',\n",
" 'points'],\n",
" dtype('O'): ['player', 'salary', 'position', 'team']}"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"types = df.columns.to_series().groupby(df.dtypes).groups\n",
"types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting by Column Type"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" player | \n",
" salary | \n",
" position | \n",
" team | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" sergio agüero | \n",
" 20 | \n",
" forward | \n",
" manchester city | \n",
"
\n",
" \n",
" 1 | \n",
" alexis sánchez | \n",
" 15 | \n",
" forward | \n",
" arsenal | \n",
"
\n",
" \n",
" 2 | \n",
" saido berahino | \n",
" 13.8 | \n",
" forward | \n",
" west brom | \n",
"
\n",
" \n",
" 3 | \n",
" eden hazard | \n",
" 18.9 | \n",
" midfield | \n",
" chelsea | \n",
"
\n",
" \n",
" 4 | \n",
" yaya touré | \n",
" 16.6 | \n",
" midfield | \n",
" manchester city | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" player salary position team\n",
"0 sergio agüero 20 forward manchester city\n",
"1 alexis sánchez 15 forward arsenal\n",
"2 saido berahino 13.8 forward west brom\n",
"3 eden hazard 18.9 midfield chelsea\n",
"4 yaya touré 16.6 midfield manchester city"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# select string columns\n",
"df.loc[:, (df.dtypes == np.dtype('O')).values].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Converting Column Types"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"df['salary'] = df['salary'].astype(float)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"{dtype('float64'): ['salary',\n",
" 'games',\n",
" 'goals',\n",
" 'assists',\n",
" 'shots_on_target',\n",
" 'points_per_game',\n",
" 'points'],\n",
" dtype('O'): ['player', 'position', 'team']}"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"types = df.columns.to_series().groupby(df.dtypes).groups\n",
"types"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# If-tests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to section overview](#Sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I was recently asked how to do an if-test in pandas, that is, how to create an array of 1s and 0s depending on a condition, e.g., if `val` less than 0.5 -> 0, else -> 1. Using the boolean mask, that's pretty simple since `True` and `False` are integers after all."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int(True)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2.0 | \n",
" 0.30 | \n",
" 4.00 | \n",
" 5 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.8 | \n",
" 0.03 | \n",
" 0.02 | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3\n",
"0 2.0 0.30 4.00 5\n",
"1 0.8 0.03 0.02 5"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"a = [[2., .3, 4., 5.], [.8, .03, 0.02, 5.]]\n",
"df = pd.DataFrame(a)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" False | \n",
" False | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" False | \n",
" True | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3\n",
"0 False False False False\n",
"1 False True True False"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df <= 0.05\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" 0 | \n",
" 1 | \n",
" 2 | \n",
" 3 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2 3\n",
"0 0 0 0 0\n",
"1 0 1 1 0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.astype(int)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}