{ "metadata": { "name": "", "signature": "sha256:01adffebfb99d8e7a86af443b9d14ca7695efc917465ea85868cc42681d6e96b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[Back to the GitHub repository](https://github.com/rasbt/python_reference)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%load_ext watermark\n", "%watermark -a 'Sebastian Raschka' -v -d -p pandas" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Sebastian Raschka 24/01/2015 \n", "\n", "CPython 3.4.2\n", "IPython 2.3.1\n", "\n", "pandas 0.15.2\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Things in Pandas I Wish I'd Had Known Earlier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is just a small but growing collection of pandas snippets that I find occasionally and particularly useful -- consider it as my personal notebook. Suggestions, tips, and contributions are very, very welcome!" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Sections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Loading Some Example Data](#Loading-Some-Example-Data)\n", "- [Renaming Columns](#Renaming-Columns)\n", "- [Applying Computations Rows-wise](#Applying-Computations-Rows-wise)\n", "- [Missing Values aka NaNs](#Missing-Values-aka-NaNs)\n", " - [Selecting NaN Rows](#Selecting-NaN-Rows)\n", " - [Dropping NaN Rows](#Dropping-NaN-Rows)\n", "- [Appending Rows to a DataFrame](#Appending-Rows-to-a-DataFrame)\n", "- [Sorting and Reindexing DataFrames](#Sorting-and-Reindexing-DataFrames)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Loading Some Example Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I am heavily into sports prediction (via a machine learning approach) these days. So, let us use a (very) small subset of the soccer data that I am just working with." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "\n", "df = pd.read_csv('https://raw.githubusercontent.com/rasbt/python_reference/master/Data/some_soccer_data.csv')\n", "df" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERSALARYGPGASOTPPGP
0 Sergio Ag\u00fcero\\n Forward \u2014 Manchester City $19.2m 16 14 3 34 13.12 209.98
1 Eden Hazard\\n Midfield \u2014 Chelsea $18.9m 21 8 4 17 13.05 274.04
2 Alexis S\u00e1nchez\\n Forward \u2014 Arsenal $17.6mNaN 12 7 29 11.19 223.86
3 Yaya Tour\u00e9\\n Midfield \u2014 Manchester City $16.6m 18 7 1 19 10.99 197.91
4 \u00c1ngel Di Mar\u00eda\\n Midfield \u2014 Manchester United $15.0m 13 3NaN 13 10.17 132.23
5 Santiago Cazorla\\n Midfield \u2014 Arsenal $14.8m 20 4NaN 20 9.97 NaN
6 David Silva\\n Midfield \u2014 Manchester City $14.3m 15 6 2 11 10.35 155.26
7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 10 10.47 209.49
8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 20 7.02 147.43
9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 11 7.50 150.01
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ " PLAYER SALARY GP G A SOT \\\n", "0 Sergio Ag\u00fcero\\n Forward \u2014 Manchester City $19.2m 16 14 3 34 \n", "1 Eden Hazard\\n Midfield \u2014 Chelsea $18.9m 21 8 4 17 \n", "2 Alexis S\u00e1nchez\\n Forward \u2014 Arsenal $17.6m NaN 12 7 29 \n", "3 Yaya Tour\u00e9\\n Midfield \u2014 Manchester City $16.6m 18 7 1 19 \n", "4 \u00c1ngel Di Mar\u00eda\\n Midfield \u2014 Manchester United $15.0m 13 3 NaN 13 \n", "5 Santiago Cazorla\\n Midfield \u2014 Arsenal $14.8m 20 4 NaN 20 \n", "6 David Silva\\n Midfield \u2014 Manchester City $14.3m 15 6 2 11 \n", "7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 10 \n", "8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 20 \n", "9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 11 \n", "\n", " PPG P \n", "0 13.12 209.98 \n", "1 13.05 274.04 \n", "2 11.19 223.86 \n", "3 10.99 197.91 \n", "4 10.17 132.23 \n", "5 9.97 NaN \n", "6 10.35 155.26 \n", "7 10.47 209.49 \n", "8 7.02 147.43 \n", "9 7.50 150.01 " ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Renaming Columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Converting column names to lowercase\n", "\n", "df.columns = [c.lower() for c in df.columns]\n", "df.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygpgasotppgp
5 Santiago Cazorla\\n Midfield \u2014 Arsenal $14.8m 20 4NaN 20 9.97 NaN
6 David Silva\\n Midfield \u2014 Manchester City $14.3m 15 6 2 11 10.35 155.26
7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 10 10.47 209.49
8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 20 7.02 147.43
9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 11 7.50 150.01
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ " player salary gp g a sot ppg \\\n", "5 Santiago Cazorla\\n Midfield \u2014 Arsenal $14.8m 20 4 NaN 20 9.97 \n", "6 David Silva\\n Midfield \u2014 Manchester City $14.3m 15 6 2 11 10.35 \n", "7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 10 10.47 \n", "8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 20 7.02 \n", "9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 11 7.50 \n", "\n", " p \n", "5 NaN \n", "6 155.26 \n", "7 209.49 \n", "8 147.43 \n", "9 150.01 " ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "# Renaming particular columns\n", "\n", "df = df.rename(columns={'p': 'points', \n", " 'gp': 'games',\n", " 'sot': 'shots_on_target',\n", " 'g': 'goals',\n", " 'ppg': 'points_per_game',\n", " 'a': 'assists',})\n", "\n", "df.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepoints
5 Santiago Cazorla\\n Midfield \u2014 Arsenal $14.8m 20 4NaN 20 9.97 NaN
6 David Silva\\n Midfield \u2014 Manchester City $14.3m 15 6 2 11 10.35 155.26
7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 10 10.47 209.49
8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 20 7.02 147.43
9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 11 7.50 150.01
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ " player salary games goals assists \\\n", "5 Santiago Cazorla\\n Midfield \u2014 Arsenal $14.8m 20 4 NaN \n", "6 David Silva\\n Midfield \u2014 Manchester City $14.3m 15 6 2 \n", "7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 \n", "8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 \n", "9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 \n", "\n", " shots_on_target points_per_game points \n", "5 20 9.97 NaN \n", "6 11 10.35 155.26 \n", "7 10 10.47 209.49 \n", "8 20 7.02 147.43 \n", "9 11 7.50 150.01 " ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Applying Computations Rows-wise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Creating a new column\n", "\n", "df['team'] = pd.Series('', index=df.index)\n", "df.tail(3)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteam
7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 10 10.47 209.49
8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 20 7.02 147.43
9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 11 7.50 150.01
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ " player salary games goals assists \\\n", "7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea $14.0m 20 2 14 \n", "8 Saido Berahino\\n Forward \u2014 West Brom $13.8m 21 9 0 \n", "9 Steven Gerrard\\n Midfield \u2014 Liverpool $13.8m 20 5 1 \n", "\n", " shots_on_target points_per_game points team \n", "7 10 10.47 209.49 \n", "8 20 7.02 147.43 \n", "9 11 7.50 150.01 " ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "# Processing `salary` column\n", "\n", "df['salary'] = df['salary'].apply(lambda x: x.strip('$m'))\n", "df.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteam
5 Santiago Cazorla\\n Midfield \u2014 Arsenal 14.8 20 4NaN 20 9.97 NaN
6 David Silva\\n Midfield \u2014 Manchester City 14.3 15 6 2 11 10.35 155.26
7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea 14.0 20 2 14 10 10.47 209.49
8 Saido Berahino\\n Forward \u2014 West Brom 13.8 21 9 0 20 7.02 147.43
9 Steven Gerrard\\n Midfield \u2014 Liverpool 13.8 20 5 1 11 7.50 150.01
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ " player salary games goals assists \\\n", "5 Santiago Cazorla\\n Midfield \u2014 Arsenal 14.8 20 4 NaN \n", "6 David Silva\\n Midfield \u2014 Manchester City 14.3 15 6 2 \n", "7 Cesc F\u00e0bregas\\n Midfield \u2014 Chelsea 14.0 20 2 14 \n", "8 Saido Berahino\\n Forward \u2014 West Brom 13.8 21 9 0 \n", "9 Steven Gerrard\\n Midfield \u2014 Liverpool 13.8 20 5 1 \n", "\n", " shots_on_target points_per_game points team \n", "5 20 9.97 NaN \n", "6 11 10.35 155.26 \n", "7 10 10.47 209.49 \n", "8 20 7.02 147.43 \n", "9 11 7.50 150.01 " ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "# Processing `player` column\n", "\n", "def process_player_col(text):\n", " name, rest = text.split('\\n')\n", " position, team = rest.split(' \u2014 ')\n", " return name, position, team\n", "\n", "for idx,row in df.iterrows():\n", " name, position, team = process_player_col(row['player'])\n", " df.ix[idx, 'player'], df.ix[idx, 'position'], df.ix[idx, 'team'] = name, position, team\n", " \n", "df.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
5 Santiago Cazorla 14.8 20 4NaN 20 9.97 NaN Arsenal Midfield
6 David Silva 14.3 15 6 2 11 10.35 155.26 Manchester City Midfield
7 Cesc F\u00e0bregas 14.0 20 2 14 10 10.47 209.49 Chelsea Midfield
8 Saido Berahino 13.8 21 9 0 20 7.02 147.43 West Brom Forward
9 Steven Gerrard 13.8 20 5 1 11 7.50 150.01 Liverpool Midfield
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " player salary games goals assists shots_on_target \\\n", "5 Santiago Cazorla 14.8 20 4 NaN 20 \n", "6 David Silva 14.3 15 6 2 11 \n", "7 Cesc F\u00e0bregas 14.0 20 2 14 10 \n", "8 Saido Berahino 13.8 21 9 0 20 \n", "9 Steven Gerrard 13.8 20 5 1 11 \n", "\n", " points_per_game points team position \n", "5 9.97 NaN Arsenal Midfield \n", "6 10.35 155.26 Manchester City Midfield \n", "7 10.47 209.49 Chelsea Midfield \n", "8 7.02 147.43 West Brom Forward \n", "9 7.50 150.01 Liverpool Midfield " ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Missing Values aka NaNs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Selecting NaN Rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Selecting all rows that have NaNs in the `assists` column\n", "df[~df['assists'].notnull()]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
4 \u00c1ngel Di Mar\u00eda 15.0 13 3NaN 13 10.17 132.23 Manchester United Midfield
5 Santiago Cazorla 14.8 20 4NaN 20 9.97 NaN Arsenal Midfield
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ " player salary games goals assists shots_on_target \\\n", "4 \u00c1ngel Di Mar\u00eda 15.0 13 3 NaN 13 \n", "5 Santiago Cazorla 14.8 20 4 NaN 20 \n", "\n", " points_per_game points team position \n", "4 10.17 132.23 Manchester United Midfield \n", "5 9.97 NaN Arsenal Midfield " ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Dropping NaN Rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Dropping all rows that have NaNs in the `assists` column\n", "\n", "df[df['assists'].notnull()]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
0 Sergio Ag\u00fcero 19.2 16 14 3 34 13.12 209.98 Manchester City Forward
1 Eden Hazard 18.9 21 8 4 17 13.05 274.04 Chelsea Midfield
2 Alexis S\u00e1nchez 17.6NaN 12 7 29 11.19 223.86 Arsenal Forward
3 Yaya Tour\u00e9 16.6 18 7 1 19 10.99 197.91 Manchester City Midfield
6 David Silva 14.3 15 6 2 11 10.35 155.26 Manchester City Midfield
7 Cesc F\u00e0bregas 14.0 20 2 14 10 10.47 209.49 Chelsea Midfield
8 Saido Berahino 13.8 21 9 0 20 7.02 147.43 West Brom Forward
9 Steven Gerrard 13.8 20 5 1 11 7.50 150.01 Liverpool Midfield
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ " player salary games goals assists shots_on_target \\\n", "0 Sergio Ag\u00fcero 19.2 16 14 3 34 \n", "1 Eden Hazard 18.9 21 8 4 17 \n", "2 Alexis S\u00e1nchez 17.6 NaN 12 7 29 \n", "3 Yaya Tour\u00e9 16.6 18 7 1 19 \n", "6 David Silva 14.3 15 6 2 11 \n", "7 Cesc F\u00e0bregas 14.0 20 2 14 10 \n", "8 Saido Berahino 13.8 21 9 0 20 \n", "9 Steven Gerrard 13.8 20 5 1 11 \n", "\n", " points_per_game points team position \n", "0 13.12 209.98 Manchester City Forward \n", "1 13.05 274.04 Chelsea Midfield \n", "2 11.19 223.86 Arsenal Forward \n", "3 10.99 197.91 Manchester City Midfield \n", "6 10.35 155.26 Manchester City Midfield \n", "7 10.47 209.49 Chelsea Midfield \n", "8 7.02 147.43 West Brom Forward \n", "9 7.50 150.01 Liverpool Midfield " ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Appending Rows to a DataFrame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Adding an \"empty\" row to the DataFrame\n", "\n", "df = df.append(pd.Series(\n", " [None]*len(df.columns), # Fill cells with NaNs\n", " index=df.columns), \n", " ignore_index=True)\n", "\n", "df.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
6 David Silva 14.3 15 6 2 11 10.35 155.26 Manchester City Midfield
7 Cesc F\u00e0bregas 14.0 20 2 14 10 10.47 209.49 Chelsea Midfield
8 Saido Berahino 13.8 21 9 0 20 7.02 147.43 West Brom Forward
9 Steven Gerrard 13.8 20 5 1 11 7.50 150.01 Liverpool Midfield
10 NaN NaNNaNNaNNaNNaN NaN NaN NaN NaN
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ " player salary games goals assists shots_on_target \\\n", "6 David Silva 14.3 15 6 2 11 \n", "7 Cesc F\u00e0bregas 14.0 20 2 14 10 \n", "8 Saido Berahino 13.8 21 9 0 20 \n", "9 Steven Gerrard 13.8 20 5 1 11 \n", "10 NaN NaN NaN NaN NaN NaN \n", "\n", " points_per_game points team position \n", "6 10.35 155.26 Manchester City Midfield \n", "7 10.47 209.49 Chelsea Midfield \n", "8 7.02 147.43 West Brom Forward \n", "9 7.50 150.01 Liverpool Midfield \n", "10 NaN NaN NaN NaN " ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "# Filling cells with data\n", "\n", "df.loc[df.index[-1], 'player'] = 'New Player'\n", "df.loc[df.index[-1], 'salary'] = 12.3\n", "df.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
6 David Silva 14.3 15 6 2 11 10.35 155.26 Manchester City Midfield
7 Cesc F\u00e0bregas 14.0 20 2 14 10 10.47 209.49 Chelsea Midfield
8 Saido Berahino 13.8 21 9 0 20 7.02 147.43 West Brom Forward
9 Steven Gerrard 13.8 20 5 1 11 7.50 150.01 Liverpool Midfield
10 New Player 12.3NaNNaNNaNNaN NaN NaN NaN NaN
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ " player salary games goals assists shots_on_target \\\n", "6 David Silva 14.3 15 6 2 11 \n", "7 Cesc F\u00e0bregas 14.0 20 2 14 10 \n", "8 Saido Berahino 13.8 21 9 0 20 \n", "9 Steven Gerrard 13.8 20 5 1 11 \n", "10 New Player 12.3 NaN NaN NaN NaN \n", "\n", " points_per_game points team position \n", "6 10.35 155.26 Manchester City Midfield \n", "7 10.47 209.49 Chelsea Midfield \n", "8 7.02 147.43 West Brom Forward \n", "9 7.50 150.01 Liverpool Midfield \n", "10 NaN NaN NaN NaN " ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Sorting and Reindexing DataFrames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to section overview](#Sections)]" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Sorting the DataFrame by a certain column (from highest to lowest)\n", "\n", "df = df.sort('goals', ascending=False)\n", "df.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
0 Sergio Ag\u00fcero 19.2 16 14 3 34 13.12 209.98 Manchester City Forward
2 Alexis S\u00e1nchez 17.6NaN 12 7 29 11.19 223.86 Arsenal Forward
8 Saido Berahino 13.8 21 9 0 20 7.02 147.43 West Brom Forward
1 Eden Hazard 18.9 21 8 4 17 13.05 274.04 Chelsea Midfield
3 Yaya Tour\u00e9 16.6 18 7 1 19 10.99 197.91 Manchester City Midfield
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ " player salary games goals assists shots_on_target \\\n", "0 Sergio Ag\u00fcero 19.2 16 14 3 34 \n", "2 Alexis S\u00e1nchez 17.6 NaN 12 7 29 \n", "8 Saido Berahino 13.8 21 9 0 20 \n", "1 Eden Hazard 18.9 21 8 4 17 \n", "3 Yaya Tour\u00e9 16.6 18 7 1 19 \n", "\n", " points_per_game points team position \n", "0 13.12 209.98 Manchester City Forward \n", "2 11.19 223.86 Arsenal Forward \n", "8 7.02 147.43 West Brom Forward \n", "1 13.05 274.04 Chelsea Midfield \n", "3 10.99 197.91 Manchester City Midfield " ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "# Reindexing the DataFrame after sorting\n", "\n", "df.index = range(1,len(df.index)+1)\n", "df.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
playersalarygamesgoalsassistsshots_on_targetpoints_per_gamepointsteamposition
1 Sergio Ag\u00fcero 19.2 16 14 3 34 13.12 209.98 Manchester City Forward
2 Alexis S\u00e1nchez 17.6NaN 12 7 29 11.19 223.86 Arsenal Forward
3 Saido Berahino 13.8 21 9 0 20 7.02 147.43 West Brom Forward
4 Eden Hazard 18.9 21 8 4 17 13.05 274.04 Chelsea Midfield
5 Yaya Tour\u00e9 16.6 18 7 1 19 10.99 197.91 Manchester City Midfield
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ " player salary games goals assists shots_on_target \\\n", "1 Sergio Ag\u00fcero 19.2 16 14 3 34 \n", "2 Alexis S\u00e1nchez 17.6 NaN 12 7 29 \n", "3 Saido Berahino 13.8 21 9 0 20 \n", "4 Eden Hazard 18.9 21 8 4 17 \n", "5 Yaya Tour\u00e9 16.6 18 7 1 19 \n", "\n", " points_per_game points team position \n", "1 13.12 209.98 Manchester City Forward \n", "2 11.19 223.86 Arsenal Forward \n", "3 7.02 147.43 West Brom Forward \n", "4 13.05 274.04 Chelsea Midfield \n", "5 10.99 197.91 Manchester City Midfield " ] } ], "prompt_number": 13 } ], "metadata": {} } ] }