python_reference/tutorials/numpy_nan_quickguide.ipynb

{
 "metadata": {
  "name": "",
  "signature": "sha256:b2597ea4263c11dd6774b227e7a3a5626197c4863e6895002657fd55d02b55d9"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to python_reference](https://github.com/rasbt/python_reference)]"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%load_ext watermark"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%watermark -v -p numpy -d -u"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Last updated: 31/07/2014 \n",
        "\n",
        "CPython 3.4.1\n",
        "IPython 2.1.0\n",
        "\n",
        "numpy 1.8.1\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<font size=\"1.5em\">[More information](https://github.com/rasbt/watermark) about the `watermark` magic command extension.</font>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Quick guide for dealing with missing numbers in NumPy"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This is just a quick overview of how to deal with missing values (i.e., \"NaN\"s for \"Not-a-Number\") in NumPy and I am happy to expand it over time. Yes, and there will also be a separate one for pandas some time!\n",
      "\n",
      "I would be happy to hear your comments and suggestions. \n",
      "Please feel free to drop me a note via\n",
      "[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/+SebastianRaschka).\n",
      "<hr>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Sections"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- [Sample data from a CSV file](#Sample-data-from-a-CSV-file)\n",
      "- [Determining if a value is missing](#Determining-if-a-value-is-missing)\n",
      "- [Counting the number of missing values](#Counting-the-number-of-missing-values)\n",
      "- [Calculating the sum of an array that contains NaNs](#Calculating the sum of an array that contains NaNs)\n",
      "- [Removing all rows that contain missing values](#Removing-all-rows-that-contain-missing-values)\n",
      "- [Convert missing values to 0](#Convert-missing-values-to-0)\n",
      "- [Converting certain numbers to NaN](#Converting-certain-numbers-to-NaN)\n",
      "- [Remove all missing elements from an array](#Remove-all-missing-elements-from-an-array)\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Sample data from a CSV file"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's assume that we have a CSV file with missing elements like the one shown below."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%file example.csv\n",
      "1,2,3,4\n",
      "5,6,,8\n",
      "10,11,12,"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Writing example.csv\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `np.genfromtxt` function has a `missing_values` parameters which translates missing values into `np.nan` objects by default. This allows us to construct a new NumPy `ndarray` object, even if elements are missing."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "ary = np.genfromtxt('./example.csv', delimiter=',')\n",
      "\n",
      "print('%s x %s array:\\n' %(ary.shape[0], ary.shape[1]))\n",
      "print(ary)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3 x 4 array:\n",
        "\n",
        "[[  1.   2.   3.   4.]\n",
        " [  5.   6.  nan   8.]\n",
        " [ 10.  11.  12.  nan]]\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Determining if a value is missing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A handy function to test whether a value is a `NaN` or not is to use the `np.isnan` function."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.isnan(np.nan)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It is especially useful to create boolean masks for the so-called \"fancy indexing\" of NumPy arrays, which we will come back to later."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.isnan(ary)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "array([[False, False, False, False],\n",
        "       [False, False,  True, False],\n",
        "       [False, False, False,  True]], dtype=bool)"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Counting the number of missing values"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In order to find out how many elements are missing in our array, we can use the `np.isnan` function that we have seen in the previous section. "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.count_nonzero(np.isnan(ary))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "2"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If we want to determine the number of non-missing elements, we can simply revert the returned `Boolean` mask via the handy \"tilde\" sign."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.count_nonzero(~np.isnan(ary))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "10"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Calculating the sum of an array that contains `NaN`s"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we will find out via the following code snippet, we can't use NumPy's regular `sum` function to calculate the sum of an array."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.sum(ary)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "nan"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since the `np.sum` function does not work, use `np.nansum` instead:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print('total sum:', np.nansum(ary))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "total sum: 62.0\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print('column sums:', np.nansum(ary, axis=0))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "column sums: [ 16.  19.  15.  12.]\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print('row sums:', np.nansum(ary, axis=1))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "row sums: [ 10.  19.  33.]\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Removing all rows that contain missing values"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here, we will use the `Boolean mask` again to return only those rows that DON'T contain missing values. And if we want to get only the rows that contain `NaN`s, we could simply drop the `~`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ary[~np.isnan(ary).any(1)]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 14,
       "text": [
        "array([[ 1.,  2.,  3.,  4.]])"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Convert missing values to 0"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Certain operations, algorithms, and other analyses might not work with `NaN` objects in our data array. But that's not a problem: We can use the convenient `np.nan_to_num` function will convert it to the value 0."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ary0 = np.nan_to_num(ary)\n",
      "ary0"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "array([[  1.,   2.,   3.,   4.],\n",
        "       [  5.,   6.,   0.,   8.],\n",
        "       [ 10.,  11.,  12.,   0.]])"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Converting certain numbers to NaN"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Vice versa, we can also convert any number to a `np.NaN` object. Here, we use the array that we created in the previous section and convert the `0`s back to `np.nan` objects."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ary0[ary0==0] = np.nan\n",
      "ary0"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "array([[  1.,   2.,   3.,   4.],\n",
        "       [  5.,   6.,  nan,   8.],\n",
        "       [ 10.,  11.,  12.,  nan]])"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Remove all missing elements from an array"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[[back to top](#Sections)]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This is one is a little bit more tricky. We can remove missing values via a combination of the `Boolean` mask and fancy indexing, however, this will have the disadvantage that it will flatten our array (we can't just punch holes into a NumPy array)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ary[~np.isnan(ary)]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "array([  1.,   2.,   3.,   4.,   5.,   6.,   8.,  10.,  11.,  12.])"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Thus, this is a method that would better work on individual rows:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = np.array([1,2,np.nan])\n",
      "\n",
      "x[~np.isnan(np.array(x))]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 21,
       "text": [
        "array([ 1.,  2.])"
       ]
      }
     ],
     "prompt_number": 21
    }
   ],
   "metadata": {}
  }
 ]
}
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`{`
			`"metadata": {`
			`"name": "",`
typo fix 2014-07-31 04:20:22 +00:00			`"signature": "sha256:b2597ea4263c11dd6774b227e7a3a5626197c4863e6895002657fd55d02b55d9"`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`"nbformat": 3,`
			`"nbformat_minor": 0,`
			`"worksheets": [`
			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to python_reference](https://github.com/rasbt/python_reference)]"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"%load_ext watermark"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [],`
			`"prompt_number": 1`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"%watermark -v -p numpy -d -u"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
typo fix 2014-07-31 04:20:22 +00:00			`"Last updated: 31/07/2014 \n",`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"\n",`
			`"CPython 3.4.1\n",`
typo fix 2014-07-31 04:20:22 +00:00			`"IPython 2.1.0\n",`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"\n",`
			`"numpy 1.8.1\n"`
			`]`
			`}`
			`],`
			`"prompt_number": 2`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"<font size=\"1.5em\">[More information](https://github.com/rasbt/watermark) about the `watermark` magic command extension.</font>"
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 1,`
			`"metadata": {},`
			`"source": [`
			`"Quick guide for dealing with missing numbers in NumPy"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"This is just a quick overview of how to deal with missing values (i.e., \"NaN\"s for \"Not-a-Number\") in NumPy and I am happy to expand it over time. Yes, and there will also be a separate one for pandas some time!\n",`
			`"\n",`
			`"I would be happy to hear your comments and suggestions. \n",`
			`"Please feel free to drop me a note via\n",`
			`"[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/+SebastianRaschka).\n",`
			`"<hr>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Sections"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"- [Sample data from a CSV file](#Sample-data-from-a-CSV-file)\n",`
			`"- [Determining if a value is missing](#Determining-if-a-value-is-missing)\n",`
			`"- [Counting the number of missing values](#Counting-the-number-of-missing-values)\n",`
			`"- [Calculating the sum of an array that contains NaNs](#Calculating the sum of an array that contains NaNs)\n",`
			`"- [Removing all rows that contain missing values](#Removing-all-rows-that-contain-missing-values)\n",`
			`"- [Convert missing values to 0](#Convert-missing-values-to-0)\n",`
			`"- [Converting certain numbers to NaN](#Converting-certain-numbers-to-NaN)\n",`
			`"- [Remove all missing elements from an array](#Remove-all-missing-elements-from-an-array)\n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Sample data from a CSV file"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Let's assume that we have a CSV file with missing elements like the one shown below."`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"%%file example.csv\n",`
			`"1,2,3,4\n",`
			`"5,6,,8\n",`
			`"10,11,12,"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
typo fix 2014-07-31 04:20:22 +00:00			`"Writing example.csv\n"`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`]`
			`}`
			`],`
			`"prompt_number": 3`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"The `np.genfromtxt` function has a `missing_values` parameters which translates missing values into `np.nan` objects by default. This allows us to construct a new NumPy `ndarray` object, even if elements are missing."
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"import numpy as np\n",`
			`"ary = np.genfromtxt('./example.csv', delimiter=',')\n",`
			`"\n",`
			`"print('%s x %s array:\\n' %(ary.shape[0], ary.shape[1]))\n",`
			`"print(ary)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"3 x 4 array:\n",`
			`"\n",`
			`"[[ 1. 2. 3. 4.]\n",`
			`" [ 5. 6. nan 8.]\n",`
			`" [ 10. 11. 12. nan]]\n"`
			`]`
			`}`
			`],`
			`"prompt_number": 4`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Determining if a value is missing"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"A handy function to test whether a value is a `NaN` or not is to use the `np.isnan` function."
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"np.isnan(np.nan)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 5,`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"text": [`
			`"True"`
			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 5`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"It is especially useful to create boolean masks for the so-called \"fancy indexing\" of NumPy arrays, which we will come back to later."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"np.isnan(ary)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 6,`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"text": [`
			`"array([[False, False, False, False],\n",`
			`" [False, False, True, False],\n",`
			`" [False, False, False, True]], dtype=bool)"`
			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 6`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Counting the number of missing values"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"In order to find out how many elements are missing in our array, we can use the `np.isnan` function that we have seen in the previous section. "
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"np.count_nonzero(np.isnan(ary))"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 7,`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"text": [`
			`"2"`
			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 7`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"If we want to determine the number of non-missing elements, we can simply revert the returned `Boolean` mask via the handy \"tilde\" sign."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"np.count_nonzero(~np.isnan(ary))"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 8,`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"text": [`
			`"10"`
			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 8`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			"Calculating the sum of an array that contains `NaN`s"
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"As we will find out via the following code snippet, we can't use NumPy's regular `sum` function to calculate the sum of an array."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"np.sum(ary)"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 9,`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`"text": [`
			`"nan"`
			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 9`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Since the `np.sum` function does not work, use `np.nansum` instead:"
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"print('total sum:', np.nansum(ary))"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
			`"total sum: 62.0\n"`
			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 10`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
typo fix 2014-07-31 04:20:22 +00:00			`"print('column sums:', np.nansum(ary, axis=0))"`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
typo fix 2014-07-31 04:20:22 +00:00			`"column sums: [ 16. 19. 15. 12.]\n"`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 11`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
typo fix 2014-07-31 04:20:22 +00:00			`"print('row sums:', np.nansum(ary, axis=1))"`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"output_type": "stream",`
			`"stream": "stdout",`
			`"text": [`
typo fix 2014-07-31 04:20:22 +00:00			`"row sums: [ 10. 19. 33.]\n"`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`]`
			`}`
			`],`
typo fix 2014-07-31 04:20:22 +00:00			`"prompt_number": 12`
numpy nan quickguide 2014-07-30 19:32:25 +00:00			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Removing all rows that contain missing values"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Here, we will use the `Boolean mask` again to return only those rows that DON'T contain missing values. And if we want to get only the rows that contain `NaN`s, we could simply drop the `~`."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"ary[~np.isnan(ary).any(1)]"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
			`"prompt_number": 14,`
			`"text": [`
			`"array([[ 1., 2., 3., 4.]])"`
			`]`
			`}`
			`],`
			`"prompt_number": 14`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Convert missing values to 0"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Certain operations, algorithms, and other analyses might not work with `NaN` objects in our data array. But that's not a problem: We can use the convenient `np.nan_to_num` function will convert it to the value 0."
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"ary0 = np.nan_to_num(ary)\n",`
			`"ary0"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
			`"prompt_number": 15,`
			`"text": [`
			`"array([[ 1., 2., 3., 4.],\n",`
			`" [ 5., 6., 0., 8.],\n",`
			`" [ 10., 11., 12., 0.]])"`
			`]`
			`}`
			`],`
			`"prompt_number": 15`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Converting certain numbers to NaN"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"Vice versa, we can also convert any number to a `np.NaN` object. Here, we use the array that we created in the previous section and convert the `0`s back to `np.nan` objects."
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"ary0[ary0==0] = np.nan\n",`
			`"ary0"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
			`"prompt_number": 16,`
			`"text": [`
			`"array([[ 1., 2., 3., 4.],\n",`
			`" [ 5., 6., nan, 8.],\n",`
			`" [ 10., 11., 12., nan]])"`
			`]`
			`}`
			`],`
			`"prompt_number": 16`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"<br>\n",`
			`"<br>"`
			`]`
			`},`
			`{`
			`"cell_type": "heading",`
			`"level": 2,`
			`"metadata": {},`
			`"source": [`
			`"Remove all missing elements from an array"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"[[back to top](#Sections)]"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			"This is one is a little bit more tricky. We can remove missing values via a combination of the `Boolean` mask and fancy indexing, however, this will have the disadvantage that it will flatten our array (we can't just punch holes into a NumPy array)."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"ary[~np.isnan(ary)]"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
			`"prompt_number": 17,`
			`"text": [`
			`"array([ 1., 2., 3., 4., 5., 6., 8., 10., 11., 12.])"`
			`]`
			`}`
			`],`
			`"prompt_number": 17`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"Thus, this is a method that would better work on individual rows:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"collapsed": false,`
			`"input": [`
			`"x = np.array([1,2,np.nan])\n",`
			`"\n",`
			`"x[~np.isnan(np.array(x))]"`
			`],`
			`"language": "python",`
			`"metadata": {},`
			`"outputs": [`
			`{`
			`"metadata": {},`
			`"output_type": "pyout",`
			`"prompt_number": 21,`
			`"text": [`
			`"array([ 1., 2.])"`
			`]`
			`}`
			`],`
			`"prompt_number": 21`
			`}`
			`],`
			`"metadata": {}`
			`}`
			`]`
			`}`