python_reference/.ipynb_checkpoints/timeit_tests-checkpoint.ipynb
2014-05-01 16:07:40 -04:00

1817 lines
43 KiB
Plaintext

{
"metadata": {
"name": "",
"signature": "sha256:d5895f75b2ac58db150d7b521682366a447ffb2fb0b7db7e551edd40e6d1ab10"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sebastian Raschka \n",
"last updated: 04/14/2014 \n",
"\n",
"[Link to this IPython Notebook on GitHub](https://github.com/rasbt/python_reference/blob/master/benchmarks/timeit_tests.ipynb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<hr>\n",
"I am really looking forward to your comments and suggestions to improve and extend this collection! Just send me a quick note \n",
"via Twitter: [@rasbt](https://twitter.com/rasbt) \n",
"or Email: [bluewoodtree@gmail.com](mailto:bluewoodtree@gmail.com)\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"# Python benchmarks via `timeit`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"sections\"></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sections\n",
"- [String operations](#string_operations)\n",
" - [String formatting: .format() vs. binary operator %s](#str_format_bin)\n",
" - [String reversing: [::-1] vs. `''.join(reversed())`](#str_reverse)\n",
" - [String concatenation: `+=` vs. `''.join()`](#string_concat)\n",
" - [Assembling strings](#string_assembly) \n",
" - [Testing if a string is an integer](#is_integer)\n",
" - [Testing if a string is a number](#is_number)\n",
"- [List operations](#list_operations)\n",
" - [List reversing: [::-1] vs. reverse() vs. reversed()](#list_reverse)\n",
" - [Creating lists using conditional statements](#create_cond_list)\n",
"- [Dictionary operations](#dict_ops) \n",
" - [Adding elements to a dictionary](#adding_dict_elements)\n",
"- [Comprehensions vs. for-loops](#comprehensions)\n",
"- [Copying files by searching directory trees](#find_copy)\n",
"- [Returning column vectors slicing through a numpy array](#row_vectors)\n",
"- [Speed of numpy functions vs Python built-ins and std. lib.](#numpy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='string_operations'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# String operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='str_format_bin'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## String formatting: `.format()` vs. binary operator `%s`\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We expect the string .format() method to perform slower than %, because it is doing the formatting for each object itself, where formatting via the binary % is hard-coded for known types. But let's see how big the difference really is..."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def test_format():\n",
" return ['{}'.format(i) for i in range(1000000)]\n",
"\n",
"def test_binaryop():\n",
" return ['%s' %i for i in range(1000000)]\n",
"\n",
"%timeit test_format()\n",
"%timeit test_binaryop()\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 400 ms per loop\n",
"1 loops, best of 3: 241 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='str_reverse'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## String reversing: `[::-1]` vs. `''.join(reversed())`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def reverse_join(my_str):\n",
" return ''.join(reversed(my_str))\n",
" \n",
"def reverse_slizing(my_str):\n",
" return my_str[::-1]\n",
"\n",
"\n",
"# Test to show that both work\n",
"a = reverse_join('abcd')\n",
"b = reverse_slizing('abcd')\n",
"assert(a == b and a == 'dcba')\n",
"\n",
"%timeit reverse_join('abcd')\n",
"%timeit reverse_slizing('abcd')\n",
"\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.4 GHz Intel Core Duo\n",
"# 8 GB 1067 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 1.28 \u00b5s per loop\n",
"1000000 loops, best of 3: 337 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='string_concat'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## String concatenation: `+=` vs. `''.join()`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Strings in Python are immutable objects. So, each time we append a character to a string, it has to be created \u201cfrom scratch\u201d in memory. Thus, the answer to the question \u201cWhat is the most efficient way to concatenate strings?\u201d is a quite obvious, but the relative numbers of performance gains are nonetheless interesting."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def string_add(in_chars):\n",
" new_str = ''\n",
" for char in in_chars:\n",
" new_str += char\n",
" return new_str\n",
"\n",
"def string_join(in_chars):\n",
" return ''.join(in_chars)\n",
"\n",
"test_chars = ['a', 'b', 'c', 'd', 'e', 'f']\n",
"\n",
"%timeit string_add(test_chars)\n",
"%timeit string_join(test_chars)\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 595 ns per loop\n",
"1000000 loops, best of 3: 269 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<a name='string_assembly'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Assembling strings\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, I wanted to compare different methods string \u201cassembly.\u201d This is different from simple string concatenation, which we have seen in the previous section, since we insert values into a string, e.g., from a variable."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def plus_operator():\n",
" return 'a' + str(1) + str(2) \n",
" \n",
"def format_method():\n",
" return 'a{}{}'.format(1,2)\n",
" \n",
"def binary_operator():\n",
" return 'a%s%s' %(1,2)\n",
"\n",
"%timeit plus_operator()\n",
"%timeit format_method()\n",
"%timeit binary_operator()\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 764 ns per loop\n",
"1000000 loops, best of 3: 494 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000000 loops, best of 3: 79.3 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<a name='is_integer'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Testing if a string is an integer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def string_is_int(a_str):\n",
" try:\n",
" int(a_str)\n",
" return True\n",
" except ValueError:\n",
" return False\n",
"\n",
"an_int = '123'\n",
"no_int = '123abc'\n",
"\n",
"%timeit string_is_int(an_int)\n",
"%timeit string_is_int(no_int)\n",
"%timeit an_int.isdigit()\n",
"%timeit no_int.isdigit()\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 401 ns per loop\n",
"100000 loops, best of 3: 3.04 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000000 loops, best of 3: 92.1 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000000 loops, best of 3: 96.3 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='is_number'></a>\n",
"<br>\n",
"<br>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Testing if a string is a number"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def string_is_number(a_str):\n",
" try:\n",
" float(a_str)\n",
" return True\n",
" except ValueError:\n",
" return False\n",
" \n",
"a_float = '1.234'\n",
"no_float = '123abc'\n",
"\n",
"a_float.replace('.','',1).isdigit()\n",
"no_float.replace('.','',1).isdigit()\n",
"\n",
"%timeit string_is_number(an_int)\n",
"%timeit string_is_number(no_int)\n",
"%timeit a_float.replace('.','',1).isdigit()\n",
"%timeit no_float.replace('.','',1).isdigit()\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 400 ns per loop\n",
"1000000 loops, best of 3: 1.15 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"1000000 loops, best of 3: 452 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"1000000 loops, best of 3: 394 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"<a name='list_operations'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# List operations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='list_reverse'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## List reversing - `[::-1]` vs. `reverse()` vs. `reversed()`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def reverse_func(my_list):\n",
" new_list = my_list[:]\n",
" new_list.reverse()\n",
" return new_list\n",
" \n",
"def reversed_func(my_list):\n",
" return list(reversed(my_list))\n",
"\n",
"def reverse_slizing(my_list):\n",
" return my_list[::-1]\n",
"\n",
"%timeit reverse_func([1,2,3,4,5])\n",
"%timeit reversed_func([1,2,3,4,5])\n",
"%timeit reverse_slizing([1,2,3,4,5])\n",
"\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.4 GHz Intel Core Duo\n",
"# 8 GB 1067 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1000000 loops, best of 3: 930 ns per loop\n",
"1000000 loops, best of 3: 1.89 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"1000000 loops, best of 3: 775 ns per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='create_cond_list'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Creating lists using conditional statements\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"In this test, I attempted to figure out the fastest way to create a new list of elements that meet a certain criterion. For the sake of simplicity, the criterion was to check if an element is even or odd, and only if the element was even, it should be included in the list. For example, the resulting list for numbers in the range from 1 to 10 would be \n",
"[2, 4, 6, 8, 10].\n",
"\n",
"Here, I tested three different approaches: \n",
"1) a simple for loop with an if-statement check (`cond_loop()`) \n",
"2) a list comprehension (`list_compr()`) \n",
"3) the built-in filter() function (`filter_func()`) \n",
"\n",
"Note that the filter() function now returns a generator in Python 3, so I had to wrap it in an additional list() function call."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"def cond_loop():\n",
" even_nums = []\n",
" for i in range(100):\n",
" if i % 2 == 0:\n",
" even_nums.append(i)\n",
" return even_nums\n",
"\n",
"def list_compr():\n",
" even_nums = [i for i in range(100) if i % 2 == 0]\n",
" return even_nums\n",
" \n",
"def filter_func():\n",
" even_nums = list(filter((lambda x: x % 2 != 0), range(100)))\n",
" return even_nums\n",
"\n",
"%timeit cond_loop()\n",
"%timeit list_compr()\n",
"%timeit filter_func()\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100000 loops, best of 3: 14.4 \u00b5s per loop\n",
"100000 loops, best of 3: 12 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000 loops, best of 3: 23.9 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='dict_ops'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Dictionary operations "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='adding_dict_elements'></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Adding elements to a Dictionary\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[[back to top](#sections)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"All three functions below count how often different elements (values) occur in a list. \n",
"E.g., for the list ['a', 'b', 'a', 'c'], the dictionary would look like this: \n",
"`my_dict = {'a': 2, 'b': 1, 'c': 1}`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import random\n",
"import copy\n",
"import timeit\n",
"\n",
"\n",
"\n",
"def add_element_check1(my_dict, elements):\n",
" for e in elements:\n",
" if e not in my_dict:\n",
" my_dict[e] = 1\n",
" else:\n",
" my_dict[e] += 1\n",
" \n",
"def add_element_check2(my_dict, elements):\n",
" for e in elements:\n",
" if e not in my_dict:\n",
" my_dict[e] = 0\n",
" my_dict[e] += 1 \n",
"\n",
"def add_element_except(my_dict, elements):\n",
" for e in elements:\n",
" try:\n",
" my_dict[e] += 1\n",
" except KeyError:\n",
" my_dict[e] = 1\n",
" \n",
"\n",
"random.seed(123)\n",
"rand_ints = [random.randrange(1, 10) for i in range(100)]\n",
"empty_dict = {}\n",
"\n",
"print('Results for 100 integers in range 1-10') \n",
"%timeit add_element_check1(copy.deepcopy(empty_dict), rand_ints)\n",
"%timeit add_element_check2(copy.deepcopy(empty_dict), rand_ints)\n",
"%timeit add_element_except(copy.deepcopy(empty_dict), rand_ints)\n",
" \n",
"print('\\nResults for 1000 integers in range 1-10') \n",
"rand_ints = [random.randrange(1, 10) for i in range(1000)]\n",
"empty_dict = {}\n",
"\n",
"%timeit add_element_check1(copy.deepcopy(empty_dict), rand_ints)\n",
"%timeit add_element_check2(copy.deepcopy(empty_dict), rand_ints)\n",
"%timeit add_element_except(copy.deepcopy(empty_dict), rand_ints)\n",
"\n",
"print('\\nResults for 1000 integers in range 1-1000') \n",
"rand_ints = [random.randrange(1, 10) for i in range(1000)]\n",
"empty_dict = {}\n",
"\n",
"%timeit add_element_check1(copy.deepcopy(empty_dict), rand_ints)\n",
"%timeit add_element_check2(copy.deepcopy(empty_dict), rand_ints)\n",
"%timeit add_element_except(copy.deepcopy(empty_dict), rand_ints)\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Results for 100 integers in range 1-10\n",
"100000 loops, best of 3: 16.6 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"100000 loops, best of 3: 17.6 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"100000 loops, best of 3: 17.9 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Results for 1000 integers in range 1-10\n",
"10000 loops, best of 3: 135 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000 loops, best of 3: 125 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000 loops, best of 3: 105 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Results for 1000 integers in range 1-1000\n",
"10000 loops, best of 3: 122 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000 loops, best of 3: 123 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"10000 loops, best of 3: 104 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conclusion\n",
"Interestingly, the `try-except` loop pays off if we have more elements (here: 1000 integers instead of 100) as dictionary keys to check. Also, it doesn't matter much whether the elements exist or do not exist in the dictionary, yet."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"comprehensions\"></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Comprehesions vs. for-loops"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comprehensions are not only shorter and prettier than ye goode olde for-loop, \n",
"but they are also up to ~1.2x faster."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"n = 1000\n",
"\n",
"#\n",
"# Python 3.4.0\n",
"# MacOS X 10.9.2\n",
"# 2.5 GHz Intel Core i5\n",
"# 4 GB 1600 Mhz DDR3\n",
"#"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set comprehensions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def set_loop(n):\n",
" a_set = set()\n",
" for i in range(n):\n",
" if i % 3 == 0:\n",
" a_set.add(i)\n",
" return a_set"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def set_compr(n):\n",
" return {i for i in range(n) if i % 3 == 0}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit set_loop(n)\n",
"%timeit set_compr(n)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 136 \u00b5s per loop\n",
"10000 loops, best of 3: 113 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## List comprehensions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def list_loop(n):\n",
" a_list = list()\n",
" for i in range(n):\n",
" if i % 3 == 0:\n",
" a_list.append(i)\n",
" return a_list"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def list_compr(n):\n",
" return [i for i in range(n) if i % 3 == 0]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit list_loop(n)\n",
"%timeit list_compr(n)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 129 \u00b5s per loop\n",
"10000 loops, best of 3: 111 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 25
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionary comprehensions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def dict_loop(n):\n",
" a_dict = dict()\n",
" for i in range(n):\n",
" if i % 3 == 0:\n",
" a_dict[i] = i\n",
" return a_dict"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def dict_compr(n):\n",
" return {i:i for i in range(n) if i % 3 == 0}"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit dict_loop(n)\n",
"%timeit dict_compr(n)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10000 loops, best of 3: 121 \u00b5s per loop\n",
"10000 loops, best of 3: 127 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 28
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"find_copy\"></a>\n",
"<br>\n",
"<br>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Copying files by searching directory trees"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Executing `Unix`/`Linux` shell commands:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import subprocess\n",
"\n",
"def subprocess_findcopy(path, search_str, dest): \n",
" query = 'find %s -name \"%s\" -exec cp {} %s \\;' %(path, search_str, dest)\n",
" subprocess.call(query, shell=True)\n",
" return "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 30
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using Python's `os.walk()` to search the directory tree recursively and matching patterns via `fnmatch.filter()`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import shutil\n",
"import os\n",
"import fnmatch\n",
"\n",
"def walk_findcopy(path, search_str, dest):\n",
" for path, subdirs, files in os.walk(path):\n",
" for name in fnmatch.filter(files, search_str):\n",
" try:\n",
" shutil.copy(os.path.join(path,name), dest)\n",
" except NameError:\n",
" pass\n",
" return"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"\n",
"def findcopy_timeit(inpath, outpath, search_str):\n",
" \n",
" shutil.rmtree(outpath)\n",
" os.mkdir(outpath)\n",
" print(50*'#')\n",
" print('subprocsess call')\n",
" %timeit subprocess_findcopy(inpath, search_str, outpath)\n",
" print(\"copied %s files\" %len(os.listdir(outpath)))\n",
" shutil.rmtree(outpath)\n",
" os.mkdir(outpath)\n",
" print('\\nos.walk approach')\n",
" %timeit walk_findcopy(inpath, search_str, outpath)\n",
" print(\"copied %s files\" %len(os.listdir(outpath)))\n",
" print(50*'#')\n",
"\n",
"print('small tree')\n",
"inpath = '/Users/sebastian/Desktop/testdir_in'\n",
"outpath = '/Users/sebastian/Desktop/testdir_out'\n",
"search_str = '*.png'\n",
"findcopy_timeit(inpath, outpath, search_str)\n",
"\n",
"print('larger tree')\n",
"inpath = '/Users/sebastian/Dropbox'\n",
"outpath = '/Users/sebastian/Desktop/testdir_out'\n",
"search_str = '*.csv'\n",
"findcopy_timeit(inpath, outpath, search_str)\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"small tree\n",
"##################################################"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"subprocsess call\n",
"1 loops, best of 3: 268 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"copied 13 files\n",
"\n",
"os.walk approach\n",
"100 loops, best of 3: 12.2 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"copied 13 files\n",
"##################################################\n",
"larger tree\n",
"##################################################\n",
"subprocsess call\n",
"1 loops, best of 3: 623 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"copied 105 files\n",
"\n",
"os.walk approach\n",
"1 loops, best of 3: 417 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"copied 105 files\n",
"##################################################\n"
]
}
],
"prompt_number": 35
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I have to say that I am really positively surprised. The shell's `find` scales even better than expected!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"<br>\n",
"<a name='row_vectors'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Returning column vectors slicing through a numpy array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Given a numpy matrix, I want to iterate through it and return each column as a 1-column vector. \n",
"E.g., if I want to return the 1st column from matrix A below\n",
"\n",
"<pre>\n",
"A = np.array([ [1,2,3], [4,5,6], [7,8,9] ])\n",
">>> A\n",
"array([[1, 2, 3],\n",
" [4, 5, 6],\n",
" [7, 8, 9]])</pre>\n",
"\n",
"I want my result to be:\n",
"<pre>\n",
"array([[1],\n",
" [4],\n",
" [7]])</pre>\n",
"\n",
"with `.shape` = `(3,1)`\n",
"\n",
"\n",
"However, the default behavior of numpy is to return the column as a row vector:\n",
"\n",
"<pre>\n",
">>> A[:,0]\n",
"array([1, 4, 7])\n",
">>> A[:,0].shape\n",
"(3,)\n",
"</pre>"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"\n",
"# 1st column, e.g., A[:,0,np.newaxis]\n",
"\n",
"def colvec_method1(A):\n",
" for col in A.T:\n",
" colvec = row[:,np.newaxis]\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 83
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1st column, e.g., A[:,0:1]\n",
"\n",
"def colvec_method2(A):\n",
" for idx in range(A.shape[1]):\n",
" colvec = A[:,idx:idx+1]\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 82
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1st column, e.g., A[:,0].reshape(-1,1)\n",
"\n",
"def colvec_method3(A):\n",
" for idx in range(A.shape[1]):\n",
" colvec = A[:,idx].reshape(-1,1)\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 81
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1st column, e.g., np.vstack(A[:,0]\n",
"\n",
"def colvec_method4(A):\n",
" for idx in range(A.shape[1]):\n",
" colvec = np.vstack(A[:,idx])\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 79
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1st column, e.g., np.row_stack(A[:,0])\n",
"\n",
"def colvec_method5(A):\n",
" for idx in range(A.shape[1]):\n",
" colvec = np.row_stack(A[:,idx])\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 77
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1st column, e.g., np.column_stack((A[:,0],))\n",
"\n",
"def colvec_method6(A):\n",
" for idx in range(A.shape[1]):\n",
" colvec = np.column_stack((A[:,idx],))\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 74
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# 1st column, e.g., A[:,[0]]\n",
"\n",
"def colvec_method7(A):\n",
" for idx in range(A.shape[1]):\n",
" colvec = A[:,[idx]]\n",
" yield colvec"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 89
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def test_method(method, A):\n",
" for i in method(A): \n",
" assert i.shape == (A.shape[0],1), \"{}, {}\".format(i.shape, A.shape[0],1)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 69
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import timeit\n",
"\n",
"A = np.random.random((300, 3))\n",
"\n",
"for method in [\n",
" colvec_method1, colvec_method2, \n",
" colvec_method3, colvec_method4, \n",
" colvec_method5, colvec_method6,\n",
" colvec_method7]:\n",
" print('\\nTest:', method.__name__)\n",
" %timeit test_method(colvec_method2, A)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"Test: colvec_method1\n",
"100000 loops, best of 3: 16.6 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Test: colvec_method2\n",
"10000 loops, best of 3: 16.1 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Test: colvec_method3\n",
"100000 loops, best of 3: 16.2 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Test: colvec_method4\n",
"100000 loops, best of 3: 16.4 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Test: colvec_method5\n",
"100000 loops, best of 3: 16.2 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Test: colvec_method6\n",
"100000 loops, best of 3: 16.8 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"\n",
"Test: colvec_method7\n",
"100000 loops, best of 3: 16.3 \u00b5s per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 91
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='numpy'></a>\n",
"<br>\n",
"<br>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Speed of numpy functions vs Python built-ins and std. lib."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import numpy as np\n",
"import timeit\n",
"\n",
"samples = list(range(1000000))\n",
"\n",
"%timeit(sum(samples))\n",
"%timeit(np.sum(samples))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"100 loops, best of 3: 18.3 ms per loop\n",
"10 loops, best of 3: 136 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit(list(range(1000000)))\n",
"%timeit(np.arange(1000000))\n",
"\n",
"# note that in Python range() is implemented as xrange()\n",
"# with lazy evaluation (generator)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"10 loops, best of 3: 82.6 ms per loop\n",
"100 loops, best of 3: 5.35 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import statistics\n",
"\n",
"%timeit(statistics.mean(samples))\n",
"%timeit(np.mean(samples))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 1.14 s per loop\n",
"10 loops, best of 3: 141 ms per loop"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}