mirror of
https://github.com/rasbt/python_reference.git
synced 2024-11-30 15:31:12 +00:00
1070 lines
28 KiB
Plaintext
1070 lines
28 KiB
Plaintext
{
|
|
"metadata": {
|
|
"name": "",
|
|
"signature": "sha256:237609a5ef934bf65a93a410c9e5107b808049dd04b0faf2b30f9b423699ba6c"
|
|
},
|
|
"nbformat": 3,
|
|
"nbformat_minor": 0,
|
|
"worksheets": [
|
|
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[Sebastian Raschka](http://sebastianraschka.com) \n",
|
|
"\n",
|
|
"- [Link to this IPython notebook on Github](https://github.com/rasbt/python_reference/blob/master/tutorials/useful_regex.ipynb) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"%load_ext watermark"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 1
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"%watermark -d -v -u -t -z"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"Last updated: 06/07/2014 22:50:23 EDT\n",
|
|
"\n",
|
|
"CPython 3.4.1\n",
|
|
"IPython 2.1.0\n"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 2
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1.5em\">[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension.</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<hr>\n",
|
|
"I would be happy to hear your comments and suggestions. \n",
|
|
"Please feel free to drop me a note via\n",
|
|
"[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/+SebastianRaschka).\n",
|
|
"<hr>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 1,
|
|
"metadata": {},
|
|
"source": [
|
|
"A collection of useful regular expressions"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Sections"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"- [About the `re` module](#About-the-re-module)\n",
|
|
"- [Identify files via file extensions](#Identify-files-via-file-extensions)\n",
|
|
"- [Username validation](#Username-validation)\n",
|
|
"- [Checking for valid email addresses](#Checking-for-valid-email-addresses)\n",
|
|
"- [Check for a valid URL](#Check-for-a-valid-URL)\n",
|
|
"- [Checking for numbers](#Checking-for-numbers)\n",
|
|
"- [Validating dates](#Validating-dates)\n",
|
|
"- [Time](#Time)\n",
|
|
"- [Checking for HTML tags](#Checking-for-HTML-tags)\n",
|
|
"- [Checking for IP addresses](#Checking-for-IP-addresses)\n",
|
|
"- [Checking for MAC addresses](#Checking-for-MAC-addresses)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"About the `re` module"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The purpose of this IPython notebook is not to rewrite a detailed tutorial about regular expressions or the in-built Python `re` module, but to collect some useful regular expressions for copy&paste purposes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The complete documentation of the Python `re` module can be found here [https://docs.python.org/3.4/howto/regex.html](https://docs.python.org/3.4/howto/regex.html). Below, I just want to list the most important methods for convenience:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"- `re.match()` : Determine if the RE matches at the beginning of the string.\n",
|
|
"- `re.search()` : Scan through a string, looking for any location where this RE matches.\n",
|
|
"- `re.findall()` : Find all substrings where the RE matches, and returns them as a list.\n",
|
|
"- `re.finditer()` : Find all substrings where the RE matches, and returns them as an iterator."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"If you are using the same regular expression multiple times, it is recommended to compile it for improved performance.\n",
|
|
"\n",
|
|
" compiled_re = re.compile(r'some_regexpr') \n",
|
|
" for word in text:\n",
|
|
" match = comp.search(compiled_re))\n",
|
|
" # do something with the match\n",
|
|
" \n",
|
|
"**E.g., if we want to check if a string ends with a substring:**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"import re\n",
|
|
"\n",
|
|
"needle = 'needlers'\n",
|
|
"\n",
|
|
"# Python approach\n",
|
|
"print(bool(any([needle.endswith(e) for e in ('ly', 'ed', 'ing', 'ers')])))\n",
|
|
"\n",
|
|
"# On-the-fly Regular expression in Python\n",
|
|
"print(bool(re.search(r'(?:ly|ed|ing|ers)$', needle)))\n",
|
|
"\n",
|
|
"# Compiled Regular expression in Python\n",
|
|
"comp = re.compile(r'(?:ly|ed|ing|ers)$') \n",
|
|
"print(bool(comp.search(needle)))"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"True\n",
|
|
"True\n",
|
|
"True\n"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 3
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"%timeit -n 10000 -r 50 bool(any([needle.endswith(e) for e in ('ly', 'ed', 'ing', 'ers')]))\n",
|
|
"%timeit -n 10000 -r 50 bool(re.search(r'(?:ly|ed|ing|ers)$', needle))\n",
|
|
"%timeit -n 10000 -r 50 bool(comp.search(needle))"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"10000 loops, best of 50: 2.74 \u00b5s per loop\n",
|
|
"10000 loops, best of 50: 2.93 \u00b5s per loop"
|
|
]
|
|
},
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"\n",
|
|
"10000 loops, best of 50: 1.28 \u00b5s per loop"
|
|
]
|
|
},
|
|
{
|
|
"output_type": "stream",
|
|
"stream": "stdout",
|
|
"text": [
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"prompt_number": 4
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Identify files via file extensions"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"A regular expression to check for file extensions. \n",
|
|
"\n",
|
|
"Note: This approach is not recommended for thorough limitation of file types (parse the file header instead). However, this regex is still a useful alternative to e.g., a Python's `endswith` approach for quick pre-filtering for certain files of interest."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r'(?i)(\\w+)\\.(jpeg|jpg|png|gif|tif|svg)$'\n",
|
|
"\n",
|
|
"# remove `(?i)` to make regexpr case-sensitive\n",
|
|
"\n",
|
|
"str_true = ('test.gif', \n",
|
|
" 'image.jpeg', \n",
|
|
" 'image.jpg',\n",
|
|
" 'image.TIF'\n",
|
|
" )\n",
|
|
"\n",
|
|
"str_false = ('test.pdf',\n",
|
|
" 'test.gif.pdf',\n",
|
|
" )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 5
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Username validation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Checking for a valid user name that has a certain minimum and maximum length.\n",
|
|
"\n",
|
|
"Allowed characters:\n",
|
|
"- letters (upper- and lower-case)\n",
|
|
"- numbers\n",
|
|
"- dashes\n",
|
|
"- underscores"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"min_len = 5 # minimum length for a valid username\n",
|
|
"max_len = 15 # maximum length for a valid username\n",
|
|
"\n",
|
|
"pattern = r\"^(?i)[a-z0-9_-]{%s,%s}$\" %(min_len, max_len)\n",
|
|
"\n",
|
|
"# remove `(?i)` to only allow lower-case letters\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"str_true = ('user123', '123_user', 'Username')\n",
|
|
" \n",
|
|
"str_false = ('user', 'username1234_is-way-too-long', 'user$34354')\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 6
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Checking for valid email addresses"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"A regular expression that captures most email addresses."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r\"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$)\"\n",
|
|
"\n",
|
|
"str_true = ('test@mail.com',)\n",
|
|
" \n",
|
|
"str_false = ('testmail.com', '@testmail.com', 'test@mailcom')\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 7
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">source: [http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address](http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address)</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Check for a valid URL"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Checks for an URL if a string ...\n",
|
|
"\n",
|
|
"- starts with `https://`, or `http://`, or `www.`\n",
|
|
"- or ends with a dot extension"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^(https?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w \\.-]*)*\\/?$'\n",
|
|
"\n",
|
|
"str_true = ('https://github.com', \n",
|
|
" 'http://github.com',\n",
|
|
" 'www.github.com',\n",
|
|
" 'github.com',\n",
|
|
" 'test.de',\n",
|
|
" 'https://github.com/rasbt',\n",
|
|
" 'test.jpeg' # !!! \n",
|
|
" )\n",
|
|
" \n",
|
|
"str_false = ('testmailcom', 'http:testmailcom', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 8
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">source: [http://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149](http://code.tutsplus.com/tutorials/8-regular-expressions-you-should-know--net-6149)</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Checking for numbers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"Positive integers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^\\d+$'\n",
|
|
"\n",
|
|
"str_true = ('123', '1', )\n",
|
|
" \n",
|
|
"str_false = ('abc', '1.1', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 9
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"Negative integers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^-\\d+$'\n",
|
|
"\n",
|
|
"str_true = ('-123', '-1', )\n",
|
|
" \n",
|
|
"str_false = ('123', '-abc', '-1.1', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 10
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"All integers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^-{0,1}\\d+$'\n",
|
|
"\n",
|
|
"str_true = ('-123', '-1', '1', '123',)\n",
|
|
" \n",
|
|
"str_false = ('123.0', '-abc', '-1.1', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 11
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"Positive numbers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^\\d*\\.{0,1}\\d+$'\n",
|
|
"\n",
|
|
"str_true = ('1', '123', '1.234', )\n",
|
|
" \n",
|
|
"str_false = ('-abc', '-123', '-123.0')\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 12
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"Negative numbers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^-\\d*\\.{0,1}\\d+$'\n",
|
|
"\n",
|
|
"str_true = ('-1', '-123', '-123.0', )\n",
|
|
" \n",
|
|
"str_false = ('-abc', '1', '123', '1.234', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 13
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"All numbers"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^-{0,1}\\d*\\.{0,1}\\d+$'\n",
|
|
"\n",
|
|
"str_true = ('1', '123', '1.234', '-123', '-123.0')\n",
|
|
" \n",
|
|
"str_false = ('-abc')\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 14
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">source: [http://stackoverflow.com/questions/1449817/what-are-some-of-the-most-useful-regular-expressions-for-programmers](http://stackoverflow.com/questions/1449817/what-are-some-of-the-most-useful-regular-expressions-for-programmers)</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Validating dates"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Validates dates in `mm/dd/yyyy` format."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = '^(0[1-9]|1[0-2])\\/(0[1-9]|1\\d|2\\d|3[01])\\/(19|20)\\d{2}$'\n",
|
|
"\n",
|
|
"str_true = ('01/08/2014', '12/30/2014', )\n",
|
|
" \n",
|
|
"str_false = ('22/08/2014', '-123', '1/8/2014', '1/08/2014', '01/8/2014')\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 15
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"12-Hour format"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r'^(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)$'\n",
|
|
"\n",
|
|
"str_true = ('2:00pm', '7:30 AM', '12:05 am', )\n",
|
|
" \n",
|
|
"str_false = ('22:00pm', '14:00', '3:12', '03:12pm', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 29
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"24-Hour format"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r'^([0-1]{1}[0-9]{1}|20|21|22|23):[0-5]{1}[0-9]{1}$'\n",
|
|
"\n",
|
|
"str_true = ('14:00', '00:30', )\n",
|
|
" \n",
|
|
"str_false = ('22:00pm', '4:00', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 18
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Checking for HTML tags"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Also this regex is only recommended for \"filtering\" purposes and not a ultimate way to parse HTML. For more information see this excellent discussion on StackOverflow: \n",
|
|
"[http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r\"\"\"</?\\w+((\\s+\\w+(\\s*=\\s*(?:\".*?\"|'.*?'|[^'\">\\s]+))?)+\\s*|\\s*)/?>\"\"\"\n",
|
|
"\n",
|
|
"str_true = ('<a>', '<a href=\"something\">', '</a>', '<img src>')\n",
|
|
" \n",
|
|
"str_false = ('a>', '<a ', '< a >')\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 16
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">source: [http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/](http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx/)</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Checking for IP addresses"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"IPv4"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"![](../Images/Ipv4_address.png)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">Image source: http://en.wikipedia.org/wiki/File:Ipv4_address.svg</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r'^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$'\n",
|
|
"\n",
|
|
"str_true = ('172.16.254.1', '1.2.3.4', '01.102.103.104', )\n",
|
|
" \n",
|
|
"str_false = ('17216.254.1', '1.2.3.4.5', '01 .102.103.104', )\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 8
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">source: [http://answers.oreilly.com/topic/318-how-to-match-ipv4-addresses-with-regular-expressions/](http://answers.oreilly.com/topic/318-how-to-match-ipv4-addresses-with-regular-expressions/)</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 3,
|
|
"metadata": {},
|
|
"source": [
|
|
"Ipv6"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"![](../Images/Ipv6_address.png)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">Image source: http://upload.wikimedia.org/wikipedia/commons/1/15/Ipv6_address.svg</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r'^\\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)(\\.(25[0-5]|2[0-4]\\d|1\\d\\d|[1-9]?\\d)){3}))|:)))(%.+)?\\s*$'\n",
|
|
"\n",
|
|
"str_true = ('2001:470:9b36:1::2',\n",
|
|
" '2001:cdba:0000:0000:0000:0000:3257:9652', \n",
|
|
" '2001:cdba:0:0:0:0:3257:9652', \n",
|
|
" '2001:cdba::3257:9652', )\n",
|
|
" \n",
|
|
"str_false = ('1200::AB00:1234::2552:7777:1313', # uses `::` twice\n",
|
|
" '1200:0000:AB00:1234:O000:2552:7777:1313', ) # contains an O instead of 0\n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 21
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">source: [http://snipplr.com/view/43003/regex--match-ipv6-address/](http://snipplr.com/view/43003/regex--match-ipv6-address/)</font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<br>\n",
|
|
"<br>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "heading",
|
|
"level": 2,
|
|
"metadata": {},
|
|
"source": [
|
|
"Checking for MAC addresses"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"[[back to top](#Sections)]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"![](../Images/MACaddressV3.png)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"<font size=\"1px\">Image source: http://upload.wikimedia.org/wikipedia/en/3/37/MACaddressV3.png </font>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"collapsed": false,
|
|
"input": [
|
|
"pattern = r'^(?i)([0-9A-F]{2}[:-]){5}([0-9A-F]{2})$'\n",
|
|
"\n",
|
|
"str_true = ('94-AE-70-A0-66-83', \n",
|
|
" '58-f8-1a-00-44-c8',\n",
|
|
" '00:A0:C9:14:C8:29'\n",
|
|
" , )\n",
|
|
" \n",
|
|
"str_false = ('0:00:00:00:00:00', \n",
|
|
" '94-AE-70-A0 -66-83', ) \n",
|
|
"\n",
|
|
"for t in str_true:\n",
|
|
" assert(bool(re.match(pattern, t)) == True), '%s is not True' %t\n",
|
|
"\n",
|
|
"for f in str_false:\n",
|
|
" assert(bool(re.match(pattern, f)) == False), '%s is not False' %f"
|
|
],
|
|
"language": "python",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"prompt_number": 29
|
|
}
|
|
],
|
|
"metadata": {}
|
|
}
|
|
]
|
|
} |