From 10e09daa89cef58b68d47991e69ad0ac215e997a Mon Sep 17 00:00:00 2001 From: Vinta Date: Wed, 13 Aug 2014 22:55:29 +0800 Subject: [PATCH] add Specific Formats Processing section --- README.md | 41 ++++++++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index 5d08f953..27c3ea81 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,7 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by - [Files](#files) - [Date and Time](#date-and-time) - [Text Processing](#text-processing) + - [Specific Formats Processing](#specific-formats-processing) - [Natural Language Processing](#natural-language-processing) - [Documentation](#documentation) - [Configuration](#configuration) @@ -158,7 +159,7 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by ## Text Processing -*Libraries for parsing and manipulating texts.* +*Libraries for parsing and manipulating plain texts.* * General * [difflib](https://docs.python.org/2/library/difflib.html) - (Python standard library) Helpers for computing deltas. @@ -166,25 +167,15 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by * [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) - Fuzzy String Matching. * [esmre](https://code.google.com/p/esmre/) - Regular expression accelerator. * [shortuuid](https://github.com/stochastic-technologies/shortuuid) - A generator library for concise, unambiguous and URL-safe UUIDs. - * [awesome-slugify](https://github.com/dimka665/awesome-slugify) - A Python slugify library that can preserve unicode. - * [python-slugify](https://github.com/un33k/python-slugify) - A Python slugify library that translates unicode to ASCII. - * [unicode-slugify](https://github.com/mozilla/unicode-slugify) - A slugifier that generates unicode slugs with Django as a dependency. * [unidecode](https://pypi.python.org/pypi/Unidecode) - ASCII transliterations of Unicode text. * [chardet](https://github.com/chardet/chardet) - Python 2/3 compatible character encoding detector. * [xpinyin](https://github.com/lxneng/xpinyin) - A library to translate Chinese hanzi (漢字) to pinyin (拼音). * [pangu.py](https://github.com/vinta/pangu.py) - Spacing texts for CJK and alphanumerics. * [pyfiglet](https://github.com/pwaller/pyfiglet) - An implementation of figlet written in Python. -* Specific Formats - * [tablib](https://github.com/kennethreitz/tablib) - A module for Tabular Datasets in XLS, CSV, JSON, YAML. - * [python-docx](https://github.com/mikemaccana/python-docx) - Reads, queries and modifies Microsoft Word 2007/2008 docx files. - * [xlwt](https://github.com/python-excel/xlwt) / [xlrd](https://github.com/python-excel/xlrd) - Packages is for writing and reading data and formatting information from Excel files. - * [XlsxWriter](https://xlsxwriter.readthedocs.org/) - A Python module for creating Excel .xlsx files. - * [mm](https://github.com/brianray/mm) - Python powered spreadsheets. - * [PDFMiner](https://github.com/euske/pdfminer) - A tool for extracting information from PDF documents. - * [PyPDF2](https://github.com/mstamy2/PyPDF2) - A pure Python PDF library capable of splitting, merging and transforming PDF pages. - * [Python-Markdown](https://github.com/waylan/Python-Markdown) - A Python implementation of John Gruber’s Markdown. - * [Mistune](https://github.com/lepture/mistune) - Fastest and full featured pure Python parsers of Markdown. - * [PyYAML](http://pyyaml.org/) - YAML implementations for Python. +* Slugify + * [awesome-slugify](https://github.com/dimka665/awesome-slugify) - A Python slugify library that can preserve unicode. + * [python-slugify](https://github.com/un33k/python-slugify) - A Python slugify library that translates unicode to ASCII. + * [unicode-slugify](https://github.com/mozilla/unicode-slugify) - A slugifier that generates unicode slugs with Django as a dependency. * Parser * [phonenumbers](https://github.com/daviddrysdale/python-phonenumbers) - Library for parsing, formatting, storing and validating international phone numbers. * [python-user-agents](https://github.com/selwin/python-user-agents) - Browser user agent parser. @@ -193,6 +184,26 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by * [python-nameparser](https://github.com/derek73/python-nameparser) - A simple Python module for parsing human names into their individual components. * [pyparsing](http://pyparsing.wikispaces.com/) - A general purpose framework for generating parsers. +## Specific Formats Processing + +*Libraries for parsing and manipulating specific text formats.* + +* General + * [tablib](https://github.com/kennethreitz/tablib) - A module for Tabular Datasets in XLS, CSV, JSON, YAML. +* Office + * [python-docx](https://github.com/mikemaccana/python-docx) - Reads, queries and modifies Microsoft Word 2007/2008 docx files. + * [xlwt](https://github.com/python-excel/xlwt) / [xlrd](https://github.com/python-excel/xlrd) - Packages is for writing and reading data and formatting information from Excel files. + * [XlsxWriter](https://xlsxwriter.readthedocs.org/) - A Python module for creating Excel .xlsx files. + * [Marmir](https://github.com/brianray/mm) - Takes Python data structures and turns them into spreadsheets. +* PDF + * [PDFMiner](https://github.com/euske/pdfminer) - A tool for extracting information from PDF documents. + * [PyPDF2](https://github.com/mstamy2/PyPDF2) - A pure Python PDF library capable of splitting, merging and transforming PDF pages. +* Markdown + * [Python-Markdown](https://github.com/waylan/Python-Markdown) - A Python implementation of John Gruber’s Markdown. + * [Mistune](https://github.com/lepture/mistune) - Fastest and full featured pure Python parsers of Markdown. +* YAML + * [PyYAML](http://pyyaml.org/) - YAML implementations for Python. + ## Natural Language Processing *Libraries for working with human languages.*