mirror of
https://github.com/rasbt/python_reference.git
synced 2024-11-23 20:11:13 +00:00
batch processing csvs
This commit is contained in:
parent
8f0944c7cf
commit
efa50ac899
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"metadata": {
|
||||
"name": "",
|
||||
"signature": "sha256:7ce6d9e0e1dc3da5c31fc5f3a5ab7687870a76cd4adaedd6da95bc6451755b12"
|
||||
"signature": "sha256:f56b7081a6e5b63610100fcfa0a226c7a0184dfe0d63128614a7a68555653428"
|
||||
},
|
||||
"nbformat": 3,
|
||||
"nbformat_minor": 0,
|
||||
|
@ -60,6 +60,7 @@
|
|||
"- [Sorting the CSV file](#sorting)\n",
|
||||
"- [Marking min/max values in particular columns](#marking)\n",
|
||||
"- [Writing out the modified table to as a new CSV file](#writing)\n",
|
||||
"- [Batch processing CSV files](#batch)\n",
|
||||
"<hr>\n",
|
||||
"<br>\n",
|
||||
"<br>"
|
||||
|
@ -646,10 +647,105 @@
|
|||
],
|
||||
"prompt_number": 14
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='batch'></a>\n",
|
||||
"<br>\n",
|
||||
"<br>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Batch processing CSV files"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"[[back to top](#sections)]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Usually, CSV files never come alone, but we have to process a whole bunch of similar formatted CSV files from some output device. \n",
|
||||
"For example, if we want to process all CSV files in a particular input directory and want to save the processed files in a separate output directory, we can use a simple list comprehension to collect tuples of input-output file names."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"collapsed": false,
|
||||
"input": [],
|
||||
"input": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"in_dir = '../Data'\n",
|
||||
"out_dir = '../Data/processed'\n",
|
||||
"csvs = [\n",
|
||||
" (os.path.join(in_dir, csv), \n",
|
||||
" os.path.join(out_dir, csv))\n",
|
||||
" for csv in os.listdir(in_dir) \n",
|
||||
" if csv.endswith('.csv')\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
"for i in csvs:\n",
|
||||
" print(i)"
|
||||
],
|
||||
"language": "python",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"stream": "stdout",
|
||||
"text": [
|
||||
"('../Data/test.csv', '../Data/processed/test.csv')\n",
|
||||
"('../Data/test_marked.csv', '../Data/processed/test_marked.csv')\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"prompt_number": 12
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<br>\n",
|
||||
"Next, we can summarize the processes we want to apply to the CSV files in a simple function and loop over our file names:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"collapsed": false,
|
||||
"input": [
|
||||
"def process_csv(csv_in, csv_out):\n",
|
||||
" \"\"\" \n",
|
||||
" Takes an input- and output-filename of an CSV file\n",
|
||||
" and marks minimum values for every column.\n",
|
||||
" \n",
|
||||
" \"\"\"\n",
|
||||
" csv_cont = csv_to_list(csv_in)\n",
|
||||
" csv_marked = copy.deepcopy(csv_cont)\n",
|
||||
" convert_cells_to_floats(csv_marked)\n",
|
||||
" mark_all_col(csv_marked, mark_max=False, marker='*')\n",
|
||||
" write_csv(csv_out, csv_marked)"
|
||||
],
|
||||
"language": "python",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"prompt_number": 18
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"collapsed": false,
|
||||
"input": [
|
||||
"for inout in csvs:\n",
|
||||
" process_csv(inout[0], inout[1])"
|
||||
],
|
||||
"language": "python",
|
||||
"metadata": {},
|
||||
"outputs": []
|
||||
|
|
Loading…
Reference in New Issue
Block a user