python-scripts/scripts/Loan Prediction Model/Loan Prediction Model .ipynb
2022-10-13 21:29:57 +05:30

1292 lines
161 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "8c97baae",
"metadata": {},
"source": [
"## Loan Prediction Model \n",
"\n",
"\n",
"The goal of this project is that from the data collected on the loans applicants, preprocess the data and predict based on the information who will be able to receive the loan or not.\n",
"\n",
"\n",
"In the Dataset we find the following features:\n",
"\n",
"1. Loan ID, the identifier code of each applicant.\n",
"2. Gender, Male or Female for each applicant.\n",
"3. Married, the maritage state.\n",
"4. Dependents, how many dependents does the applicant have?\n",
"5. Education, the level of education, graduate or non graduate\n",
"6. Self Employed, Yes or No in the case\n",
"7. Applicant Income\n",
"8. Coapplicant Income\n",
"9. Loan Amount\n",
"10. Loan Amount Term\n",
"11. Credit History, just Yes or No in the case\n",
"12. Property Area, urban, semiurban or rural area of the applicants property\n",
"\n",
"Loan Status, Yes or No ( The independent variable represents the class)"
]
},
{
"cell_type": "markdown",
"id": "3f28aeb7",
"metadata": {},
"source": [
"## Import Packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "4cde977c",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"id": "ec208c3e",
"metadata": {},
"source": [
"## Read & visualize the data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8895329b",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Loan_ID</th>\n",
" <th>Gender</th>\n",
" <th>Married</th>\n",
" <th>Dependents</th>\n",
" <th>Education</th>\n",
" <th>Self_Employed</th>\n",
" <th>ApplicantIncome</th>\n",
" <th>CoapplicantIncome</th>\n",
" <th>LoanAmount</th>\n",
" <th>Loan_Amount_Term</th>\n",
" <th>Credit_History</th>\n",
" <th>Property_Area</th>\n",
" <th>Loan_Status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>LP001002</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>Graduate</td>\n",
" <td>No</td>\n",
" <td>5849</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>Urban</td>\n",
" <td>Y</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>LP001003</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>1</td>\n",
" <td>Graduate</td>\n",
" <td>No</td>\n",
" <td>4583</td>\n",
" <td>1508.0</td>\n",
" <td>128.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>Rural</td>\n",
" <td>N</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>LP001005</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>0</td>\n",
" <td>Graduate</td>\n",
" <td>Yes</td>\n",
" <td>3000</td>\n",
" <td>0.0</td>\n",
" <td>66.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>Urban</td>\n",
" <td>Y</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>LP001006</td>\n",
" <td>Male</td>\n",
" <td>Yes</td>\n",
" <td>0</td>\n",
" <td>Not Graduate</td>\n",
" <td>No</td>\n",
" <td>2583</td>\n",
" <td>2358.0</td>\n",
" <td>120.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>Urban</td>\n",
" <td>Y</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>LP001008</td>\n",
" <td>Male</td>\n",
" <td>No</td>\n",
" <td>0</td>\n",
" <td>Graduate</td>\n",
" <td>No</td>\n",
" <td>6000</td>\n",
" <td>0.0</td>\n",
" <td>141.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>Urban</td>\n",
" <td>Y</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Loan_ID Gender Married Dependents Education Self_Employed \\\n",
"0 LP001002 Male No 0 Graduate No \n",
"1 LP001003 Male Yes 1 Graduate No \n",
"2 LP001005 Male Yes 0 Graduate Yes \n",
"3 LP001006 Male Yes 0 Not Graduate No \n",
"4 LP001008 Male No 0 Graduate No \n",
"\n",
" ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \\\n",
"0 5849 0.0 NaN 360.0 \n",
"1 4583 1508.0 128.0 360.0 \n",
"2 3000 0.0 66.0 360.0 \n",
"3 2583 2358.0 120.0 360.0 \n",
"4 6000 0.0 141.0 360.0 \n",
"\n",
" Credit_History Property_Area Loan_Status \n",
"0 1.0 Urban Y \n",
"1 1.0 Rural N \n",
"2 1.0 Urban Y \n",
"3 1.0 Urban Y \n",
"4 1.0 Urban Y "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df= pd.read_csv('Loan_train.csv')\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "8c16b1bc",
"metadata": {},
"source": [
"## Data Analysis"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8bbe6c13",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(614, 13)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.shape"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ae1c8c0f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ApplicantIncome</th>\n",
" <th>CoapplicantIncome</th>\n",
" <th>LoanAmount</th>\n",
" <th>Loan_Amount_Term</th>\n",
" <th>Credit_History</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>614.000000</td>\n",
" <td>614.000000</td>\n",
" <td>592.000000</td>\n",
" <td>600.00000</td>\n",
" <td>564.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>5403.459283</td>\n",
" <td>1621.245798</td>\n",
" <td>146.412162</td>\n",
" <td>342.00000</td>\n",
" <td>0.842199</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>6109.041673</td>\n",
" <td>2926.248369</td>\n",
" <td>85.587325</td>\n",
" <td>65.12041</td>\n",
" <td>0.364878</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>150.000000</td>\n",
" <td>0.000000</td>\n",
" <td>9.000000</td>\n",
" <td>12.00000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>2877.500000</td>\n",
" <td>0.000000</td>\n",
" <td>100.000000</td>\n",
" <td>360.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>3812.500000</td>\n",
" <td>1188.500000</td>\n",
" <td>128.000000</td>\n",
" <td>360.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>5795.000000</td>\n",
" <td>2297.250000</td>\n",
" <td>168.000000</td>\n",
" <td>360.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>81000.000000</td>\n",
" <td>41667.000000</td>\n",
" <td>700.000000</td>\n",
" <td>480.00000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \\\n",
"count 614.000000 614.000000 592.000000 600.00000 \n",
"mean 5403.459283 1621.245798 146.412162 342.00000 \n",
"std 6109.041673 2926.248369 85.587325 65.12041 \n",
"min 150.000000 0.000000 9.000000 12.00000 \n",
"25% 2877.500000 0.000000 100.000000 360.00000 \n",
"50% 3812.500000 1188.500000 128.000000 360.00000 \n",
"75% 5795.000000 2297.250000 168.000000 360.00000 \n",
"max 81000.000000 41667.000000 700.000000 480.00000 \n",
"\n",
" Credit_History \n",
"count 564.000000 \n",
"mean 0.842199 \n",
"std 0.364878 \n",
"min 0.000000 \n",
"25% 1.000000 \n",
"50% 1.000000 \n",
"75% 1.000000 \n",
"max 1.000000 "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "8b553da7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"RangeIndex: 614 entries, 0 to 613\n",
"Data columns (total 13 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Loan_ID 614 non-null object \n",
" 1 Gender 601 non-null object \n",
" 2 Married 611 non-null object \n",
" 3 Dependents 599 non-null object \n",
" 4 Education 614 non-null object \n",
" 5 Self_Employed 582 non-null object \n",
" 6 ApplicantIncome 614 non-null int64 \n",
" 7 CoapplicantIncome 614 non-null float64\n",
" 8 LoanAmount 592 non-null float64\n",
" 9 Loan_Amount_Term 600 non-null float64\n",
" 10 Credit_History 564 non-null float64\n",
" 11 Property_Area 614 non-null object \n",
" 12 Loan_Status 614 non-null object \n",
"dtypes: float64(4), int64(1), object(8)\n",
"memory usage: 62.5+ KB\n"
]
}
],
"source": [
"df.info()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "20168c69",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Loan_ID 0\n",
"Gender 13\n",
"Married 3\n",
"Dependents 15\n",
"Education 0\n",
"Self_Employed 32\n",
"ApplicantIncome 0\n",
"CoapplicantIncome 0\n",
"LoanAmount 22\n",
"Loan_Amount_Term 14\n",
"Credit_History 50\n",
"Property_Area 0\n",
"Loan_Status 0\n",
"dtype: int64"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.isnull().sum()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a52091bf",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Gender Married Dependents Education Self_Employed ApplicantIncome \\\n",
"0 1.0 0.0 0.0 1 0.0 5849 \n",
"1 1.0 1.0 1.0 1 0.0 4583 \n",
"2 1.0 1.0 0.0 1 1.0 3000 \n",
"3 1.0 1.0 0.0 0 0.0 2583 \n",
"4 1.0 0.0 0.0 1 0.0 6000 \n",
".. ... ... ... ... ... ... \n",
"609 0.0 0.0 0.0 1 0.0 2900 \n",
"610 1.0 1.0 3.0 1 0.0 4106 \n",
"611 1.0 1.0 1.0 1 0.0 8072 \n",
"612 1.0 1.0 2.0 1 0.0 7583 \n",
"613 0.0 0.0 0.0 1 1.0 4583 \n",
"\n",
" CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History \\\n",
"0 0.0 NaN 360.0 1.0 \n",
"1 1508.0 128.0 360.0 1.0 \n",
"2 0.0 66.0 360.0 1.0 \n",
"3 2358.0 120.0 360.0 1.0 \n",
"4 0.0 141.0 360.0 1.0 \n",
".. ... ... ... ... \n",
"609 0.0 71.0 360.0 1.0 \n",
"610 0.0 40.0 180.0 1.0 \n",
"611 240.0 253.0 360.0 1.0 \n",
"612 0.0 187.0 360.0 1.0 \n",
"613 0.0 133.0 360.0 0.0 \n",
"\n",
" Property_Area Loan_Status \n",
"0 1 1 \n",
"1 0 0 \n",
"2 1 1 \n",
"3 1 1 \n",
"4 1 1 \n",
".. ... ... \n",
"609 0 1 \n",
"610 0 1 \n",
"611 1 1 \n",
"612 1 1 \n",
"613 2 0 \n",
"\n",
"[614 rows x 12 columns]\n"
]
}
],
"source": [
"#Loan Status Encoding\n",
"df= df.replace({\"Loan_Status\":{'Y': 1, 'N': 0}})\n",
"\n",
"#Gender Encoding\n",
"df= df.replace({\"Gender\":{\"Male\":1, \"Female\":0 }})\n",
"\n",
"#Married Encoding\n",
"df =df.replace({\"Married\" :{\"Yes\":1, \"No\":0}})\n",
"\n",
"#Replace the 3+ in dependents ande make the column numeric\n",
"df['Dependents'] = df['Dependents'].replace('3+', '3')\n",
"df['Dependents']=pd.to_numeric(df['Dependents'], errors='coerce')\n",
"\n",
"#Count the quantity of values on the column\n",
"df['Self_Employed'].value_counts()\n",
"df= df.replace({\"Self_Employed\":{\"Yes\":1, \"No\":0 }})\n",
"\n",
"#Education Encoding\n",
"df['Education'].value_counts()\n",
"df= df.replace({\"Education\":{\"Graduate\":1, \"Not Graduate\":0 }})\n",
"\n",
"#Drop the Loan ID column\n",
"df = df.drop('Loan_ID',axis=1)\n",
"\n",
"#Property Area Encoding\n",
"df['Property_Area'].value_counts()\n",
"df['Property_Area'] = df['Property_Area'].map({'Rural': 0, 'Urban': 1, 'Semiurban': 2})\n",
"\n",
"print(df)\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "861ac719",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Gender</th>\n",
" <th>Married</th>\n",
" <th>Dependents</th>\n",
" <th>Education</th>\n",
" <th>Self_Employed</th>\n",
" <th>ApplicantIncome</th>\n",
" <th>CoapplicantIncome</th>\n",
" <th>LoanAmount</th>\n",
" <th>Loan_Amount_Term</th>\n",
" <th>Credit_History</th>\n",
" <th>Property_Area</th>\n",
" <th>Loan_Status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>601.000000</td>\n",
" <td>611.000000</td>\n",
" <td>599.000000</td>\n",
" <td>614.000000</td>\n",
" <td>582.000000</td>\n",
" <td>614.000000</td>\n",
" <td>614.000000</td>\n",
" <td>592.000000</td>\n",
" <td>600.00000</td>\n",
" <td>564.000000</td>\n",
" <td>614.000000</td>\n",
" <td>614.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.813644</td>\n",
" <td>0.651391</td>\n",
" <td>0.762938</td>\n",
" <td>0.781759</td>\n",
" <td>0.140893</td>\n",
" <td>5403.459283</td>\n",
" <td>1621.245798</td>\n",
" <td>146.412162</td>\n",
" <td>342.00000</td>\n",
" <td>0.842199</td>\n",
" <td>1.087948</td>\n",
" <td>0.687296</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.389718</td>\n",
" <td>0.476920</td>\n",
" <td>1.015216</td>\n",
" <td>0.413389</td>\n",
" <td>0.348211</td>\n",
" <td>6109.041673</td>\n",
" <td>2926.248369</td>\n",
" <td>85.587325</td>\n",
" <td>65.12041</td>\n",
" <td>0.364878</td>\n",
" <td>0.815081</td>\n",
" <td>0.463973</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>150.000000</td>\n",
" <td>0.000000</td>\n",
" <td>9.000000</td>\n",
" <td>12.00000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>2877.500000</td>\n",
" <td>0.000000</td>\n",
" <td>100.000000</td>\n",
" <td>360.00000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>3812.500000</td>\n",
" <td>1188.500000</td>\n",
" <td>128.000000</td>\n",
" <td>360.00000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>5795.000000</td>\n",
" <td>2297.250000</td>\n",
" <td>168.000000</td>\n",
" <td>360.00000</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>3.000000</td>\n",
" <td>1.000000</td>\n",
" <td>1.000000</td>\n",
" <td>81000.000000</td>\n",
" <td>41667.000000</td>\n",
" <td>700.000000</td>\n",
" <td>480.00000</td>\n",
" <td>1.000000</td>\n",
" <td>2.000000</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Gender Married Dependents Education Self_Employed \\\n",
"count 601.000000 611.000000 599.000000 614.000000 582.000000 \n",
"mean 0.813644 0.651391 0.762938 0.781759 0.140893 \n",
"std 0.389718 0.476920 1.015216 0.413389 0.348211 \n",
"min 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"25% 1.000000 0.000000 0.000000 1.000000 0.000000 \n",
"50% 1.000000 1.000000 0.000000 1.000000 0.000000 \n",
"75% 1.000000 1.000000 2.000000 1.000000 0.000000 \n",
"max 1.000000 1.000000 3.000000 1.000000 1.000000 \n",
"\n",
" ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \\\n",
"count 614.000000 614.000000 592.000000 600.00000 \n",
"mean 5403.459283 1621.245798 146.412162 342.00000 \n",
"std 6109.041673 2926.248369 85.587325 65.12041 \n",
"min 150.000000 0.000000 9.000000 12.00000 \n",
"25% 2877.500000 0.000000 100.000000 360.00000 \n",
"50% 3812.500000 1188.500000 128.000000 360.00000 \n",
"75% 5795.000000 2297.250000 168.000000 360.00000 \n",
"max 81000.000000 41667.000000 700.000000 480.00000 \n",
"\n",
" Credit_History Property_Area Loan_Status \n",
"count 564.000000 614.000000 614.000000 \n",
"mean 0.842199 1.087948 0.687296 \n",
"std 0.364878 0.815081 0.463973 \n",
"min 0.000000 0.000000 0.000000 \n",
"25% 1.000000 0.000000 0.000000 \n",
"50% 1.000000 1.000000 1.000000 \n",
"75% 1.000000 2.000000 1.000000 \n",
"max 1.000000 2.000000 1.000000 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.describe()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "64ab82d9",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Gender</th>\n",
" <th>Married</th>\n",
" <th>Dependents</th>\n",
" <th>Education</th>\n",
" <th>Self_Employed</th>\n",
" <th>ApplicantIncome</th>\n",
" <th>CoapplicantIncome</th>\n",
" <th>LoanAmount</th>\n",
" <th>Loan_Amount_Term</th>\n",
" <th>Credit_History</th>\n",
" <th>Property_Area</th>\n",
" <th>Loan_Status</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>5849</td>\n",
" <td>0.0</td>\n",
" <td>NaN</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>4583</td>\n",
" <td>1508.0</td>\n",
" <td>128.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>3000</td>\n",
" <td>0.0</td>\n",
" <td>66.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>0.0</td>\n",
" <td>2583</td>\n",
" <td>2358.0</td>\n",
" <td>120.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>6000</td>\n",
" <td>0.0</td>\n",
" <td>141.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>609</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>2900</td>\n",
" <td>0.0</td>\n",
" <td>71.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>610</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>3.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>4106</td>\n",
" <td>0.0</td>\n",
" <td>40.0</td>\n",
" <td>180.0</td>\n",
" <td>1.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>611</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>8072</td>\n",
" <td>240.0</td>\n",
" <td>253.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>612</th>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>2.0</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>7583</td>\n",
" <td>0.0</td>\n",
" <td>187.0</td>\n",
" <td>360.0</td>\n",
" <td>1.0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>613</th>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1</td>\n",
" <td>1.0</td>\n",
" <td>4583</td>\n",
" <td>0.0</td>\n",
" <td>133.0</td>\n",
" <td>360.0</td>\n",
" <td>0.0</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>614 rows × 12 columns</p>\n",
"</div>"
],
"text/plain": [
" Gender Married Dependents Education Self_Employed ApplicantIncome \\\n",
"0 1.0 0.0 0.0 1 0.0 5849 \n",
"1 1.0 1.0 1.0 1 0.0 4583 \n",
"2 1.0 1.0 0.0 1 1.0 3000 \n",
"3 1.0 1.0 0.0 0 0.0 2583 \n",
"4 1.0 0.0 0.0 1 0.0 6000 \n",
".. ... ... ... ... ... ... \n",
"609 0.0 0.0 0.0 1 0.0 2900 \n",
"610 1.0 1.0 3.0 1 0.0 4106 \n",
"611 1.0 1.0 1.0 1 0.0 8072 \n",
"612 1.0 1.0 2.0 1 0.0 7583 \n",
"613 0.0 0.0 0.0 1 1.0 4583 \n",
"\n",
" CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History \\\n",
"0 0.0 NaN 360.0 1.0 \n",
"1 1508.0 128.0 360.0 1.0 \n",
"2 0.0 66.0 360.0 1.0 \n",
"3 2358.0 120.0 360.0 1.0 \n",
"4 0.0 141.0 360.0 1.0 \n",
".. ... ... ... ... \n",
"609 0.0 71.0 360.0 1.0 \n",
"610 0.0 40.0 180.0 1.0 \n",
"611 240.0 253.0 360.0 1.0 \n",
"612 0.0 187.0 360.0 1.0 \n",
"613 0.0 133.0 360.0 0.0 \n",
"\n",
" Property_Area Loan_Status \n",
"0 1 1 \n",
"1 0 0 \n",
"2 1 1 \n",
"3 1 1 \n",
"4 1 1 \n",
".. ... ... \n",
"609 0 1 \n",
"610 0 1 \n",
"611 1 1 \n",
"612 1 1 \n",
"613 2 0 \n",
"\n",
"[614 rows x 12 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7a2c0144",
"metadata": {},
"outputs": [],
"source": [
"df.fillna(df.median(), inplace=True)\n",
"columns = df.columns\n",
"for column in columns:\n",
" df[column] = pd.to_numeric(df[column], errors='coerce')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "574b6b70",
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 1080x576 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.set(rc={'figure.figsize':(15,8)})\n",
"sns.heatmap(df.corr(),annot=True,cmap=\"rocket\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "40bee983",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Married Education CoapplicantIncome Credit_History Property_Area \\\n",
"0 0.0 1 0.0 1.0 1 \n",
"1 1.0 1 1508.0 1.0 0 \n",
"2 1.0 1 0.0 1.0 1 \n",
"3 1.0 0 2358.0 1.0 1 \n",
"4 0.0 1 0.0 1.0 1 \n",
".. ... ... ... ... ... \n",
"609 0.0 1 0.0 1.0 0 \n",
"610 1.0 1 0.0 1.0 0 \n",
"611 1.0 1 240.0 1.0 1 \n",
"612 1.0 1 0.0 1.0 1 \n",
"613 0.0 1 0.0 0.0 2 \n",
"\n",
" Loan_Status \n",
"0 1 \n",
"1 0 \n",
"2 1 \n",
"3 1 \n",
"4 1 \n",
".. ... \n",
"609 1 \n",
"610 1 \n",
"611 1 \n",
"612 1 \n",
"613 0 \n",
"\n",
"[614 rows x 6 columns]\n"
]
}
],
"source": [
"def correlationdrop(df, sl):\n",
" columns = df.columns\n",
" for column in columns:\n",
" C=abs(df[column].corr(df['Loan_Status']))\n",
" if C < sl:\n",
" df=df.drop(columns=[column])\n",
" return df\n",
"\n",
"df= correlationdrop(df,0.05)\n",
"\n",
"print(df)"
]
},
{
"cell_type": "markdown",
"id": "7cd3d4a8",
"metadata": {},
"source": [
"## Separate the variables"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c6e6d8cb",
"metadata": {},
"outputs": [],
"source": [
"x = df.iloc[:,:-1].values\n",
"y = df.iloc[:,-1].values"
]
},
{
"cell_type": "markdown",
"id": "0e743143",
"metadata": {},
"source": [
"## Scale the data"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "b8992600",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import MinMaxScaler\n",
"sc = MinMaxScaler()\n",
"X= sc.fit_transform(x)"
]
},
{
"cell_type": "markdown",
"id": "3615ec24",
"metadata": {},
"source": [
"## Split the data"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2a37ac15",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train,X_test,y_train,y_test = train_test_split(X,y, test_size= 0.2, random_state= 0)"
]
},
{
"cell_type": "markdown",
"id": "c98b35e0",
"metadata": {},
"source": [
"## Logistic Regression"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "daba8de4",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"model=LogisticRegression()\n",
"model.fit(X_train,y_train)\n",
"z=model.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "16b8534b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8292682926829268"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import accuracy_score\n",
"accuracy_score(y_test,z)"
]
},
{
"cell_type": "markdown",
"id": "2a31a652",
"metadata": {},
"source": [
"## SVM Classifier"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "e6c9e365",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVC\n",
"classifier = SVC(kernel = 'rbf', gamma= 0.2)\n",
"classifier.fit(X_train, y_train)\n",
"\n",
"# Predicting the Test set results\n",
"y_pred = classifier.predict(X_test)"
]
},
{
"cell_type": "markdown",
"id": "e5b880d7",
"metadata": {},
"source": [
"## Making the Confusion Matrix"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "a1503813",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[14 19]\n",
" [ 2 88]]\n",
"Accuracy: 80.44 %\n",
"Standard Deviation: 4.59 %\n"
]
}
],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"cm = confusion_matrix(y_test, y_pred)\n",
"print(cm)\n",
"\n",
"# Applying k-Fold Cross Validation\n",
"from sklearn.model_selection import cross_val_score\n",
"accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)\n",
"print(\"Accuracy: {:.2f} %\".format(accuracies.mean()*100))\n",
"print(\"Standard Deviation: {:.2f} %\".format(accuracies.std()*100))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}