Image for post
Image for post

A few months ago I decided to create an Image Classification model using keras to detect threats such as firearms. I have now decided to extend this to object detection.

The purpose for a tool like this is to be able to detect objects in real time using a camera system.

Object Detection vs Image Classification

Before we began, I will slightly assume that you already know the difference between object detection and image classification is, but this will serve as a quick recap.

Image Classification is the process of using an image as your input through your model, and that model detects similarities in the given image, to have an output of your desired class. …


Image for post
Image for post

I’m sure we’ve all seen this very annoying error before when using neural networks. I have personally spent weeks trying to debug my code in order to fix this error early in my Data Science Consultant career.

I want to make sure to address this error, how to understand the error and be prepared for when you see it again.

What does the error mean?

When you see the ValueError it usually means that your model (convolutional, lstm, etc.) ran into an input error. …


Image for post
Image for post

This will be a very short article about the papers published on this topic.

Summary

A research paper by the authors, Wei-Han. Lee and Ruby B. Lee, from one of Princeton Universities Department introduces the idea of always keeping your smartphone safe by using a multi-sensors-based system that continuously authenticates the user using your smartphone’s sensors.

The system is able to perform this task by learning the owner’s behavior patterns and environment characteristics, and then authenticates the current user without interrupting the user’s interaction. This helps combat smartphone thefts from impersonation attacks.

The system would use a sensors that best reflects the user’s…


Image for post
Image for post

This is part 4 out of 4 and in part 4, the results of all of the machine learning algorithms that were used will be looked over.

In other words this will be a very short blog…

Conclusion

  • Majority of the models had problems classifying.

    - The metric that was used the most in order to make decisions with the data was F1 score.

Performances

Random Forest (Original)

Acc: 0.70

AUC ROC: 0.5473384593553532

TNR: 0.18208028387669106

FPR: 0.8179197161233089

FNR: 0.08740336516598454

TPR: 0.9125966348340154

Precision: 0.7312345139192538

Recall: 0.9125966348340154

F1 Score: 0.8119108306024194

Random Forest (Feature Selection/Importance > 0.05)

[‘Current Loan Amount’,
‘Credit Score’,
‘Annual Income’,
‘Monthly Debt’,
‘Number of Open Accounts’,
‘Current Credit Balance’,
‘Maximum Open Credit’,
‘Term_Short Term’,
‘Home Ownership_Home Mortgage’,
‘Home Ownership_Rent’]

Test Score: 0.6727296181630547

OOB Score: 0.7639080876521108

TNR: 0.3016189842537148

FPR: 0.6983810157462852

FNR: 0.17507958162801274

TPR: 0.8249204183719873

Precision: 0.7422866028316556

Recall: 0.8249204183719873 …


Image for post
Image for post

This blog is part 3 out of 4 and we will be discussing Boosting.

Gradient Boosting


Image for post
Image for post

On the bank data we saw that our dependent variable is imbalanced, and on the pervious blog we discussed that the metric that we will be basing our results on was F1 Score using the Confusion Matrix. This blog will discuss, in depth, why.

Accuracy

Accuracy score is the most commonly used metric when it comes to performance measurement, but sometimes this can be a bad metric to base your results on.

Accuracy measures how many observations, both positive and negative, were correctly classified.

This can be misleading when having an imbalanced dataset, because if you have an imbalance dataset where the dependent variable is binary, there are 80% 1’s and 20% 0’s, then our model will develop an accuracy score where it calculates most of predicted variables as 1’s, maybe giving it a 90% accuracy score. …


Image for post
Image for post

Part 1 out of 4 will be short posts about the 4 different machine learning algorithms that were used on the bank data.

Random Forest


Image for post
Image for post

This will be a short post before we dive deep into classification in the next few blog posts.

If we look back on the banking data we will see that the dependent variable is heavily imbalanced. We can check the value counts by using the code below, and we can also get a visual representation using Seaborn’s count plot.

# Dependent variable is imbalanced
y_train.value_counts(normalize=True)
sns.countplot(y_train)


Image for post
Image for post

The dataset that will be used for this example is on Kaggle. This discussion will be about the process of using PCA on the Bank data.

What is PCA?

PCA, Principal Component Analysis, is a statistical procedure that uses an orthogonal transformation which converts a set of correlated variables to a set of uncorrelated variables.

PCA is a tool that is mostly used for Exploratory Data Analysis, EDA, and in machine learning predictive modeling. You can also use PCA for dimensionality reduction, this is also known as feature extraction. This become useful when wanting to make a dataset simpler by reducing the amount of features you have in your dataset. …

About

Zaki Jefferson

Data Scientist | Data Science Consultant. I work with companies and individuals to help leverage the abundance of data to help grow their ideas and business!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store