top of page
fondo azul

An overview of Machine Learning models.

  • Writer: Yazmin T. Montana
    Yazmin T. Montana
  • Oct 14, 2021
  • 3 min read

Updated: Jun 8, 2022

In order to understand machine learning, first we need to understand the basics of software development.


Traditionally, software programs have used input data to produce output data.


Software developers would carefully craft the program by manually developing thousands or millions of lines of code. Once developed, the program was fixed and generally couldn't adapt its behavior unless developers rewrote it and the deployed a new version.


With machine learning, this well stablished concept is turned upside down. Instead, the software behavior is generated from historical data, so now the process looks like this:

Input Data + Output Data= Program

As new data becomes available, the algorithm can simply be regenerated or retrained from the updated dataset, thus adapting its behavior to changes in the environment. When this process happens repeatedly, the software effectively becomes self-learning and self-adapting.

This is the foundational principle of machine learning.


The ability to generate and regenerate software logic automatically from data is what powers AI.


Most common types of AI models (or prediction algorithms)

  • Linear regression

  • Deep neural networks

  • Logistic regressions

  • Decision trees

  • Linear discriminant Analysis

  • Naive Bayes

  • Support Vector Machines

  • Learning vector quantization

  • K-nearest Neighbors

  • Random forests

Linear regression:

Is a model that is based on supervised learning. Its main task is to find the relationships between input and output. It is also used to predict the value of a dependent variable based on a given independent variable. There are multiple ways to develop these models, including softwares such as Excel and Minitab, and if there are not time constraints on your project, can even be developed conventionally on pen and paper!

Deep neural networks

These models are inspired by the neural network of the human brain since they are similarly based on interconnected units known as artificial neurons.

Just like Linear regressions, neural network models can be developed on Excel, creating results that can later be used to obtain specific values to a problem with defined constraints.

Logistic Regression

This is a classic statistical model that can predict the class of the dependent variable from the set of given independent variables. It is similar to the linear regression model, but it is only used in solving problems based on classification. Just like linear regression, it can also be developed on Excel.

Decision Tress

This model is used to obtain conclusions based on the data from past decisions. It is deployed after the data has been divided into smaller portions and gets its name because it resembles the structure of a tree. This model can be used for both regression and classification problems. In other words, it can be used for both forecasting and obtaining insights of the current data of your model.

Linear Discriminant Analysis (LDA)

LDA is a branch of the Logistic regression model. It is used when two or more classes are to be separated in the output. It is commonly used in the research of medicine and computer vision, since many factors come into play in order to obtain a result.


Naive Bayes

This model is based on the Bayes theorem and it is applied for test classification. It works on the assumption that the occurrence of any feature does not depend on the occurrence of any other feature (correlation). For example, it is assumed that every variable interacting in the model is completely independent and has no effect on any other part of the model. It is called "naive" because this assumption is almost never true.

Support Vector Machines

This is a supervised ML algorithm that can be used for classification, outlier detection and regression problems.SVM usually works faster and performs better with datasets of limited samples, for example text classification problems, such as chat-bots. it is applicable for binary classification problems.

Learning Vector Quantization

LVQ processes information by preparing codebook vectors that are used to classify unseen vectors. It is used for solving multi-class classification problems, usually seen in the development of robotics such as self-driving vehicles.

K-nearest neighbors.

kNN is a simple supervised ML model that i used for solving both regression and classification problems, It works on the assumption that similar data exists near each other.

It is a powerful model, but typically slow, since it increases the volume of data over time. This model can be used to classify problems and defects in manufacturing, since it considers groups of defects that appear similar, but can section them in key differentiators.

Random forest

This algorithm builds a "forest" of multiple decision tress, each trained on different data sets.

It is a robust technique that can help obtain more accurate forecasts than a linear regression, since it considers multiple outputs and systems of interactions.

It is commonly applied in banking and finance since there are many variables that interact in intricate systems with defined rules.





“Some people call this artificial intelligence, but the reality is this technology will enhance us. So instead of artificial intelligence, I think we’ll augment our intelligence.”

—Ginni Rometty










ree

Comments


bottom of page