Choosing A Right Estimator

Picture yourself back in school, bracing for exams in a couple of weeks. How will you prepare for your exams? Some strategies you may utilize include reading text books or class notes, scavenging Google for resources, or watching videos. Ultimately each person prefers their own method of learning, and if one method does not work, you will try another one until you find one that best suits you.

Most of the time results are how we gage the efficacy of each method- we continue study techniques that yield higher exam scores and abandon techniques that do not translate well to exams. This also means that depending on the exam, different studying methods will need to be used to achieve the most desirable results.

If this is how humans learn, do machines also have different learning preferences in order to achieve the goals they are programmed for? The answer is yes, machines do have different methods of learning depends on the problem which is given to them. Before getting into the methods machines use, it is first important to understand the different types of problems they can be given. For the most part, the two types of problems given to machines can be categorized as unsupervised and supervised.

Unsupervised machine learning models classify data inputs into clusters. For example, an unsupervised problem for machines could be categorizing people with high credit scores and high salaries as one group and people with low credit score and high salary as another. Glancing through an unsupervised problem, some techniques utilized are K-means clustering and Hierarchical clustering.

In a supervised problem, on the other hand, a machine is trying to predict an output depending on an input- for example, whether or not a person is applicable for loan based on their credit score, down payment, salary and other factors. Supervised problems can further be denoted as classification or regression. Classification is a problem requiring a yes or no prediction — whether a person can be given loan or not- while regression problems need machines to predict values such as the future credit score of an individual based on the trajectory of their current spending habits.

For classification problems, some methods machines use are Logistic Regression, KNN Classifier, and Decision Tree Classifier. In contrast, Regression problems utilize Linear Regression, KNN Regression, and Decision Tree Regression. However, machine solutions depend on more than just the classification of a problem since the type of data inputted into machines also has a substantial effect on which problem-solving methods will prove most effective. Data that is structured- neatly arranged in rows or columns- only needs standard machine learning procedures to input. These processes are not sufficient enough for unstructured data though because this type of data includes images, videos, and audios. Instead, deep learning methods like Artificial Neural Networks or Convolutional Neural Networks must be implemented.

With a vast array of machine problems and techniques, choosing the right estimator, or the equation used to train a machine to solve a problem, is tricky. Luckily, tools such as the scikit above make it more convenient for humans to identify solutions. By identifying the task at hand through a flowchart, we can algorithmically determine the best course of machine action. Maybe one day machines will become so sophisticated that they will be capable of choosing estimators on their own, but until then humans must hone their own problem solving skills before delegating the rest of the work to a machine.