In supervised learning, we try to infer function from training data. There are three steps to build a supervised model. Build model, train model and test model.
Let me give you my analogy of supervised learning. Suppose you are a student and want to learn a subject called machine learning. You have joined a tuition class for machine learning.
1 - In this case you are a model.
2 - Now your teacher will teach you machine learning. During teaching, your teacher may use some resource. In machine learning terminology, this is the training process. Where we train our model with past/current data. Training process varies according to learning algorithm. In short what we do is, we try to find out patterns in data.
3 - At the end of the course your teacher may test your knowledge to check how well you have done. Your teacher will test your knowledge and based on that he/she will take some actions to improve it. One situation to note here is that during teaching your teacher might have used some examples, so obviously he will not ask that one in the test. If he asks that example then you can solve it easily since you have learned it already. Your teacher takes test to ensure that you have learned the concept and you can perform well on new examples as well. In machine learning terminology, since we have trained our model on training data it will perform well (most of the time) on the training data. In order to check it’s actual power of prediction or accuracy we have to test it on unseen data (test data). Usually we divide whole data set in 70:30 manner to form training and testing data respectively.
Practical example, say you are trying to build a model which can detect fraud transaction of credit cards. What you can do is, try to gather as much data as you can from past incidents. Say, you have 500 data about fraud transactions and 500 data about normal transactions. Now you can use any supervised learning algorithm which will try to find patterns in data when it is fraud transaction as well when it is normal transaction. So now your model knows when xyz pattern exists in the data then there is some probability that it is a fraud transaction.