Everyone should have an idea what machine learning is. Even because it is in the list of top priority areas of the companies that define modern digital industry: Google, Facebook and Amazon. The nature of this technology guarantees that it will change the technologies at the early stages of its development. Not to be left out of the cold, you should know how it works.
The Aim of Machine Learning is to Program Everything
In fact, it is teaching a computer to recognize objects as a human. For this purpose a great deal of data is gathered, organized and commented upon by a man, the data are like photos with tags. Then the program analyses the data, comparing their elements with people’s comments, after that it looks for the same objects in the net. For example, Facebook offers tags for photos, Google photos looks for people in the pictures.
At present, most such algorithms are used for nothing very serious, primarily for entertainment: smart photo albums or speech recognition. Such application is not very demanding. If the algorithm confuses your dog with your friend Petro or mishears a multi-syllable word “self-improving”, the user will only laugh at it and corrects the mistake manually. But algorithms are improving, becoming more reliable and soon they will be used in much more difficult tasks.
Machine learning allows companies to create adds that work with pictures created by people, texts, speech and other things that did not use to be digital. Generally speaking, it helps to create adds that will understand people: break the wall that has been dividing a human and a computer since the times of differential machine by Turing.
A graphic interface and a mouse opens the door for the computer in every house. A sensor interface made it a household appliance. The interfaces, based on machine learning, will make them pervasive.
There is only one problem. Someone will have to systemize all the data.
Machine learning can be as successful and reliable as its default data are.
To create a self-educating machine you will need three things:
- Data for training: the files that are ordered and systematized by people.
- The program itself: software that will create models based on the training data.
- Hardware: the counting facilities using which the program works and saves the previously mentioned data.
Hardware is easy to get. Take an old computer or rent a part of a cloud.
The program, in fact, is even easier to get. A big part of licenses is available free.
Everything you need data for training. And you need a great deal of them. That is the main problem.
The case is that the quality of created algorithms directly depends on the scope of default data store. Today we still do not have the software able to create a good model from, for example, one thousand sources.
Besides, you can not directly delete errors from them, even because we still do not understand how they work. This is the strangest feature of machine learning: it is practically impossible to get head or tail in them, directly or logically, you can only do it by trial and errors.
And when the situation is the same and we do not know how machine learning works, we have to compensate local errors of computer logic by using numerous data.
Where Should We Get these Data?
Use every part of buffalo that is the user
If we want to make computers to understand such non-digital people, we will have to make people learn them. But where should we find thousands of people who will agree to spend their free time creating information that will be easy for computer processing? If to employ them, the budget of the program will be just unreasonable.
Everything is simple. If a person does not pay money for the product, he pays something else, doesn’t he? For example, a user has to watch an ad in Google or tag something in Instagram. Actually,in free services users create the data store needed for machine learning themselves.
Online services learn how to use all parts of their users: the same as the Indians used to find practical application of all parts of buffalo. Our attention provides them with money made on advertising. Our knowledge is fuel for their machine learning, even if the knowledge is that a round object on the dinner table is a pizza.
This is the same story as Tom Sowyer and painting the fence, only zoomed by millions of times.
Take, for instance, Facebook Photoes. This program offers you to select friends in the downloaded photos. This is also useful for you: the photos will be easy to find. This is convenient for your friends: they can find photos with them and recollect what happened then. This is beneficial for Facebook too: because it creates a numerous data storage that will be used for machine learning training. It helps Facebook to improve service Photoes. To cut a long story short, everyone only wins.
The better the program, the more people use it, the better the program. This is a cyclic process.
Under what conditions can these adds exist?
- It should be online. Otherwise, it will be impossible to get the necessary data.
- The counting should take place not in a user’s device but on the servers. Otherwise, no one will want an add using so much space on his smartphone.
- The first condition of creating a good program is a big audience. The bigger the audience, the bigger the data they create.
- The second condition is a regular usage. The more often people use the app, the bigger data they create.
- Good adds encourage the creation of accurate data because thhe developers do not want you to tag a pizza using a “Pikachu” tag.
Of course, start-ups can experience a problem here. You will not have so many users as Facebook has, that means you will have less data. And if you are going to overact the corporation on its own playing field you are bound to fail. That why the only way out for you is to collect unique data and create your own algorithms in the field where there is still no winner.