January 16, 2006
Search Tech Opportunities- Improvement #7: Machine Learning as a Web Service
Following up my post on the opening of the search tech chain, the seventh opportunity that I see is with machine learning of various kinds. If you are not familiar with machine learning, take a look at Tom Mitchell’s book on the topic (funny, when I punched “machine learning” into Google, the first entry was an advertisement from Google asking “Want to work at Google?”)
This topic gets deep and broad very fast. I have put a lot of time into it over the years (starting with some credit card behavioral modeling in the early ’90s), but even with all the research I have done, I know only enough to be dangerous. Machine learning is relatively technical and relatively difficult to get exactly right (lots of math, CS, and art here).
I do, however, have a few non-technical thoughts on the topic:
What I would use it for…
Machine learning can be used for an amazing number of things (too many to describe here and many, many that I have not even considered). With respect to search (and assuming that all of my improvement ideas so far have had some level of development), innovators could create the following (and much more):
- Propose tags for me on the social tagging (as I requested in an earlier post on tagging).
- Given a set of resources (a.k.a., webpages, URIs), find resources that are similar (the find similar buttons on the search engines are really quite bad at this point. The approach that I would test would be asking the user what “dimensions” of similarity the user is looking for and then find everything similar. Note that the approach would use the feature vectors and models from the machine learning algorithms.
- Automatically update my Ajax desktop pico-domain. There is refresh work that would need to be done in addition to the machine learning, but I should be able to develop algorithms that help me to quickly find and update resources in my domain-specific site.
These are just a few of the list of examples that solid machine learning can do. Clearly, it is not perfect, so I will still need to have some manual activity to sort through some bad results, particularly at the beginning. But, the machine learning will save me a lot of time.
An innovative service…
The problem is that building good machine learning models is a time intensive task that starts with creating a model building environment. While many of the large companies appear to be doing just this, most smaller companies are standing flat footed in this area, due to lack of understanding, resources, and skills.
So, how about an innovative On-demand (SAAS) service as an offering in this area from a company that has the skills (generally now operating as professional services groups)? Most internet services could use machine learning of one kind or another, but they do not have the resources/skills to set up the model development environment. The service could be helpful with code that helps create feature vectors (and other machine learning inputs), recommend modeling approaches for the particular class of problems, walk the user through the model building process, and back test the models. Finally, it could deliver the code for the models or, possibly, even implement the models on its own systems as an ongoing service.
Since the inputs are all available via the Internet and there is a lot of work in the set-up of the model building environment, this innovation lends itself pretty nicely to be set up as an internet based service.
Given that the process still has a fair amount of art in it, the service could also offer up expert model building advisors to its customers.
If some innovators do not move in this direction for the community at large to share, this will become a major area of strategic advantage for the larger companies over the next few years. Perhaps Alexa (or another innovator) will move in this direction for the benefit of many?
Huge amount of innovation potential here!