January 16, 2006
Search Tech Opportunities- Improvement #6: Feature Vectors
Following up my post on the opening of the search tech chain, the next few opportunities start getting slightly more technical. The sixth opportunity that I see is with more useful features in the search feature vectors and the mathematical combination of entries in those vectors.
Briefly, the current features that I can use to extract resources in the major search engines are word-based where each resource can be retrieved based on a series of words. (Note that this is not completely accurate, as there are a few other features that could be searched on such as language, file format, date of update, and domain suffix but the vast majority of the entries in the vector currently represent words).
Some Ideas (aimed at the search engines):
- How about giving me a few more features to search on? For example, search engines seem to be storing away bolded and highlighted words to use in their prioritization schemes. How about exposing some of them so I can use them in a basic (or more advanced) search?
- More advanced, give me the access to a large number of features that you are not already calculating, but should be (maybe you are already?). For example, I am always searching for interesting new technology product companies. Most of them have a tab-based link on their home page that says “product” (sometimes plural, but the stem is fine). How about a calculated feature that I can search on? This is one of a huge number of possible features.
- Even better, let me make my own features calculations with some tools that you provide and you execute them and store them on your systems!
- Just as important as the exposure to features, I would like to have the ability to make calculations off of the features and store them in your system. This will allow me to create some machine learning models (at a higher, concept level) and make the calculations in advance of my searches. (I will reduce the load on the systems by only uses specific features through this method, as I will use my composite variable for the resource extraction). This will also give me the ability to store some very advanced searches AND some of my schemes for prioritizing results. If you are really generous, you would allow me to make the calculations through the API as well.
I think Alexa already allows me to do this with its open platform, but I have not studied it enough at this point.
The net result is that I should be able to do some really interesting things with the information, especially if the other components of the open platform are in place (prior posts). The Amazon Camera Image search is one good example of what is possible (even if this does not interest you from a user standpoint, the search ability is amazingly specific).