No sooner do I finish my instructions when..
I just finished a comprehensive research paper... and well made a prediction that happened the next day almost. Not quite what I had in mind but, I like myself right now.
Paper Here (Note that I wouldn't call it fit to publish just yet. I am making an exception, this was turned in 12/12 with a deadline.):
Magnus Gordon
Com 546 – Final Paper
Due 12/12/06
My goal is to determine the direction that search system technology should take in the future. Search systems are widely used on the internet and in other places such as libraries. Because of how vital this technology is becoming it makes sense that this technology is something that should be researched and improved upon. That is not to say that search systems are insufficient but that they can be improved upon greatly. Currently with search engines, in order to have a webpage show up in a search, a webpage must be optimized properly. This optimization is a very laborious process. The reason search systems do this is to minimize the possibility for incorrect results, spam-web pages (web pages that mean nothing and just try to pull in visitors), and abuse of the search engine system. One of the goals of creating a better search system will be simplifying the process of submission. While it may be appropriate for web developing businesses to have to correctly optimize a web page, the same reasons do not apply to personal web page creations or creators. Typically a web page creating business is going to try to sell its services to another business and if a company is creating many web pages per day this may not be the type of thing users would like easily categorized. If someone is making a webpage for restaurants or informative blogs on the other hand, than users may prefer that they can easily access this information.
One of the goals of a search engine is to be as relevant as possible. Unfortunately the need is secondary to the consumers need for the usefulness of providing semi relevant information immediately. Relevance seems to haven taken a back seat to speed and simplicity. Though many search engines offer query refinement on the page of the search results for personal experience and the results of a small unscientific survey it can be determined unscientifically that people are very unlikely to use these search refinements. This is also the suggestion that the previous paper eludes too. Academics on the other hand want higher quality results with less search time. While learning how to properly manipulate Boolean operators can serve an academic well, the problem remains that not every system supports these functions. The systems that do can be navigated with greater accuracy but the problem of time still exists. All these quandaries with the current day search system setup lead me to ask, what are the criteria needed to develop a more efficient and effective search engine?
The intriguing aspect of this area of search is how much psychology and language analysis is involved in formulating a good search engine. Imagine a search for high football scores in a lost game. The query entered is going to be in the form of a question, which many engines are striving to support at the moment. The answer is going to be in the form of a score. What’s the Seahawks highest losing score? The language required to analyze this statement is intense and complicated. Currently as of 12/12/06 this question returns the most recent win by the Seahawks as a top answer on Google (one of the webs largest search engines). Now imagine if you were on a web page where you could pick a category say for example ‘Losses’ and then type the keywords Seahawks high scores. Would this prove more effective? Or would it return the same result? This essay will discuss theorys behind the way today’s search systems function and process requests. Then the paper will discuss what direction search systems are heading in the near future.
In the search engine business users are money. The more users you have the more money you receive from advertisers. The industry is operating on a model of revenue where the beneficiaries of the services pay little to no money at all. Because of this it is in the search system’s best interest to continuously improve and refine their system. Google is known for its ability to keep people guessing what the next component to the search engine optimization it will incorporate. While many of these details are unknown about how a system like Google refines their technology I will examine several different functions that are not yet wholly incorporated into any major search engine and determine if they are an option that is worthy of further study.
Beginning the feature study is query reformulation. To define: ‘Query’ is the object/word/phrase that is entered into a search submission field for the engine to find and match similarities to. Query reformulation is the process whereby the original query is altered to match search results with more accuracy. Query reformulation initially (as discussed in the first paper) was a type of user reformulation that, because of the time investment allotted for reformulation, never won the popularity contest. This type of user reformulation was the most accurate form of search. Currently query reformulation refers to the process that search systems go through without the user being aware of the alterations to their initial search. Some of this reformulation refers to literally reformulating a sentence, such as our previous Seahawks question, into something more search engine friendly. Ex: It turns it into a ‘phrase including’ the following: Seahawks high lose score (Cafarella). You can see now why the results for this search might have almost nothing to do with the actual question I had asked in the first place. Another form of query reformulation is ‘query context’ (Jansen). Query context is reformulation where the engine adds on words to contextualize the results and the accuracy of the search. Ex: If I enter the word ‘bark’ the search system may insert ‘tree’ or ‘dog’ for added contextual results and search relevance. This contextual search can often be based off of a user profile or web search histories and popularity (Jansen). With the speed of contextual reformulation that the search system performs for the user it doesn’t degrade the performance enough to make those who value speed turn away from the system. The issue here is that sometimes the reformulations can be very misleading or send a less informed user in the wrong direction.
The feature of reformulation shouldn’t be reformulation. Currently there are several branches of search developing for many different search engines. It is strange that each engine compels you to go to a different website if a user wants to use the different forms of search. Clicking buttons and wasting time switching platforms every time you want to switch to a specific type of search may turn many users away. Unfortunately there may be a strong correlation between how clear a query is and how ambiguous the results are (Cronen-Townsend). What this is hinting at is that there should be a way to rapidly categorize and clarify search results without sacrificing speed or usability. Creating a solution to this problem is not hard. If there was a drop down category that would have the top ten easiest ways to sort results in them than it isn’t hard to imagine queries getting a great deal more accurate. Thinking about the categories and what they might look like it is easy to envision a ‘Business’, ‘Animal’, ‘Academic’, and ‘Government’ category. For example, imagine the difference in a search for ‘Medical Care’ in the context of all 4 of those query categories. If a user allows a computer to input the context of ‘Medical Care’ for them while searching for the best way to take care of their family pet then chances are that they will take a very long time trying to get the computer to insert that context. A simple method of classification at the beginning of a search engine is a definite candidate for consideration.
There are currently available evaluative search systems some even capable of mapping 3d images (Min). Why then do they ignore the basics in a question about sports scores? The answer to this is that there are many different types of search systems and some are better design to compare different functions than others. Not all of them are available for comparative searches like the graphically analyzing models mentioned. Some search engines like these graphical search engines are extremely user intensive. There are some engines however that take phrases from the query and convert them into phrases standard Boolean operators for the user. Knowing that a search engine can perform these type of analysis but do not typically search on the multiple endings of phrases for the user is frustrating. The Boolean query entry law* should return law, laws, lawyers, lawless, lawful, and more (Albano). Search engines should automatically return similar queries without the entry of the Boolean operator. Some search engines do but frequently many do not. If a search engine requires training it may not be as attractive to use as something as simple as Google or Yahoo! Similar to this is the ‘Binding Engine’ search that is currently in development. The binding engine allows for variables, not just Boolean operators to be entered into the query field. With the binding engine search and query refinement there are many possibilities. The binding engine on it’s own could be a powerful tool. An example of the binding engine is a search that allows the search not to search for the query variations itself and not just a sentence with the query in it. Ex: a search for ‘Slow noun’ returns, sloth, slugs, molasses, and time elapsed photography (Cafarella). If you could contextualize this search with the click of a button you are getting towards a powerful tool. Even without categorizing the Binding engine search would still be powerful for the fact that it allows a user to search for Seahawks score loss, which would provide all the scores with Seahawk losses as a result for the query.
Some search engines are more apt to adapt to complex, yet instantaneous search improvement. Yahoo! For example uses Boolean operators like * (Miramatsu). Other search engines remove what are called stop-words. These stop words would cause searches like to be or not to be to return absolutely no results (Miramatsu). The problem is determining when it is applicable to use this setting and who makes that decision. An idea for some search engines would be to have a qualifying amount of stop words that would hinder removal if the query contained an amount above that number. An example would be the searches ‘to be or not to be’ and ‘Seahawks win in overtime’. There is only one stop word in the Seahawks related query so the term can be removed for the search however; the ‘to be or not to be’ broke the limit of 3 stop words so none would be removed for fear of damaging the query. This system may conflict with some Boolean operators such as ‘creek and not mill’ yet as I explained previously most users do not want to think about complex query entries so they may most likely not be using these terms that may be hindered therefore it is not considered an issue just a precaution. The last factor in query entry that needs considering is term order weight (Miramatsu). Because of the substantial difference between ‘Seahawks beat’ and ‘beat Seahawks’ this is something that no search engine should do without yet surprisingly at the time of Miramatsu’s article it was most prevalent in only Lycos and Google. An educated user can currently navigate around various search engines quite effectively depending on what they are looking for. My argument is that because it is the search engines that are profiting from the users, they should help users remain happy by creating an educated search engine that requires little to no knowledge on the users end.
Because this paper was all about the uneducated user I had to get opinions from some close, not web savvy friends. These are regular people with currently no higher education than a high school diploma. There are ten friends total four female, six male. One male and one female are students at the University of Washington. Two females are students at Seattle Central Community College the other female is a student at North Seattle CC where four of the remaining males attend. The final male is my roommate and is not attending college.
The survey had a few very deliberate questions to determine whether or not a format of search could be perceived as user friendly and the rest of it was fairly open ended.
Below are the question listed in numbered form:
1. What do you think of current search engines? Is there room for improvement?
2. In what way do you think search engines could be improved?
3. How would you respond to a drop down category classification box that is normally set at default? Is this a good idea?
4. If there was a circle checked by default labeled ‘Perform Smart Search Operations’ and you had no idea what it did would this worry you?
5. If there was an option for ‘Advanced Variable Replacement’ that had a ‘what is this?’ link beneath it would you be inspired to learn more or just leave it unchecked?
6. If there was a community monitored search engine submission method that made it easier for good information to get in and harder for spam pages to turn up would you participate?
That was generally the extent of the survey simply to forum ideas with those who may not be familiar with the idea of ‘Binding Engines’ or stop word searches and Boolean operators. The results of the survey were surprisingly favorable. More specifically I am referring to the community moderation question. 7 out of 10 surveyed reported that they would support this method of submission. Even after hearing that a company could simply submit and check a few boxes to have the result listed in the proper area there was still a large amount of support for the idea from all those who supported it initially. The idea behind question six is that a company should not have to spend hours to optimize their page for search engines. When a page is created it should let the content and the quality of the site guide it to the rightful place in whatever search system. The user moderated forum is a take away from Wikipedia. Thomas Freidman mentions in “The World is Flat” that user moderated communities and user generated content is one of the great equalizers. Yes there can be abuses but in a community like a search engine moderation and submission community you could require real information from people and legitimize every submission by requiring logins for everyone who wants to submit or alter any web page classification entry. This combined with more advanced search functions such as a drop down classification box set at default and two other buttons that allow those not interested in higher level of customization to ignore everything and search as usual could be a revolutionary way to get more relevant results with less effort. Currently many systems could make the switch very easily. Yahoo! And Google have the infrastructure and Microsoft Live is creeping up right behind them. In the future I wouldn’t be surprised if many people make a move towards setting up this infrastructure. The system of Gmail and Google loyal subscribers along with Yahoo! Groups, and My Yahoo! have a great advantage to begin working on this now.
I don’t see these results too far off from now in the future. This is no great ten year prediction but more of a three year one. You can count on the big players in the industry right now to make advancements toward this. I would recommend a full-scale usability survey on the system that I mentioned. It is worth determining whether or not the drop down optional category system, optional ‘Smart Search’ button, and ‘Advanced Variable Search’ or ‘Binding Engine’ could be implemented in a clean user friendly fashion.
Bibliography
Albano, Jessica ‘Com 546 - Guest Lecture Week 2’ Communications Librarian, AUT 2006. University of Washington.
Cafarella, M. J. and Etzioni, O. 2005. A search engine for natural language applications. In Proceedings of the 14th international Conference on World Wide Web (Chiba, Japan, May 10 - 14, 2005). WWW '05. ACM Press, New York, NY, 442-452. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1060745.1060811
Eastman, C. M. and Jansen, B. J. 2003. Coverage, relevance, and ranking: The impact of query operators on Web search engine results. ACM Trans. Inf. Syst. 21, 4 (Oct. 2003), 383-411. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/944012.944015
Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1 (Jan. 2006), 51-78. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1125857.1125859
Friedman, Thomas L. “The World Is Flat: A Brief History of the Twenty-First Century” Updated and Expanded 2005-2006 New York, NY. Farrar Straus and Giroux.
Jansen, B. J., Mullen, T., Spink, A., and Pedersen, J. 2006. Automated gathering of Web information: An in-depth examination of agents interacting with search engines. ACM Trans. Inter. Tech. 6, 4 (Nov. 2006), 442-464. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1183463.1183468
Mandl, T. 2006. Implementation and evaluation of a quality-based search engine. In Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (Odense, Denmark, August 22 - 25, 2006). HYPERTEXT '06. ACM Press, New York, NY, 73-84. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1149941.1149957
Min, P., Halderman, J. A., Kazhdan, M., and Funkhouser, T. A. 2003. Early experiences with a 3D model search engine. In Proceeding of the Eighth international Conference on 3D Web Technology (Saint Malo, France, March 09 - 12, 2003). Web3D '03. ACM Press, New York, NY, 7-ff. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/636593.636595
Muramatsu, J. and Pratt, W. 2001. Transparent Queries: investigation users' mental models of search engines. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM Press, New York, NY, 217-224. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/383952.383991
Ntoulas, A., Cho, J., and Olston, C. 2004. What's new on the web?: the evolution of the web from a search engine perspective. In Proceedings of the 13th international Conference on World Wide Web (New York, NY, USA, May 17 - 20, 2004). WWW '04. ACM Press, New York, NY, 1-12. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/988672.988674
Now note the New Engine Decipho:
http://www.searchenginejournal.com/?p=4122
Unfortunately because of Decipho's name I think this launch will ("be a total failure" Edit to: "Not be as successful as it could be" ), but you will never hear me say it was a bad idea, after all It was mine!
In order to have a name that can be competitive in the search market it has to have the same function that google has, that is currently in search people say "Google it" Well you need a name that can compete, My suggestion would be for Yahoo! to grab the domain Kwizz.com from the person who is redirecting it to a video game parking site and Implement my system to get people saying "Kwizz it".
