Wednesday, December 13, 2006

No sooner do I finish my instructions when..

I just finished a comprehensive research paper... and well made a prediction that happened the next day almost. Not quite what I had in mind but, I like myself right now.

Paper Here (Note that I wouldn't call it fit to publish just yet. I am making an exception, this was turned in 12/12 with a deadline.):


Magnus Gordon
Com 546 – Final Paper
Due 12/12/06

My goal is to determine the direction that search system technology should take in the future. Search systems are widely used on the internet and in other places such as libraries. Because of how vital this technology is becoming it makes sense that this technology is something that should be researched and improved upon. That is not to say that search systems are insufficient but that they can be improved upon greatly. Currently with search engines, in order to have a webpage show up in a search, a webpage must be optimized properly. This optimization is a very laborious process. The reason search systems do this is to minimize the possibility for incorrect results, spam-web pages (web pages that mean nothing and just try to pull in visitors), and abuse of the search engine system. One of the goals of creating a better search system will be simplifying the process of submission. While it may be appropriate for web developing businesses to have to correctly optimize a web page, the same reasons do not apply to personal web page creations or creators. Typically a web page creating business is going to try to sell its services to another business and if a company is creating many web pages per day this may not be the type of thing users would like easily categorized. If someone is making a webpage for restaurants or informative blogs on the other hand, than users may prefer that they can easily access this information.
One of the goals of a search engine is to be as relevant as possible. Unfortunately the need is secondary to the consumers need for the usefulness of providing semi relevant information immediately. Relevance seems to haven taken a back seat to speed and simplicity. Though many search engines offer query refinement on the page of the search results for personal experience and the results of a small unscientific survey it can be determined unscientifically that people are very unlikely to use these search refinements. This is also the suggestion that the previous paper eludes too. Academics on the other hand want higher quality results with less search time. While learning how to properly manipulate Boolean operators can serve an academic well, the problem remains that not every system supports these functions. The systems that do can be navigated with greater accuracy but the problem of time still exists. All these quandaries with the current day search system setup lead me to ask, what are the criteria needed to develop a more efficient and effective search engine?
The intriguing aspect of this area of search is how much psychology and language analysis is involved in formulating a good search engine. Imagine a search for high football scores in a lost game. The query entered is going to be in the form of a question, which many engines are striving to support at the moment. The answer is going to be in the form of a score. What’s the Seahawks highest losing score? The language required to analyze this statement is intense and complicated. Currently as of 12/12/06 this question returns the most recent win by the Seahawks as a top answer on Google (one of the webs largest search engines). Now imagine if you were on a web page where you could pick a category say for example ‘Losses’ and then type the keywords Seahawks high scores. Would this prove more effective? Or would it return the same result? This essay will discuss theorys behind the way today’s search systems function and process requests. Then the paper will discuss what direction search systems are heading in the near future.
In the search engine business users are money. The more users you have the more money you receive from advertisers. The industry is operating on a model of revenue where the beneficiaries of the services pay little to no money at all. Because of this it is in the search system’s best interest to continuously improve and refine their system. Google is known for its ability to keep people guessing what the next component to the search engine optimization it will incorporate. While many of these details are unknown about how a system like Google refines their technology I will examine several different functions that are not yet wholly incorporated into any major search engine and determine if they are an option that is worthy of further study.
Beginning the feature study is query reformulation. To define: ‘Query’ is the object/word/phrase that is entered into a search submission field for the engine to find and match similarities to. Query reformulation is the process whereby the original query is altered to match search results with more accuracy. Query reformulation initially (as discussed in the first paper) was a type of user reformulation that, because of the time investment allotted for reformulation, never won the popularity contest. This type of user reformulation was the most accurate form of search. Currently query reformulation refers to the process that search systems go through without the user being aware of the alterations to their initial search. Some of this reformulation refers to literally reformulating a sentence, such as our previous Seahawks question, into something more search engine friendly. Ex: It turns it into a ‘phrase including’ the following: Seahawks high lose score (Cafarella). You can see now why the results for this search might have almost nothing to do with the actual question I had asked in the first place. Another form of query reformulation is ‘query context’ (Jansen). Query context is reformulation where the engine adds on words to contextualize the results and the accuracy of the search. Ex: If I enter the word ‘bark’ the search system may insert ‘tree’ or ‘dog’ for added contextual results and search relevance. This contextual search can often be based off of a user profile or web search histories and popularity (Jansen). With the speed of contextual reformulation that the search system performs for the user it doesn’t degrade the performance enough to make those who value speed turn away from the system. The issue here is that sometimes the reformulations can be very misleading or send a less informed user in the wrong direction.
The feature of reformulation shouldn’t be reformulation. Currently there are several branches of search developing for many different search engines. It is strange that each engine compels you to go to a different website if a user wants to use the different forms of search. Clicking buttons and wasting time switching platforms every time you want to switch to a specific type of search may turn many users away. Unfortunately there may be a strong correlation between how clear a query is and how ambiguous the results are (Cronen-Townsend). What this is hinting at is that there should be a way to rapidly categorize and clarify search results without sacrificing speed or usability. Creating a solution to this problem is not hard. If there was a drop down category that would have the top ten easiest ways to sort results in them than it isn’t hard to imagine queries getting a great deal more accurate. Thinking about the categories and what they might look like it is easy to envision a ‘Business’, ‘Animal’, ‘Academic’, and ‘Government’ category. For example, imagine the difference in a search for ‘Medical Care’ in the context of all 4 of those query categories. If a user allows a computer to input the context of ‘Medical Care’ for them while searching for the best way to take care of their family pet then chances are that they will take a very long time trying to get the computer to insert that context. A simple method of classification at the beginning of a search engine is a definite candidate for consideration.
There are currently available evaluative search systems some even capable of mapping 3d images (Min). Why then do they ignore the basics in a question about sports scores? The answer to this is that there are many different types of search systems and some are better design to compare different functions than others. Not all of them are available for comparative searches like the graphically analyzing models mentioned. Some search engines like these graphical search engines are extremely user intensive. There are some engines however that take phrases from the query and convert them into phrases standard Boolean operators for the user. Knowing that a search engine can perform these type of analysis but do not typically search on the multiple endings of phrases for the user is frustrating. The Boolean query entry law* should return law, laws, lawyers, lawless, lawful, and more (Albano). Search engines should automatically return similar queries without the entry of the Boolean operator. Some search engines do but frequently many do not. If a search engine requires training it may not be as attractive to use as something as simple as Google or Yahoo! Similar to this is the ‘Binding Engine’ search that is currently in development. The binding engine allows for variables, not just Boolean operators to be entered into the query field. With the binding engine search and query refinement there are many possibilities. The binding engine on it’s own could be a powerful tool. An example of the binding engine is a search that allows the search not to search for the query variations itself and not just a sentence with the query in it. Ex: a search for ‘Slow noun’ returns, sloth, slugs, molasses, and time elapsed photography (Cafarella). If you could contextualize this search with the click of a button you are getting towards a powerful tool. Even without categorizing the Binding engine search would still be powerful for the fact that it allows a user to search for Seahawks score loss, which would provide all the scores with Seahawk losses as a result for the query.
Some search engines are more apt to adapt to complex, yet instantaneous search improvement. Yahoo! For example uses Boolean operators like * (Miramatsu). Other search engines remove what are called stop-words. These stop words would cause searches like to be or not to be to return absolutely no results (Miramatsu). The problem is determining when it is applicable to use this setting and who makes that decision. An idea for some search engines would be to have a qualifying amount of stop words that would hinder removal if the query contained an amount above that number. An example would be the searches ‘to be or not to be’ and ‘Seahawks win in overtime’. There is only one stop word in the Seahawks related query so the term can be removed for the search however; the ‘to be or not to be’ broke the limit of 3 stop words so none would be removed for fear of damaging the query. This system may conflict with some Boolean operators such as ‘creek and not mill’ yet as I explained previously most users do not want to think about complex query entries so they may most likely not be using these terms that may be hindered therefore it is not considered an issue just a precaution. The last factor in query entry that needs considering is term order weight (Miramatsu). Because of the substantial difference between ‘Seahawks beat’ and ‘beat Seahawks’ this is something that no search engine should do without yet surprisingly at the time of Miramatsu’s article it was most prevalent in only Lycos and Google. An educated user can currently navigate around various search engines quite effectively depending on what they are looking for. My argument is that because it is the search engines that are profiting from the users, they should help users remain happy by creating an educated search engine that requires little to no knowledge on the users end.

Because this paper was all about the uneducated user I had to get opinions from some close, not web savvy friends. These are regular people with currently no higher education than a high school diploma. There are ten friends total four female, six male. One male and one female are students at the University of Washington. Two females are students at Seattle Central Community College the other female is a student at North Seattle CC where four of the remaining males attend. The final male is my roommate and is not attending college.

The survey had a few very deliberate questions to determine whether or not a format of search could be perceived as user friendly and the rest of it was fairly open ended.

Below are the question listed in numbered form:
1. What do you think of current search engines? Is there room for improvement?
2. In what way do you think search engines could be improved?
3. How would you respond to a drop down category classification box that is normally set at default? Is this a good idea?
4. If there was a circle checked by default labeled ‘Perform Smart Search Operations’ and you had no idea what it did would this worry you?
5. If there was an option for ‘Advanced Variable Replacement’ that had a ‘what is this?’ link beneath it would you be inspired to learn more or just leave it unchecked?
6. If there was a community monitored search engine submission method that made it easier for good information to get in and harder for spam pages to turn up would you participate?

That was generally the extent of the survey simply to forum ideas with those who may not be familiar with the idea of ‘Binding Engines’ or stop word searches and Boolean operators. The results of the survey were surprisingly favorable. More specifically I am referring to the community moderation question. 7 out of 10 surveyed reported that they would support this method of submission. Even after hearing that a company could simply submit and check a few boxes to have the result listed in the proper area there was still a large amount of support for the idea from all those who supported it initially. The idea behind question six is that a company should not have to spend hours to optimize their page for search engines. When a page is created it should let the content and the quality of the site guide it to the rightful place in whatever search system. The user moderated forum is a take away from Wikipedia. Thomas Freidman mentions in “The World is Flat” that user moderated communities and user generated content is one of the great equalizers. Yes there can be abuses but in a community like a search engine moderation and submission community you could require real information from people and legitimize every submission by requiring logins for everyone who wants to submit or alter any web page classification entry. This combined with more advanced search functions such as a drop down classification box set at default and two other buttons that allow those not interested in higher level of customization to ignore everything and search as usual could be a revolutionary way to get more relevant results with less effort. Currently many systems could make the switch very easily. Yahoo! And Google have the infrastructure and Microsoft Live is creeping up right behind them. In the future I wouldn’t be surprised if many people make a move towards setting up this infrastructure. The system of Gmail and Google loyal subscribers along with Yahoo! Groups, and My Yahoo! have a great advantage to begin working on this now.
I don’t see these results too far off from now in the future. This is no great ten year prediction but more of a three year one. You can count on the big players in the industry right now to make advancements toward this. I would recommend a full-scale usability survey on the system that I mentioned. It is worth determining whether or not the drop down optional category system, optional ‘Smart Search’ button, and ‘Advanced Variable Search’ or ‘Binding Engine’ could be implemented in a clean user friendly fashion.












Bibliography

Albano, Jessica ‘Com 546 - Guest Lecture Week 2’ Communications Librarian, AUT 2006. University of Washington.

Cafarella, M. J. and Etzioni, O. 2005. A search engine for natural language applications. In Proceedings of the 14th international Conference on World Wide Web (Chiba, Japan, May 10 - 14, 2005). WWW '05. ACM Press, New York, NY, 442-452. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1060745.1060811

Eastman, C. M. and Jansen, B. J. 2003. Coverage, relevance, and ranking: The impact of query operators on Web search engine results. ACM Trans. Inf. Syst. 21, 4 (Oct. 2003), 383-411. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/944012.944015

Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1 (Jan. 2006), 51-78. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1125857.1125859

Friedman, Thomas L. “The World Is Flat: A Brief History of the Twenty-First Century” Updated and Expanded 2005-2006 New York, NY. Farrar Straus and Giroux.

Jansen, B. J., Mullen, T., Spink, A., and Pedersen, J. 2006. Automated gathering of Web information: An in-depth examination of agents interacting with search engines. ACM Trans. Inter. Tech. 6, 4 (Nov. 2006), 442-464. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1183463.1183468

Mandl, T. 2006. Implementation and evaluation of a quality-based search engine. In Proceedings of the Seventeenth Conference on Hypertext and Hypermedia (Odense, Denmark, August 22 - 25, 2006). HYPERTEXT '06. ACM Press, New York, NY, 73-84. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/1149941.1149957

Min, P., Halderman, J. A., Kazhdan, M., and Funkhouser, T. A. 2003. Early experiences with a 3D model search engine. In Proceeding of the Eighth international Conference on 3D Web Technology (Saint Malo, France, March 09 - 12, 2003). Web3D '03. ACM Press, New York, NY, 7-ff. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/636593.636595

Muramatsu, J. and Pratt, W. 2001. Transparent Queries: investigation users' mental models of search engines. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM Press, New York, NY, 217-224. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/383952.383991

Ntoulas, A., Cho, J., and Olston, C. 2004. What's new on the web?: the evolution of the web from a search engine perspective. In Proceedings of the 13th international Conference on World Wide Web (New York, NY, USA, May 17 - 20, 2004). WWW '04. ACM Press, New York, NY, 1-12. DOI= http://doi.acm.org.offcampus.lib.washington.edu/10.1145/988672.988674



Now note the New Engine Decipho:

http://www.searchenginejournal.com/?p=4122


Unfortunately because of Decipho's name I think this launch will ("be a total failure" Edit to: "Not be as successful as it could be" ), but you will never hear me say it was a bad idea, after all It was mine!

In order to have a name that can be competitive in the search market it has to have the same function that google has, that is currently in search people say "Google it" Well you need a name that can compete, My suggestion would be for Yahoo! to grab the domain Kwizz.com from the person who is redirecting it to a video game parking site and Implement my system to get people saying "Kwizz it".

Friday, December 08, 2006

Live push

Recently more and more things have been pushing me towards what i will refer to as 'Microsoft Live'. I have been getting increasingly depressed with the transition from IE 6 to IE 7. It seems like there has been a drop in usability with that switch over. Because of this and the negative windows vista reviews I have been trying to stay away from New Microsoft computer products. I don't prefer macs but right now it seems as if Microsoft is trying to drive many users towards them.

Man do I love the 360 though. XBOX 360 so far has been the single most perfect gaming tool somebody could invent. With spectacular graphics I couldn't wait to see what this baby performed like on the Xbox live network. Let me just say that Gears of War + xbox live = No life ever again. I think my grades slipped a whole .5 when I hooked my xbox up(sarcasm).. But because of this success of the Xbox 360 live system I decided to try my hand at Windows Live search and Windows live mail. I had to create a 'windows live ID' anyways for Xbox and decided at this point it would be less harmless to do so online. I was mostly right.

I already had a hotmail account and I liked the way it functioned. On hotmail I felt much like i could organize folders on a computer and direct files(emails) to save in certain spaces like on my actual computer. but due to my enjoyment of hotmail and xbox 360 i decided, with the guarantee that I could switch abck at any time during the beta, to try windows live out.

I like it. You can organize your email with the same functionality as before, the difference is now it appears more like a 'feed reader' than an email in box. there are many options and many looks. if you compare the difference between feed demon and windows live mail they may wind up seeming very similar. This push from microsoft i believe is simply how they frequently operate. sometimes it can be disappointing sometimes it can be something I truly appreciate, it's hit and miss. I'm glad i figured out however, that i would enjoy the new system before i was forced to use the new system I moved during the gentle nudge instead of being dumped in with the "push". another big incentive of this was the ability to have more email storage. hotmail voluntarily upgraded to 1 GB windows live offered 2 GB of free storage. I kind of felt like MSN was being so nice, I could at least return the favor a little bit and help them try their new product.

At this point I am not sure which I like more, windows live or hotmail. they are both very functional, but I have a feeling that I will wind up appreciating windows live more due to the availability of many extra features such as formatting that is in microsoft word and even emoticons.

Windows Live search on the other hand is a giant mess of gradients and does not perform the one thing that allowed google to succeed over more relevant search methods. ITS SLOW AND LAGGY. I actually almost hate it.

I seem like a terrible blogger.

Recently seattle faced a little downtime due to snow and ice. If you live in seattle you understand that a little bit of ice can shut the city down for hours. It's the type of thing that makes you want to curl up by a fire place and watch re-runs of ally mcbeal. Don't ask.

So during this time, I attempted to function normally, I had a lot of things to accomplisha nd i certaintly wasn't going to let a little thing like a blanket of ice an inch thick all over town stop me from doing them.
This snowy situation however happened to one of the most wired towns i can think of. Most people I know in seattle have cable internet at this point and often cable everything else as well. One of the ways I had to use technology because of this heavy snow storm was i was unable to commute to work within reasonable amounts of time so I worked from home. This was the second time I had ever worked from home before but this time there was something fundamentally different.

The first time I was ill, this time i was not. I was trying to be productive as possible on my PC at home. The first thing I found myself doing was organizing the system to give me as little trouble adapting to a network of communication i usually participate on with a mac.

The key difference in the way I used technology during this time is during this time period the IT dept where i work set up an internet based chat group/im web browser application with cammpfire. this allowed our group of co workers who are already spread about the country to function more seamlessly with each other and accomplish emergency goals more quickly. So anyways, because of all this innovation there was no difference in how much I was able to accomplish at home than from what I was able to do at work. Actually I was able to do more because I did not have many hours of commuting and was able to work earlier and stay working later.

I suppose in pre-internet eras I would have played console video games or watched reruns of holiday season television snows to make me feel better about not being able to work, and of course there would have to be a fireplace instead of a campfire.

Friday, December 01, 2006

Gilmore Guy.

When I read, in chapter 3 of Dan Gilmore's 'We the Media', about RFID chips and how they were being described, I realized that this technology that has been developed could actually be a boon to consumers. Previously I have heard of chips like radio frequency imbedded devices or something similar to that affect (RFID) and I knew that walmart was using them to track the products going in and out of their stores so noone ever had to do any inventory ordering.. I also knew that this could be used to track repeat customers due to the chips never expiring. but what i ahdn't considered is how much of a boon this could actually be to consumers. We are able to utilize the same reading devices much the same as the company that provides the chipped items are. Now imagine if as Gilmore mentioned you could find out information on where your tennis shoes were made, or if there was important allergy information not included ona product. Imagine how useful it would be in identifying E.coli spinach growers if the people who had eaten the spinach had a chip somewhere in the food they had left.. and imagine if this chip had information on exactly where that food came from.

I wonder if it is better to be protected from this type of invasion of privacy or if it is better to exploit it to the consumer advantage?

Flattener 4

The reading that spoke to me the most was world flattener 4. Community generated content is a powerful tool that we can use for the betterment of society. Sharing information freely is key to the beggining of a better educated public. Groups and websites like wikipedia allow people to moderate discussion on an academic level.. no longer is the privaledge of information restricted to a few people who haave PHD's and can be published in academic journals. Wikipedia provides a forum where content can be group moderated and edited. Community content is a wave that I don't think is going to stop anytime soon, yes it has it's flaws such as annonimity but these can be taken care of, in fact many might not consider these as large a flaw now as they were in previous days as we discussed in class.

Friedman mentions open source development of operating system linux as part of this miraculous development. The more people share the greater the progress a group can make in an area especially on group endeavors on the internet. This is a flattener that i cannot see a downside too. Information for all.