Web search query 240
A 2005 study of Yahoo's query logs revealed 33% of the queries from the same user were repeat queries and that
87% of the time the user would click on the same result.[5] This suggests that many users use repeat queries to revisit
or re-find information. This analysis is confirmed by a Bing search engine blog post telling about 30% queries are
navigational queries [6]
In addition, much research has shown that query term frequency distributions conform to the power law, or long tail
distribution curves. That is, a small portion of the terms observed in a large query log (e.g. > 100 million queries) are
used most often, while the remaining terms are used less often individually.[7] This example of the Pareto principle
(or 80 – 20 rule) allows search engines to employ optimization techniques such as index or database partitioning,
caching and pre-fetching.
But in a recent study in 2011 it was found that the average length of queries has grown steadily over time and
average length of non-English languages queries had increased more than English queries.[8]
Structured queries
With search engines that support Boolean operators and parentheses, a technique traditionally used by librarians can
be applied. A user who is looking for documents that cover several topics or facets may want to describe each of
them by a disjunction of characteristic words, such as vehicles OR cars OR automobiles. A faceted
query is a conjunction of such facets; e.g. a query such as (electronic OR computerized OR DRE) AND
(voting OR elections OR election OR balloting OR electoral) is likely to find documents
about electronic voting even if they omit one of the words "electronic" and "voting", or even both.[9]
References
[ 1 ]Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze (2007), Introduction to Information Retrieval (http:/ / nlp. stanford. edu/
IR-book/ pdf/ 19web. pdf), Ch. 19
[ 2 ]Dawn Kawamoto and Elinor Mills (2006), AOL apologizes for release of user search data (http:/ / news. com. com/ 2100-1030_3-6102793.
html)
[ 3 ]Amanda Spink, Dietmar Wolfram, Major B. J. Jansen, Tefko Saracevic (2001). "Searching the web: The public and their queries". Journal of
the American Society for Information Science and Technology 52 (3): 226–234.
doi:10.1002/1097-4571(2000)9999:9999<::AID-ASI1591>3.3.CO;2-I.
[ 4 ]Mark Sanderson and Janet Kohler (2004). "Analyzing geographic queries" (http:/ / http://www. geo. unizh. ch/ ~rsp/ gir/ abstracts/ sanderson. pdf).
Proceedings of the Workshop on Geographic Information (SIGIR '04)..
[ 5 ]Jaime Teevan, Eytan Adar, Rosie Jones, Michael Potts (2005). "History repeats itself: Repeat Queries in Yahoo's query logs" (http:/ / http://www.
csail. mit. edu/ ~teevan/ work/ publications/ posters/ sigir06. pdf). Proceedings of the 29th Annual ACM Conference on Research and
Development in Information Retrieval (SIGIR '06). pp. 703–704. doi:10.1145/1148170.1148326..
[ 6 ]http:/ / http://www. bing. com/ community/ site_blogs/ b/ search/ archive/ 2011/ 02/ 10/ making-search-yours. aspx
[ 7 ]Ricardo Baeza-Yates (2005). Applications of Web Query Mining (http:/ / http://www. springerlink. com/ content/ kpphaktugag5mbv0/ ). 3408.
Springer Berlin / Heidelberg. pp. 7–22. ISBN 978-3-540-25295-5..
[ 8 ]Mona Taghavi, Ahmed Patel, Nikita Schmidt, Christopher Wills, Yiqi Tew (2011). An analysis of web proxy logs with query distribution
pattern approach for search engines (http:/ / http://www. sciencedirect. com/ science/ article/ pii/ S0920548911000808). 34. Elsevier. pp. 162–170.
.
[ 9 ]Vojkan Mihajlović, Djoerd Hiemstra, Henk Ernst Blok, Peter M.G. Apers. "Exploiting Query Structure and Document Structure to Improve
Document Retrieval Effectiveness" (http:/ / eprints. eemcs. utwente. nl/ 6918/ 01/ TR-CTIT-06-57. pdf).