A Machine Learning Approach for Result Caching in Search Engines

Tarih: 6 May, 2015 - 13:00

Konum: B141

Tayfun Küçükyılmaz
University of Turkish Aeronautical Association

A commonly used technique for improving search engine performance is result caching. In result caching, pre-computed results (e.g. URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already cached can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate, i.e., the fraction of queries that are answered by the cache. In this talk, a machine learning approach that combines a large variety of features extracted from search engine query logs to improve the hit rate of the result cache will be discussed. Compared to the state-of-the-art baseline framework, the proposed approach improves the hit rate by 0.66%, i.e. 7.8% of the possible improvement. The basic outline of the talk consists of 4 parts. In the first part, the caching problem for Web Search engines will be motivated along with the state-of-the-art techniques. In the second part, the key features that are used in our machine learning model are discussed. Several baseline techniques and newly proposed techniques are discussed in part 3. Finally in part 4, our experimental setup and results are discussed. This talk is based on joint work with B. Barla Cambazoglu (Yahoo! Research Barcelona), Ricardo Baeza-Yates (Yahoo! Research Barcelona), and Cevdet Aykanat (Bilkent University).