CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking by autonomous citation indexing.
CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. The service transitioned to the Pennsylvania State University’s College of Information Sciences and Technology in 2003. Since then, the project has been led by Professor Lee Giles.
After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of its original architecture. Since its inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system’s capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the “Next Generation CiteSeer,” or CiteSeerx, in order to continue the CiteSeer legacy into the foreseeable future.
CiteSeerx is an evolving scientific literature digital library and search engine that has focused primarily on the literature in computer and information science. CiteSeerx aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge. Rather than creating just another digital library, CiteSeerx attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeerx has developed new methods and algorithms to index PostScript and PDF research articles on the Web.