Till KTH:s startsida Till KTH:s startsida

Project #03

3.pdf

Title: Efficient web-scraping

Leader's Name: Michael Håkansson
Member2 Name: Henrik Hygerth

Related paper: Sigmod Record, 2000 Jun, Vol.29(2), pp.117-128 and is called "Synchronizing a database to improve freshness"
Presentation Day: May 25

Model: LE

Abstract: Scraping the web for data and efficiently presenting the data to multiple clients can cause a problem regarding efficiency. Having all clients individually calling third-party services imply lost control over the data, where a lot more data than needed might be sent to the client. Also, some API:s have a usage limited, where the limit might be reached quickly if all clients make individual calls to the API. By storing all scraped data in a database on a server and push only relevant data to the clients, efficiency in both client and towards third-party service can be achieved.

The aim of this project is to scrape some data source(s) - source(s) yet to be decided - store the relevant data in a database and present the data in a client prototype. This will be done using a top-down, compositional, and incremental approach as presented by DeRose et al.

synchronizing-a-database-to-improve-freshness.pdf