Project #03
Title: Efficient web-scraping
Leader's Name: Michael Håkansson
Member2 Name: Henrik Hygerth
Related paper: Sigmod Record, 2000 Jun, Vol.29(2), pp.117-128 and is called "Synchronizing a database to improve freshness"
Presentation Day: May 25
Model: LE
Abstract: Scraping the web for data and efficiently presenting the data to multiple clients can cause a problem regarding efficiency. Having all clients individually calling third-party services imply lost control over the data, where a lot more data than needed might be sent to the client. Also, some API:s have a usage limited, where the limit might be reached quickly if all clients make individual calls to the API. By storing all scraped data in a database on a server and push only relevant data to the clients, efficiency in both client and towards third-party service can be achieved.
The aim of this project is to scrape some data source(s) - source(s) yet to be decided - store the relevant data in a database and present the data in a client prototype. This will be done using a top-down, compositional, and incremental approach as presented by DeRose et al.