Scraping, aggregation, and cleaning of web data on publicly listed companies for a hedge fund.

Web Scraping

A prominent financial services firm (which we cannot reveal) asked us to scrape the web for large amounts of data on a public company and process that data into a format they could use to drive investment decisions.

Given our experience scraping thousands of websites at Yipit, we were well-prepared for the task at hand.

  • We built a scalable system to parse and standardize data from many different sources of markup.

Here are the problems we solved during this project, none of which were new to us:

  • Rate-limiting
  • Old web pages that are no longer linked
  • Inconsistent markup across pages with similar content
  • Non-standard pagination

We also employed the following techniques:

  • Set up a farm of remote machines to scrape the data in parallel
  • Ensured data quality using automated tests and manual QA

When working with large companies in the finance industry, data quality and integrity is of paramount importance. That is something we understand and are able to deliver.

Whether you are a financial, consulting, or research firm, we can fulfill your data needs with speed and accuracy.


Click here to hire us