Data extraction methods: the details

This section walks more in depth about data extraction methods, specifically sharing more context for what method to use for what provider systems, pros and cons, and examples.

Web Scraping

Scraping data is simply looking at existing web sites that publish data and extracting it from the web pages. Web scraping is the easiest solution, because you might not have to deal with legal issues, such as getting data use agreements (relevant government counsel should be consulted if there are any questions). However, scraping has a variety of downsides, and using API calls is often a better final solution. Many states start by scraping and, once they have worked out legal agreements, move to API calls and turn off the scraper.

If your state has a community-based group doing similar work (e.g. VaccinateCA in California, VaccineTexas in Texas, FindYourVaccine.org and VaccineSpotter.com in many states), consider getting in contact with them and combining efforts. The State of Washington was able to launch their VAF more quickly by collaborating with a community group at CovidWA.com.

Constraints to be aware of when scraping:

  • Variable data display: Data is displayed differently on different sites and your code needs to accommodate these changes.

  • Changeable web sites: Web sites change fairly frequently. Keeping up with this can be onerous, as you may need to rewrite code with each change.

  • Many sites have CAPTCHA: There are various tools to work around CAPTCHA, but their use may raise ethical concerns. Sites with CAPTCHA have implemented the technology to discourage computers from accessing their data instead of humans.

  • Systems can overload from too much scraping traffic: Too many people scraping sites that are not set up for the traffic could overload systems and then no one can get an appointment.

It's best to have a central group, like USDR, do the scraping and then publish the data for states and other organizations to use.

To mitigate this risk, USDR is exploring shared data between states. For example, the States of Washington and New Jersey are working on ways to share data. This way, one state can get access to the data for all 50 states, reducing the load on provider data stores.

Integrations (especially APIs)

Many pharmacy chains have APIs which allow you to obtain vaccine provider location data. For example, CVS has a document for this (you’ll need to create an account to log in and see it, but that doesn’t require any agreements or fees). However, some nationwide pharmacies have no API yet, such as Rite-Aid.

There are also industry standards for healthcare data, Fast Healthcare Interoperability Resources (FHIR), which includes ways to describe appointment scheduling information. Many clinics using commercial software, like Epic, may support a standard API for appointment scheduling that you can use. SMART (Substitutable Medical Applications, Reusable Technologies) is another standards organization that is currently developing a standard for vaccination appointment availability APIs called “SMART Scheduling Links” that some clinics and pharmacies are supporting.

Standardized APIs

  • A standardized API that work nationally, across all states, are the ideal reliable and sustainable solution. CVS offers a standardized API, and more pharmacies are planning to do so too. This makes for minimal work to integrate with these systems. For them, a single legal agreement is needed.

Similar APIs

  • Similar APIs that are similar in many places (e.g. hospitals/clinics running Epic software): Because the same code can be used to get data from many providers, there is minimal technical work. However, relationships must be built with each hospital group. For example, Epic requires an agreement with them, and from each hospital and clinic individually for actual direct access to their data.

Note: Epic is waiving license fees for state governments during the Covid-19 pandemic, but you will still need an agreement with them.

Non-standard integrations

This type of integration with local pharmacies and small clinics running custom software requires significant technical and relationship-building work with, because you typically need unique software code and an agreement for each provider.

Data extraction from state-managed systems

Most states have a system for managing appointments at some locations, such as Microsoft VRAS, PrepMod, or VAMS. If purchased from a vendor, you may need to amend the vendor contract, or may simply need to direct the vendor to work with the VAF technical team to surface appointment data.

These systems are usually missing retail pharmacies, because the pharmacies have existing, robust scheduling systems they prefer to use. States using the same vendor may often be able to share tools for accessing appointment data.

Provider-driven updates

Some states have put the onus on clinics to update their availability information. In these states, clinics repeatedly update no-code forms indicating whether they have appointments or not. This option does not require as much software development but clinics may not update their availability frequently.

Robocalling

Robocalling is an option if you have the personnel to implement it. With this option, a computer calls each provider and/or provider location. State personnel then have to follow-up each positive robocall. Like provider-updated data, it is unlikely to get updated very frequently, but talking with staff at provider locations can provide you with much richer qualitative information that an appointment API or scraper can not. For example, pharmacy staff might tell you they open their vaccination appointments every Tuesday at 8:00am.

Last updated