Crawling Competitors
To support our sales team, I built an automated system that crawls competitor websites to gather provider information and supply salesforce with all required info for a lead to be created.
The Challenge
Our competitors maintained searchable provider directories on their websites, but manually searching them was inefficient and often left gaps in coverage. The sales team needed a way to consistently identify new providers across the country without dedicating hours to repetitive searches.
The Solution
Database Architecture and Workflow Control
I designed a robust system workflow anchored by a MySQL database to manage the scraping process efficiently. This involved creating a master table containing all U.S. zip codes, each assigned a specific processing state: null for unstarted tasks, running for active processes, done for completed scrapes, and failed for errors. Additionally, I built a secondary table to store the extracted provider details—such as names, addresses, and phone numbers—which were directly linked back to their originating zip codes to ensure data integrity.
Automated Data Extraction
For the automation layer, I utilized Puppeteer to control the browser and interact with the competitor’s provider search page. The script programmatically iterated through the zip codes marked as null, entering each one into the search field and executing the query. Once the results populated, the system extracted the relevant provider data and committed the information directly into the Providers Table.
Error Handling and System Reliability
To ensure reliability, I implemented comprehensive error handling and recovery mechanisms. If a timeout, parsing failure, or site issue occurred, the system automatically flagged that specific zip code as failed for later review, preventing the entire process from crashing. Conversely, successfully processed zip codes were updated to done, allowing the script to safely pause and resume operation exactly where it left off without duplicating work or missing data.
Results
This project transformed sales operations by delivering a scalable, automated lead source that processed thousands of zip codes into a centralized, queryable database. By replacing manual data hunting with a structured and repeatable workflow, the system ensured comprehensive geographic coverage and allowed the team to shift their focus entirely to outreach and conversions. The robust architecture ensured no area was missed due to errors, while the centralization of data provided critical strategic agility; notably, when FDA issues affected a competitor’s device, I was able to leverage the database to generate a targeted list of providers using that product within 24 hours, enabling the sales team to pivot immediately.