Jelmer Rotteveel at Soar Music Group needed structured lead lists: artist contacts, curator profiles, music industry decision-makers. Eprecisio managed development of the scraping scripts that generated those lists, running them repeatedly as the business needed fresh data. Over years of running and improving those scripts, a pattern became clear: the underlying need was not a script, it was a platform. One that others in the music industry could use too.
That became FindSocial.
From scripts to platform: how the pilot evolved
The first phase was purely internal. Ehtisham's team built and iterated scraping scripts for Soar Music Group's outreach campaigns. Each iteration improved accuracy, coverage, and the quality of the data returned. By the time the idea of building a proper platform emerged, the scraping logic had already been refined through years of real-world use against real music industry data.
That history mattered. Most lead platforms in any vertical are built by people who understand software but not the data they are collecting. FindSocial was built by a team that had spent years understanding specifically what makes music industry lead data useful versus useless: what signals indicate an active curator versus a dormant one, what contact data is actually reachable, how to distinguish an independent artist from a major label imprint, and how the data decays over time.
The pilot-to-platform transition happened in stages:
| Phase | What was built | Output |
|---|---|---|
| Phase 1: Internal scripts | Scraping and lead generation for Soar Music Group | Structured CSV exports for outreach campaigns |
| Phase 2: Data infrastructure | Persistent storage, deduplication, enrichment pipeline | Reusable, queryable lead database |
| Phase 3: Platform build | Next.js frontend, search, filters, export | Product usable by non-technical users |
| Phase 4: AI-native scraping | Autonomous agents replacing manual script runs | Continuous 24/7 data collection and refresh |
| Phase 5: Scale | AWS horizontal scaling, 1M+ profiles in production | Production platform with live lead generation |
What we built
Agentic scraping system. The core data collection runs on autonomous agents built to operate continuously without human intervention. The agents navigate multiple data sources across the music industry, handle rate limiting and blocking gracefully, retry failed operations with backoff logic, and flag data quality issues rather than silently passing bad data downstream. Each agent is scoped to a specific data source and profile type, so failures in one do not cascade to others.
Data enrichment pipeline. Raw scraped data is not useful as-is. The enrichment pipeline cross-references profiles across multiple platforms, validates contact information, resolves duplicates, scores lead quality based on engagement signals, and categorises profiles by type (artist, curator, label, agency, influencer). A profile in FindSocial is not a raw scraped record. It is a verified, enriched, scored contact.
Real-time search and filtering. The PostgreSQL-backed search layer returns results in milliseconds across 1M+ profiles. Filters cover genre, location, follower count, engagement rate, platform presence, and contact availability. A music promotion team can find exactly the curators or artists they need for a specific campaign without exporting and filtering in a spreadsheet.
Production platform. Next.js frontend designed for non-technical users. Browse, filter, build lists, and export. The interface reflects how music industry professionals actually think about outreach, not how a developer would structure a database query.
| Component | What it does | Technology |
|---|---|---|
| Agentic scrapers | Continuous multi-source data collection, failure-resilient | Python, custom agent framework |
| Enrichment pipeline | Cross-referencing, validation, deduplication, quality scoring | Python, PostgreSQL |
| Lead database | 1M+ profiles, structured and queryable | PostgreSQL, AWS RDS |
| Search layer | Real-time full-text search with multi-dimensional filters | PostgreSQL full-text, custom indexing |
| Platform frontend | Browse, filter, export, manage lists | Next.js, FastAPI |
| Infrastructure | Scalable on scraping and search load | AWS, horizontal autoscaling |
What made this technically hard
Data quality at scale. Getting to 1M+ profiles is straightforward if you are willing to accept bad data. Getting to 1M+ profiles that are actually useful requires continuous quality enforcement. Contact information goes stale. Artists change labels. Curators go inactive. The enrichment pipeline has to catch degradation and either re-validate or flag the profile, not silently serve outdated data to users.
Rate limits and blocking across multiple sources. The music industry's data lives across Spotify, SoundCloud, Instagram, YouTube, music blogs, and dozens of niche platforms. Each has different rate limits, different anti-scraping approaches, and different data structures. The agent architecture was specifically designed to handle this heterogeneity: each source has its own agent with its own rate management, so one platform blocking a scraper does not stop data collection from the others.
Deduplication across sources. The same artist might appear under different names, with different spellings, on different platforms. Building a deduplication layer that correctly merges profiles without falsely collapsing distinct people into one record, and without missing real duplicates, was one of the more technically intricate parts of the data pipeline.
Results
| Metric | Before FindSocial | After FindSocial |
|---|---|---|
| Lead data availability | CSV exports from manual script runs | 1M+ profiles in a live, searchable platform |
| Data freshness | Stale between script runs | Continuously updated by autonomous agents |
| Time to build a targeted list | Hours of manual export and filtering | Minutes via platform search and filters |
| Data quality | Variable, unchecked | Enriched, validated, quality-scored |
| Access | One internal team | Multi-user platform for the music industry |
| Coverage | Artists and curators in one vertical | Artists, curators, labels, agencies, influencers |
What the platform represents
FindSocial is a proof of how pilot work compounds into real products when the underlying domain knowledge is there. The scraping scripts that started as a service for one client turned into a platform because the years of running and refining them produced genuine understanding of the data.
The music industry lead problem is not unique. Almost every vertical has the same challenge: contacts are scattered across dozens of platforms, quality is inconsistent, and the people who need the data are not technical. What makes FindSocial work is the combination of AI-native data collection and the domain knowledge built through years of working in this specific space.
For how we build AI-native data platforms from pilot to production, see our Development service.
If you are building a data product or lead platform and need a team that can take it from scraping scripts to production AI architecture, book a free 30-minute call.
