Data Scientist & AI Systems Builder

Victor
Vassallo

Turning complex information into decisions.

I design systems that transform complex data, documents, and policy contexts into actionable decisions — bridging technical rigor with real-world impact.

Pipelines in Production
NLP + Econometrics
Policy-Grade Decision Support
Scroll
BOOM — Opportunity Intelligence
Agent Active
Opportunity Queue
IT Modernization — DOC
Federal92
Data Analytics Platform — HHS
Federal88
Broadband Planning — NTIA
Grant61
GIS Mapping Services — DOT
Federal58
Research Support — NSF
Grant34
IT Modernization Support & Advisory Services — Dept. of Commerce
Route →Summarize
Sector Fit
0.87
Capability Match
0.64
Strategic Align
0.91
Agent Reasoning
Strong alignment with federal IT modernization portfolio. Prior DOC engagement and data platform work increases win probability. Recommending BD Team A routing with 48hr response window.
Agentic WorkflowProprietary ScoringAuto-Routed
Decision Intelligence RAG
CorpusHybridDoc-Only
VV
What has our team previously written about broadband equity measurement frameworks?
AI
Based on 4 sources retrieved from your corpus, your team has developed three distinct measurement frameworks for broadband equity. First, a digital equity index combining FCC availability data with census tract-level adoption indicators and affordability benchmarks src_02. Second, a provider-level scoring methodology applied during the NTIA grant analysis, weighting deployment commitments against historical underservice src_07. Third, a mixed-methods rubric that integrates qualitative community survey inputs with quantitative coverage data to produce a composite access score src_11. The equity index is your most-cited internal artifact, referenced in 6 subsequent deliverables.
Retrieved Sources
broadband_equity_index_v3.pdfsim 0.94
ntia_grant_analysis_final.docxsim 0.89
mixed_methods_framework_2024.pdfsim 0.82
Ask a question about your documents...
Qualitative Research Database
3 AgenciesSemantic Index847 Records
AllInterviewsReportsSurvey
Interview · USDA Rural Devsim 0.96
"Residents cited cost as the primary barrier, followed by a lack of awareness that affordable programs existed. Even where infrastructure was present, digital literacy gaps prevented meaningful adoption."
Report · FCC Broadband Studysim 0.91
Analysis of 12 rural counties found device availability and perceived relevance as consistent secondary barriers alongside infrastructure gaps.
Survey · NTIA Program Evalsim 0.87
68% of non-adopters indicated affordability as their primary reason for not subscribing to available broadband service.
Theme Distribution
Cost / Affordability
Digital Literacy
Infrastructure Gap
Device Access
Perceived Relevance
Source Breakdown
USDA Rural Dev312
FCC Studies284
NTIA Program251
Operational Analytics — Resource Allocation
3 on track 1 at risk 1 critical
Active Workstreams
12
↑ 2 this week
Team Utilization
84%
⚠ Near capacity
On-Time Delivery
91%
↑ 6pts vs last qtr
Scope Overages
2
↓ flagged for review
Workstream Progress
Federal Broadband Analysis18d remaining
RAG Pipeline Deployment5d remaining
Policy Document NLP12d remaining
⚠ Staffing gap detected
Client Dashboard Q32d remaining
✗ Behind schedule
Econometric Modeling24d remaining
Capacity by Role
Analytics
92%
ML / NLP
78%
Research
65%
Delivery
88%
ML Forecast Alert
Analytics role projected to exceed capacity in 6 days. Recommend rebalancing 2 tasks to Research track.
BOOM System
Agentic Opportunity Intelligence
vv2 visual

Systems-oriented.
Analytically precise.

"My work focuses on turning fragmented data, documents, and qualitative inputs into structured, usable intelligence."

I'm a data scientist and AI systems builder who operates at the intersection of technical architecture and applied research. My work spans decision support systems, knowledge pipelines, and economic analysis — with a consistent focus on making complex information actionable.

I bring together machine learning, NLP, and systems design with deep domain knowledge in economic and policy research. I build things that are meant to be used — not just analyzed.

Data Systems
Modeling + NLP
Decision Impact

What I Build

Decision Support Systems

AI-powered pipelines that synthesize multi-source intelligence into clear, structured outputs for high-stakes decisions.

Document & Knowledge Systems

OCR, LLM extraction, and retrieval-augmented architectures that unlock value locked in unstructured documents.

Analytical Models

Predictive models, NLP classifiers, and statistical frameworks tuned for operational and policy contexts.

Economic & Policy Analysis

Rigorous quantitative analysis connecting data-driven methods with economic theory and policy implications.

Past Experience

Kaptivate LLC
May 2024 — Present
Data Scientist

Develop and maintain production-grade analytics pipelines and agentic workflows that unify structured records with narrative text sources. Apply ML/NLP to transform qualitative inputs into decision-ready metrics and support econometric broadband impact modeling used in policy research.

Click for expanded background
Alexandria, VA
  • Design and maintain production data pipelines integrating structured records, documents, and narrative inputs across multiple federal and operational data sources to enable decision-ready analytics.
  • Develop and deploy NLP and semantic modeling systems that classify qualitative research inputs, extract meaning at scale, and transform unstructured text into structured, queryable knowledge.
  • Build data validation, anomaly detection, and monitoring frameworks with schema-constrained extraction pipelines and threshold alerting to ensure integrity across multi-source environments.
  • Create dashboards and analytical deliverables used by 30+ program staff, translating complex operational and policy datasets into clear, actionable insights for technical and non-technical stakeholders.
  • Architect and implement several retrieval-augmented generation systems including a contract opportunity intelligence platform that cut review time 70% and supported $4M+ in successful federal funding applications.
Data Society LLC
May 2023 — August 2023
Data Science / Machine Learning Intern

Developed supervised and unsupervised machine learning models for client research engagements, engineered ETL pipelines to prepare large datasets for modeling, and implemented evaluation practices to strengthen reproducibility.

Click for expanded background
Washington, DC
  • Developed supervised and unsupervised machine learning workflows for client-facing research projects, improving predictive performance and supporting reproducible, well-documented analytical outputs.
  • Engineered Python- and SQL-based ETL processes to clean, transform, and organize large datasets for modeling, reporting, experimentation, and downstream analytics applications.
  • Applied natural language processing and automated text analysis techniques to extract insights from unstructured data, improving research efficiency and analytical consistency.
  • Evaluated model behavior, validated datasets, and produced technical documentation that improved transparency, usability, and communication of analytical results across stakeholders.
AARP
May 2022 — August 2022
Membership Lifecycle Management Intern

Built predictive models to analyze membership behavior and improve retention-oriented campaign strategy. Automated recurring data pipelines and integrated structured and unstructured CRM data for fuller lifecycle visibility.

Click for expanded background
Washington, DC
  • Built predictive and descriptive models analyzing member behavior, supporting retention strategy, campaign optimization, and more targeted decision-making across business units.
  • Automated recurring data preparation and reporting workflows, reducing manual effort while improving consistency, timeliness, and accessibility of operational insights for stakeholders.
  • Integrated CRM, behavioral, and reporting data into unified analytical views that improved visibility into lifecycle trends, segmentation opportunities, and performance drivers.
  • Developed dashboards and presentation-ready analyses that translated complex membership data into clear recommendations for planning, strategy, and cross-functional collaboration.

Key Projects

Decision Intelligence RAG

A production-grade, multi-source RAG system with corpus, document-only, and hybrid modes that delivers citation-ready, context-grounded intelligence for policy and program decisions.

RAG Multi-Source Managed LLM
Predictive Forecasting & Strategic Design System

A machine learning decision system that forecasts demand and outcome quality, then operationalizes targeted intervention strategies to improve participation and resource alignment.

ML Forecasting Decision Systems
Published Research on Economic Impact of Federal Broadband Funding

Published mixed-methods research combining regional econometric analysis and qualitative evidence synthesis to evaluate federal broadband funding impacts and policy implications.

Econometrics Mixed Methods Policy Research

Areas of Active Focus

AI-driven decision systems for civic and government contexts

Designing agentic pipelines and structured reasoning workflows that support human-governed decisions in policy and program environments.

Agentic AI Civic Tech Decision Design
Scalable semantic modeling across multi-source policy and opportunity data

Building NLP and embedding-based systems that extract, align, and surface meaning from heterogeneous document corpora and operational data.

NLP Semantic Search Embeddings
Applied economic analysis of federal broadband & infrastructure policy

Using econometric modeling and mixed-methods research to evaluate federal investment outcomes and inform infrastructure policy recommendations.

Econometrics Policy Research Mixed Methods
Advanced data visualization for document-heavy research and decision workflows

Translating complex, multi-layered datasets and document-driven findings into clear, interactive deliverables for analytical and executive audiences.

Data Viz Dashboards Stakeholder Comms

Systems &
Projects

A deep look at the systems, pipelines, and analyses I've built — organized by domain and designed for exploration.

01 Decision Systems
BOOM (Business Optimization & Opportunity Management) System
An agentic opportunity intelligence system that monitors, evaluates, and routes contract and grant opportunities through a human-governed workflow.
BOOM LLM GCP AI
+
Overview

An end-to-end agentic platform that ingests contract and grant opportunities, evaluates them against organizational priorities, and executes downstream actions including prioritization, routing, summarization, and analyst escalation.

Combines automated decision support with human oversight, enabling faster and more consistent pursuit workflows.

Problem

Business development teams face fragmented, high-volume opportunity streams with inconsistent structure and changing requirements.

  • Traditional tools surface opportunities but don't actively move them through the pursuit process
  • Missed high-fit opportunities due to manual triage bottlenecks
  • Inefficient use of analyst time on low-value screening tasks
Architecture

BOOM operates as a stateful orchestration layer that:
(1) continuously ingests and monitors opportunity sources,
(2) normalizes and semantically structures incoming data,
(3) retrieves relevant internal context from capabilities, offerings, and prior work,
(4) evaluates opportunity fit using proprietary scoring logic, and
(5) conditionally triggers downstream actions such as routing to teams, generating summaries, initiating document analysis, and preparing decision support outputs.

Decision Impact
~70%
reduction in review time
500+
opportunities identified / month

By automating triage, prioritization, and initial analysis, BOOM allows teams to focus entirely on high-value pursuit decisions rather than screening.

Architecture Diagram
BOOM Architecture Diagram
02 Document & Knowledge Systems
Decision Intelligence RAG
A production-grade, multi-source RAG pipeline with three operating modes: RAG corpus, document-only, and hybrid retrieval for decision-ready answers.
RAG Multi-Source Managed LLM GCP
+
Overview

A production-grade decision intelligence RAG system engineered for high-reliability retrieval, grounded generation, and operational scalability. It supports corpus retrieval, document-scoped retrieval, and hybrid retrieval across persistent and session-level sources.

Problem

Critical documents, prior work artifacts, and institutional company IP were fragmented across disconnected databases and repositories, making knowledge hard to locate at decision time. That fragmentation slowed deliverable production, weakened proposal development velocity, and increased the risk of repeating work or missing high-value evidence during response windows.

Architecture

The system unifies ingestion, retrieval, and generation in a single operational pipeline:
(1) query and document inputs are routed across corpus, document-only, or hybrid modes,
(2) multi-index retrieval materializes chunk/parent/neighbor context with metadata-aware reranking and diversification, and
(3) a managed LLM layer generates citation-grounded responses with streaming, retry/backoff resiliency, concurrency controls, token-budgeted context assembly, and telemetry for auditability.

Decision Impact
> 200%
increase in proposal production / month
5
data silos converted into a single knowledge base

By centralizing fragmented documents and institutional IP into a retrieval-first workflow, the system reduced knowledge loss during active pursuits, improved delivery speed, and fueled more consistent decision support across recurring and ad hoc requests.

RAG Pipeline
Decision Intelligence RAG Flow Diagram
Qualitative Research Database
A context-aware qualitative intelligence system that structures diverse evidence into a semantic layer for policy and program research.
Embeddings Semantic Index Metadata Knowledge Base
+
Overview

A production qualitative research database that transforms unstructured evidence into a context-aware semantic knowledge layer for analysis, synthesis, and decision support.

Problem

Critical research evidence arrived in fragmented formats and channels, including federally mandated reports, interviews, survey instruments, and scraped media references. Manual cross-source synthesis was slow, inconsistent, and difficult to operationalize at scale.

Architecture & Methods

The system was designed as a single qualitative intelligence workflow:
(1) ingest and normalize heterogeneous evidence sources into a shared, schema-flexible structure,
(2) preserve provenance and metadata at the source and record level for traceability,
(3) generate embeddings and vector indexes to enable semantic retrieval alongside structured filtering, and
(4) assemble context windows with entity tags and citation-linked references for reliable synthesis and reporting.

Decision Impact
3
federal agency datasets consolidated
1
published research paper enabled

Consolidating cross-agency qualitative data improved research continuity, strengthened evidence reuse for reporting and proposals, and supported publication-ready analysis.

Qualitative Intelligence System
Qualitative Research Database Flow Diagram
03 Business Analytics & Operations
Operational Analytics & Resource Allocation Dashboard
An operational intelligence system using machine learning and predictive analytics to optimize workflows, project resources, and coordination across teams.
ML Analytics Predictive Ops Dashboard Risk Monitoring
+
Overview

An operational intelligence platform combining machine learning, predictive workflow analytics, and resource-allocation visibility to improve delivery performance in multi-project environments.

Problem

Program leadership lacked timely integrated signals for workflow health and resource pressure. This increased exposure to burnout risk, workstream fragmentation, and poor coordination across interdependent project teams.

Architecture & Methods

The system was built as a unified operational decision workflow:
(1) integrate workflow, staffing, and delivery signals into a shared operations model,
(2) map dependencies between workstream stages and resource capacity to expose coordination pressure points,
(3) apply forecasting, anomaly detection, and risk scoring for burnout, fragmentation, and handoff bottlenecks, and
(4) surface role-specific dashboards and alerting layers to support earlier intervention and allocation decisions.

Decision Impact

Teams completed workstreams in fewer hours than prior projections while improved division of labor reduced duplicate effort and clarified ownership. Operational response cycles accelerated as risks were surfaced sooner and addressed earlier.

Operational Intelligence System
Operational Analytics Dashboard Pipeline Diagram
Predictive Forecasting & Strategic Design System
A machine learning and strategy-design system for forecasting demand and outcome quality, then proactively shaping federal program results through targeted interventions.
ML Forecasting Decision Systems Strategy Design
+
Overview

Designed and implemented a predictive system that forecasts participation dynamics and outcome quality, then translates those forecasts into targeted intervention strategies. In federal participation-driven programs, this enables teams to anticipate variability, optimize resource allocation, and influence outcomes instead of reacting after trends emerge.

Problem

Organizations operating in uncertain, participation-driven federal environments lacked visibility into future demand and outcome distribution. This produced resource mismatches (overload vs. underutilization), ineffective or poorly timed outreach, and limited ability to correct course once trends appeared. Even where forecasts existed, there was no structured system for operational action.

Architecture & Methods

The system was implemented as a closed-loop predictive decision workflow:
(1) model expected demand and outcome quality from historical program data, behavioral signals, and structural characteristics,
(2) calibrate probabilities and uncertainty ranges to support reliable deployment thresholds,
(3) diagnose key drivers with feature attribution and segment scenarios (low participation risk, low quality risk, imbalance), and
(4) activate targeted intervention strategies through decision rules that link forecasts directly to operational actions.

Decision Impact
~15%
forecast variance from actual outcomes
3
risk scenarios operationalized

Enabled proactive resource allocation aligned to predicted demand, improved participation and outcome quality through targeted interventions, and reduced intuition-only decision making by embedding data-driven rules into federal program operations.

Predictive Forecasting / Strategic Design Flow
Predictive Forecasting Strategic Design Loop Diagram
04 Economic & Applied Analysis
Published Research on Economic Impact of Federal Broadband Funding
Published mixed-methods research quantifying the economic effects of federal broadband investment and integrating qualitative evidence for policy interpretation and implementation guidance.
Econometrics Mixed Methods Policy Research Published
+
Overview

A published research project analyzing the economic impact of federal broadband funding through a custom mixed-methods framework that pairs regional econometric analysis with large-scale qualitative evidence synthesis.

Problem

Policy stakeholders needed causal, region-specific estimates of economic effects, but quantitative findings alone were insufficient for implementation interpretation. Existing analyses lacked a unified structure linking economic outcomes to on-the-ground qualitative evidence from communities and program actors.

Approach & Methods

The research used an integrated mixed-methods design:
(1) regional econometric modeling estimated effects across employment, income, and local business indicators,
(2) qualitative evidence was systematically indexed and analyzed to interpret mechanisms and implementation constraints, and
(3) both evidence streams were synthesized into a single policy analysis layer for stronger confidence, sensitivity testing, and applied guidance.

Decision Impact
3
prominent research conferences
10+
national media outlets citing findings

Findings informed federal broadband policy conversations and implementation strategy discussions, with integrated quantitative and qualitative evidence improving confidence in where and why impacts were strongest.

Analytical Framework
Broadband Funding Economic Impact Framework Figure
NLP Policy Intelligence & Geospatial Analysis
A unified NLP + policy research + geospatial approach to analysis, combining public input intelligence with implementation-plan analysis to support broadband strategy and evidence-based policy design.
NLP Geospatial Advanced Viz Policy Analysis
+
Overview

Combined listening-session NLP analysis with implementation scenario research into a single service layer for policy intelligence. The system synthesizes unstructured inputs, policy documents, and geography-linked infrastructure data to surface actionable implementation insights.

Problem

Public inputs and state implementation choices were being analyzed in silos, obscuring cross-signal patterns. Stakeholders lacked an integrated framework to connect thematic concerns, sentiment trends, geographic disparities, and likely policy outcome tradeoffs.

Approach & Advanced NLP Methods

The system used a unified policy intelligence workflow:
(1) ingest comments, transcripts, and planning artifacts into a shared analysis layer,
(2) apply supervised and weakly supervised classification, semantic clustering, sentiment analysis, and topic modeling to surface dominant concerns and emerging issue clusters,
(3) link NLP outputs to geospatial coverage and demographic context for comparative scenario assessment, and
(4) communicate results through advanced visuals including choropleth and bivariate maps, Sankey flows, and stakeholder-topic network graphs.

Decision Impact
10
policy analyses produced
7
analyses cited and published by national media

Delivered integrated NLP and geospatial evidence products that improved policy scenario evaluation and accelerated synthesis during active federal and state broadband decision windows.

NLP / Geospatial Visuals
Policy Intelligence Rotation Visualization Partnerships Map Visualization

Contact &
Speaking

Open to collaborations in AI systems, data science, and applied research.

Email
victorevassallo@gmail.com

Best for project inquiries, collaborations, and speaking requests.

LinkedIn
View LinkedIn Profile →

Professional background, publications, and updates.

Resume
Download PDF →

Full work history, technical skills, and education.

Current Availability
Selectively Available

Open to contract, advisory, and collaborative research engagements in AI systems and policy analysis.

AI Systems Design

RAG architectures, decision pipelines, document intelligence

Applied Data Science

NLP, predictive modeling, analytical systems for civic/gov contexts

Policy & Economic Analysis

Broadband, digital equity, federal program analysis and evaluation

Speaking Engagements

Conference Presentation
Western Regional Science Association (WRSA) Annual Meeting
Mixed Methods Approaches to Broadband Research and Policy Analysis

Presented research at an international, multidisciplinary conference of economists, policymakers, and regional scientists focused on spatial and economic analysis. Shared a mixed-methods framework integrating quantitative modeling with qualitative insights to inform broadband policy and regional development strategies.

Panel
Federal Reserve Digital Access Research Forum
Connecting Minority Communities

Panelist at a national convening hosted by multiple Federal Reserve Banks, bringing together researchers, policymakers, and practitioners to advance work on digital access and economic inclusion. Contributed perspectives on equity-focused broadband deployment and the role of data in identifying and addressing disparities in underserved communities.

Presentation
Broadband Breakfast (National Broadband Media Platform)
BEAD Policy Restructuring & NLP Applications in Policy Analysis

Delivered a policy-focused presentation for a national broadband policy audience, examining evolving BEAD implementation strategies and demonstrating how NLP methods can be used to systematically analyze policy documents and public input at scale.

Conference Talk
ForwardDMV (Regional Workforce & Innovation Conference)
Identifying and Implementing AI-Supported Workflows

Panel discussion with regional leaders and practitioners on practical approaches to identifying high-value AI use cases within organizations, with emphasis on implementation strategy, change management, and aligning technical capabilities with operational needs.

Available for speaking on AI systems, data science,
and broadband & digital equity policy.
Send Speaking Inquiry →