Home/What's New/Carrier Vector Embeddings — 800K+ Fleet Profiles
    Data UpdateMarch 6, 2026

    Carrier Vector Embeddings — 800K+ Fleet Profiles

    Today, we're releasing Carrier Embeddings — a foundational AI capability that changes how AlphaLoop understands, compares, and surfaces carriers. It's the engine powering our new CarrierMatch feature, and it's built on the same transformer architecture behind ChatGPT.

    Carrier Vector Embeddings — 800K+ Fleet Profiles

    Traditional carrier search is broken — not because the data is bad, but because it treats every data point as if it exists in isolation. Fleet size tells you nothing without knowing where a carrier runs, what they haul, and how they're equipped. Two carriers with identical truck counts can be completely different businesses serving completely different needs.

    Today, we're releasing Carrier Embeddings — a foundational AI capability that changes how AlphaLoop understands, compares, and surfaces carriers. It's the engine powering our new CarrierMatch feature, and it's built on the same transformer architecture behind ChatGPT.

    The Problem: Numbers Don't Tell the Whole Story

    Consider two carriers, both with 3 trucks. Are they similar? To a traditional database query, yes. To anyone who works in transportation, obviously not.

    Carrier A: Local Delivery Co.

    3 trucks, 2 states, no sleepers, day routes

    Carrier B: Cross-Country Hauler

    3 trucks, 15 states, all sleepers, long-haul

    Traditional search would call these identical. Any experienced broker knows they're completely different businesses. Our AI now knows that too.

    Our Solution: Teaching AI to See the Whole Picture

    We built a transformer model trained on every registered carrier in the United States — 2.3 million carriers, 60+ data points each. The model doesn't look at data points one at a time. It understands how all the pieces fit together, the same way an experienced industry professional would.

    Step 1: We Collect the Full Picture

    For each carrier, we ingest over 60 distinct data points:

    • Fleet size and equipment types

    • States and regions served

    • Safety scores and inspection history

    • Cargo types and specializations

    Step 2: Data Points Talk to Each Other

    This is where the real intelligence happens. Each data point doesn't just sit in a column — it interacts with every other data point to understand what it actually means in context.

    The model learns that "3 trucks + 2 states + no sleepers" describes a fundamentally different business than "3 trucks + 15 states + all sleepers" — even though the first number is identical. Context is everything.

    Step 3: Create a Carrier Fingerprint

    All of that contextual understanding gets compressed into a 128-dimensional embedding — a unique mathematical fingerprint for every carrier. Think of it as a carrier's operational DNA.

    Two carriers with similar fingerprints operate similarly — even if their raw numbers look different on paper. This is what enables true similarity matching.

    Step 4: Find Your Match at Scale

    When you search for a carrier or run a lookalike query, we compare their fingerprint against all 2.3 million carriers in our database using cosine similarity and vector indexing. Results are:

    • Ranked by how similarly they truly operate — not just how similar their stats look

    • Surfacing hidden similarities that even seasoned brokers might miss

    • Returned in under 100 milliseconds

    Trained on the Entire Industry

    Our model didn't learn from a sample or a subset. It analyzed every registered carrier in the country — learning the full spectrum of how fleets are structured, from owner-operators to large regional carriers.

    2.3M+

    Carriers Analyzed

    60+

    Data Points Per Carrier

    <100ms

    Search Time

    For the Technically Curious

    Under the hood, we use a transformer architecture with self-attention — the same class of model behind ChatGPT, but purpose-built for structured tabular carrier data rather than language.

    The model is trained using masked column reconstruction: we randomly hide data points and ask the model to predict them from context. This forces it to learn deep interdependencies between all carrier attributes — not just surface correlations.

    The result is a 128-dimensional embedding per carrier. Similarity is computed using cosine distance with GPU-accelerated HDBSCAN clustering and vector indexing across 500K+ carriers, enabling sub-100ms search at scale.

    What This Means for Your Team

    Carrier Embeddings is the foundation for CarrierMatch — our new lookalike search tool that lets you drop in a carrier you already work with and instantly surface others that operate the same way. No more manual filtering across dozens of fields.

    For GTM teams, this means:

    • Prospect lists built from operational similarity, not just demographic filters

    • Discovery of carriers that look different on paper but behave like your best accounts

    • A smarter signal for segmentation, prioritization, and outreach sequencing

    ← PreviousCarrier Relationships — Network GraphNext →CarrierMatch.io — Lookalike Carrier Scoring