KeyKitGet started
EVALUATION PLATFORM FOR DATA APIs AND AI MODELS

Verify before you buy.
Prove before you pitch.

KeyKit runs your data or model API through a structured assessment framework. Buyers know what they're getting. Providers can prove it.

Start an assessmentGet certified
30
Automated Tests
AUTOMATE HEALTH CHECKS
< 5 min
TO FIRST FINDING
A vs. B
COMPARE PROVIDERS
Evaluation #042·Vendor·7 tests
COMPLETE
FIT
Historical Depth
38 mo · req. ≥ 24 mo
92
FIT
Freshness Lag
1.8 hr avg · req. ≤ 4 hr
88
PARTIAL FIT
Field Completeness
82% full · req. ≥ 90%
61
FIT
Deduplication Rate
1.1% dupe · req. ≤ 5%
94
NO FIT
Rate Limit Discovery
800 req/hr · req. ≥ 5,000
18
FIT
Response Latency
340 ms p95 · req. ≤ 800 ms
85
FIT
Availability Check
99.4% up · req. ≥ 99%
99
5 FIT
1 PARTIAL FIT
1 NO FIT
avg score 76 · 1 threshold missed

Example evaluation. Results scored against your thresholds, not defaults.

For buyers

Test on your terms, before you commit.

  • You define the requirements. Freshness tolerance, latency ceiling, field fill rate, reliability.
  • KeyKit tests against what you actually need, not generic benchmarks.
  • Results come back as PASS, WARN, or FAIL verdicts tied to your stated requirements.
  • Use your own trial API key. No vendor involvement required.
Start free assessment
Example results
COMPLETE
FITHistorical Depth
FITFreshness Lag
PARTIALField Completeness
FITDeduplication Rate
NO FITRate Limit Discovery
FITResponse Latency
FITAvailability Check
For providers

Certification that works in sales decks, not just demos.

  • A Mulberry KeyKit certification is third-party proof you can use in sales decks, investor materials, and on your website.
  • Buyers who test on KeyKit arrive requirements-ready. No educating from scratch.
  • Verified providers get listed on Sourced.cc, where qualified buyers are already looking.
Talk to us about certification →

KeyKit assessments use the same framework Mulberry applies to Independent Field Assessments. Learn about Mulberry →

The problem

Data procurement is still largely a leap of faith.

Vendor sales cycles are polished. Demo environments are cherry-picked. By the time you're live in production, you've already signed a contract.

KeyKit closes that gap. Run structured evaluation frameworks against a live trial key, scored against your actual requirements. Before the ink dries.

Without KeyKit
With KeyKit
Vendor demo
Live evaluation against your trial key
Sales-provided benchmarks
Your requirements, your fit score
Gut-feel data quality check
30 ready-to-go evaluation frameworks
Find problems post-contract
Findings in under 5 minutes
How it works

From trial key to fit-scored findings in five steps.

01
Select provider
02
Paste API key
03
Set requirements
04
Choose evaluations
05
Review findings
01

Select your provider

Choose the API product you want to evaluate: a data source or an AI model. KeyKit loads the frameworks that apply to that provider type.

02

Paste your trial API key

KeyKit validates the key before anything runs. No wasted time on bad credentials.

03

Set your requirements

Define your actual thresholds: freshness tolerance, latency budget, field coverage %, reliability SLA, historical depth. Your requirements, not industry defaults.

04

Choose which frameworks to run

Pick from 30 ready-to-go frameworks across 10 groups. Dependencies are enforced automatically, so you cannot run Sort Order Stability before Result Set Stability.

Platform features

Built for how buyers actually work.

Simple + Advanced mode

Works for technical and non-technical users

Simple mode surfaces what was tested, what it means for your requirements, and whether the API passed. Advanced mode gives full framework-level control for technical users. Switch between them at any time.

Health checks

Monitor vendor performance over time

Set any evaluation to run automatically on a daily, weekly, or monthly schedule. KeyKit re-runs the same tests against the same requirements and alerts you if anything changes. Track freshness lag trends, catch rate limit degradation, and know before your vendor does when performance slips.

Evaluation comparison

Compare vendors side by side

Select any two completed evaluations and see a direct comparison, framework by framework, with raw value deltas and score deltas. Rows where verdicts differ are highlighted. Use it to choose between vendors or to benchmark a new provider against your current one.

Evaluation frameworks

32 test types across coverage, data quality, freshness, reliability, compliance, and more.

Every framework scores API performance against your stated requirements, not industry averages. Supported providers include Think-Pol and Tisane. More providers added regularly — request yours.

Coverage

2 runs

Does the dataset cover the time range and regions your use case requires?

Historical DepthGeographic Coverage

Data Quality

4 runs

Are records complete, canonical, and free of duplicates before they hit your pipeline?

Field CompletenessDeduplication RateCross-Query ConsistencyProvenance Metadata

Determinism

4 runs

Does re-querying the same parameters return the same results? Critical for incremental pipelines.

Result Set StabilitySort Order StabilityCount StabilityField Value Stability

Freshness

2 runs

How stale is "live" data? We measure actual ingestion lag against your stated tolerance.

Freshness LagLag Distribution

Query Complexity

6 runs

Can the API handle the queries your use case actually needs, or only the simple ones in the demo?

Basic Keyword QueryBoolean LogicNested BooleanWildcard & FuzzyField-Scoped QueryComplex Multi-Clause

Scale & Reliability

3 runs

Performance and stability under realistic load, not cherry-picked conditions.

Response LatencyRate Limit DiscoveryAvailability Check

Language & Scripts

1 run

Does multilingual content arrive correctly encoded and attributed?

Language Coverage

Stress & Edge Cases

4 runs

What breaks at the edges? Edge-case testing surfaces failures before production does.

Malformed Query HandlingEmpty Result HandlingRate Limit BreachDeep Pagination

Compliance & Cost

3 runs

Is sensitive data scoped correctly? Does cost hold at volume?

PII / Sensitive Data ScanQuota AccountingAuth & Scope Boundaries

Benchmarking

1 run

Side-by-side scoring against your current vendor or an alternative. Apples to apples.

Category Benchmark
Pricing

Start free. Go Pro when you need more.

Try KeyKit with one evaluation, no credit card required. Upgrade to Pro for unlimited evaluations, health checks, and comparison.

Free Trial
$0

One evaluation. Expires after 7 days.

  • 1 evaluation
    Expires after 7 days.
  • Simple mode only
    Plain-English results: what was tested, what it means, whether it passed.
  • No health checks
  • No comparison
Start free trial
MOST POPULAR
Pro
$349/month

For data buyers evaluating APIs before purchase.

  • Unlimited evaluations
  • Simple and Advanced mode
    Switch between plain-English summaries and full framework-level control.
  • Health checks
    Run any evaluation on a recurring schedule. Daily, weekly, or monthly.
  • Evaluation comparison
    Compare any two completed evaluations side by side, framework by framework.
  • Full framework library
    30+ evaluation frameworks across 10 groups.
  • Request logs and raw output
Start free trial
Vendor
$2,749/month

For data API companies that want to offer evaluation access to their prospects.

  • All Pro features
  • Unlimited prospect trial issuances
    Each trial is scoped to your API and expires after 14 days.
  • Prospect gets Simple mode evaluation
    Against your API. They run it. You don't influence the outcome.
  • Vendor dashboard
    Track trial status: issued, started, completed.
Talk to us →
For data vendors

Offer your prospects proof, not promises.

KeyKit lets you give prospects a real evaluation of your API against their own requirements, before they sign. They run the test, they see the results. You don't influence the outcome. That's the point.

Data buyers are under pressure to justify procurement decisions. A KeyKit evaluation gives them something to bring to their team. It shortens sales cycles, builds trust, and differentiates you from every vendor still relying on demo environments and benchmark PDFs.

Talk to us about Vendor access →
Buyer-owned findings
Prospects run frameworks against their own requirements. You don't control the fit scores. That's the point.
Scoped to your API
Each prospect trial is locked to your endpoint. They evaluate you against your actual API, not a generic sandbox.
Prospect dashboard
Track which prospects have run evaluations and where they are: issued, started, or completed.
Differentiates you immediately
Most vendors rely on demo environments and benchmark PDFs. An independent KeyKit evaluation is something different.

Verified providers are listed on Sourced.cc, where qualified buyers are already looking.

Start your free trial today.

One evaluation, no credit card required. Paste a trial key, define your scope, and have fit-scored findings in under five minutes.

Start free trial →Sign in