KeyKit

Evaluate, compare, monitor, and benchmark any data or model API.

KeyKit runs structured assessments against your specific requirements, not generic benchmarks. Know which provider to choose, and know if they stay that way.

Start an assessmentTalk to us

Used by buyers to evaluate and monitor providers. Used by providers to prove and maintain quality.

What KeyKit does

Four capabilities. All live today.

Evaluations

Know how providers perform on your rules.

Define your tolerance for freshness, latency, field fill rate, and reliability. KeyKit runs your data or model API through up to 30 test types and returns Fit, Partial Fit, or No Fit verdicts tied to what you actually need. Use your own trial API key. No vendor involvement required.

Evaluation #007
COMPLETE
FIT
Historical Depth
38 mo · req. ≥ 24 mo
92
FIT
Freshness Lag
1.8 hr avg · req. ≤ 4 hr
88
PARTIAL
Field Completeness
82% full · req. ≥ 90%
61
FIT
Deduplication Rate
1.1% dupe · req. ≤ 5%
94
NO FIT
Rate Limit
800 req/hr · req. ≥ 5,000
18
FIT
Response Latency
340 ms p95 · req. ≤ 800 ms
85
4 FIT
1 PARTIAL FIT
1 NO FIT
Comparison

Compare providers side by side.

Run the same evaluation suite across multiple providers and see the results head to head. Built for buyers shortlisting two or three options before committing. Built for providers who want to know where they stand.

TEST
PROVIDER A
PROVIDER B
Freshness Lagdiff
FIT88
PARTIAL61
Field Completeness
FIT94
FIT90
Response Latencydiff
FIT85
NO FIT22
Availability
FIT99
FIT97
Deduplication
FIT91
FIT88
Health Checks

Ensure it works as well as the day you bought.

Schedule evaluations to run daily, weekly, or monthly. KeyKit alerts you when coverage drops, latency spikes, or field fill rates change. Post-purchase monitoring so you are never caught off guard.

Health CheckWeekly · Mondays
Jun 02
84
Jun 09
81
Jun 16
71
Jun 23
68
Jun 30
77
Score dropped 13 pts Jun 16 · Freshness lag exceeded threshold
Benchmarks

Benchmark your results against everyone else.

KeyKit aggregates anonymized results across all users testing the same provider. Know immediately whether your results are normal or a red flag, without asking anyone.

Benchmark · Provider A142 results
Your result76
Top quartile84
Median68
Bottom quartile49
Simple & Advanced mode

Works for technical and non-technical users.

Simple mode surfaces the results that matter in plain English — what was tested, what it means for your requirements, and whether the API passed. Advanced mode gives full framework-level control for technical users. Switch between them at any time.

SIMPLE
Plain-English summaries. FIT / PARTIAL FIT / NO FIT per category.
ADVANCED
Full framework control. Raw scores, logs, and per-test configuration.
For buyers

For buyers

Choose the right provider before you sign. Comparison shows you the difference.
Protect yourself after you sign. Health checks catch degradation before it becomes a problem.
Know if your results are normal. Benchmarks tell you where you stand in the market.
Start an assessment
For providers

Offer your prospects proof, not promises.

Give prospects a real evaluation of your API against their own requirements — before they sign.
They run the test, they see the results. You don't influence the outcome. That's the point.
Shortens sales cycles, builds trust, and differentiates you from providers still relying on demo environments and benchmark PDFs.
Book a meeting

Part of the Mulberry ecosystem. Verified providers are listed on Sourced.cc.

Learn about MulberryBrowse Sourced
Evaluation frameworks

30 test types across coverage, data quality, freshness, reliability, compliance, and more.

Every framework scores API performance against your stated requirements, not industry averages. Supported providers include Think-Pol, Tisane, Datashake, Opoint, AllEars, and SocialVoice. More added regularly.

Coverage

2 runs

Does the dataset cover the time range and regions your use case requires?

Historical DepthGeographic Coverage

Data Quality

4 runs

Are records complete, canonical, and free of duplicates before they hit your pipeline?

Field CompletenessDeduplication RateCross-Query ConsistencyProvenance Metadata

Determinism

4 runs

Does re-querying the same parameters return the same results? Critical for incremental pipelines.

Result Set StabilitySort Order StabilityCount StabilityField Value Stability

Freshness

2 runs

How stale is "live" data? We measure actual ingestion lag against your stated tolerance.

Freshness LagLag Distribution

Query Complexity

6 runs

Can the API handle the queries your use case actually needs, or only the simple ones in the demo?

Basic Keyword QueryBoolean LogicNested BooleanWildcard & FuzzyField-Scoped QueryComplex Multi-Clause

Scale & Reliability

3 runs

Performance and stability under realistic load, not cherry-picked conditions.

Response LatencyRate Limit DiscoveryAvailability Check

Language & Scripts

1 run

Does multilingual content arrive correctly encoded and attributed?

Language Coverage

Stress & Edge Cases

4 runs

What breaks at the edges? Edge-case testing surfaces failures before production does.

Malformed Query HandlingEmpty Result HandlingRate Limit BreachDeep Pagination

Compliance & Cost

3 runs

Is sensitive data scoped correctly? Does cost hold at volume?

PII / Sensitive Data ScanQuota AccountingAuth & Scope Boundaries

Benchmarking

1 run

Side-by-side scoring against your current vendor or an alternative. Apples to apples.

Category Benchmark
Pricing

Choose the plan that fits your workflow.

Pro for buyers running evaluations. Provider for data API companies offering evaluation access to prospects.

Most teams arrive through a Mulberry assessment or a provider-issued invite. Evaluating on your own? Pro gives you everything.

Pro
$199/month

$1,990/year

For data buyers evaluating APIs before purchase.

  • Unlimited evaluations
  • Simple and Advanced mode
    Switch between plain-English summaries and full framework-level control.
  • Health checks
    Run any evaluation on a recurring schedule. Daily, weekly, or monthly.
  • Evaluation comparison
    Compare any two completed evaluations side by side, framework by framework.
  • Full framework library
    30 evaluation frameworks across 10 groups.
  • Request logs and raw output
Start an assessment
Provider
$749/month

$7,490/year

For data API companies that want to offer evaluation access to their prospects.

  • All Pro features
  • Issue invites for prospects
    Full Pro access, with an expiry you control.
  • Prospect runs their own evaluation
    Against your API. They see the results. You don't influence the outcome.
  • Provider dashboard
    Track invite status: issued, started, completed.
Talk to us