KeyKit runs any data or model API through up to 30 structured tests against your own requirements and returns Fit, Partial Fit, or No Fit verdicts. It does the hard part of evaluating a data or model API, so you can see if it works for you without the scripts, engineer, and week that usually takes.

How does KeyKit work?

You bring a trial API key for the provider you want to test. KeyKit runs structured evaluations against your stated requirements, freshness, latency, field fill rate, reliability, and more, with no provider involvement required, and returns a Fit, Partial Fit, or No Fit verdict for each test.

What can I evaluate with KeyKit?

Any data or model API. KeyKit covers one-off evaluations, side-by-side provider comparison, ongoing health checks, and category benchmarks across coverage, data quality, freshness, reliability, and compliance.

How much does KeyKit cost?

KeyKit is $250 per month or $2,500 per year. Buyers and providers pay the same; a provider account adds private self-testing of your own endpoints and the ability to sponsor evaluations for your customers. There is no free tier.

Buyers use KeyKit to evaluate and monitor providers. Providers use it to prove quality to prospects with an evaluation the prospect runs themselves, so the results are not something the provider can influence.

Is KeyKit part of Mulberry?

Yes. KeyKit is a tool under the Mulberry master brand. Mulberry also operates Sourced, the market map of data providers. KeyKit is the proof tool; Sourced helps you see what is available.

KeyKit

Start an assessment

You got the key. You don't have the week.

30 structured tests against your own requirements. Fit, Partial Fit, or No Fit. You bring the trial key, KeyKit brings the test harness.

Built by Mulberry, from a decade on the buyer side of these contracts.

Start an assessment Talk to us

Don't have the week either? Have Mulberry run the evaluation with you.

What KeyKit does

How KeyKit works.

Evaluations

Know how providers perform against your requirements.

Define your tolerance for freshness, latency, field fill rate, and reliability. KeyKit runs your data or model API through up to 30 test types and returns Fit, Partial Fit, or No Fit verdicts tied to what you actually need. Use your own trial API key. No provider involvement required.

Evaluation #007

COMPLETE

Fit

Historical Depth

38 mo · req. ≥ 24 mo

Fit

Freshness Lag

1.8 hr avg · req. ≤ 4 hr

Partial

Field Completeness

82% full · req. ≥ 90%

Fit

Deduplication Rate

1.1% dupe · req. ≤ 5%

No Fit

Rate Limit

800 req/hr · req. ≥ 5,000

Fit

Response Latency

340 ms p95 · req. ≤ 800 ms

4 Fit

1 Partial Fit

1 No Fit

Comparison

Compare providers side by side.

Run the same evaluation suite across multiple providers and see the results head to head. Built for buyers shortlisting two or three options. Built for providers who want to know where they stand.

TEST

PROVIDER A

PROVIDER B

Freshness Lagdiff

Fit88

Partial61

Field Completeness

Fit94

Fit90

Response Latencydiff

Fit85

No Fit22

Availability

Fit99

Fit97

Deduplication

Fit91

Fit88

Health Checks

Ensure it works as well as the day you bought it.

Schedule evaluations to run daily, weekly, or monthly. KeyKit alerts you when coverage drops, latency spikes, or field fill rates change. Post-purchase monitoring so you are never caught off guard.

Health CheckWeekly · Mondays

Jun 02

Jun 09

Jun 16

Jun 23

Jun 30

Score dropped 13 pts Jun 16 · Freshness lag exceeded threshold

Simple & Advanced mode

Works for technical and non-technical users.

Simple mode surfaces the results that matter in plain English: what was tested, what it means for your requirements, and whether the API passed. Advanced mode gives full framework-level control for technical users. Switch between them at any time.

SIMPLE

Plain-English summaries. Fit / Partial Fit / No Fit per category.

ADVANCED

Full framework control. Raw scores, logs, and per-test configuration.

For buyers

✓See which provider fits your requirements. Comparison shows you the difference.

✓Protect yourself after you sign. Health checks catch degradation before it becomes a problem.

✓Know if your results are normal. Benchmarks tell you where you stand in the market.

Start an assessment

Not sure which providers to test yet? See the market on Sourced.

Evaluation frameworks

The questions a trial key can't answer on its own

Will it hold up in production?

Latency, rate limits, and availability under real load, not demo conditions.

Is it as fresh as they claim?

We measure actual ingestion lag against your tolerance.

Will it handle your real queries?

The boolean, nested, and messy queries your use case needs, not just the simple ones in the demo.

Is what you're seeing normal?

How your results compare across everyone testing the same provider.

30 structured tests sit behind these questions, across coverage, data quality, freshness, reliability, compliance, and more.

See all 30

Coverage

2 runs

Does the dataset cover the time range and regions your use case requires?

Historical DepthGeographic Coverage

Data Quality

4 runs

Are records complete, canonical, and free of duplicates before they hit your pipeline?

Field CompletenessDeduplication RateCross-Query ConsistencyProvenance Metadata

Determinism

4 runs

Does re-querying the same parameters return the same results? Critical for incremental pipelines.

Result Set StabilitySort Order StabilityCount StabilityField Value Stability

Freshness

2 runs

How stale is "live" data? We measure actual ingestion lag against your stated tolerance.

Freshness LagLag Distribution

Query Complexity

6 runs

Can the API handle the queries your use case actually needs, or only the simple ones in the demo?

Basic Keyword QueryBoolean LogicNested BooleanWildcard & FuzzyField-Scoped QueryComplex Multi-Clause

Scale & Reliability

3 runs

Performance and stability under realistic load, not cherry-picked conditions.

Response LatencyRate Limit DiscoveryAvailability Check

Language & Scripts

1 run

Does multilingual content arrive correctly encoded and attributed?

Language Coverage

Stress & Edge Cases

4 runs

What breaks at the edges? Edge-case testing surfaces failures before production does.

Malformed Query HandlingEmpty Result HandlingRate Limit BreachDeep Pagination

Compliance & Cost

3 runs

Is sensitive data scoped correctly? Does cost hold at volume?

PII / Sensitive Data ScanQuota AccountingAuth & Scope Boundaries

Benchmarking

1 run

Side-by-side scoring against your current provider or an alternative. Apples to apples.

Category Benchmark

Pricing

One plan, one price.

KeyKit is $250 a month, the same account for buyers and providers. Buyers evaluate any data or model API against their requirements. Providers get that same account, plus private self-testing and the ability to sponsor customers (see below).

KeyKit

$250/month

$2,500/year

Evaluate any data or model API against your requirements.

✓
Unlimited evaluations
✓
Simple and Advanced mode
Switch between plain-English summaries and full framework-level control.
✓
Health checks
Run any evaluation on a recurring schedule. Daily, weekly, or monthly.
✓
Evaluation comparison
Compare any two completed evaluations side by side, framework by framework.
✓
Full framework library
30 evaluation frameworks across 10 groups.
✓
Request logs and raw output

Start an assessment

For providers

Prospects don't say no. They go quiet.

A trial key hands the work to your prospect: find the time, wire it up, decide what the numbers mean. Most never get to it. Sponsor a KeyKit account instead and they get a neutral evaluation of your API against their own requirements, without the setup. If you're confident in your data, an independent verdict is the strongest thing you can put in front of them.

✓Sponsor a customer for 30, 60, or 90 days, at $125, $250, or $375 (half the monthly rate). They run their own evaluations and keep the account at $250 a month if they want it. No automatic charge.

✓It's KeyKit's neutral verdict, not a demo you control. They see the results, you don't. That is why they trust it.

✓Your own account also lets you test your endpoints privately, before a buyer ever does.

✓Shortens the sales cycle and sets you apart from providers still sending benchmark PDFs.

Book a meeting