Research

How we think about decision-making, knowledge gaps, and AI agents that reason about what they don't know.

Scoring the Gaps in What an Organization Knows

Organizations collect data across dozens of systems — documents, emails, CRMs, project tools. The usual assumption is that more data means better decisions. In practice, the data that would actually change a decision is often the data that's missing. A supplier lead time that nobody recorded. A competitor price change buried in someone's inbox. A contract clause that expired three months ago.

We built a system that maps an organization's knowledge into a structured graph and then scans it for gaps. Each gap gets a score: how likely is it that resolving this gap would change the current recommendation? A missing production cost estimate that could flip a sourcing decision scores high. A missing footnote in a report that wouldn't affect any outcome scores near zero. The system searches for the high-scoring gaps first.

The scoring is based on simulation. For each unknown, the system tests a bounded set of plausible values and measures how many of them would change which option ranks first. The output is a number — the Value of Information — that tells you exactly how much a given piece of missing knowledge could matter. This makes it possible to prioritize information gathering by decision impact.

The practical result: the system spends its retrieval budget on information that could actually shift the answer. The search terminates when no remaining gap could plausibly change the recommendation — a mathematically defined stopping point.

Missing information carries measurable decision value; the ranking prioritizes gaps with material leverage and deprioritizes the rest.

When an AI Agent Has Enough Information to Decide

Most retrieval systems fill a context window until they run out of space or hit a token limit. The stopping point is mechanical. It has nothing to do with whether the system has gathered enough information to answer the question well. This means the system can stop too early on a hard question and waste capacity on an easy one.

The VOI framework uses a different stopping rule. After each retrieval step, the system recalculates the scores of all remaining unknowns. If no unknown has a score above a set threshold, the search stops — the recommendation is stable. If the leading option's margin exceeds the combined score of every remaining unknown, the search also stops — no possible combination of new facts could overtake it.

Both conditions are mathematical. The system cannot stop because it "feels done." Given the same knowledge graph and the same query, it will always take the same search path and reach the same stopping point. That supports auditability: each retrieval step can be tied to a stated justification, and the stop rule is explicit rather than implicit.

The convergence guarantee carries a practical benefit: the system can allocate more effort to decisions where the evidence is genuinely uncertain and less effort where the answer is clear early. Effort scales with the difficulty of the question.

The system stops searching when it can prove that more information wouldn't change the answer.

A Knowledge Base That Reorganizes Around What Matters

Conventional knowledge bases are static. Someone decides what to store, builds a taxonomy, and hopes it holds up. Over time, the structure drifts from what the organization actually needs. Documents go stale without anyone noticing. Whole categories fill up with content nobody uses while the information people need most stays uncollected.

In this framework, each decision the system processes generates gap signals — records of missing information that could not be found in the existing knowledge base. These signals accumulate across decisions. A fact that would have changed 30 decisions has a higher cumulative score than a fact that would have changed one. This cumulative score is a direct measure of organizational demand for a specific piece of knowledge.

The system uses these signals to generate targeted acquisition requests. It identifies what's missing, who in the organization is most likely to have it (based on their upload history), and in what format it would be most useful. It also adjusts freshness thresholds per domain — tightening them where stale information frequently causes high-scoring gaps, relaxing them where it doesn't.

Over time, the knowledge base converges toward a shape that matches the organization's decision patterns, with effort shifting toward high-impact gaps and less incremental attention to low-impact holdings. The structure emerges from use.

The knowledge base learns what matters by watching which missing facts affect real decisions.

Using Bayesian Methods to Quantify What We Don't Know About Markets

Standard regression gives you a point estimate. "Small I&C firms have a 34% likelihood to adopt." That number is useful, but it hides something: how confident should you be in it? If the training data contains 12 firms in that segment, the answer is "not very." The Bayesian specification we used surfaced interval width as a first-class output alongside the mean, which matters when tail risk drives the decision.

In our research on AI adoption in European SMEs, we used Bayesian neural networks to predict two quantities per market segment: likelihood to adopt (LTA) and willingness to pay (WTP). The Bayesian approach produces a posterior distribution for each prediction. The mean of that distribution is the best estimate. The width is the uncertainty. A tight distribution means the data supports the prediction well. A wide one means the model is unsure - and that uncertainty is directly useful for making decisions about where to invest.

The practical difference shows up when you compare segments. Two segments might have the same mean LTA, but if one has a credible interval of ±4% and the other ±18%, they carry very different risk profiles. A product team deciding where to launch first needs both numbers. The point estimate tells you the expected outcome. The interval tells you how much you're betting.

Bayesian methods also handle sparse data more gracefully. When a segment has few observations, the prior keeps the estimate from collapsing to extreme values. As more data arrives, the posterior tightens and the prior's influence fades. This is especially useful for early-stage market research where sample sizes are small by necessity - you can't wait for a thousand responses before making a launch decision.

We applied this to a survey of 113 European companies across multiple industries and size classes. The Bayesian model reduced uncertainty on segment-level LTA estimates by 15–30% compared to the frequentist baseline, depending on segment size. More importantly, it flagged two segments where the frequentist model showed high LTA but the Bayesian credible interval was wide enough to include zero - meaning the data did not actually support the conclusion that those segments would adopt.

The Bayesian framework also lets you incorporate informative priors from external sources. In our case, Eurostat data on AI adoption rates by industry and firm size provided a prior distribution that grounded the model in population-level statistics before seeing any survey data. The posterior then updated based on our sample. This is a principled way to combine small-sample survey data with large-scale secondary data - something that standard regression cannot do.

For product teams, the output is a table of market segments ranked by expected value with explicit risk bounds. Segment A might have the highest expected LTA but also the widest interval. Segment B might have a lower point estimate but much higher certainty. Choosing between them is a decision under uncertainty - and the Bayesian output gives you the information to make that decision deliberately.

The broader point is methodological: when tail risk matters, reporting credible intervals next to means keeps the precision-versus-uncertainty trade-off explicit. In our workflow, the Bayesian layer was the natural place to attach those intervals.

Bayesian models produce predictions with uncertainty bounds. The width of the interval is as important as the estimate itself.

Decision Quality Depends on Two Things - and Most Organizations Underinvest in Both

Firm productivity depends on how fast and how well managers decide. Eisenhardt (1989) already linked rich information search and rapid action to outperformance; what is newer is the ability to decompose decision quality into measurable components and estimate returns to improving each one.

The model we use treats decision quality as a function of two variables: information quality and decision-maker capability. Information quality has five dimensions: precision, truth, completeness, timeliness, and relevance. Decision-maker capability covers whether the right person receives the right information at the right time, with awareness of what is unknown. Both variables matter. Good information in the wrong hands is wasted. A capable decision-maker with bad information will make bad calls.

The relationship between these variables is super-additive. Investing in both simultaneously yields more than the sum of investing in each alone. This comes from expected utility theory: information reduces the set of states the decision-maker must consider, and capability determines how well they navigate the remaining states. Improving one amplifies the value of the other.

Organizations typically use 30–50% of available information when making strategic decisions (BARC, 2024). In most cases fragmentation and findability bind before aggregate storage capacity: knowledge sits across email, documents, chat, CRM, and ERP systems, and it is rarely assembled in one place for a given decision. The result is that decisions are made on partial pictures, even when the missing facts are available internally.

The formal concept that makes this quantifiable is the Expected Value of Perfect Information (EVPI) - a measure from decision theory that calculates how much a decision-maker would pay, in expected utility, to know the true state of the world before deciding. EVPI sets an upper bound on the value of any information system. The closer your actual information utilization gets to perfect information, the more value you capture.

A typical organization operates with an information utilization ratio (ρ) around 0.30 and a capability multiplier (κ) around 0.35. Moving to ρ = 0.75 and κ = 0.65 - achievable with the right system - represents an order-of-magnitude improvement in decision-driven value. The exact multiplier depends on the EVPI of the decision class, which varies by industry and function. High-stakes procurement decisions in manufacturing have different EVPI than routine hiring decisions in services.

One less-discussed finding: targeted information is disproportionately valuable per unit cost. A study from the applied decision theory literature found that a domain-specific analysis delivered 9.4x the net value of a generic report, despite costing only 2.4x more. The reason: its specificity ratio - the fraction of content that was actually relevant to the decision - was 14x higher. Systems that retrieve information by decision relevance exploit this specificity premium automatically.

The practical consequence for AI systems is to weight retrieval by expected decision impact rather than surface similarity alone-prioritizing facts that would materially move the recommendation. Formalizing that objective yields different context construction than optimizing for lexical or embedding proximity alone.

Decision quality = f(information quality, decision-maker capability). Improving both simultaneously yields super-additive returns.

80% of Companies Rate Trust as Important as Performance

In our survey, 80% of respondents rated security and transparency features as equally important as performance improvements when evaluating AI tools, with little variation by company size or industry. Respondents described a faster system with weak provenance as less attractive than a slower one that showed sources and reasoning steps.

Attributes such as source tracing, audit trails, explainable outputs, and residency controls showed up alongside latency and accuracy in purchase criteria. Teams that treated those items as a late compliance pass reported more rework when procurement questionnaires surfaced them earlier than expected.

European participants emphasized processing location, model provenance, and training-use restrictions. A hosting region alone rarely satisfied follow-up questions; comments clustered on combinations of jurisdiction, handling practice, and traceability.

Meeting those expectations has measurable engineering cost: richer provenance metadata and retained traces increase storage and latency. The survey pattern suggests treating that cost as part of the core product budget rather than as an optional overlay.

In this sample, trust-related attributes carried weight comparable to core performance claims; sequencing them as afterthoughts misaligned with how buyers said they evaluated vendors.

Companies Undervalue AI by a Wide Margin

We asked 113 European companies to estimate how much time they spend on information search, coordination, and duplicate work. Then we measured it. The gap was large. 77.20% of respondents demonstrated a significant value-perception gap - they underestimated the time lost to these activities by 30–50%. The time they thought was spent productively was, in many cases, spent searching for information that already existed somewhere else in the organization.

This undervaluation is the primary adoption barrier. Companies don't adopt AI tools for a simple reason: they don't think the problem is big enough to justify the effort. The irony is that the companies with the worst information management problems are often the least aware of them - the inefficiency is invisible because it's distributed across many small tasks throughout the day.

Knowledge workers spend an average of 2.5 hours per day searching for internal information. Another 7% of weekly time goes to administrative tasks that create no direct value. At scale, this adds up to over €10 billion per week in productivity losses across the EU. Published time-use and labor-market studies report magnitudes in this range; what differed in our interviews was how rarely those aggregates showed up in a firm's own internal picture of where time goes.

For vendors, the friction is often epistemic before it is technical: buyers who understate baseline search and coordination load also understate the upside of tooling that targets it.

Non-adoption in the sample tracked muted internal estimates of search and coordination overhead relative to external time-use statistics.

Research based on Achieving Product-Market-Fit for an Adaptive AI Assistant System in European SMEs (Rothe, 2025) and VOI-Driven Context Engineering for Decision-Making AI Agents (2026). Get in touch if you'd like to discuss the research.