Enter app

Contact us

Enter app

Contact us

May 21, 2026

Hyperliquid

Two Weeks of HIP-4: A Forensic Look at Hyperliquid's Outcome Markets

@web3_pastel

Matthew Hammond

his article is published for informational and educational purposes only and does not constitute investment advice. Arrakis has made reasonable efforts to verify the accuracy of the data presented but does not warrant that all information is accurate, complete, or current.

Thank you to Castle Labs, 0xArchive, andHyperTracker for their contributions to the research.

TL;DR

We tracked HIP-4 for the first two weeks of it being live: 18 daily BTC binary outcomes, 6,101 wallets, 28.1 million orders, 908 thousand fills, and 4.64 million orderbook snapshots. Three things stood out.

Algorithmic wallets accounted for 6% of the participant set, but moved 49% of the dollar volume. None of them resolved to a known market-making firm or trading desk.
One frontend, Outcome.xyz, dominated builder routing. It carried $1.54M across 499 wallets at zero builder fee, more than 10x the volume of the next-largest builder.
HIP-4 already matches Polymarket on BTC binary volume, but its pricing sits 4-5x further from Deribit's implied probability than Polymarket or Kalshi. The widest spreads concentrate in the same thin-liquidity windows the orderbook itself flags.

HIP-4 arrives on Hyperliquid

Hyperliquid recently launched HIP-4: a new market type for binary outcomes that resolve to Yes or No against a real-world condition. The first set runs daily BTC markets asking whether Bitcoin closes above a specific strike at 06:00 UTC.

The contract is new, but the rails are not:

Order book: the same engine powering Hyperliquid's perps
Margin and account: outcomes share inventory and collateral with the rest of the venue
Routing: the same builder code system HIP-3 already uses

This is what sets HIP-4 apart from Polymarket and Kalshi. Both built their markets on standalone rails. HIP-4 puts binary outcomes inside the same trading account that already holds a user's spot and perps positions, against the same collateral pool. That changes what's possible without leaving the venue: options-style payoffs, hedges, and structured exposure can sit alongside the underlying perp and outcome legs in one place.

The rest of this piece covers who traded the first two weeks, how the book formed, and how the pricing held against the incumbents.

The first two weeks in numbers

HIP-4 launched on Hyperliquid mainnet on May 2nd, 2026. By May 18th, the venue had produced 18 daily BTC binary outcomes on whether Bitcoin would close above a specific strike at 06:00 UTC the next day. Our dataset covered 6,101 wallets, 28.1m orders, 908k fills, and 4.64m Level-2 orderbook snapshots.

We split traders into two behavioural buckets, Retail and Algorithmic. A third group, Unclassified, contains the wallets that held no on-chain identity that our attribution service could identify.

A wallet was tagged Retail if at least one of its orders was routed through the official Hyperliquid frontend (the FrontendMarket flag) or through a non-algorithmic builder address. Both indicate a human placing the order through a UI.
Algorithmic wallets either matched a machine order pattern directly (heavy post-only quoting with low fill rates, or short-horizon immediate-or-cancel flow), or routed most of their orders through a builder whose own flow looked algorithmic.
Unclassified held the rest.

While algorithmic wallets accounted for a mere 6% of total wallets, they did nearly half of HIP-4's dollar volume. Retail wallets, which represented the majority of the total wallets and held the highest peak open interest, produced less than a third of the volume.

Daily realised volume averaged a little over $3M in each of the first two weeks, but fill counts grew by 65%, from 273K to 451K across the same span. While the number of trades grew, the average trade size shrank dramatically.

Figure 2a. Daily end-of-day notional open interest by category, 2 May to 18 May. Stacked, with the saw-tooth shape reflecting daily settlement of binary outcomes at 06:00 UTC. Retail (green) carried the largest day-to-day OI; Algorithmic and Unclassified built incrementally.

Figure 2b. Cumulative HIP-4 realised volume by category, May 2nd to May 18th.

State of Liquidity

Through the analysis period, the orderbook showed a mean bid-ask spread of 360 bps and a median of 53 bps. A handful of thinly-quoted phases skewed the mean, the typical book sat near the median. Top-of-book liquidity was dense, but depth thinned out quickly deeper into the orderbook.

Depth changed shape as each outcome approached settlement. When the result was uncertain, the Yes and No sides were quoted relatively symmetrically. When the market neared settlement and probability resolved toward one side, depth piled on the winning side, and the losing side thinned out.

On May 5th, 35 minutes before settlement of an outcome priced at 99.2% Yes, a single maker stacked $50.3 million of bid side depth between 0.5% and 1.0% off mid. This repeated over multiple outcomes as algorithmic participants rushed to take last-leg exposure of likely wins.

The squeeze worsened as traders holding the losing side rushed to exit, exhausting any remaining liquidity, while market makers stopped adding depth on that side to avoid absorbing toxic positions into settlement.

Figure 3a. Yes-side depth evolution across all outcomes, 2 May to 18 May. Threshold bands at 0.1, 0.25, 0.5, 1, and 1.5% off mid. Bid depth below the zero line, ask depth above. Y-axis capped at $50K; settlement spikes exceeded the cap and were dropped from view.

Figure 3b. No-side depth evolution across all outcomes, 2 May to 18 May. Same threshold convention as Figure 3a. Settlement spikes appeared on the No side when outcomes resolved "No". Otherwise the No book thinned as the Yes side accumulated winning-side depth.

Looking past the temporary near-settlement skew in the orderbook, the largest bottleneck the market faced in its first two weeks was consistent depth. With no professional market-making firm operating on the venue, this impacted traders’ execution efficiency through slippage.

We measured the largest trade sizes the market could absorb at a ±2% slippage for a trader and found that, past the first day when launch hype drove a flux of liquidity into the book, the subsequent execution experience steadily worsened due to limited depth at most times. At its worst on May 13th, a $1K trade would have produced 2% slippage for the trader.

Figure 3c. Daily median and interquartile range (P25-P75) of max trade size within ±2% slippage on the buy and sell side, 2 May to 18 May. Solid lines show the median minute of each day; shaded bands show the middle 50% of minutes.

This is both a bottleneck for users and an opportunity for market makers. In a venue this thin, the first market-making desk willing to commit real size will find little competition and can capitalise on the inefficiencies the current order book leaves on the table.

Who's trading

To deepen our understanding of who was trading this outcome market, we pulled the top 250 wallets by dollar volume and ran each through Arkham and HyperTracker. The cohort accounted for $33.67M in volume, or ~70% of total HIP-4 volume, which makes it a reasonable proxy for the participant mix.

Four things stood out.

Figure 4a. Wallet count, realised volume, and peak open interest by category, 2 May to 18 May.

1. 14% of the total wallets in the cohort traded on Polymarket BTC markets before HIP-4.

Most of these 36 wallets entered Polymarket BTC in late 2024, when daily-binary BTC markets ("Bitcoin above $X on date Y") went mainstream on Polymarket, indicating that these wallets were already established participants in the same product class well before HIP-4 launched.

When HIP-4 launched, most of these wallets effectively stopped trading Polymarket BTC and shifted attention to HIP-4, treating it as a higher-quality replacement for the PM BTC daily binaries they had been trading, with better UX and shared margin against existing Hyperliquid positions.

Figure 4b. Where the 36 HIP-4 cohort wallets traded BTC binaries, weekly. Blue: their Polymarket BTC volume. Orange: their HIP-4 volume. The dashed line marks the HIP-4 launch on 2 May 2026.

The magnitude of the shift can be seen in metrics: in 17 days, these 36 wallets did $1.05M of HIP-4 volume, almost half of what they had accumulated on Polymarket BTC over the prior 18 months.

A smaller subset of 7 wallets stayed active on PM BTC post-launch and increased their trade frequency by roughly 5×. Most of them were classified as cross-venue arb wallets, running both books in parallel to capture the basis between the two venues.

2. Stat-arb (taker) bots had a clean execution edge, with a median win-rate of 60%.

Median five-minute taker markouts came in close to zero when looking broadly at Retail, Algorithmic and Unclassified categories (i.e., most wallets neither systematically won nor lost on short horizons).

The granular breakdown showed something interesting.

Every one of the 24 stat-arb taker wallets cleared 50%, with a median wallet positive-rate of 60%. Market makers and retail, by contrast, had only a third of such wallets.

Figure 4c. Five-minute taker markout edge by wallet label. Bars show the percentage of wallets with a positive markout rate above 50%. Dotted line marks the 50% reference.

3. No activity via institutional desks.

We searched Arkham’s attribution for Jump, Wintermute, Selini, Cumberland, GSR, Flow Traders, Amber, Auros, and adjacent names and found 0 matches. The liquidity in the HIP-4 market came from individuals and small teams, not from institutional market-making firms yet.

This is unsurprising for a nascent market with volumes still too small to attract institutional desks. Expect these names to enter as Hyperliquid introduces additional markets and volume scales.

4. The algorithmic bucket was not a homogeneous category.

Of the 395 algorithmic wallets, 170 carried an explicit machine pattern in their own orders. They split into four sub-types:

The other 225 algorithmic wallets routed most of their flow through builders like TreadFi and Planemo, who provide algorithmic trading strategies via their products.

Figure 4d. Algorithmic bucket, granular breakdown. Wallet count and realised volume across the four sub-labels.

Early Builders

HIP-4 inherits Hyperliquid's builder code system. Every order can route through a registered builder address. Builders can charge a fee by routing order flow through a custom UI, or claim a share of revenue. In two weeks, 66 unique builder addresses appeared in the data. We identified 26 of them.

Across all builders, Outcome.xyz cleared 10x more volume than any other builder code identified.

Figure 5. Top 15 HIP-4 builders by routed volume, 2nd May to 18th May. Colour denotes label source (manual identification, HyperTracker registry, HIP-3 inherited, or unlabeled).

It routed $1.54M across 499 wallets and every live outcome, all at zero builder fees. Zero-fee routing meant that Outcome.xyz offered a more competitive venue for its users than other frontends. The strategy for now seems to be focused on user-acquisition: build the audience ahead of HIP-4 opening up permissionless market deployment, which Outcome.xyz has already staked $HYPE to participate in.

We spoke to @OutcomeAK, @OutcomeMiG, and @IamGyomei, the founders of Outcome.xyz, on HIP-4 and what they're building toward.

This was followed by algorithmic builders like TreadFi, Moonbot and Minara AI who also dominate volumes in HIP-3 markets and retail frontends like Hyperbeat ($79K routed, fee 25 bps), Liquidiction ($21K, fee 150 bps), Hyperview, Based One X, Hypersurface, Supurr.app, Dexly, Cipher and GTR.Trade.

Notably, HIP-4 has yet to be picked up by some of the large wallets like MetaMask, Rabby and Phantom that route significant orderflow to HIP-3 and Hypercore markets. Whether major wallets integrate HIP-4 for binary options as more markets ship is a meaningful next test against Polymarket and Kalshi.

Cross-Venue Inefficiencies

Polymarket and Kalshi were strongly competing in BTC 24h prediction markets prior to HIP-4's launch. We measured how Hyperliquid stacked up against them in terms of volume and pricing accuracy.

Within 19 days of launch, Hyperliquid had drawn $31.7M of BTC prediction-market volume against Polymarket's $33.87M, effectively matching an incumbent with three years of head start, while pulling ahead of Kalshi’s $0.5M by an order of magnitude.

Figure 6a. Daily BTC binary turnover by venue, 2 May to 20 May. HIP-4 fills feed ends 18 May; the live ingester carries L2 only past that date. Kalshi backfill is in progress and figures will be refreshed in a final pass.

Cross-venue price agreement. Globally, Deribit is seen as the uncontested price discovery venue for BTC options, processing more than $1B per day. While Deribit options are not structured as binaries, the option pricing itself carries a market implied probability distribution over BTC's price at any future timestamp.

Treating Deribit as the price-discovery anchor (the venue where new information first enters the BTC options market), we reconstructed its implied probability density at each hour, projected it to the native settlement of each prediction-market venue, and computed the spread between each venue's quoted probability and Deribit's.

Figure 6b. Hourly absolute basis versus Deribit, volume-weighted across active strikes. Higher values mean the venue's quoted probability sat further from Deribit's implied probability in either direction. The typical HIP-4 hour ran roughly 4 times further from Deribit than the typical Polymarket or Kalshi hour. About 140 of 532 HIP-4 hourly rows lacked the strike or expiry mapping needed for basis computation and are excluded. Coverage skews to the second half of the window.

All three venues tracked Deribit's trajectory but sat slightly above it at the median. This is consistent with optimistic retail bias, a documented pattern across prediction markets and perps.

Polymarket and Kalshi were tightly anchored to Deribit, while Hyperliquid's spread sat 4-5x wider than either. The widest spreads for Hyperliquid lined up with the thinner-liquidity windows surfaced in the State of Liquidity section above. These inefficiencies are a function of the market being nascent, and they provide opportunities for informed actors to capitalise on.

We continue tracking these dynamics live at static-psi-two.vercel.app.

Conclusion

In two weeks, HIP-4 matched Polymarket on BTC binary volume despite Polymarket's three-year head start. The product class works on Hyperliquid's rails: orders flow through the same matching engine, settle in the same account, and route through the same builder economy as HIP-3.

The microstructure is where the venue is still in its early stages. The book is dense at the top and thin past 100 bps. Pricing tracks Deribit but sits 4-5x further off than Polymarket or Kalshi. The cohort that moved 70% of the volume is sophisticated retail and algorithmic flow, with no institutional market makers in the mix.

That gap, between volume parity and a mature book, is the opportunity. A market-making desk willing to commit real size into HIP-4 today would face little competition, a participant base that already trades the product class, and a basis curve wide enough to support honest two-sided quotes. We expect that to change in the coming weeks. HIP-3 markets started in a similar shape, and the same builders driving HIP-3 algorithmic flow are already active on HIP-4.

Methodology

Most of the mechanical groundwork below, ingest, orderbook handling, wallet classification, depth and slippage, follows the same approach we used in our earlier writeup on Who's actually trading on trade.xyz. Those sections are kept short on purpose; the long-form treatment lives in the prior article. The cross-venue basis methodology is new to this piece and gets the full treatment at the end.

Data pipeline

HIP-4 fills, orders, and L2 orderbook snapshots came from 0xArchive's REST API. The fills feed covers 2n May to 18th May 2026; L2 snapshots extend to 20 May. L2 cadence is roughly 538 ms per snapshot, with a 20-level book depth cap.

Wallet classification

Classification follows the same waterfall as the trade.xyz piece, with two HIP-4-specific extensions. A soft-compute promotion step moves human-looking Unclassified wallets into Retail. A stat-arb reclassification flags MM-shaped wallets carrying a directional positional bias, separating them from the pure two-sided market makers. The result is eight granular labels, Pure Market Maker, Stat-Arb Maker, MM Mixed Bias, Stat-Arb Taker, Retail, Retail Unattributed, Unclassified, and Unclassified Likely Human, which roll up to the three-category split (Algorithmic / Retail / Unclassified) the article's headline numbers use.

Cohort design

The 250-wallet cohort used in Section 4 is the top 180 Retail, top 50 Unclassified, top 10 MM, and top 10 SAT / HFT, ranked by realised volume. Together they accounted for ~70% of HIP-4 dollar volume.

L2 depth and slippage

Threshold-coverage at X% off mid measures how often the book is fillable on both sides at that distance, defined as the share of L2 snapshots where bid and ask both carry visible depth at least X% from mid. For the slippage chart we walked the relevant side of the book at each snapshot and computed the largest USD notional whose volume-weighted execution price stayed within ±2% of mid. The daily series reports the median and interquartile range across minutes within each day.

Markout edge

Taker markout measures whether a fill priced well relative to where the market moved over the following five minutes. We computed it for every fill, in basis points:

Where $s = +1$ for buy fills and $-1$ for sells, $P_{1m}(t + 5m)$ is the 1-minute VWAP evaluated 5 minutes after the fill, and $P_{\text{fill}}$ is the execution price. A positive value means the taker traded with the next five minutes of price action rather than against it. We rolled fill-level markouts up to per-wallet positive-rates, defined as the fraction of a wallet's fills with positive markout, and report the share of wallets in each label whose positive-rate clears 50%.

Attribution and labelling

Wallet attribution leaned on Arkham's address-enrichment API, which returns entity matches, deposit-service identifiers, funder traces, ENS, Polymarket handles, Hyperliquid referral codes, and a confidence score for the match. On top of that we layered our own four-level confidence rubric (HIGH / MEDIUM / LOW / NONE), with one specific override: any wallet whose funder resolved to a bridge or aggregator stayed at LOW regardless of what else came back. Bridges carry too much downstream noise to trust as an attribution signal on their own.

HyperTracker's perp-PnL leaderboard gave us a second cross-reference for Hyperliquid perpetual activity, with one caveat worth flagging: that feed covers HIP-1 and HIP-3 perpetuals, not HIP-4 binaries. Polymarket trade history for matched wallets came from Polymarket's data API, queried per wallet via the Polygon proxy address Arkham resolves.

Cross-venue basis vs Deribit

Each HIP-4 coin, and each Polymarket strike, quotes a probability that BTC closes above a specific price at a specific time. Deribit doesn't quote those probabilities directly, but its option chain encodes the same information implicitly. At each hour the market-consensus probability distribution over BTC's future price can be recovered from the prices of calls and puts across every traded strike. Once both sides sit in the same units, the spread between a venue's quoted probability and Deribit's implied probability is the basis we want to measure. The pipeline runs in four stages.

Stage 1, implied volatility surface. For each 1-hour window we pulled Deribit's BTC option chain and recorded one implied-volatility point per traded strike per maturity. Where per-trade IV stamps were available we used them directly; otherwise we Black-Scholes-inverted the hourly close. We restricted to OTM contracts (calls at $K geq F$, puts at $K leq F$) to avoid the wide-quote problem on deep ITMs.

For each maturity we fit a local-quadratic in log-moneyness:

The fit uses the six nearest observations to each evaluation point $k$, with a hard floor on the variance at $10^{-8}$ so extrapolation stays well-defined. We chose a local-quadratic over a global parameterisation like SSVI on purpose. At 1-hour resolution across a 19-day window, we don't need an arbitrage-free recalibration of the entire surface. What we need are accurate digitals across the strike band the prediction markets actually quote, and a local fit gets us there with less machinery.

Stage 2, settlement projection. Each prediction-market venue has its own settlement clock. HIP-4 settles at 06:00 UTC, Polymarket at 12:00 UTC, and Deribit's daily expiries at 08:00 UTC. To compare like for like we projected Deribit's surface from its native expiry to the venue's settlement time, interpolating linearly in total variance between two bracketing Deribit expiries:

Interpolating in total variance preserves no-arbitrage under maturity shifts. It's the same construction Dupire and Carr-Madan use.

Stage 3, Breeden-Litzenberger digital extraction. The probability that BTC closes above strike $K$ at expiry $T$ equals the negative derivative of the call price with respect to strike:

Numerically, we computed this using a central-difference call spread:

The step size is adaptive, $h = max(0.003 cdot K, $250)$. The dollar floor keeps the difference well-conditioned at low strikes; the relative term keeps it stable as BTC moves around. Call prices at the bracket strikes come from Black-Scholes evaluated against the projected vol surface from Stage 2, with the risk-free rate $r$ set to zero over a one-day horizon. A second finite difference of the same call surface recovers the implied PDF $f(K, T) = partial^2 C / partial K^2$, which we use in the strike-translation step below.

Stage 4 — per-hour basis and vol-weighting. For each venue at each hour, the per-strike basis is the gap between the venue's quoted probability and Deribit's BL probability at the same strike:

Venue strikes don't always sit on the Deribit grid. When the closest Deribit strike fell within 2% of the venue strike, we applied a first-order linear shift using the BL-implied PDF as the local slope:

Where $K^*$ is the nearest Deribit strike. Shifts beyond 2% were dropped, since the linearisation breaks down across larger gaps.

Polymarket lists roughly 11 strikes per daily binary event, so each hour of Polymarket data produces a vector of per-strike basis values. We collapsed that to a single per-hour number by volume-weighting across strikes inside the informative band (yes-side mid $in [0.01, 0.99]$):

With $v_{t, K}$ the 24-hour rolling volume on strike $K$. When no strike carried volume in the window, we fell back to equal weighting. HIP-4 only lists one strike per outcome, so each hour produces a single basis value and no aggregation is needed.

Pin-zone exclusion. The BL digital becomes numerically unstable in the final hours before settle. The implied distribution narrows sharply, the call surface flattens into a step function around spot, and the central-difference call spread starts measuring quote noise rather than density. We dropped any hour within five hours of the venue's native settle from the basis series.