CVE-2026-42440

Description

OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP AbstractModelReader Versions Affected: before 2.5.9 before 3.0.0-M3 Description: The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and getPredicates() each read a 32-bit signed integer count field from a binary model stream and pass that value directly to an array allocation (new String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without validating that the value is non-negative or within a reasonable bound. The count is therefore fully attacker-controlled when the model file originates from an untrusted source. A crafted .bin model file in which any of these count fields is set to Integer.MAX_VALUE (or any value large enough to exhaust the available heap) triggers an OutOfMemoryError at the array allocation itself, before the corresponding label or pattern data is consumed from the stream. The error occurs very early in deserialization: for a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read; so the attacker pays no meaningful size cost to weaponize a payload, and a single small file can crash a JVM that loads it. Any code path that deserializes a .bin model is affected, including direct use of GenericModelReader and any higher-level component that delegates to it during model load. The practical impact is denial of service against processes that load model files from untrusted or semi-trusted origins. Mitigation: * 2.x users should upgrade to 2.5.9. * 3.x users should upgrade to 3.0.0-M3. Note: The fix introduces an upper bound on each of the three count fields, checked before array allocation; counts that are negative or exceed the bound cause an IllegalArgumentException to be thrown and the read to fail fast with no large allocation. The default bound is 10,000,000, which is well above the entry counts of legitimate OpenNLP models but far below any value that would threaten heap exhaustion. Deployments that legitimately need to load models with more entries than the default can raise the limit at JVM startup by setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer (e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back to the default. Users who cannot upgrade immediately should treat all .bin model files as untrusted input unless their provenance is verified, and should avoid loading models supplied by end users or fetched from third-party repositories without integrity checks.

Is your site exposed to CVE-2026-42440?

Run a free security scan — no signup, results in seconds.

CVSS v3.1 Score

7.5

HIGH

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

EPSS — Exploit Prediction

0.0019

Probability of exploitation

0.41%

Percentile rank

EPSS estimates the probability that this vulnerability will be exploited in the wild within the next 30 days. A higher score means more likely to be exploited.

Weakness Type (CWE)

CWE-789 CWE-789

Affected Products

Vendor	Product	Versions
apache	opennlp	< 2.5.9
apache	opennlp	All
apache	opennlp	All

References

Advisories & Patches

https://lists.apache.org/thread/s8xlkx1gqbxfsq48py5h6jphjvgqp1jo

Other References

http://www.openwall.com/lists/oss-security/2026/05/01/21

Frequently Asked Questions

What is CVE-2026-42440? +

OOM Denial of Service via Unbounded Array Allocation in Apache OpenNLP AbstractModelReader Versions Affected: before 2.5.9 before 3.0.0-M3 Description: The AbstractModelReader methods getOutcomes(), getOutcomePatterns(), and getPredicates() each read a 32-bit signed integer count field from a binary model stream and pass that value directly to an array allocation (new String[numOutcomes], new int[numOCTypes][], new String[NUM_PREDS]) without validating that the value is non-negative or within a reasonable bound. The count is therefore fully attacker-controlled when the model file originates from an untrusted source. A crafted .bin model file in which any of these count fields is set to Integer.MAX_VALUE (or any value large enough to exhaust the available heap) triggers an OutOfMemoryError at the array allocation itself, before the corresponding label or pattern data is consumed from the stream. The error occurs very early in deserialization: for a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read; so the attacker pays no meaningful size cost to weaponize a payload, and a single small file can crash a JVM that loads it. Any code path that deserializes a .bin model is affected, including direct use of GenericModelReader and any higher-level component that delegates to it during model load. The practical impact is denial of service against processes that load model files from untrusted or semi-trusted origins. Mitigation: * 2.x users should upgrade to 2.5.9. * 3.x users should upgrade to 3.0.0-M3. Note: The fix introduces an upper bound on each of the three count fields, checked before array allocation; counts that are negative or exceed the bound cause an IllegalArgumentException to be thrown and the read to fail fast with no large allocation. The default bound is 10,000,000, which is well above the entry counts of legitimate OpenNLP models but far below any value that would threaten heap exhaustion. Deployments that legitimately need to load models with more entries than the default can raise the limit at JVM startup by setting the OPENNLP_MAX_ENTRIES system property to the desired positive integer (e.g. -DOPENNLP_MAX_ENTRIES=50000000); invalid or non-positive values fall back to the default. Users who cannot upgrade immediately should treat all .bin model files as untrusted input unless their provenance is verified, and should avoid loading models supplied by end users or fetched from third-party repositories without integrity checks. It has a CVSS v3.1 base score of 7.5 (HIGH).

How severe is CVE-2026-42440? +

CVE-2026-42440 has a CVSS v3.1 score of 7.5 out of 10, rated HIGH. This is a high-severity vulnerability that should be prioritized for patching. The EPSS score is 0.0019, placing it in the 0th percentile for exploitation probability.

What products are affected by CVE-2026-42440? +

CVE-2026-42440 affects products from apache, specifically: opennlp. Check the affected products table above for specific version ranges.

How do I check if I'm vulnerable to CVE-2026-42440? +

You can use Secably's free Website Scanner to check your website for known vulnerabilities. For infrastructure scanning, use the Port Scanner to identify exposed services that may be affected. Check the vendor advisories linked above for specific patch and version information.

Related Vulnerabilities

CVE-2025-26618

Erlang is a programming language and runtime system for building massively scalable soft real-time systems with requirements on high availability. …

CVE-2026-53428

Memory Allocation with Excessive Size Value vulnerability in leandrocp mdex allows an unauthenticated attacker to cause a denial of service …

CVE-2026-14454 9.8

Imager versions before 1.033 for Perl treat unsigned EXIF IFD entry counts as signed. Imager mishandled large EXIF IFD entry …

CVE-2024-20260 8.6

A vulnerability in the VPN and management web servers of the Cisco Adaptive Security Virtual Appliance (ASAv) and Cisco Secure …

CVE-2026-40303 7.5

zrok is software for sharing web services, files, and network resources. Prior to version 2.0.1, endpoints.GetSessionCookie parses an attacker-supplied cookie …

CVE-2026-33524 7.5

Zserio is a framework for serializing structured data with a compact and efficient way with low overhead. Prior to 2.18.1, …