Unstructured-IO/unstructured

Unstructured-IO/unstructured

Releases239
Frequency5 days 14 hours
Last Release
Stars14.8K
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

CVE History

CVEPublishedCVSS v3CVSS v2
9.8 CRITICAL

The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. Prior to version 0.18.18, a path traversal vulnerability in the partition_msg function allows an attacker to write or overwrite arbitrary files on the filesystem when processing malicious MSG files with attachments. This issue has been patched in version 0.18.18.

9.8 CRITICAL

unstructured v.0.14.2 and before is vulnerable to XML External Entity (XXE) via the XMLParser.