Question Details

No question body available.

Tags

python zstd zstandard

Answers (2)

January 12, 2026 Score: 0 Rep: 52 Quality: Low Completeness: 80%

The reason you cannot see the filenames is that Zstandard is a compression format, not an archive format. Unlike .zip or .7z, Zstd (and Gzip) compressed streams do not natively store file metadata like filenames or directory structures. When you concatenate multiple frames, Zstd simply treats them as independent blocks of data.

To achieve your goal of storing multiple files with their metadata while keeping them independently seekable, you have two main options:

The industry standard for this is to use a TAR archive to hold the metadata and then compress it with Zstandard. Python's tarfile module can work directly with the zstandard stream.

import zstandard as zstd
import tarfile
from pathlib import Path

filestocompress = [Path(r"chunk0.ndjson"), Path(r"chunk1.ndjson")] outputfile = Path(r"dataset.tar.zst")

cctx = zstd.ZstdCompressor(threads=5)

with open(output
file, "wb") as fout: with cctx.streamwriter(fout) as zstwriter: # Use mode "w|" for streaming with tarfile.open(fileobj=zstwriter, mode="w|") as tar: for src in filestocompress: tar.add(src, arcname=src.name)

To read the metadata (Names and Sizes):

dctx = zstd.ZstdDecompressor()
with open(outputfile, "rb") as fin:
    with dctx.streamreader(fin) as zstreader:
        with tarfile.open(fileobj=zstreader, mode="r|") as tar:
            for member in tar:
                print(f"File: {member.name}, Size: {member.size} bytes")

Update ...

As discussed in the comments, while Zstandard frames are independent, you need a way to know where each one starts to achieve true random access (seeking) without reading the entire stream. We can achieve this by appending a Skippable Frame at the end of the file to serve as an index.

1. Creating the Multiframe Zst with an Index

This approach compresses each file into its own frame and records the byte offsets.

import zstandard as zstd
import struct
import json
from pathlib import Path

files
tocompress = [Path("chunk0.ndjson"), Path("chunk1.ndjson")] outputfile = Path("dataset.zst") cctx = zstd.ZstdCompressor(threads=5)

index = {} currentoffset = 0

with open(output
file, "wb") as fout: for src in filestocompress: with open(src, "rb") as fin: data = fin.read() compressed = cctx.compress(data)

# Store metadata: start offset and compressed size index[src.name] = { "offset": current
offset, "csize": len(compressed), "usize": len(data) }

fout.write(compressed) currentoffset += len(compressed)

# Wrap the index in a Zstd Skippable Frame # Magic Number for skippable frames: 0x184D2A50 to 0x184D2A5F index_data = json.dumps(index).encode('utf-8') header = struct.pack('
January 12, 2026 Score: 0 Rep: 21,497 Quality: Low Completeness: 70%

How can I get the metadata?

Well, you can't, because it isn't in there. I refer you to the fine docs about the writecontentsize flag:

This data will only be written if the compressor knows the size of the input data.

You did your own open() calls, which is nice enough. Then you stream in the data, which prevents zstd from learning stat() details like st_size.

In this use case I think they want you to use zstd.open(), to expose such details.


BTW the docs seem slightly confused about the default setting of that flag, None or True. You might possibly want to file a Documentation Bug with the upstream development team.