stackoverflow January 12, 2026 Rep: 2,948

Multiframe ZSTD file: how to get metadata of each frame?

Score

Answers

Views

15.7

Trend Score

Question Details

No question body available.

Answers (2)

January 12, 2026 Score: 0 Rep: 52 Quality: Low Completeness: 80%

The reason you cannot see the filenames is that Zstandard is a compression format, not an archive format. Unlike .zip or .7z, Zstd (and Gzip) compressed streams do not natively store file metadata like filenames or directory structures. When you concatenate multiple frames, Zstd simply treats them as independent blocks of data.

To achieve your goal of storing multiple files with their metadata while keeping them independently seekable, you have two main options:

The industry standard for this is to use a TAR archive to hold the metadata and then compress it with Zstandard. Python's tarfile module can work directly with the zstandard stream.

import zstandard as zstd
import tarfile
from pathlib import Path
filestocompress = [Path(r"chunk0.ndjson"), Path(r"chunk1.ndjson")]
outputfile = Path(r"dataset.tar.zst")
cctx = zstd.ZstdCompressor(threads=5)
with open(outputfile, "wb") as fout:
    with cctx.streamwriter(fout) as zstwriter:
        # Use mode "w|" for streaming
        with tarfile.open(fileobj=zstwriter, mode="w|") as tar:
            for src in filestocompress:
                tar.add(src, arcname=src.name)

To read the metadata (Names and Sizes):

dctx = zstd.ZstdDecompressor()
with open(outputfile, "rb") as fin:
    with dctx.streamreader(fin) as zstreader:
        with tarfile.open(fileobj=zstreader, mode="r|") as tar:
            for member in tar:
                print(f"File: {member.name}, Size: {member.size} bytes")

Update ...

As discussed in the comments, while Zstandard frames are independent, you need a way to know where each one starts to achieve true random access (seeking) without reading the entire stream. We can achieve this by appending a Skippable Frame at the end of the file to serve as an index.

1. Creating the Multiframe Zst with an Index

This approach compresses each file into its own frame and records the byte offsets.

import zstandard as zstd
import struct
import json
from pathlib import Path
filestocompress = [Path("chunk0.ndjson"), Path("chunk1.ndjson")]
outputfile = Path("dataset.zst")
cctx = zstd.ZstdCompressor(threads=5)
index = {}
currentoffset = 0
with open(outputfile, "wb") as fout:
    for src in filestocompress:
        with open(src, "rb") as fin:
            data = fin.read()
            compressed = cctx.compress(data)
            # Store metadata: start offset and compressed size
            index[src.name] = {
                "offset": currentoffset, 
                "csize": len(compressed),
                "usize": len(data)
            }
            fout.write(compressed)
            currentoffset += len(compressed)
    # Wrap the index in a Zstd Skippable Frame
    # Magic Number for skippable frames: 0x184D2A50 to 0x184D2A5F
    index_data = json.dumps(index).encode('utf-8')
    header = struct.pack('

January 12, 2026 Score: 0 Rep: 21,497 Quality: Low Completeness: 70%

How can I get the metadata?

Well, you can't, because it isn't in there. I refer you to the fine docs about the writecontentsize flag:

This data will only be written if the compressor knows the size of the input data.

You did your own open() calls, which is nice enough. Then you stream in the data, which prevents zstd from learning stat() details like st_size.

In this use case I think they want you to use zstd.open(), to expose such details.

BTW the docs seem slightly confused about the default setting of that flag, None or True. You might possibly want to file a Documentation Bug with the upstream development team.

Export Question Data

Export this question and its answers for further analysis or reporting.

Back to Questions

Multiframe ZSTD file: how to get metadata of each frame?

Question Details

Tags

Answers (2)

Update ...

1. Creating the Multiframe Zst with an Index

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data