Question Details

No question body available.

Tags

python pyspark pytest databricks databricks-connect

Answers (2)

March 17, 2026 Score: 2 Rep: 1,567 Quality: Low Completeness: 80%

3-5 minutes per test is not normal for tiny local DataFrames. The Spark docs show a simple test run finishing in about 1.7s, and scope="session" is supposed to reuse one fixture for the whole pytest session.

The most likely issue is that your Spark session is not actually being reused. Common reasons:

  • You are using pytest-xdist / parallel workers. With xdist, a session-scoped fixture runs once per worker process, not once globally.

  • The fixture is being recreated in multiple pytest sessions/jobs in CI.

  • Something else in your setup is stopping Spark and forcing a restart.

What I would do first:

# conftest.py
import pytest
import time
from pyspark.sql import SparkSession

@pytest.fixture(scope="session") def spark(): t0 = time.time() spark = ( SparkSession.builder .master("local[1]") .appName("pytest-pyspark") .config("spark.ui.enabled", "false") .config("spark.sql.shuffle.partitions", "1") .config("spark.default.parallelism", "1") .config("spark.sql.adaptive.enabled", "false") .config("spark.sql.execution.arrow.pyspark.enabled", "false") .getOrCreate() ) print(f"\nSPARK STARTED once: {id(spark)} in {time.time() - t0:.2f}s") yield spark print("\nSPARK STOP") spark.stop()

Then run with pytest -s and check whether you see SPARK STARTED once once or once per test.

March 17, 2026 Score: 1 Rep: 5,089 Quality: Low Completeness: 80%

This answer is correct, the Spark session config is not the issue. In my case the session wasn’t actually local at all.

The real problem was that we had databricks-connect installed. This library overrides PySpark and makes SparkSession.builder.getOrCreate() connect to a remote Databricks cluster. You can see the issue reported here and here

So although the code looked like a local session, each test was actually connecting to the cluster!

Fix: run tests in a separate virtual environment without databricks-connect (I used pyenv for this). After removing it, the same tests dropped from minutes to seconds.