stackoverflow February 7, 2026 Rep: 21

Fetching large datasets from SAP HANA using hdbcli in Python is very slow

Score

Answers

Views

20.5

Trend Score

Question Details

No question body available.

Answers (2)

February 7, 2026 Score: 1 Rep: 349 Quality: Low Completeness: 80%

These two lines don't have a logical link in the code:

    cursor = connection.cursor()
    cursor.arraysize = 50000

    pd.readsqlquery(query,conection,chunksize=10000)

hence, the readsqlquery doesn't use your cursor.

you should distinguishe between the chunksize and chunksize :

the chunksize is related to Panads but the chunksize is related to DB-API.

Also:

dfchunk.todict("records")

here every data frame, row and column will be converted to

{ "col1": pythonobj, ...}

It consumes a significant amount of processor resources.

My suggestions are:

Use cursor in query like: cursor.execute("SELECT * FROM tablename"
Loop on get_rows=cursor.fetchmany()
Loop for every row in get rows and process the row as you want.
Close the cursor cursor.close()

February 7, 2026 Score: 1 Rep: 1 Quality: Low Completeness: 80%

You have conection instead of connection in pd.readsqlquery().

You are fetching chunks and then converting each chunk to dictionaries .todict("records"), then extending a list. this creates a lot of overhead and intermediate objects, which goes against the purpose of chunking

Also, while you set cursor.arraysize, pandas doesn't automatically use your cursor when you pass a connection object. It creates its own cursor internally

here's how id do it:

from hdbcli import dbapi
import pandas as pd
connection = dbapi.connect(
    address="-----------",
    port="-----------",
    user="-----------",
    password="-----------"
)
cursor = connection.cursor()
cursor.arraysize = 50000query = "SELECT * FROM tablename LIMIT 10000"
df = pd.readsqlquery(query, connection)
chunks = []
for chunk in pd.readsqlquery(query, connection, chunksize=10000):
    chunks.append(chunk)
df = pd.concat(chunks, ignore_index=True)
cursor.execute(query)
columns = [desc[0] for desc in cursor.description]
df = pd.DataFrame(cursor.fetchall(), columns=columns)cursor.close()
connection.close()

no unnecessary dict conversion
correct arraysize usage
less overhead

Export Question Data

Export this question and its answers for further analysis or reporting.

Back to Questions

Fetching large datasets from SAP HANA using hdbcli in Python is very slow

Question Details

Tags

Answers (2)

Analysis Metrics

Question Information

Actions

Related Questions

Export Question Data