Question Details

No question body available.

Tags

python pandas dataframe case-insensitive

Answers (4)

February 26, 2026 Score: 1 Rep: 14,583 Quality: Medium Completeness: 60%

If columns position don't change just read without column names and set them later to the flavor you prefer. Minimal code changes.

import pandas as pd
df = pd.DataFrame([['2024-06-21 06:22:38', 22958 ,605.968389, 0.994548],
                   ['2024-06-21 06:22:39', 22959 ,616.009398, 0.983443],
                   ['2024-06-21 06:22:40', 22960 ,624.630573, 0.973647],
                   ['2024-06-21 06:22:41', 22961 ,633.476367, 1.017651],
                   ['2024-06-21 06:22:42', 22962 ,642.322161, 5.017651]])
print(df)

columns=['SampleTime', 'UTCs', 'SensorData1', 'SensorData2'] df.columns = columns print(df)

columns=['SAMPLETIME', 'UTCS', 'SENSORDATA1', 'SENSORDATA2'] df.columns = columns print(df)

Result

                     0      1           2         3
0  2024-06-21 06:22:38  22958  605.968389  0.994548
1  2024-06-21 06:22:39  22959  616.009398  0.983443
2  2024-06-21 06:22:40  22960  624.630573  0.973647
3  2024-06-21 06:22:41  22961  633.476367  1.017651
4  2024-06-21 06:22:42  22962  642.322161  5.017651

SampleTime UTCs SensorData1 SensorData2 0 2024-06-21 06:22:38 22958 605.968389 0.994548 1 2024-06-21 06:22:39 22959 616.009398 0.983443 2 2024-06-21 06:22:40 22960 624.630573 0.973647 3 2024-06-21 06:22:41 22961 633.476367 1.017651 4 2024-06-21 06:22:42 22962 642.322161 5.017651

SAMPLETIME UTCS SENSORDATA1 SENSORDATA2 0 2024-06-21 06:22:38 22958 605.968389 0.994548 1 2024-06-21 06:22:39 22959 616.009398 0.983443 2 2024-06-21 06:22:40 22960 624.630573 0.973647 3 2024-06-21 06:22:41 22961 633.476367 1.017651 4 2024-06-21 06:22:42 22962 642.322161 5.017651
February 26, 2026 Score: 0 Rep: 13,604 Quality: Medium Completeness: 60%

Motivation:

I have a bunch of code that is looking for the names of columns with specific cased strings and you have old data that uses those names and new data that uses those names but with a different string casing.

Question:

How can I ensure name compatibility without having to update my code?

Answer:

You will need to updates the names of the columns of course and to do that we will use an old dataset with columns representing your canonical names and any example of a dataset that you want to ensure has your canonical names. We will do this with a little lookup function that will handle the translation. Note this is safe to apply on old and new datasets to ensure you get the correct names. Note as well that if a new dataset has a column that is not in the cannon it passes through.

def getcanonicalnames(noncannon, canon):
    canon = {name.lower(): name for name in canon}
    return [canon.get(name.lower(), name) for name in noncannon]

With a method like this we can now do:

import pandas as pd

def getcanonicalnames(noncannon, canon): canon = {name.lower(): name for name in canon} return [canon.get(name.lower(), name) for name in noncannon]

dfold = pd.DataFrame( [ ['2024-06-21 06:22:38', 22958 ,605.968389, 0.994548], ['2024-06-21 06:22:39', 22959 ,616.009398, 0.983443], ['2024-06-21 06:22:40', 22960 ,624.630573, 0.973647], ['2024-06-21 06:22:41', 22961 ,633.476367, 1.017651], ['2024-06-21 06:22:42', 22962 ,642.322161, 5.017651] ], columns=['SampleTime', 'UTCs', 'SensorData1', 'SensorData2'] )

df
new = pd.DataFrame( [ ['2024-06-21 06:22:38', 22958 ,605.968389, 0.994548], ['2024-06-21 06:22:39', 22959 ,616.009398, 0.983443], ['2024-06-21 06:22:40', 22960 ,624.630573, 0.973647], ['2024-06-21 06:22:41', 22961 ,633.476367, 1.017651], ['2024-06-21 06:22:42', 22962 ,642.322161, 5.017651] ], columns=['SAMPLETIME', 'UTCS', 'SENSORDATA1', 'SENSORDATA2'] )

dfnew.columns = getcanonicalnames(dfnew.columns, dfold.columns) dfold.columns = getcanonicalnames(dfold.columns, dfold.columns)

print(dfold) print(dfnew)

and get:

            SampleTime   UTCs  SensorData1  SensorData2
0  2024-06-21 06:22:38  22958   605.968389     0.994548
1  2024-06-21 06:22:39  22959   616.009398     0.983443
2  2024-06-21 06:22:40  22960   624.630573     0.973647
3  2024-06-21 06:22:41  22961   633.476367     1.017651
4  2024-06-21 06:22:42  22962   642.322161     5.017651
            SampleTime   UTCs  SensorData1  SensorData2
0  2024-06-21 06:22:38  22958   605.968389     0.994548
1  2024-06-21 06:22:39  22959   616.009398     0.983443
2  2024-06-21 06:22:40  22960   624.630573     0.973647
3  2024-06-21 06:22:41  22961   633.476367     1.017651
4  2024-06-21 06:22:42  22962   642.322161     5.017651

Addendum:

If this was something I was going to do a lot, I would probably look to use a closure to define my method so I did not have to keep referencing the list of canonical names. Maybe something like:

def getcanonicalnamesbuilder(canon):
    canon = {name.lower(): name for name in canon}
    return lambda noncannon: [canon.get(name.lower(), name) for name in noncannon]

This takes our list of canonical names and returns a method that will accept a list of names and convert them. This returned method works just like the old one but now we need only one parameter.

Use like:

import pandas as pd

def get
canonicalnamesbuilder(canon): canon = {name.lower(): name for name in canon} return lambda noncannon: [canon.get(name.lower(), name) for name in noncannon]

dfold = pd.DataFrame( [ ['2024-06-21 06:22:38', 22958 ,605.968389, 0.994548], ['2024-06-21 06:22:39', 22959 ,616.009398, 0.983443], ['2024-06-21 06:22:40', 22960 ,624.630573, 0.973647], ['2024-06-21 06:22:41', 22961 ,633.476367, 1.017651], ['2024-06-21 06:22:42', 22962 ,642.322161, 5.017651] ], columns=['SampleTime', 'UTCs', 'SensorData1', 'SensorData2'] )

df
new = pd.DataFrame( [ ['2024-06-21 06:22:38', 22958 ,605.968389, 0.994548], ['2024-06-21 06:22:39', 22959 ,616.009398, 0.983443], ['2024-06-21 06:22:40', 22960 ,624.630573, 0.973647], ['2024-06-21 06:22:41', 22961 ,633.476367, 1.017651], ['2024-06-21 06:22:42', 22962 ,642.322161, 5.017651] ], columns=['SAMPLETIME', 'UTCS', 'SENSORDATA1', 'SENSORDATA2'] )

getcanonicalnames = getcanonicalnamesbuilder(dfold.columns)

dfnew.columns = getcanonicalnames(dfnew.columns) dfold.columns = getcanonicalnames(dfold.columns)

print(dfold) print(dfnew)
February 26, 2026 Score: 0 Rep: 21 Quality: Low Completeness: 50%

If you have a predefined list of all the new columns the dataset, you can just lower both of them to compare, then replace it with the matching one in the new dataset.

oldcolumns = ["APPLEDATA","ORANGEDATA","BANNANADATA"]
newcolumns = ["AppleData","BannanaData","OrangeData"]

oldcolumnsreplacement = []

for oldcolumnsitem in oldcolumns: for newcolumnsitem in newcolumns: if oldcolumnsitem.lower() == newcolumnsitem.lower(): oldcolumnsreplacement.append(newcolumnsitem) break

print(oldcolumnsreplacement)

turns ["APPLEDATA","ORANGEDATA","BANNANADATA"] into ['AppleData', 'OrangeData', 'BannanaData']

February 26, 2026 Score: 0 Rep: 167 Quality: Low Completeness: 60%

Create a dictionary that maps lowercase column names to the actual column names present in the DataFrame.

import pandas as pd

def runfirmwarecalculation(df): # Step 1: Build the map { 'sensordata1': 'SENSORDATA1' (or 'SensorData1') } colmap = {c.lower(): c for c in df.columns}

# Step 2: Extract columns using the map # We use lowercase keys to find the actual mixed/upper case names s1 = df[colmap['sensordata1']] s2 = df[colmap['sensordata2']]

# Step 3: Perform calculation df['Result'] = (s1 - s2) / (s1 + s2) return df

--- TEST CASE: NEW FIRMWARE (All Uppercase) ---

dfnew = pd.DataFrame([['2024-06-21 06:22:38', 22958 ,605.968389, 0.994548], ['2024-06-21 06:22:39', 22959 ,616.009398, 0.983443]], columns=['SAMPLETIME', 'UTCS', 'SENSORDATA1', 'SENSORDATA2'])

print(runfirmwarecalculation(df_new))

Output is:

SAMPLETIME UTCS SENSORDATA1 SENSORDATA2 Result 0 2024-06-21 06:22:38 22958 605.968389 0.994548 0.996723 1 2024-06-21 06:22:39 22959 616.009398 0.983443 0.996812 2 2024-06-21 06:22:40 22960 624.630573 0.973647 0.996887 3 2024-06-21 06:22:41 22961 633.476367 1.017651 0.996792 4 2024-06-21 06:22:42 22962 642.322161 5.017651 0.984498