This error can happen when you try to compare a pandas column to a list but they have different lengths. For instance, given a dataframe as follows:

 df = pd.DataFrame({'s': [[1,2,3], [2,3,4], [3,4,5], [4, 5, 6]]})  df #           s #0  [1, 2, 3] #1  [2, 3, 4] #2  [3, 4, 5] #3  [4, 5, 6] 

You might want to check which row has the element [1,2,3], and if you run:

 df.s == [1,2,3] 

You will get an error:

ValueError: Lengths must match to compare

The error happens because the column (pandas Series) df.s has length of 4 while the list to be compared has length of 3. Since pandas tries to vectorize the comparison, i.e. compare element by element, it requires both sides have the same size. And it wouldn't know that you want to compare the right side list as a whole to every element in the left side column.

Solution:

The solution to this problem is to make everything explicitly so pandas doesn't have to guess what you want to do. The obvious approach is to use apply to loop over element in the column and compare individually:

 df.s.apply(lambda x: x == [1, 2, 3])  #0     True #1    False #2    False #3    False #Name: s, dtype: bool 

This is recommended as it provides no ambiguity and also pretty fast. But there's also a tricky approach, that is, to convert list element to tuple. This works because pandas is smart enough to treat a tuple as a single element without broadcasting each element in the tuple.

 df.s.map(tuple) == tuple([1,2,3])  #0     True #1    False #2    False #3    False #Name: s, dtype: bool 

I made a simple benchmark here to compare the performance if you are interested: https://akuiper.com/console/EqA6RIgm8KB5


You can also play with this in the below PyConsole browser extension.

  1. chrome web store
  2. firefox addon

This post is ad-supported