No, no, n, N, nO --> 0 and Y, y, Yes, yEs, yeS --> 1 Before LabelEncoder

You'd think they'd have more user-friendly stuff, but nope

Scikit Learn's LabelEncoder can comfortably do Y,N,N,N,Y,Y --> 1,0,0,0,1,1 and same for Yes, No, but will struggle for anything more stressful, which a human would take in her stride.

What *can* one do?

Transform the input data column, that's what - to make it edible for labelEncoder:

If you're still a pd.Series, just do:

y = y.str.replace(

    r'^[nNyY].*',

    lambda m: m.group(0)[0].upper(),

    regex=True

)

sadf

And that's it - you're then cleared to do 

le = LabelEncoder()

y = le.fit_transform(y)

You, gentle reader, will easily be able to extend this to the case of Male, Female 😊

Comments