No, no, n, N, nO --> 0 and Y, y, Yes, yEs, yeS --> 1 Before LabelEncoder
You'd think they'd have more user-friendly stuff, but nope
Scikit Learn's LabelEncoder can comfortably do Y,N,N,N,Y,Y --> 1,0,0,0,1,1 and same for Yes, No, but will struggle for anything more stressful, which a human would take in her stride.
What *can* one do?
Transform the input data column, that's what - to make it edible for labelEncoder:
If you're still a pd.Series, just do:
y = y.str.replace(
r'^[nNyY].*',
lambda m: m.group(0)[0].upper(),
regex=True
)
sadf
And that's it - you're then cleared to do
le = LabelEncoder()
y = le.fit_transform(y)
You, gentle reader, will easily be able to extend this to the case of Male, Female 😊
Comments
Post a Comment