Posts

Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers

Image
Wow - giving it away for free in the hope that it helps build a better world:  https://core.ac.uk/reader/334586725 Machine Learning Machine Learning and Knowledge Discovery Support Vector Machines for Classification Support Vector Regression Hidden Markov Model Bioinspired Computing: Swarm Intelligence Deep Neural Networks Cortical Algorithms Deep Learning Multiobjective Optimization Machine Learning in Action: Examples

Your Handy ML Reference

Interactive Blog Table Interactive Blog Post Support Vector Machines for Classification Classifying data points Applications: Image recognition Click on a heading or short description to see more details here. Support Vector Regression Predicting continuous values Applications: Stock price forecasting Hidden Markov Model Modeling sequences Applications: Speech recognition Bioinspired Computing: Swarm Intelligence Distributed optimization Applications: Robotics Deep Neural Networks Multi-layer perceptron Applications: Language translation Cortical Algorithms ...

AttributeError: 'Series' object has no attribute 'reshape' - When You Want to Concatenate Two Columns for Looking at LR Model Performance (Error)

y_test is not an ndarray . It's a panda.core.series.Series object. What did that for me? It's coming out of train_test_split ( sklearn.model_selection ) which has been given a y that is a Series object and it politely returned the same. What happened? When you created your y from the input data, did you leave out the .values ? I did - because I left it out intentionally when creating the X - so that I could use a cute snippet to automatically (without visual inspection - you know me, I'm Mr. Automation) find the non-numeric columns to subject to one-hot encoding. And, typing stuff manually (a good reason to use a template and make edits) - I did the same with the y creation. If you have y = dataset[:,-1] .values, you get an ndarray and all is well. Be warned :) Why do we care? Because, to concatenate two vectors as two columns side by side, you need to use reshape: cmp_matrix = np.concatenate( (y_pred.reshape(len(y_pred),1), y_test.reshape((len(y_test),1)) ), axis=1 )

Backwared Elimination in Linear Regression Model Building

Backward Elimination Select a significance level to stay in the model (eg. SL = 0.05) Fit the full model with all possible predictors Consider the predictor with the highest P-value. If P > SL, go to STEP 4, otherwise go to FIN Remove the predictor Fit model without this variable Back to (3) FIN - you're done. Congratulations - you've applied LR to build an ML model! If you're using Scikit-Learn, the module automatically selects the statistically significant features, but, if you want to see how BE is done, check out HdP's videos on DropBox

The Dummy Variable Trap

Image
Watching this lecture, I felt like I was seeing a case of the right hand not knowing what the left hand was doing since Hadelin de Ponteves has pointed out the case of needing to transform a "name" or "state" feature using one-hot-encoding with ColumnTransformer. KE is doing the same thing and calling it creation of dummy variables. Cool stuff - always drop one of the "dummy variables" you generate using one-hot-encoding. chatG: Tools like pandas.get_dummies and OneHotEncoder (with drop='first' ) in sklearn can automatically exclude one dummy variable:

When Can You Safely Use Linear Regression?

Image
According to Kirill , only when you have: Linear relationship Homoscedasticity (equal variance) Multivariate normality (a bimodal distribution would be a disqualifier) Independence - lack of autocorrelation (a stock price depends on its past values) Lack of multicollinearity - independent variables should not influence each other Lack of outliers

What these Courses Don't Teach You

Image
Given some data, sure, you can follow what they tell you on cleaning it and using it to predict values based on new inputs. But, how are you supposed to generate the data in the first place? A friend who used to work at Amazon said you need to invest in generating the data.  Maybe chatG can suggest ways to generate data based on the problem you're trying to solve.  Which course can teach you to do something like what Google Deepmind did - train an AI to play a game by playing against itself (a copy of itself)? Tough?