Thursday, June 30, 2016

Advice for Matlab Users on Python

Advice for Matlab Users on Python

1. (5000L, 1L) vs. (5000L,)

When performing any matrix operations that result in only a single column or row, Numpy would returns a directionless 1D array, instead of a 1xn or nx1 matrix. You must manually reshape your result back to nx1 matrix, if it is what you expect.

A.reshape(n,1)

2. 1/m vs 1./m

Be careful Python would return any integer for you type 1/m. Always play safe to add a dot to the dividend.

1./m

3. (y==k)*1

Boolean operations on MATLAB return 0 or 1 that is convenient for further calculation. You can cast them back to 0 or 1 in Python by multiplying them by one.

(y==k)*1

4. lambda = 3 vs lambda = 3.

Again, always assign real number 3. to a variable, if it will be fit to a formula and the result would be real. Otherwise, Python would trim to an integer for you.

5. Fancy indexing in Pandas DataFrame returns a copy

If you try to extract rows a DataFrame with some conditions, and use the indexing style, it would return a copy of DataFrame to you. You should not assign anythings into it, if you really do, you should use the .loc syntax.

df['a'][df['b']>0.5]=1 #failed
df.loc[df.b>0.5, 'a']=1 #correct

6. Strange behavior of Pandas mode function

See the API for details. If you simply want to vote a single most likely majority from some sample, you may consider to use the one in scipy.stats library.

from scipy.stats import mode
mode(y)[0][0]

No comments:

Post a Comment

Principle Component Analysis

Principle Component Analysis Eigenvector Decomposition Let A ∈ R n × n A \in \R^{n \times n} A ∈ R n × n be an n by n...