});

Python Codings

一些python的经验

general通用技巧

batch run skills 批量运行小技巧

pass argues from command line 预留命令行接口

Ref: {Link}

*args, **kwargs

{参考简书}
*args wrap the input as tuple.

**kwargswrap them as a dict
kwargs and args are just conventional, like self in class definition.

@ 语法糖 — 单行代码利器

【python语法糖与Aspect-oriented Programming-知乎】

My understanding now:

Equals to:

glob — 批量文件处理

glob是python自带文件处理器,载入:import glob
glob.glob 可以快速查找文件,返回单个值
glob.iglob返回文件迭代器,批量处理文件时节省内存(推荐)
【python中glob的用途-CSDN】

绘图matplotlib

Legends setting when there are more than one subplot.

Use figure legend

Plt legend in figure and transform it into ax
Reference: https://www.cnblogs.com/Atanisi/p/8530693.html
With sample code shown below:

The introduction of bbox:


Also very useful in-subplots 内部局部放大图教程
https://blog.csdn.net/weixin_30898555/article/details/112699536

In-subplots results illustration
  1. bbox_to_anchor:边界框,四元数组(x0, y0, width, height)
  2. bbox_transform:从父坐标系到子坐标系的几何映射
Illustration of bbox parameters

My work

My work of figure legend and bbox transformation

Customized colorbar

Matplotlib provide official colorbar scaling {Link}, but it does not support pandas usage, such as df.plot(..., cmap=new_cmap). But this stackoverflow post {link} provides a reference for this, where I summarized it into the following code.

Note that the set_under and set_over need to be used with vmin and vmax when plotting. An example is shown below. Note the initial step is cyan, and last step is violet, with gradient transition in steps between.

pandas & numpy 数据处理

Powerful combo of pandas

manipulating data with mapping funcitons: apply, lambda, if

In previous post, difference between map, apply, applymap has been explained.

df.mode()‘ may return a dataframe, because it may have multiple modes for a series. A typical solution is ‘df.fillna(df.mode()[0])’

df.apply()‘ works for function operations on certain row or column
df.applymap()’ works for function operations on whole dataframe
df.map()‘works for dictionary replacement
df.groupby()’ is also commonly used with these functions for quick feature selections.

DataSci&MachLearn Workflow 1/4

In a brief:{Ref-Zhihu}

  1. For series we have map()
    map() and applymap() is more suitable for simple function with one input or a mapping dictionary. Usually applied with ‘lambda’
  2. apply() works for more complex functions and take multiple input with apply(func, args=(x,))

But here with combination of apply and lambda&if, we will put the productivity of pandas into a whole new level. Check out these examples:

Eg.1. Data selection

Output:

Eg.2. Info withdraw

Slicing

IndexSlice读取 比.loc[]的返回值更加规范 `

Ref

简书多重索引高阶用法
https://www.jianshu.com/p/760cd4f46c8d

检查多层索引

先转换为dataframe后再利用dataframe的功能进一步查询。

MultiIndex dataframe

通过.loc[]读取,.loc是优先使用label进行定位的,可以类似坐标索引实现多层下的寻找

读取行名

读取不重复的元素

随机抽取

重新生成索引

主要用于调整df的行/列空间数据,reindex中传入新的名称会对应修改行/列空间数据,比较适合快速调整df的数据或自动生成。如果只是修改个别索引的名称,建议实用df.rename()

https://vimsky.com/examples/usage/python-pandas-dataframe-reindex.html

df.rename() 重命名

mapper or dictionary can also be transferred to df.rename() function. Only index will be change without effect on data.

利用数据对应ID,浅析python pandas数据类型 DataFrame的 copy和= 操作的异同

df2 = df1.copy() create new object with new id df2 = df1 only create new object but the same id, and both won’t transfer the operation on df2 to df1.
df2 = df1.copy(deep=True) will bind df2 and df1 on operation as well.

df2 = df1.copy() create new object with new id df2 = df1 only create new object but the same id, and both won’t transfer the operation on df2 to df1.
df2 = df1.copy(deep=True) will bind df2 and df1 on operation as well.

‘is’ vs ‘==’

is return ‘True’ is the two item has identical ID and value.
== return ‘True’ when they have the same value.

miscellaneous

dataframe似乎有个b = a.group_by有点意思

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.