Python – pandas
pandas
Pandas is Python Data Analysis Library.
The feature is powerful and covers several features.
- Data input, output CSV, Excel, RDB, HDF5
- Store data to handle data
- NaN handler
- Extract data
- pivot
- Statistical analysis and regression
- Group by
Basic Data Structure
Dimension | Name |
---|---|
1 | Series |
2 | DataFrame |
3 | Panel |
Series
1 dimensional data
import pandas as pd import numpy as np # series dat = pd.Series([1,3,6,12]) print(dat) #0 1 #1 3 #2 6 #3 12 #dtype: int64 dat2 = pd.Series(np.array([1,3,np.nan, 12])) print(dat2) dat3 = pd.Series(['aa','bb','cc', 'd']) dat4 = pd.Series([1,'aa', 2.34, 'd']) dat5 = pd.Series([1,3,6,12], index=[1,10,20, 30]) print(dat5) #1 1 #10 3 #20 6 #30 12 #dtype: int64 print(dat5[10]) # 3 dat7 = pd.Series({'a':1, 'b':3, 'c':6, 'd':12}) print(dat7) print(dat7.iloc[2]) # 6
DataFrame
2 dim data
import pandas as pd import numpy as np # dict(list) to dataframe dat1 = {'country': ['Japan', 'China', 'Korea', 'Vietnam'], 'money': ['Yen', 'RMB', 'Won', 'Don'], 'economic': [2, 1, 3,4]} d = pd.DataFrame(dat1) print(d.columns) # Index(['country', 'economic', 'money'], dtype='object') d2 = pd.DataFrame(dat1, columns=['1_country', '2_money', '3_economy']) print(d2) # dataframe to series extract one line s1 = d['money'] print(s1)
Panel
3 dim data