Python – pandas
pandas
Pandas is Python Data Analysis Library.
The feature is powerful and covers several features.
- Data input, output CSV, Excel, RDB, HDF5
- Store data to handle data
- NaN handler
- Extract data
- pivot
- Statistical analysis and regression
- Group by
Basic Data Structure
| Dimension | Name |
|---|---|
| 1 | Series |
| 2 | DataFrame |
| 3 | Panel |
Series
1 dimensional data
import pandas as pd
import numpy as np
# series
dat = pd.Series([1,3,6,12])
print(dat)
#0 1
#1 3
#2 6
#3 12
#dtype: int64
dat2 = pd.Series(np.array([1,3,np.nan, 12]))
print(dat2)
dat3 = pd.Series(['aa','bb','cc', 'd'])
dat4 = pd.Series([1,'aa', 2.34, 'd'])
dat5 = pd.Series([1,3,6,12], index=[1,10,20, 30])
print(dat5)
#1 1
#10 3
#20 6
#30 12
#dtype: int64
print(dat5[10]) # 3
dat7 = pd.Series({'a':1, 'b':3, 'c':6, 'd':12})
print(dat7)
print(dat7.iloc[2]) # 6
DataFrame
2 dim data
import pandas as pd
import numpy as np
# dict(list) to dataframe
dat1 = {'country': ['Japan', 'China', 'Korea', 'Vietnam'],
'money': ['Yen', 'RMB', 'Won', 'Don'],
'economic': [2, 1, 3,4]}
d = pd.DataFrame(dat1)
print(d.columns) # Index(['country', 'economic', 'money'], dtype='object')
d2 = pd.DataFrame(dat1, columns=['1_country', '2_money', '3_economy'])
print(d2)
# dataframe to series extract one line
s1 = d['money']
print(s1)
Panel
3 dim data
