Python – pandas

pandas

Pandas is Python Data Analysis Library.
The feature is powerful and covers several features.

  • Data input, output CSV, Excel, RDB, HDF5
  • Store data to handle data
  • NaN handler
  • Extract data
  • pivot
  • Statistical analysis and regression
  • Group by

Basic Data Structure

Dimension Name
1 Series
2 DataFrame
3 Panel

Series

1 dimensional data

import pandas as pd
import numpy as np

# series
dat = pd.Series([1,3,6,12])

print(dat)

#0     1
#1     3
#2     6
#3    12
#dtype: int64

dat2 = pd.Series(np.array([1,3,np.nan, 12]))

print(dat2)

dat3 = pd.Series(['aa','bb','cc', 'd'])

dat4 = pd.Series([1,'aa', 2.34, 'd'])

dat5 = pd.Series([1,3,6,12], index=[1,10,20, 30])

print(dat5)

#1      1
#10     3
#20     6
#30    12
#dtype: int64

print(dat5[10])  # 3

dat7 = pd.Series({'a':1, 'b':3, 'c':6, 'd':12})
print(dat7)


print(dat7.iloc[2])  # 6

DataFrame

2 dim data

import pandas as pd
import numpy as np

# dict(list) to dataframe
dat1 = {'country': ['Japan', 'China', 'Korea', 'Vietnam'],
        'money': ['Yen', 'RMB', 'Won', 'Don'],
        'economic': [2, 1, 3,4]}

d = pd.DataFrame(dat1)

print(d.columns) # Index(['country', 'economic', 'money'], dtype='object')

d2 = pd.DataFrame(dat1, columns=['1_country', '2_money', '3_economy'])

print(d2)

# dataframe to series  extract one line
s1 = d['money']
print(s1)

Panel

3 dim data