Pandas 可以用來做什麼#

用於處理二維資料的數據分析,通常用於資料前處理、group聚和運算

結構上分為以下 3 種:

  • Series

  • DataFrame

  • Panel (較不常用)

import pandas as pd 

# setup options for maximum rows displayed
pd.set_option("display.max_rows", 10)

## read youbike data from json as a DataFrame
youBike = pd.read_json('./data/youbike_immediate.json')
youBike
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated. Intel oneAPI Math Kernel Library 2025.0 will require Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.
sno sna tot sbi sarea mday lat lng ar sareaen snaen aren bemp act srcUpdateTime updateTime infoTime infoDate
0 500101001 YouBike2.0_捷運科技大樓站 28 2 大安區 2024-02-22 09:19:19 25.02605 121.54360 復興南路二段235號前 Daan Dist. YouBike2.0_MRT Technology Bldg. Sta. No.235, Sec. 2, Fuxing S. Rd. 26 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:19 2024-02-22
1 500101002 YouBike2.0_復興南路二段273號前 21 1 大安區 2024-02-22 09:14:18 25.02565 121.54357 復興南路二段273號西側 Daan Dist. YouBike2.0_No.273, Sec. 2, Fuxing S. Rd. No.273, Sec. 2, Fuxing S. Rd. (West) 20 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:14:18 2024-02-22
2 500101003 YouBike2.0_國北教大實小東側門 16 3 大安區 2024-02-22 09:16:19 25.02429 121.54124 和平東路二段96巷7號 Daan Dist. YouBike2.0_NTUE Experiment Elementary School (... No. 7, Ln. 96, Sec. 2, Heping E. Rd 13 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:16:19 2024-02-22
3 500101004 YouBike2.0_和平公園東側 11 1 大安區 2024-02-22 09:17:22 25.02351 121.54282 和平東路二段118巷33號 Daan Dist. YouBike2.0_Heping Park (East) No. 33, Ln. 118, Sec. 2, Heping E. Rd 10 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:17:22 2024-02-22
4 500101005 YouBike2.0_辛亥復興路口西北側 16 2 大安區 2024-02-22 09:16:14 25.02153 121.54299 復興南路二段368號 Daan Dist. YouBike2.0_Xinhai Fuxing Rd. Intersection (Nor... No. 368, Sec. 2, Fuxing S. Rd. 14 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:16:14 2024-02-22
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1329 500119087 YouBike2.0_臺大總圖書館西南側 30 2 臺大公館校區 2024-02-22 09:19:18 25.01690 121.54031 臺大圖書館西南側 NTU Dist YouBike2.0_NTU Main Library(Southwest) NTU Main Library(Southwest) 28 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:18 2024-02-22
1330 500119088 YouBike2.0_臺大黑森林西側 20 18 臺大公館校區 2024-02-22 09:19:19 25.01995 121.54347 臺大霖澤館南側 NTU Dist YouBike2.0_NTU Black Forest(West) NTU Tsai Lecture Hall(South) 2 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:19 2024-02-22
1331 500119089 YouBike2.0_臺大獸醫館南側 24 24 臺大公館校區 2024-02-22 09:19:14 25.01791 121.54242 臺大獸醫系館南側 NTU Dist YouBike2.0_NTU Dept. of Veterinary Medicine(So... NTU Dept. of Veterinary Medicine(South) 0 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:14 2024-02-22
1332 500119090 YouBike2.0_臺大新體育館東南側 40 16 臺大公館校區 2024-02-22 09:15:18 25.02112 121.53591 臺大體育館東側 NTU Dist YouBike2.0_NTU Sports Center(Southeast) NTU Sports Center(East) 24 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:15:18 2024-02-22
1333 500119091 YouBike2.0_臺大明達館北側(員工宿舍) 18 0 臺大公館校區 2024-02-22 08:32:34 25.01816 121.54469 明達館北側前空地 NTU Dist YouBike2.0_NTU Ming - Da Hall (Staff Dormitory) NTU Ming - Da Hall (North) 18 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 08:32:34 2024-02-22

1334 rows × 18 columns

Show data with conditions#

請將 youbike 的 dataFrame 將 act (啟用狀態)為 true 的站點顯示出來,並且 display 資料數量

## select on act equal to one 
activeStation = youBike[youBike.act == 1]
# use len to display total length of DataFrame
print('active stations counts: ',len(activeStation))
activeStation
active stations counts:  1325
sno sna tot sbi sarea mday lat lng ar sareaen snaen aren bemp act srcUpdateTime updateTime infoTime infoDate
0 500101001 YouBike2.0_捷運科技大樓站 28 2 大安區 2024-02-22 09:19:19 25.02605 121.54360 復興南路二段235號前 Daan Dist. YouBike2.0_MRT Technology Bldg. Sta. No.235, Sec. 2, Fuxing S. Rd. 26 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:19 2024-02-22
1 500101002 YouBike2.0_復興南路二段273號前 21 1 大安區 2024-02-22 09:14:18 25.02565 121.54357 復興南路二段273號西側 Daan Dist. YouBike2.0_No.273, Sec. 2, Fuxing S. Rd. No.273, Sec. 2, Fuxing S. Rd. (West) 20 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:14:18 2024-02-22
2 500101003 YouBike2.0_國北教大實小東側門 16 3 大安區 2024-02-22 09:16:19 25.02429 121.54124 和平東路二段96巷7號 Daan Dist. YouBike2.0_NTUE Experiment Elementary School (... No. 7, Ln. 96, Sec. 2, Heping E. Rd 13 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:16:19 2024-02-22
3 500101004 YouBike2.0_和平公園東側 11 1 大安區 2024-02-22 09:17:22 25.02351 121.54282 和平東路二段118巷33號 Daan Dist. YouBike2.0_Heping Park (East) No. 33, Ln. 118, Sec. 2, Heping E. Rd 10 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:17:22 2024-02-22
4 500101005 YouBike2.0_辛亥復興路口西北側 16 2 大安區 2024-02-22 09:16:14 25.02153 121.54299 復興南路二段368號 Daan Dist. YouBike2.0_Xinhai Fuxing Rd. Intersection (Nor... No. 368, Sec. 2, Fuxing S. Rd. 14 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:16:14 2024-02-22
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1329 500119087 YouBike2.0_臺大總圖書館西南側 30 2 臺大公館校區 2024-02-22 09:19:18 25.01690 121.54031 臺大圖書館西南側 NTU Dist YouBike2.0_NTU Main Library(Southwest) NTU Main Library(Southwest) 28 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:18 2024-02-22
1330 500119088 YouBike2.0_臺大黑森林西側 20 18 臺大公館校區 2024-02-22 09:19:19 25.01995 121.54347 臺大霖澤館南側 NTU Dist YouBike2.0_NTU Black Forest(West) NTU Tsai Lecture Hall(South) 2 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:19 2024-02-22
1331 500119089 YouBike2.0_臺大獸醫館南側 24 24 臺大公館校區 2024-02-22 09:19:14 25.01791 121.54242 臺大獸醫系館南側 NTU Dist YouBike2.0_NTU Dept. of Veterinary Medicine(So... NTU Dept. of Veterinary Medicine(South) 0 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:14 2024-02-22
1332 500119090 YouBike2.0_臺大新體育館東南側 40 16 臺大公館校區 2024-02-22 09:15:18 25.02112 121.53591 臺大體育館東側 NTU Dist YouBike2.0_NTU Sports Center(Southeast) NTU Sports Center(East) 24 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:15:18 2024-02-22
1333 500119091 YouBike2.0_臺大明達館北側(員工宿舍) 18 0 臺大公館校區 2024-02-22 08:32:34 25.01816 121.54469 明達館北側前空地 NTU Dist YouBike2.0_NTU Ming - Da Hall (Staff Dormitory) NTU Ming - Da Hall (North) 18 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 08:32:34 2024-02-22

1325 rows × 18 columns

Rename column in DataFrame#

請將 youbike 的 dataFrame 欄位定義似乎有點亂,我想要將欄位名稱改掉,名稱對應如下:

  • sno: station_no

  • sna: station_name

  • tot: total

  • sbi: available

  • bemp: empty

  • act: active

  • sarea: district

## 重新命名 data column
renameDict = {
    'sno':'station_no',
    'sna':'station_name',
    'tot':'total',
    'sbi': 'available',
    'bemp': 'empty',
    'act': 'active',
    'sarea': 'district'
}
## 重新命名欄位 (axis - 0: index, 1: column)
#renameBike = youBike.rename(mapper=renameDict,axis=1)
renameBike = youBike.rename(columns=renameDict)
renameBike
station_no station_name total available district mday lat lng ar sareaen snaen aren empty active srcUpdateTime updateTime infoTime infoDate
0 500101001 YouBike2.0_捷運科技大樓站 28 2 大安區 2024-02-22 09:19:19 25.02605 121.54360 復興南路二段235號前 Daan Dist. YouBike2.0_MRT Technology Bldg. Sta. No.235, Sec. 2, Fuxing S. Rd. 26 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:19 2024-02-22
1 500101002 YouBike2.0_復興南路二段273號前 21 1 大安區 2024-02-22 09:14:18 25.02565 121.54357 復興南路二段273號西側 Daan Dist. YouBike2.0_No.273, Sec. 2, Fuxing S. Rd. No.273, Sec. 2, Fuxing S. Rd. (West) 20 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:14:18 2024-02-22
2 500101003 YouBike2.0_國北教大實小東側門 16 3 大安區 2024-02-22 09:16:19 25.02429 121.54124 和平東路二段96巷7號 Daan Dist. YouBike2.0_NTUE Experiment Elementary School (... No. 7, Ln. 96, Sec. 2, Heping E. Rd 13 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:16:19 2024-02-22
3 500101004 YouBike2.0_和平公園東側 11 1 大安區 2024-02-22 09:17:22 25.02351 121.54282 和平東路二段118巷33號 Daan Dist. YouBike2.0_Heping Park (East) No. 33, Ln. 118, Sec. 2, Heping E. Rd 10 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:17:22 2024-02-22
4 500101005 YouBike2.0_辛亥復興路口西北側 16 2 大安區 2024-02-22 09:16:14 25.02153 121.54299 復興南路二段368號 Daan Dist. YouBike2.0_Xinhai Fuxing Rd. Intersection (Nor... No. 368, Sec. 2, Fuxing S. Rd. 14 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:16:14 2024-02-22
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1329 500119087 YouBike2.0_臺大總圖書館西南側 30 2 臺大公館校區 2024-02-22 09:19:18 25.01690 121.54031 臺大圖書館西南側 NTU Dist YouBike2.0_NTU Main Library(Southwest) NTU Main Library(Southwest) 28 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:18 2024-02-22
1330 500119088 YouBike2.0_臺大黑森林西側 20 18 臺大公館校區 2024-02-22 09:19:19 25.01995 121.54347 臺大霖澤館南側 NTU Dist YouBike2.0_NTU Black Forest(West) NTU Tsai Lecture Hall(South) 2 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:19 2024-02-22
1331 500119089 YouBike2.0_臺大獸醫館南側 24 24 臺大公館校區 2024-02-22 09:19:14 25.01791 121.54242 臺大獸醫系館南側 NTU Dist YouBike2.0_NTU Dept. of Veterinary Medicine(So... NTU Dept. of Veterinary Medicine(South) 0 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:19:14 2024-02-22
1332 500119090 YouBike2.0_臺大新體育館東南側 40 16 臺大公館校區 2024-02-22 09:15:18 25.02112 121.53591 臺大體育館東側 NTU Dist YouBike2.0_NTU Sports Center(Southeast) NTU Sports Center(East) 24 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 09:15:18 2024-02-22
1333 500119091 YouBike2.0_臺大明達館北側(員工宿舍) 18 0 臺大公館校區 2024-02-22 08:32:34 25.01816 121.54469 明達館北側前空地 NTU Dist YouBike2.0_NTU Ming - Da Hall (Staff Dormitory) NTU Ming - Da Hall (North) 18 1 2024-02-22 09:20:32 2024-02-22 09:20:52 2024-02-22 08:32:34 2024-02-22

1334 rows × 18 columns

Show specific columns only#

承上題的將欄位重新命名,並且依照資料我只需要在 Dataframe 顯示以下欄位名稱

  • station_no

  • station_name

  • total

  • available

  • empty

  • district

  • mday

feature = ['station_no', 'station_name','total', 'available','empty', 'district', 'mday','active']

# renameBike.active = renameBike.active == 1 
renameBike.active.astype('bool') # 將 Series 型別改為 boolean
# exract data through loc
bikeInfo = renameBike.loc[0:100,feature]

bikeInfo
station_no station_name total available empty district mday active
0 500101001 YouBike2.0_捷運科技大樓站 28 2 26 大安區 2024-02-22 09:19:19 1
1 500101002 YouBike2.0_復興南路二段273號前 21 1 20 大安區 2024-02-22 09:14:18 1
2 500101003 YouBike2.0_國北教大實小東側門 16 3 13 大安區 2024-02-22 09:16:19 1
3 500101004 YouBike2.0_和平公園東側 11 1 10 大安區 2024-02-22 09:17:22 1
4 500101005 YouBike2.0_辛亥復興路口西北側 16 2 14 大安區 2024-02-22 09:16:14 1
... ... ... ... ... ... ... ... ...
96 500101151 YouBike2.0_大安高工 23 0 23 大安區 2024-02-22 09:16:14 1
97 500101152 YouBike2.0_捷運忠孝新生站(3號出口) 23 1 22 大安區 2024-02-22 09:17:22 1
98 500101153 YouBike2.0_瑠公公園 20 4 16 大安區 2024-02-22 09:16:19 1
99 500101154 YouBike2.0_敦化市民路口 23 10 12 大安區 2024-02-22 09:13:18 1
100 500101155 YouBike2.0_捷運大安森林公園站(5號出口) 15 1 14 大安區 2024-02-22 09:19:19 1

101 rows × 8 columns

## extract data through index-based locations
## with axis 0 as Index, axis 1 as Column
bikeInfo_iloc = renameBike.iloc[0:101,[0,1,2,3,4,5,12,13]]
bikeInfo_iloc
station_no station_name total available district mday empty active
0 500101001 YouBike2.0_捷運科技大樓站 28 2 大安區 2024-02-22 09:19:19 26 1
1 500101002 YouBike2.0_復興南路二段273號前 21 1 大安區 2024-02-22 09:14:18 20 1
2 500101003 YouBike2.0_國北教大實小東側門 16 3 大安區 2024-02-22 09:16:19 13 1
3 500101004 YouBike2.0_和平公園東側 11 1 大安區 2024-02-22 09:17:22 10 1
4 500101005 YouBike2.0_辛亥復興路口西北側 16 2 大安區 2024-02-22 09:16:14 14 1
... ... ... ... ... ... ... ... ...
96 500101151 YouBike2.0_大安高工 23 0 大安區 2024-02-22 09:16:14 23 1
97 500101152 YouBike2.0_捷運忠孝新生站(3號出口) 23 1 大安區 2024-02-22 09:17:22 22 1
98 500101153 YouBike2.0_瑠公公園 20 4 大安區 2024-02-22 09:16:19 16 1
99 500101154 YouBike2.0_敦化市民路口 23 10 大安區 2024-02-22 09:13:18 12 1
100 500101155 YouBike2.0_捷運大安森林公園站(5號出口) 15 1 大安區 2024-02-22 09:19:19 14 1

101 rows × 8 columns

由運算產生新的欄位#

承上題的將欄位新增一個欄位為站點的使用率,名稱為 renting_rate 用 empty 除以 total

## Series 的 empty 是 function,因為撞名稱,故改用以下方式計算
renameBike['renting_rate'] = renameBike['empty'] / renameBike['total']
if 'renting_rate' not in feature: 
    feature.append('renting_rate')
bikes = renameBike.loc[:,feature]

bikes
station_no station_name total available empty district mday active renting_rate
0 500101001 YouBike2.0_捷運科技大樓站 28 2 26 大安區 2024-02-22 09:19:19 1 0.928571
1 500101002 YouBike2.0_復興南路二段273號前 21 1 20 大安區 2024-02-22 09:14:18 1 0.952381
2 500101003 YouBike2.0_國北教大實小東側門 16 3 13 大安區 2024-02-22 09:16:19 1 0.812500
3 500101004 YouBike2.0_和平公園東側 11 1 10 大安區 2024-02-22 09:17:22 1 0.909091
4 500101005 YouBike2.0_辛亥復興路口西北側 16 2 14 大安區 2024-02-22 09:16:14 1 0.875000
... ... ... ... ... ... ... ... ... ...
1329 500119087 YouBike2.0_臺大總圖書館西南側 30 2 28 臺大公館校區 2024-02-22 09:19:18 1 0.933333
1330 500119088 YouBike2.0_臺大黑森林西側 20 18 2 臺大公館校區 2024-02-22 09:19:19 1 0.100000
1331 500119089 YouBike2.0_臺大獸醫館南側 24 24 0 臺大公館校區 2024-02-22 09:19:14 1 0.000000
1332 500119090 YouBike2.0_臺大新體育館東南側 40 16 24 臺大公館校區 2024-02-22 09:15:18 1 0.600000
1333 500119091 YouBike2.0_臺大明達館北側(員工宿舍) 18 0 18 臺大公館校區 2024-02-22 08:32:34 1 1.000000

1334 rows × 9 columns

多條件查詢#

  • 查詢特定區域(district)並且找出 renting_rate 小於等於 0.35 的站點資訊

  • 我想要讓 renting_rate 由小到大顯示

## 搜尋區域為大安區且使用率小於 0.35 的站點
bikes[bikes.district.isin(['信義區'])].sort_values(by='renting_rate', ascending=True)
station_no station_name total available empty district mday active renting_rate
1183 500112080 YouBike2.0_福德國小公車站 10 7 0 信義區 2024-02-22 09:18:18 1 0.000000
1108 500112002 YouBike2.0_基隆路二段159巷口 14 14 0 信義區 2024-02-22 09:04:14 1 0.000000
1137 500112031 YouBike2.0_市府松壽路口 28 28 0 信義區 2024-02-22 09:01:14 1 0.000000
1146 500112042 YouBike2.0_松高路(信義新天地A4館) 17 17 0 信義區 2024-02-22 08:54:14 1 0.000000
1139 500112033 YouBike2.0_市民廣場 44 39 5 信義區 2024-02-22 09:19:19 1 0.113636
... ... ... ... ... ... ... ... ... ...
1189 500112086 YouBike2.0_五分埔公園 18 0 18 信義區 2024-02-22 09:08:15 1 1.000000
1159 500112055 YouBike2.0_基隆光復路口 21 0 21 信義區 2024-02-22 09:12:14 1 1.000000
1175 500112071 YouBike2.0_吳興街260巷 32 0 32 信義區 2024-02-22 09:12:14 1 1.000000
1197 500112094 YouBike2.0_吳興街284巷(廣安宮) 15 0 15 信義區 2024-02-22 09:13:14 1 1.000000
1107 500112001 YouBike2.0_黎忠區民活動中心 16 0 16 信義區 2024-02-22 09:17:22 1 1.000000

103 rows × 9 columns

產生一個 Series 進行運算#

## 將 renting_rate 與其平均數相減
centeredRate = bikes.renting_rate - bikes.renting_rate.mean()

centeredRate
0       0.266979
1       0.290789
2       0.150908
3       0.247499
4       0.213408
          ...   
1329    0.271741
1330   -0.561592
1331   -0.661592
1332   -0.061592
1333    0.338408
Name: renting_rate, Length: 1334, dtype: float64

顯示該 Series 中最大值,並返回其 index 數字#

bikes[bikes.district.isin(['大安區'])].sort_values(by='renting_rate', ascending=True)

## idxmax 將其中一個 series 中最大值給找出來
idmx = bikes.renting_rate.idxmax()


bikes.loc[idmx,['station_name', 'district', 'renting_rate']]
station_name    YouBike2.0_復興南路二段340巷口
district                           大安區
renting_rate                       1.0
Name: 6, dtype: object
## 用 group by 找尋資料關聯性
bikes[['district','renting_rate']].groupby('district').mean().sort_values('renting_rate',ascending=False)
renting_rate
district
大安區 0.730549
信義區 0.716201
文山區 0.681383
南港區 0.670463
內湖區 0.670445
... ...
中正區 0.646898
中山區 0.627093
大同區 0.609082
臺大公館校區 0.590644
萬華區 0.564889

13 rows × 1 columns