A.I
포켓몬 찾기 본문
포켓몬 찾기¶
1. 포켓몬 데이터 구하기¶
- https://www.kaggle.com/abcsds/pokemon
- mkdir -p ~/aiffel/pokemon_eda/data
- wget https://aiffelstaticprd.blob.core.windows.net/media/documents/Pokemon.csv
- mv Pokemon.csv ~/aiffel/pokemon_eda/data
2. 데이터 불러오기¶
In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
print('슝=3')
슝=3
In [2]:
import os
csv_path = os.getenv("HOME") +"/aiffel/pokemon_eda/data/Pokemon.csv"
original_data = pd.read_csv(csv_path)
In [3]:
# 원본 데이터 복사
pokemon = original_data.copy()
print(pokemon.shape)
pokemon.head()
(800, 13)
Out[3]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
In [4]:
# 전설의 포켓몬 데이터셋
legendary = pokemon[pokemon["Legendary"] == True].reset_index(drop=True)
print(legendary.shape)
legendary.head()
(65, 13)
Out[4]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 144 | Articuno | Ice | Flying | 580 | 90 | 85 | 100 | 95 | 125 | 85 | 1 | True |
1 | 145 | Zapdos | Electric | Flying | 580 | 90 | 90 | 85 | 125 | 90 | 100 | 1 | True |
2 | 146 | Moltres | Fire | Flying | 580 | 90 | 100 | 90 | 125 | 85 | 90 | 1 | True |
3 | 150 | Mewtwo | Psychic | NaN | 680 | 106 | 110 | 90 | 154 | 90 | 130 | 1 | True |
4 | 150 | MewtwoMega Mewtwo X | Psychic | Fighting | 780 | 106 | 190 | 100 | 154 | 100 | 130 | 1 | True |
In [5]:
# 일반 포켓몬 데이터셋
ordinary = pokemon[pokemon["Legendary"] == False].reset_index(drop=True)
print(ordinary.shape)
ordinary.head()
(735, 13)
Out[5]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
In [6]:
# 빈 데이터(결측치) 확인
pokemon.isnull().sum()
Out[6]:
# 0 Name 0 Type 1 0 Type 2 386 Total 0 HP 0 Attack 0 Defense 0 Sp. Atk 0 Sp. Def 0 Speed 0 Generation 0 Legendary 0 dtype: int64
In [7]:
# 전체 컬럼 확인
print(len(pokemon.columns))
pokemon.columns
13
Out[7]:
Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'], dtype='object')
In [9]:
# 중복을 제외한 id 값
len(set(pokemon["#"]))
Out[9]:
721
In [10]:
pokemon[pokemon["#"] == 6]
Out[10]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 6 | Charizard | Fire | Flying | 534 | 78 | 84 | 78 | 109 | 85 | 100 | 1 | False |
7 | 6 | CharizardMega Charizard X | Fire | Dragon | 634 | 78 | 130 | 111 | 130 | 85 | 100 | 1 | False |
8 | 6 | CharizardMega Charizard Y | Fire | Flying | 634 | 78 | 104 | 78 | 159 | 115 | 100 | 1 | False |
In [11]:
# 이름 값
len(set(pokemon["Name"]))
Out[11]:
800
In [13]:
# 속성
pokemon.loc[[6, 10]]
Out[13]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 6 | Charizard | Fire | Flying | 534 | 78 | 84 | 78 | 109 | 85 | 100 | 1 | False |
10 | 8 | Wartortle | Water | NaN | 405 | 59 | 63 | 80 | 65 | 80 | 58 | 1 | False |
In [14]:
len(list(set(pokemon["Type 1"]))), len(list(set(pokemon["Type 2"])))
Out[14]:
(18, 19)
In [17]:
# type2에는 있고 1에는 없는 컬럼
set(pokemon["Type 2"]) - set(pokemon["Type 1"])
# set(pokemon["Type 2"]).difference(set(pokemon["Type 1"]))
Out[17]:
{nan}
In [18]:
types = list(set(pokemon["Type 1"]))
print(len(types))
print(types)
18 ['Steel', 'Ghost', 'Normal', 'Bug', 'Water', 'Fire', 'Grass', 'Dragon', 'Poison', 'Fairy', 'Fighting', 'Electric', 'Psychic', 'Rock', 'Ground', 'Dark', 'Ice', 'Flying']
In [21]:
# NaN 값을 가지는 개수
pokemon["Type 2"].isna().sum()
Out[21]:
386
In [24]:
plt.figure(figsize=(12, 10)) # 화면 해상도에 따라 그래프 크기를 조정해 주세요.
plt.subplot(211)
sns.countplot(data=ordinary, x="Type 1", order=types).set_xlabel('')
plt.title("[All Pokemons]")
plt.subplot(212)
sns.countplot(data=legendary, x="Type 1", order=types).set_xlabel('')
plt.title("[Legendary Pokemons]")
plt.show()
In [25]:
# Type1별로 Legendary 의 비율을 보여주는 피벗 테이블
# Flying 속성중에 0.5가 Legendary임을 알 수 있다.
pd.pivot_table(pokemon, index="Type 1", values="Legendary").sort_values(by=["Legendary"], ascending=False)
Out[25]:
Legendary | |
---|---|
Type 1 | |
Flying | 0.500000 |
Dragon | 0.375000 |
Psychic | 0.245614 |
Steel | 0.148148 |
Ground | 0.125000 |
Fire | 0.096154 |
Electric | 0.090909 |
Rock | 0.090909 |
Ice | 0.083333 |
Dark | 0.064516 |
Ghost | 0.062500 |
Fairy | 0.058824 |
Grass | 0.042857 |
Water | 0.035714 |
Normal | 0.020408 |
Poison | 0.000000 |
Fighting | 0.000000 |
Bug | 0.000000 |
In [26]:
plt.figure(figsize=(12, 10)) # 화면 해상도에 따라 그래프 크기를 조정해 주세요.
plt.subplot(211)
sns.countplot(data=ordinary, x="Type 2", order=types).set_xlabel('')
plt.title("[All Pokemons]")
plt.subplot(212)
sns.countplot(data=legendary, x="Type 2", order=types).set_xlabel('')
plt.title("[Legendary Pokemons]")
plt.show()
In [27]:
# Type2별로 Legendary 의 비율을 보여주는 피벗 테이블
pd.pivot_table(pokemon, index="Type 2", values="Legendary").sort_values(by=["Legendary"], ascending=False)
Out[27]:
Legendary | |
---|---|
Type 2 | |
Fire | 0.250000 |
Dragon | 0.222222 |
Ice | 0.214286 |
Electric | 0.166667 |
Fighting | 0.153846 |
Psychic | 0.151515 |
Flying | 0.134021 |
Fairy | 0.086957 |
Water | 0.071429 |
Ghost | 0.071429 |
Dark | 0.050000 |
Steel | 0.045455 |
Ground | 0.028571 |
Rock | 0.000000 |
Bug | 0.000000 |
Poison | 0.000000 |
Normal | 0.000000 |
Grass | 0.000000 |
In [28]:
stats = ["HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed"]
stats
Out[28]:
['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']
In [29]:
print("#0 pokemon: {}\n".format(pokemon.loc[0, "Name"]))
print("total: ", int(pokemon.loc[0, "Total"]))
print("stats: ", list(pokemon.loc[0, stats]))
print("sum of all stats: ", sum(list(pokemon.loc[0, stats])))
#0 pokemon: Bulbasaur total: 318 stats: [45, 49, 49, 65, 65, 45] sum of all stats: 318
In [31]:
# status의 총합이 일치하는 포켓몬 수와 총합이 같다. status는 가로로 더해야하므로 axis=1
sum(pokemon['Total'].values == pokemon[stats].values.sum(axis=1))
Out[31]:
800
In [32]:
fig, ax = plt.subplots()
fig.set_size_inches(12, 6) # 화면 해상도에 따라 그래프 크기를 조정해 주세요.
sns.scatterplot(data=pokemon, x="Type 1", y="Total", hue="Legendary")
plt.show()
# total 값을 비교해볼때 Legendary들의 값이 전체적으로 상위임을 알 수 있다.
In [34]:
# 세부 스탯별 그래프
figure, ((ax1, ax2), (ax3, ax4), (ax5, ax6)) = plt.subplots(nrows=3, ncols=2)
figure.set_size_inches(12, 18) # 화면 해상도에 따라 그래프 크기를 조정해 주세요.
sns.scatterplot(data=pokemon, y="Total", x="HP", hue="Legendary", ax=ax1)
sns.scatterplot(data=pokemon, y="Total", x="Attack", hue="Legendary", ax=ax2)
sns.scatterplot(data=pokemon, y="Total", x="Defense", hue="Legendary", ax=ax3)
sns.scatterplot(data=pokemon, y="Total", x="Sp. Atk", hue="Legendary", ax=ax4)
sns.scatterplot(data=pokemon, y="Total", x="Sp. Def", hue="Legendary", ax=ax5)
sns.scatterplot(data=pokemon, y="Total", x="Speed", hue="Legendary", ax=ax6)
plt.show()
In [35]:
# 세대별 분포
plt.figure(figsize=(12, 10)) # 화면 해상도에 따라 그래프 크기를 조정해 주세요.
plt.subplot(211)
sns.countplot(data=ordinary, x="Generation").set_xlabel('')
plt.title("[All Pkemons]")
plt.subplot(212)
sns.countplot(data=legendary, x="Generation").set_xlabel('')
plt.title("[Legendary Pkemons]")
plt.show()
In [36]:
# 전설의 total값
fig, ax = plt.subplots()
fig.set_size_inches(8, 4)
sns.scatterplot(data=legendary, y="Type 1", x="Total")
plt.show()
In [37]:
sorted(list(set(legendary["Total"])))
Out[37]:
[580, 600, 660, 670, 680, 700, 720, 770, 780]
In [38]:
fig, ax = plt.subplots()
fig.set_size_inches(8, 4)
sns.countplot(data=legendary, x="Total")
plt.show()
In [39]:
round(65 / 9, 2)
Out[39]:
7.22
In [40]:
# 일반 포켓몬의 total값
print(sorted(list(set(ordinary["Total"]))))
[180, 190, 194, 195, 198, 200, 205, 210, 213, 215, 218, 220, 224, 236, 237, 240, 244, 245, 250, 251, 253, 255, 260, 262, 263, 264, 265, 266, 269, 270, 273, 275, 278, 280, 281, 285, 288, 289, 290, 292, 294, 295, 299, 300, 302, 303, 304, 305, 306, 307, 308, 309, 310, 313, 314, 315, 316, 318, 319, 320, 323, 325, 328, 329, 330, 334, 335, 336, 340, 341, 345, 348, 349, 350, 351, 352, 355, 358, 360, 362, 363, 365, 369, 370, 371, 375, 380, 382, 384, 385, 390, 395, 400, 401, 405, 409, 410, 411, 413, 414, 415, 418, 420, 423, 424, 425, 428, 430, 431, 435, 438, 440, 442, 445, 446, 448, 450, 452, 454, 455, 456, 458, 460, 461, 462, 464, 465, 466, 467, 468, 470, 471, 472, 473, 474, 475, 479, 480, 481, 482, 483, 484, 485, 487, 488, 489, 490, 494, 495, 497, 498, 499, 500, 505, 507, 508, 509, 510, 514, 515, 518, 519, 520, 521, 523, 525, 528, 530, 531, 534, 535, 540, 545, 550, 552, 555, 560, 565, 567, 575, 579, 580, 590, 594, 600, 610, 615, 618, 625, 630, 634, 635, 640, 670, 700]
In [41]:
len(sorted(list(set(ordinary["Total"]))))
Out[41]:
195
In [42]:
round(735 / 195, 2)
Out[42]:
3.77
Total 컬럼 정리¶
- Total값의 다양성은 일반 포켓몬이 전설의 포켓몬보다 두 배 가까이 된다. 즉 전설의 포켓몬의 Total값은 다양하지 않다.
- 한 포켓몬의 Total 속성값이 전설의 포켓몬의 값들 집합에 포함되는지의 여부는 전설의 포켓몬임을 결정하는 데에 영향을 미친다.
- 전설의 포켓몬의 Total 값 중에는 일반 포켓몬이 가지지 못하는 Total값이 존재한다. ex) 680, 720, 770, 780
- Total값은 전설의 포켓몬인지 아닌지를 결정하는 데에 이러한 방식으로도 영향을 미칠 수 있다.
- Total값은 legendary인지 아닌지를 예측하는 데에 중요한 컬럼일 것이라는 결론을 내릴 수 있습니다.
In [43]:
n1, n2, n3, n4, n5 = legendary[3:6], legendary[14:24], legendary[25:29], legendary[46:50], legendary[52:57]
names = pd.concat([n1, n2, n3, n4, n5]).reset_index(drop=True)
names
Out[43]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 150 | Mewtwo | Psychic | NaN | 680 | 106 | 110 | 90 | 154 | 90 | 130 | 1 | True |
1 | 150 | MewtwoMega Mewtwo X | Psychic | Fighting | 780 | 106 | 190 | 100 | 154 | 100 | 130 | 1 | True |
2 | 150 | MewtwoMega Mewtwo Y | Psychic | NaN | 780 | 106 | 150 | 70 | 194 | 120 | 140 | 1 | True |
3 | 380 | Latias | Dragon | Psychic | 600 | 80 | 80 | 90 | 110 | 130 | 110 | 3 | True |
4 | 380 | LatiasMega Latias | Dragon | Psychic | 700 | 80 | 100 | 120 | 140 | 150 | 110 | 3 | True |
5 | 381 | Latios | Dragon | Psychic | 600 | 80 | 90 | 80 | 130 | 110 | 110 | 3 | True |
6 | 381 | LatiosMega Latios | Dragon | Psychic | 700 | 80 | 130 | 100 | 160 | 120 | 110 | 3 | True |
7 | 382 | Kyogre | Water | NaN | 670 | 100 | 100 | 90 | 150 | 140 | 90 | 3 | True |
8 | 382 | KyogrePrimal Kyogre | Water | NaN | 770 | 100 | 150 | 90 | 180 | 160 | 90 | 3 | True |
9 | 383 | Groudon | Ground | NaN | 670 | 100 | 150 | 140 | 100 | 90 | 90 | 3 | True |
10 | 383 | GroudonPrimal Groudon | Ground | Fire | 770 | 100 | 180 | 160 | 150 | 90 | 90 | 3 | True |
11 | 384 | Rayquaza | Dragon | Flying | 680 | 105 | 150 | 90 | 150 | 90 | 95 | 3 | True |
12 | 384 | RayquazaMega Rayquaza | Dragon | Flying | 780 | 105 | 180 | 100 | 180 | 100 | 115 | 3 | True |
13 | 386 | DeoxysNormal Forme | Psychic | NaN | 600 | 50 | 150 | 50 | 150 | 50 | 150 | 3 | True |
14 | 386 | DeoxysAttack Forme | Psychic | NaN | 600 | 50 | 180 | 20 | 180 | 20 | 150 | 3 | True |
15 | 386 | DeoxysDefense Forme | Psychic | NaN | 600 | 50 | 70 | 160 | 70 | 160 | 90 | 3 | True |
16 | 386 | DeoxysSpeed Forme | Psychic | NaN | 600 | 50 | 95 | 90 | 95 | 90 | 180 | 3 | True |
17 | 641 | TornadusIncarnate Forme | Flying | NaN | 580 | 79 | 115 | 70 | 125 | 80 | 111 | 5 | True |
18 | 641 | TornadusTherian Forme | Flying | NaN | 580 | 79 | 100 | 80 | 110 | 90 | 121 | 5 | True |
19 | 642 | ThundurusIncarnate Forme | Electric | Flying | 580 | 79 | 115 | 70 | 125 | 80 | 111 | 5 | True |
20 | 642 | ThundurusTherian Forme | Electric | Flying | 580 | 79 | 105 | 70 | 145 | 80 | 101 | 5 | True |
21 | 645 | LandorusIncarnate Forme | Ground | Flying | 600 | 89 | 125 | 90 | 115 | 80 | 101 | 5 | True |
22 | 645 | LandorusTherian Forme | Ground | Flying | 600 | 89 | 145 | 90 | 105 | 80 | 91 | 5 | True |
23 | 646 | Kyurem | Dragon | Ice | 660 | 125 | 130 | 90 | 130 | 90 | 95 | 5 | True |
24 | 646 | KyuremBlack Kyurem | Dragon | Ice | 700 | 125 | 170 | 100 | 120 | 90 | 95 | 5 | True |
25 | 646 | KyuremWhite Kyurem | Dragon | Ice | 700 | 125 | 120 | 90 | 170 | 100 | 95 | 5 | True |
In [44]:
formes = names[13:23]
formes
Out[44]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | 386 | DeoxysNormal Forme | Psychic | NaN | 600 | 50 | 150 | 50 | 150 | 50 | 150 | 3 | True |
14 | 386 | DeoxysAttack Forme | Psychic | NaN | 600 | 50 | 180 | 20 | 180 | 20 | 150 | 3 | True |
15 | 386 | DeoxysDefense Forme | Psychic | NaN | 600 | 50 | 70 | 160 | 70 | 160 | 90 | 3 | True |
16 | 386 | DeoxysSpeed Forme | Psychic | NaN | 600 | 50 | 95 | 90 | 95 | 90 | 180 | 3 | True |
17 | 641 | TornadusIncarnate Forme | Flying | NaN | 580 | 79 | 115 | 70 | 125 | 80 | 111 | 5 | True |
18 | 641 | TornadusTherian Forme | Flying | NaN | 580 | 79 | 100 | 80 | 110 | 90 | 121 | 5 | True |
19 | 642 | ThundurusIncarnate Forme | Electric | Flying | 580 | 79 | 115 | 70 | 125 | 80 | 111 | 5 | True |
20 | 642 | ThundurusTherian Forme | Electric | Flying | 580 | 79 | 105 | 70 | 145 | 80 | 101 | 5 | True |
21 | 645 | LandorusIncarnate Forme | Ground | Flying | 600 | 89 | 125 | 90 | 115 | 80 | 101 | 5 | True |
22 | 645 | LandorusTherian Forme | Ground | Flying | 600 | 89 | 145 | 90 | 105 | 80 | 91 | 5 | True |
In [45]:
legendary["name_count"] = legendary["Name"].apply(lambda i: len(i))
legendary.head()
Out[45]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 144 | Articuno | Ice | Flying | 580 | 90 | 85 | 100 | 95 | 125 | 85 | 1 | True | 8 |
1 | 145 | Zapdos | Electric | Flying | 580 | 90 | 90 | 85 | 125 | 90 | 100 | 1 | True | 6 |
2 | 146 | Moltres | Fire | Flying | 580 | 90 | 100 | 90 | 125 | 85 | 90 | 1 | True | 7 |
3 | 150 | Mewtwo | Psychic | NaN | 680 | 106 | 110 | 90 | 154 | 90 | 130 | 1 | True | 6 |
4 | 150 | MewtwoMega Mewtwo X | Psychic | Fighting | 780 | 106 | 190 | 100 | 154 | 100 | 130 | 1 | True | 19 |
In [46]:
ordinary["name_count"] = ordinary["Name"].apply(lambda i: len(i))
ordinary.head()
Out[46]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | 9 |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | 7 |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | 8 |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | 21 |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | 10 |
In [47]:
plt.figure(figsize=(12, 10)) # 화면 해상도에 따라 그래프 크기를 조정해 주세요.
plt.subplot(211)
sns.countplot(data=legendary, x="name_count").set_xlabel('')
plt.title("Legendary")
plt.subplot(212)
sns.countplot(data=ordinary, x="name_count").set_xlabel('')
plt.title("Ordinary")
plt.show()
In [48]:
print(round(len(legendary[legendary["name_count"] > 9]) / len(legendary) * 100, 2), "%")
41.54 %
In [49]:
print(round(len(ordinary[ordinary["name_count"] > 9]) / len(ordinary) * 100, 2), "%")
15.65 %
Name 정리¶
- 만약 "Latios"가 전설의 포켓몬이라면, "%%% Latios" 또한 전설의 포켓몬이다!
- 적어도 전설의 포켓몬에서 높은 빈도를 보이는 이름들의 모임이 존재한다!
- 전설의 포켓몬은 11자 이상의 긴 이름을 가졌을 확률이 높다!
3. 데이터 전처리¶
In [50]:
# 이름의 길이가 10이상인지
pokemon["name_count"] = pokemon["Name"].apply(lambda i: len(i))
pokemon.head()
Out[50]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | 9 |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | 7 |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | 8 |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | 21 |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | 10 |
In [51]:
# 10 이상이라면 분류
pokemon["long_name"] = pokemon["name_count"] >= 10
pokemon.head()
Out[51]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | long_name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | 9 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | 7 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | 8 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | 21 | True |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | 10 | True |
In [52]:
# isalpha는 띄어쓰기도 False로 분류하기때문에 제거
pokemon["Name_nospace"] = pokemon["Name"].apply(lambda i: i.replace(" ", ""))
pokemon.tail()
Out[52]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | long_name | Name_nospace | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
795 | 719 | Diancie | Rock | Fairy | 600 | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True | 7 | False | Diancie |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 700 | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True | 19 | True | DiancieMegaDiancie |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 600 | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True | 19 | True | HoopaHoopaConfined |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 680 | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True | 18 | True | HoopaHoopaUnbound |
799 | 721 | Volcanion | Fire | Water | 600 | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True | 9 | False | Volcanion |
In [53]:
pokemon["name_isalpha"] = pokemon["Name_nospace"].apply(lambda i: i.isalpha())
pokemon.head()
Out[53]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | long_name | Name_nospace | name_isalpha | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | 9 | False | Bulbasaur | True |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | 7 | False | Ivysaur | True |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | 8 | False | Venusaur | True |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | 21 | True | VenusaurMegaVenusaur | True |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | 10 | True | Charmander | True |
In [54]:
# 알파벳으로 이루어지지않은 포켓몬
print(pokemon[pokemon["name_isalpha"] == False].shape)
pokemon[pokemon["name_isalpha"] == False]
(9, 17)
Out[54]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | long_name | Name_nospace | name_isalpha | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34 | 29 | Nidoran♀ | Poison | NaN | 275 | 55 | 47 | 52 | 40 | 40 | 41 | 1 | False | 8 | False | Nidoran♀ | False |
37 | 32 | Nidoran♂ | Poison | NaN | 273 | 46 | 57 | 40 | 40 | 40 | 50 | 1 | False | 8 | False | Nidoran♂ | False |
90 | 83 | Farfetch'd | Normal | Flying | 352 | 52 | 65 | 55 | 58 | 62 | 60 | 1 | False | 10 | True | Farfetch'd | False |
131 | 122 | Mr. Mime | Psychic | Fairy | 460 | 40 | 45 | 65 | 100 | 120 | 90 | 1 | False | 8 | False | Mr.Mime | False |
252 | 233 | Porygon2 | Normal | NaN | 515 | 85 | 80 | 90 | 105 | 95 | 60 | 2 | False | 8 | False | Porygon2 | False |
270 | 250 | Ho-oh | Fire | Flying | 680 | 106 | 130 | 90 | 110 | 154 | 90 | 2 | True | 5 | False | Ho-oh | False |
487 | 439 | Mime Jr. | Psychic | Fairy | 310 | 20 | 25 | 45 | 70 | 90 | 60 | 4 | False | 8 | False | MimeJr. | False |
525 | 474 | Porygon-Z | Normal | NaN | 535 | 85 | 80 | 70 | 135 | 75 | 90 | 4 | False | 9 | False | Porygon-Z | False |
794 | 718 | Zygarde50% Forme | Dragon | Ground | 600 | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True | 16 | True | Zygarde50%Forme | False |
In [55]:
pokemon = pokemon.replace(to_replace="Nidoran♀", value="Nidoran X")
pokemon = pokemon.replace(to_replace="Nidoran♂", value="Nidoran Y")
pokemon = pokemon.replace(to_replace="Farfetch'd", value="Farfetchd")
pokemon = pokemon.replace(to_replace="Mr. Mime", value="Mr Mime")
pokemon = pokemon.replace(to_replace="Porygon2", value="Porygon")
pokemon = pokemon.replace(to_replace="Ho-oh", value="Ho Oh")
pokemon = pokemon.replace(to_replace="Mime Jr.", value="Mime Jr")
pokemon = pokemon.replace(to_replace="Porygon-Z", value="Porygon Z")
pokemon = pokemon.replace(to_replace="Zygarde50% Forme", value="Zygarde Forme")
pokemon.loc[[34, 37, 90, 131, 252, 270, 487, 525, 794]]
Out[55]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | long_name | Name_nospace | name_isalpha | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
34 | 29 | Nidoran X | Poison | NaN | 275 | 55 | 47 | 52 | 40 | 40 | 41 | 1 | False | 8 | False | Nidoran X | False |
37 | 32 | Nidoran Y | Poison | NaN | 273 | 46 | 57 | 40 | 40 | 40 | 50 | 1 | False | 8 | False | Nidoran Y | False |
90 | 83 | Farfetchd | Normal | Flying | 352 | 52 | 65 | 55 | 58 | 62 | 60 | 1 | False | 10 | True | Farfetchd | False |
131 | 122 | Mr Mime | Psychic | Fairy | 460 | 40 | 45 | 65 | 100 | 120 | 90 | 1 | False | 8 | False | Mr.Mime | False |
252 | 233 | Porygon | Normal | NaN | 515 | 85 | 80 | 90 | 105 | 95 | 60 | 2 | False | 8 | False | Porygon | False |
270 | 250 | Ho Oh | Fire | Flying | 680 | 106 | 130 | 90 | 110 | 154 | 90 | 2 | True | 5 | False | Ho Oh | False |
487 | 439 | Mime Jr | Psychic | Fairy | 310 | 20 | 25 | 45 | 70 | 90 | 60 | 4 | False | 8 | False | MimeJr. | False |
525 | 474 | Porygon Z | Normal | NaN | 535 | 85 | 80 | 70 | 135 | 75 | 90 | 4 | False | 9 | False | Porygon Z | False |
794 | 718 | Zygarde Forme | Dragon | Ground | 600 | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True | 16 | True | Zygarde50%Forme | False |
In [57]:
# 알파벳으로 이루어지지 않은 컬럼이 0
pokemon["Name_nospace"] = pokemon["Name"].apply(lambda i: i.replace(" ", ""))
pokemon["name_isalpha"] = pokemon["Name_nospace"].apply(lambda i: i.isalpha())
pokemon[pokemon["name_isalpha"] == False]
Out[57]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | name_count | long_name | Name_nospace | name_isalpha |
---|
이름을 띄어쓰기, 대문자 기준으로 토큰화¶
In [59]:
import re
name = "CharizardMega Charizard X"
name_split = name.split(" ")
name_split
temp = name_split[0]
temp
Out[59]:
'CharizardMega'
In [60]:
# 대문자로 시작하고 소문자가 붙는 패턴의 반복
tokens = re.findall('[A-Z][a-z]*', temp)
tokens
Out[60]:
['Charizard', 'Mega']
In [61]:
tokens = []
for part_name in name_split:
a = re.findall('[A-Z][a-z]*', part_name)
tokens.extend(a)
tokens
Out[61]:
['Charizard', 'Mega', 'Charizard', 'X']
In [62]:
# 분리 토큰화 함수
def tokenize(name):
name_split = name.split(" ")
tokens = []
for part_name in name_split:
a = re.findall('[A-Z][a-z]*', part_name)
tokens.extend(a)
return np.array(tokens)
In [63]:
name = "CharizardMega Charizard X"
tokenize(name)
Out[63]:
array(['Charizard', 'Mega', 'Charizard', 'X'], dtype='<U9')
In [64]:
all_tokens = list(legendary["Name"].apply(tokenize).values)
token_set = []
for token in all_tokens:
token_set.extend(token)
print(len(set(token_set)))
print(token_set)
65 ['Articuno', 'Zapdos', 'Moltres', 'Mewtwo', 'Mewtwo', 'Mega', 'Mewtwo', 'X', 'Mewtwo', 'Mega', 'Mewtwo', 'Y', 'Raikou', 'Entei', 'Suicune', 'Lugia', 'Ho', 'Regirock', 'Regice', 'Registeel', 'Latias', 'Latias', 'Mega', 'Latias', 'Latios', 'Latios', 'Mega', 'Latios', 'Kyogre', 'Kyogre', 'Primal', 'Kyogre', 'Groudon', 'Groudon', 'Primal', 'Groudon', 'Rayquaza', 'Rayquaza', 'Mega', 'Rayquaza', 'Jirachi', 'Deoxys', 'Normal', 'Forme', 'Deoxys', 'Attack', 'Forme', 'Deoxys', 'Defense', 'Forme', 'Deoxys', 'Speed', 'Forme', 'Uxie', 'Mesprit', 'Azelf', 'Dialga', 'Palkia', 'Heatran', 'Regigigas', 'Giratina', 'Altered', 'Forme', 'Giratina', 'Origin', 'Forme', 'Darkrai', 'Shaymin', 'Land', 'Forme', 'Shaymin', 'Sky', 'Forme', 'Arceus', 'Victini', 'Cobalion', 'Terrakion', 'Virizion', 'Tornadus', 'Incarnate', 'Forme', 'Tornadus', 'Therian', 'Forme', 'Thundurus', 'Incarnate', 'Forme', 'Thundurus', 'Therian', 'Forme', 'Reshiram', 'Zekrom', 'Landorus', 'Incarnate', 'Forme', 'Landorus', 'Therian', 'Forme', 'Kyurem', 'Kyurem', 'Black', 'Kyurem', 'Kyurem', 'White', 'Kyurem', 'Xerneas', 'Yveltal', 'Zygarde', 'Forme', 'Diancie', 'Diancie', 'Mega', 'Diancie', 'Hoopa', 'Hoopa', 'Confined', 'Hoopa', 'Hoopa', 'Unbound', 'Volcanion']
list 또는 set의 자료형에서¶
- 각 요소의 개수를 다루고 싶을 때에는 collection
- 순서가 있는 딕셔너리인 OrderedDict
- 요소의 개수를 카운트하는 Counter
In [66]:
from collections import Counter
a = [1, 1, 0, 0, 0, 1, 1, 2, 3]
Counter(a)
Out[66]:
Counter({1: 4, 0: 3, 2: 1, 3: 1})
In [67]:
Counter(a).most_common()
Out[67]:
[(1, 4), (0, 3), (2, 1), (3, 1)]
In [68]:
# 특정 이름값이 포함되어있다면 전설일 확률이 높다
most_common = Counter(token_set).most_common(10)
most_common
Out[68]:
[('Forme', 15), ('Mega', 6), ('Mewtwo', 5), ('Kyurem', 5), ('Deoxys', 4), ('Hoopa', 4), ('Latias', 3), ('Latios', 3), ('Kyogre', 3), ('Groudon', 3)]
In [70]:
for token, _ in most_common:
pokemon[token] = pokemon["Name"].str.contains(token)
pokemon.head(10)
Out[70]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | ... | Forme | Mega | Mewtwo | Kyurem | Deoxys | Hoopa | Latias | Latios | Kyogre | Groudon | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | ... | False | False | False | False | False | False | False | False | False | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | ... | False | False | False | False | False | False | False | False | False | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | ... | False | False | False | False | False | False | False | False | False | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | ... | False | True | False | False | False | False | False | False | False | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | ... | False | False | False | False | False | False | False | False | False | False |
5 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | ... | False | False | False | False | False | False | False | False | False | False |
6 | 6 | Charizard | Fire | Flying | 534 | 78 | 84 | 78 | 109 | 85 | ... | False | False | False | False | False | False | False | False | False | False |
7 | 6 | CharizardMega Charizard X | Fire | Dragon | 634 | 78 | 130 | 111 | 130 | 85 | ... | False | True | False | False | False | False | False | False | False | False |
8 | 6 | CharizardMega Charizard Y | Fire | Flying | 634 | 78 | 104 | 78 | 159 | 115 | ... | False | True | False | False | False | False | False | False | False | False |
9 | 7 | Squirtle | Water | NaN | 314 | 44 | 48 | 65 | 50 | 64 | ... | False | False | False | False | False | False | False | False | False | False |
10 rows × 27 columns
Type 1, 2 범주형 데이터 전처리¶
In [71]:
print(types)
['Steel', 'Ghost', 'Normal', 'Bug', 'Water', 'Fire', 'Grass', 'Dragon', 'Poison', 'Fairy', 'Fighting', 'Electric', 'Psychic', 'Rock', 'Ground', 'Dark', 'Ice', 'Flying']
In [72]:
# 원-핫 인코딩 ( True = 1 , False = 0 )
for t in types:
pokemon[t] = (pokemon["Type 1"] == t) | (pokemon["Type 2"] == t)
pokemon[[["Type 1", "Type 2"] + types][0]].head()
Out[72]:
Type 1 | Type 2 | Steel | Ghost | Normal | Bug | Water | Fire | Grass | Dragon | Poison | Fairy | Fighting | Electric | Psychic | Rock | Ground | Dark | Ice | Flying | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Grass | Poison | False | False | False | False | False | False | True | False | True | False | False | False | False | False | False | False | False | False |
1 | Grass | Poison | False | False | False | False | False | False | True | False | True | False | False | False | False | False | False | False | False | False |
2 | Grass | Poison | False | False | False | False | False | False | True | False | True | False | False | False | False | False | False | False | False | False |
3 | Grass | Poison | False | False | False | False | False | False | True | False | True | False | False | False | False | False | False | False | False | False |
4 | Fire | NaN | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False |
4. 베이스라인 모델 만들기¶
In [73]:
print(original_data.shape)
original_data.head()
(800, 13)
Out[73]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
In [74]:
original_data.columns
Out[74]:
Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'], dtype='object')
의미 없는 컬럼인 #와 문자열 데이터인 Name, Type 1, Type 2 데이터는 제외, Target인 Legendary도 제외¶
In [75]:
features = ['Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation']
In [76]:
target = 'Legendary'
In [77]:
# 학습을 위한 X에 담는다
X = original_data[features]
print(X.shape)
X.head()
(800, 8)
Out[77]:
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | |
---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 |
In [79]:
# Target 데이터 담기
y = original_data[target]
print(y.shape)
y.head()
(800,)
Out[79]:
0 False 1 False 2 False 3 False 4 False Name: Legendary, dtype: bool
In [82]:
# 데이터 분리
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
(640, 8) (640,) (160, 8) (160,)
5. 의사결정트리로 모델학습¶
In [83]:
from sklearn.tree import DecisionTreeClassifier
print('슝=3')
슝=3
In [84]:
model = DecisionTreeClassifier(random_state=25)
model
Out[84]:
DecisionTreeClassifier(random_state=25)
In [85]:
# 학습
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
In [86]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)
Out[86]:
array([[144, 3], [ 5, 8]])
- TN (True Negative) : 옳게 판단한 Negative, 즉 일반 포켓몬을 일반 포켓몬이라고 알맞게 판단한 경우입니다.
- FP (False Positive) : 틀리게 판단한 Positive, 즉 일반 포켓몬을 전설의 포켓몬이라고 잘못 판단한 경우입니다.
- FN (False Negative) : 틀리게 판단한 Negative, 즉 전설의 포켓몬을 일반 포켓몬이라고 잘못 판단한 경우입니다.
- TP (True Positive) : 옳게 판단한 Positive, 즉 전설의 포켓몬을 전설의 포켓몬이라고 알맞게 판단한 경우입니다.
In [87]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
precision recall f1-score support False 0.97 0.98 0.97 147 True 0.73 0.62 0.67 13 accuracy 0.95 160 macro avg 0.85 0.80 0.82 160 weighted avg 0.95 0.95 0.95 160
- Recall은 TP/(FN+TP)인데 0.62로 낮다는것은 FN값이 크다는것을 의미한다
- 이런 정답이 적은 불균형 데이터에서는 Positive값을 잘 잡아내는 것이 중요
6. 전처리한 데이터로 학습¶
In [88]:
print(len(pokemon.columns))
print(pokemon.columns)
45 Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary', 'name_count', 'long_name', 'Name_nospace', 'name_isalpha', 'Forme', 'Mega', 'Mewtwo', 'Kyurem', 'Deoxys', 'Hoopa', 'Latias', 'Latios', 'Kyogre', 'Groudon', 'Steel', 'Ghost', 'Normal', 'Bug', 'Water', 'Fire', 'Grass', 'Dragon', 'Poison', 'Fairy', 'Fighting', 'Electric', 'Psychic', 'Rock', 'Ground', 'Dark', 'Ice', 'Flying'], dtype='object')
제외해야할 컬럼¶
"#" : ID에 해당하는 데이터로, index의 의미 외에 특별한 의미가 담긴 특징이 아니기 때문에 제외한다.
"Name" : 문자열 데이터로, 전처리를 통해 "name_count"와 "long_name", 그리고 15개의 토큰 컬럼으로 대체되었다.
"name_nospace", "name_isalpha" : 전처리를 위해 필요했던 컬럼으로 분류 분석에는 필요하지 않다.
"Type 1" & "Type 2" : 속성은 원-핫 인코딩으로 처리했다.
"Legendary" : 이 컬럼은 target 데이터이므로 모델이 학습하는 "X" 데이터에는 넣지 않고, "y" 데이터로 사용한다.
In [89]:
features = ['Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation',
'name_count', 'long_name', 'Forme', 'Mega', 'Mewtwo', 'Kyurem', 'Deoxys', 'Hoopa',
'Latias', 'Latios', 'Kyogre', 'Groudon', 'Poison', 'Water', 'Steel', 'Grass',
'Bug', 'Normal', 'Fire', 'Fighting', 'Electric', 'Psychic', 'Ghost', 'Ice',
'Rock', 'Dark', 'Flying', 'Ground', 'Dragon', 'Fairy']
len(features)
Out[89]:
38
In [90]:
target = "Legendary"
target
Out[90]:
'Legendary'
In [91]:
X = pokemon[features]
print(X.shape)
X.head()
(800, 38)
Out[91]:
Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | name_count | long_name | ... | Electric | Psychic | Ghost | Ice | Rock | Dark | Flying | Ground | Dragon | Fairy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | 9 | False | ... | False | False | False | False | False | False | False | False | False | False |
1 | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | 7 | False | ... | False | False | False | False | False | False | False | False | False | False |
2 | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | 8 | False | ... | False | False | False | False | False | False | False | False | False | False |
3 | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | 21 | True | ... | False | False | False | False | False | False | False | False | False | False |
4 | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | 10 | True | ... | False | False | False | False | False | False | False | False | False | False |
5 rows × 38 columns
In [92]:
y = pokemon[target]
print(y.shape)
y.head()
(800,)
Out[92]:
0 False 1 False 2 False 3 False 4 False Name: Legendary, dtype: bool
In [93]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)
(640, 38) (640,) (160, 38) (160,)
In [94]:
model = DecisionTreeClassifier(random_state=25)
model
Out[94]:
DecisionTreeClassifier(random_state=25)
In [95]:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print('슝=3')
슝=3
In [96]:
confusion_matrix(y_test, y_pred)
Out[96]:
array([[141, 6], [ 1, 12]])
In [97]:
print(classification_report(y_test, y_pred))
precision recall f1-score support False 0.99 0.96 0.98 147 True 0.67 0.92 0.77 13 accuracy 0.96 160 macro avg 0.83 0.94 0.87 160 weighted avg 0.97 0.96 0.96 160
'파이썬 & AI 학습' 카테고리의 다른 글
파이썬 문법 (0) | 2021.02.01 |
---|---|
비지도 학습( Unsupervised Learning ) (0) | 2021.01.29 |
주사위 만들기 (0) | 2021.01.25 |
sklearn의 이해 (0) | 2021.01.22 |
파이썬으로 그래프 그리기 (0) | 2021.01.20 |