ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [pandas] group으로 묶기
    Python 2020. 9. 24. 12:53

    Score

    Group으로 묶어서 특정 과목 점수가 높은 학생 출력하기

    import pandas as pd
    
    # DataFrame 생성
    score = pd.DataFrame({
        "Name": ["Jueun", "Jieun", "Terry", "Mits"],
        "Gender": ["girl", "girl", "boy", "boy"],
        "Class": [1, 2, 2, 1],
        "Total": [0, 0, 0, 0],
        "Eng": [50, 90, 40, 70],
        "kor": [30, 70, 90, 100]
    })
    
    # Name을 score의 인덱스로 설정해줌
    score = score.set_index(["Name"])
    
    # Total column을 채워줌(kor + Eng = Total)
    for index in range(4):
      print("====={}=====".format(index))
      score.iloc[index, 2] = int(score.iloc[index, 3:].sum())
      print(score.iloc[index])
    
    # Group 짓기(Class를 기준으로 묶기)
    grouped = score.groupby("Class")
    
    # 각 Class에서 Eng 과목 점수가 제일 높은 사람 출력하기
    for group_name, group_data in grouped:
      eng_king = score[(score["Class"]==group_name) & (score["Eng"]==group_data["Eng"].max())]
      print(eng_king)
      print("=" * 100)
     

     

    #출력 결과:
         Gender  Class  Total  Eng  Kor
    Name                               
    Mits    boy      1  170.0   70  100
    ==================================================================================
          Gender  Class  Total  Eng  Kor
    Name                                
    Jieun   girl      2  160.0   90   70
    ==================================================================================

     

    Titanic

    생존자의 비율을 Class별로 알아보기

    import pandas as pd
    
    # 시각화 패키지
    import seaborn as sns 
    
    titanic = sns.load_dataset('titanic')
    
    # 모든 row에 대한 column 선택적으로 보기
    df = titanic.loc[:, ['age', 'sex', 'class', 'fare', 'survived']]
    
    print('승객 수:', len(df)) 
    # 승객 수: 891
    
    # 그룹짓기
    grouped = df.groupby(['class'])
    
    # 출력
    for key, group in grouped:
      print('* key :', key)
      print('*number :', len(group))
      print(group.head())
      print('\n')

    3개의 클래스로 구분되기 때문에 3개의 그룹이 나온다.

    * key : First
    *number : 216
         age     sex  class     fare  survived
    1   38.0  female  First  71.2833         1
    3   35.0  female  First  53.1000         1
    6   54.0    male  First  51.8625         0
    11  58.0  female  First  26.5500         1
    23  28.0    male  First  35.5000         1
    
    
    * key : Second
    *number : 184
         age     sex   class     fare  survived
    9   14.0  female  Second  30.0708         1
    15  55.0  female  Second  16.0000         1
    17   NaN    male  Second  13.0000         1
    20  35.0    male  Second  26.0000         0
    21  34.0    male  Second  13.0000         1
    
    
    * key : Third
    *number : 491
        age     sex  class     fare  survived
    0  22.0    male  Third   7.2500         0
    2  26.0  female  Third   7.9250         1
    4  35.0    male  Third   8.0500         0
    5   NaN    male  Third   8.4583         0
    7   2.0    male  Third  21.0750         0

    각 그룹별 생존 비율

     

    # 평균 출력
    
    average = grouped.mean()
    print(average)

    출력결과

                  age       fare  survived
    class                                 
    First   38.233441  84.154687  0.629630
    Second  29.877630  20.662183  0.472826
    Third   25.140620  13.675550  0.242363

     


    생존자의 비율을 Class, Gender로 알아보기

    # 탑승 클래스와 성별을 기준으로 그룹 나누기 
    grouped_two = df.groupby(['class', 'sex'])
    
    # 출력
    for key, group in grouped_two:
      print('* key :', key)
      print('* number :', len(group))
      print(group.head())
      print('\n')

    3개의 클래스 * 2개의 성별로 구분되기 때문에 6개의 그룹이 나온다.

    * key : ('First', 'female')
    * number : 94
         age     sex  class      fare  survived
    1   38.0  female  First   71.2833         1
    3   35.0  female  First   53.1000         1
    11  58.0  female  First   26.5500         1
    31   NaN  female  First  146.5208         1
    52  49.0  female  First   76.7292         1
    
    
    * key : ('First', 'male')
    * number : 122
         age   sex  class      fare  survived
    6   54.0  male  First   51.8625         0
    23  28.0  male  First   35.5000         1
    27  19.0  male  First  263.0000         0
    30  40.0  male  First   27.7208         0
    34  28.0  male  First   82.1708         0
    
    
    * key : ('Second', 'female')
    * number : 76
         age     sex   class     fare  survived
    9   14.0  female  Second  30.0708         1
    15  55.0  female  Second  16.0000         1
    41  27.0  female  Second  21.0000         0
    43   3.0  female  Second  41.5792         1
    53  29.0  female  Second  26.0000         1
    
    
    * key : ('Second', 'male')
    * number : 108
         age   sex   class  fare  survived
    17   NaN  male  Second  13.0         1
    20  35.0  male  Second  26.0         0
    21  34.0  male  Second  13.0         1
    33  66.0  male  Second  10.5         0
    70  32.0  male  Second  10.5         0
    
    
    * key : ('Third', 'female')
    * number : 144
         age     sex  class     fare  survived
    2   26.0  female  Third   7.9250         1
    8   27.0  female  Third  11.1333         1
    10   4.0  female  Third  16.7000         1
    14  14.0  female  Third   7.8542         0
    18  31.0  female  Third  18.0000         0
    
    
    * key : ('Third', 'male')
    * number : 347
         age   sex  class     fare  survived
    0   22.0  male  Third   7.2500         0
    4   35.0  male  Third   8.0500         0
    5    NaN  male  Third   8.4583         0
    7    2.0  male  Third  21.0750         0
    12  20.0  male  Third   8.0500         0

    각 그룹별 생존 비율

    # 각 그룹별로 생존 비율 알아보기
    average_two = grouped_two.mean()
    print(average_two)
    print('\n')
    print(type(average_two))

    출력결과

                         age        fare  survived
    class  sex                                    
    First  female  34.611765  106.125798  0.968085
           male    41.281386   67.226127  0.368852
    Second female  28.722973   21.970121  0.921053
           male    30.740707   19.741782  0.157407
    Third  female  21.750000   16.118810  0.500000
           male    26.507589   12.661633  0.135447
    
    
    <class 'pandas.core.frame.DataFrame'>

    삼등칸 여성승객의 생존 비율

    # 6개의 그룹으로 나뉘어진 grouped_two에서 삼등칸 여성승객의 정보만 골라서 가져온다.
    # group3f 변수에 저장
    group3f = grouped_two.get_group(('Third', 'female'))
    print(group3.head())
    
    # 출력
       age     sex  class     fare  survived
    0  22.0    male  Third   7.2500         0
    2  26.0  female  Third   7.9250         1
    4  35.0    male  Third   8.0500         0
    5   NaN    male  Third   8.4583         0
    7   2.0    male  Third  21.0750         0

    탑승 Class 가격 비교

    # class의 정보중에 fare만 조회
    df.groupby(["class"]).std()
    class
    First     78.380373
    Second    13.417399
    Third     11.778142
    Name: fare, dtype: float64

     

    댓글

Designed by Tistory + Edited by Juepark