Kimuksung
Kimuksung 안녕하세요. 분산처리에 관심이 많은 생각하는 주니어 Data Enginner입니다.

AWS Airflow(MWAA) 설치하기

AWS Airflow(MWAA) 설치하기

안녕하세요

오늘은 AWS 인프라를 구축하면서 AWS Airflow인 MWAA를 설치하는 과정에 대해서 설명드리려 합니다.

간단하게 AWS CLI를 사용하여 구축하였습니다.

Install MWAA

1. AWS CLI

1-1) AWS CLI 설치
1
2
3
4
5
6
7
8
$ curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"
$ sudo softwareupdate --install-rosetta
$ sudo installer -pkg AWSCLIV2.pkg -target /

# 제대로 동작하는지 체크
$ which aws
$ aws --version
> aws-cli/2.8.5 Python/3.9.11 Darwin/21.6.0 exe/x86_64 prompt/off
1-2) Certification 설정
  • aws 사용자 계정에서 본인의 계정의 Certification -> ( access_key_id , access_key ) 값을 가져와 아래 코드 처럼 입력하여 줍니다.
  • AWS Certification을 모른다면?

  • 참조 Document
  • CLI 아래와 같이 설정
    1
    2
    3
    4
    5
    
      $ aws configure
      AWS Access Key ID [None]: id
      AWS Secret Access Key [None]: key
      Default region name [None]: ap-northeast-2
      Default output format [None]: json
    
2. MWAA

2-1) download or save mwaa-environment-public-network.yml - 링크
  • 환경 설정
    • stack-name = cli 넘겨줄 값
    • EnvironmentName = 구성할 이름을 지정한 value
    • 네트워크 세팅
      • VPC => VpcCIDR
      • Default 값은 2개의 Public Subnet과 2개의 Private Subnet으로 구성됩니다.
    • WorkerNode 세팅
      • MaxWorkerNodes
      • Default = 2

  • Resource
    • VPC, InternetGateway
      • value 값에 맞추어 VPC,InternetGateway 생성합니다.
    • NatGateway
      • Public Subnet에 추가하여 연결
      • Private Subnet을 외부와 연결해주기 위한 용도
    • RouteTable
      • public 1개
      • private 2개
    • SecurityGroup
      • VPC Security Group 생성
    • EnvironmentBucket
      • S3 버킷 생성

  • MWAA 설정
    • SecurityGroup
      • Resource VPC 연결
    • IAM Role
    • MwaaExecutionPolicy

  • code

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    
      AWSTemplateFormatVersion: "2010-09-09"
        
      Parameters:
        
        EnvironmentName:
          Description: An environment name that is prefixed to resource names
          Type: String
          Default: MWAAEnvironment
      # Network Setting
        VpcCIDR:
          Description: The IP range (CIDR notation) for this VPC
          Type: String
          Default: 10.192.0.0/16
        
        PublicSubnet1CIDR:
          Description: The IP range (CIDR notation) for the public subnet in the first Availability Zone
          Type: String
          Default: 10.192.10.0/24
        
        PublicSubnet2CIDR:
          Description: The IP range (CIDR notation) for the public subnet in the second Availability Zone
          Type: String
          Default: 10.192.11.0/24
        
        PrivateSubnet1CIDR:
          Description: The IP range (CIDR notation) for the private subnet in the first Availability Zone
          Type: String
          Default: 10.192.20.0/24
        PrivateSubnet2CIDR:
          Description: The IP range (CIDR notation) for the private subnet in the second Availability Zone
          Type: String
          Default: 10.192.21.0/24
        # 최대 worker 수
        MaxWorkerNodes:
          Description: The maximum number of workers that can run in the environment
          Type: Number
          Default: 2
        # Log 구성
        DagProcessingLogs:
          Description: Log level for DagProcessing
          Type: String
          Default: INFO
        SchedulerLogsLevel:
          Description: Log level for SchedulerLogs
          Type: String
          Default: INFO
        TaskLogsLevel:
          Description: Log level for TaskLogs
          Type: String
          Default: INFO
        WorkerLogsLevel:
          Description: Log level for WorkerLogs
          Type: String
          Default: INFO
        WebserverLogsLevel:
          Description: Log level for WebserverLogs
          Type: String
          Default: INFO
        
      Resources:
        #####################################################################################################################
        # CREATE VPC
        #####################################################################################################################
        
        VPC:
          Type: AWS::EC2::VPC
          Properties:
            CidrBlock: !Ref VpcCIDR
            EnableDnsSupport: true
            EnableDnsHostnames: true
            Tags:
              - Key: Name
                # vpc에서 보이는 이름 설정 값
                Value: MWAAEnvironment
        
        InternetGateway:
          Type: AWS::EC2::InternetGateway
          Properties:
            Tags:
              - Key: Name
                # 인터넷 게이트웨이에 보이는 이름 설정 값
                Value: MWAAEnvironment
        
        InternetGatewayAttachment:
          Type: AWS::EC2::VPCGatewayAttachment
          Properties:
            InternetGatewayId: !Ref InternetGateway
            VpcId: !Ref VPC
        
        PublicSubnet1:
          Type: AWS::EC2::Subnet
          Properties:
            VpcId: !Ref VPC
            AvailabilityZone: !Select [ 0, !GetAZs '' ]
            CidrBlock: !Ref PublicSubnet1CIDR
            MapPublicIpOnLaunch: true
            Tags:
              - Key: Name
                # Subnet 이름
                Value: !Sub ${EnvironmentName} Public Subnet (AZ1)
        
        PublicSubnet2:
          Type: AWS::EC2::Subnet
          Properties:
            VpcId: !Ref VPC
            AvailabilityZone: !Select [ 1, !GetAZs  '' ]
            CidrBlock: !Ref PublicSubnet2CIDR
            MapPublicIpOnLaunch: true
            Tags:
              - Key: Name
                # Subnet 이름
                Value: !Sub ${EnvironmentName} Public Subnet (AZ2)
        
        PrivateSubnet1:
          Type: AWS::EC2::Subnet
          Properties:
            VpcId: !Ref VPC
            AvailabilityZone: !Select [ 0, !GetAZs  '' ]
            CidrBlock: !Ref PrivateSubnet1CIDR
            MapPublicIpOnLaunch: false
            Tags:
              - Key: Name
                # Subnet 이름
                Value: !Sub ${EnvironmentName} Private Subnet (AZ1)
        
        PrivateSubnet2:
          Type: AWS::EC2::Subnet
          Properties:
            VpcId: !Ref VPC
            AvailabilityZone: !Select [ 1, !GetAZs  '' ]
            CidrBlock: !Ref PrivateSubnet2CIDR
            MapPublicIpOnLaunch: false
            Tags:
              - Key: Name
                # Subnet 이름
                Value: !Sub ${EnvironmentName} Private Subnet (AZ2)
        
        NatGateway1EIP:
          Type: AWS::EC2::EIP
          DependsOn: InternetGatewayAttachment
          Properties:
            Domain: vpc
        
        NatGateway2EIP:
          Type: AWS::EC2::EIP
          DependsOn: InternetGatewayAttachment
          Properties:
            Domain: vpc
        
        NatGateway1:
          Type: AWS::EC2::NatGateway
          Properties:
            AllocationId: !GetAtt NatGateway1EIP.AllocationId
            SubnetId: !Ref PublicSubnet1
        
        NatGateway2:
          Type: AWS::EC2::NatGateway
          Properties:
            AllocationId: !GetAtt NatGateway2EIP.AllocationId
            SubnetId: !Ref PublicSubnet2
        
        PublicRouteTable:
          Type: AWS::EC2::RouteTable
          Properties:
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                # Route Table Name
                Value: !Sub ${EnvironmentName} Public Routes
        
        DefaultPublicRoute:
          Type: AWS::EC2::Route
          DependsOn: InternetGatewayAttachment
          Properties:
            RouteTableId: !Ref PublicRouteTable
            DestinationCidrBlock: 0.0.0.0/0
            GatewayId: !Ref InternetGateway
        
        PublicSubnet1RouteTableAssociation:
          Type: AWS::EC2::SubnetRouteTableAssociation
          Properties:
            RouteTableId: !Ref PublicRouteTable
            SubnetId: !Ref PublicSubnet1
        
        PublicSubnet2RouteTableAssociation:
          Type: AWS::EC2::SubnetRouteTableAssociation
          Properties:
            RouteTableId: !Ref PublicRouteTable
            SubnetId: !Ref PublicSubnet2
        
        PrivateRouteTable1:
          Type: AWS::EC2::RouteTable
          Properties:
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                # Route Table Name
                Value: !Sub ${EnvironmentName} Private Routes (AZ1)
        
        DefaultPrivateRoute1:
          Type: AWS::EC2::Route
          Properties:
            RouteTableId: !Ref PrivateRouteTable1
            DestinationCidrBlock: 0.0.0.0/0
            NatGatewayId: !Ref NatGateway1
        
        PrivateSubnet1RouteTableAssociation:
          Type: AWS::EC2::SubnetRouteTableAssociation
          Properties:
            RouteTableId: !Ref PrivateRouteTable1
            SubnetId: !Ref PrivateSubnet1
        
        PrivateRouteTable2:
          Type: AWS::EC2::RouteTable
          Properties:
            VpcId: !Ref VPC
            Tags:
              - Key: Name
                # Route Table Name
                Value: !Sub ${EnvironmentName} Private Routes (AZ2)
        
        DefaultPrivateRoute2:
          Type: AWS::EC2::Route
          Properties:
            RouteTableId: !Ref PrivateRouteTable2
            DestinationCidrBlock: 0.0.0.0/0
            NatGatewayId: !Ref NatGateway2
        
        PrivateSubnet2RouteTableAssociation:
          Type: AWS::EC2::SubnetRouteTableAssociation
          Properties:
            RouteTableId: !Ref PrivateRouteTable2
            SubnetId: !Ref PrivateSubnet2
        
        SecurityGroup:
          Type: AWS::EC2::SecurityGroup
          Properties:
            GroupName: "mwaa-security-group"
            GroupDescription: "Security group with a self-referencing inbound rule."
            VpcId: !Ref VPC
        
        SecurityGroupIngress:
          Type: AWS::EC2::SecurityGroupIngress
          Properties:
            GroupId: !Ref SecurityGroup
            IpProtocol: "-1"
            SourceSecurityGroupId: !Ref SecurityGroup
        
        EnvironmentBucket:
          Type: AWS::S3::Bucket
          Properties:
            VersioningConfiguration:
              Status: Enabled
            PublicAccessBlockConfiguration: 
              BlockPublicAcls: true
              BlockPublicPolicy: true
              IgnorePublicAcls: true
              RestrictPublicBuckets: true
        
        #####################################################################################################################
        # CREATE MWAA
        #####################################################################################################################
        
        MwaaEnvironment:
          Type: AWS::MWAA::Environment
          DependsOn: MwaaExecutionPolicy
          Properties:
            # ${AWS::Region} = aws/config region
            # ${AWS::AccountId} = ??
            # {AWS::StackName} = command 시 stack-name 값
            # 알파벳으로 시작하여야 한다.
            Name: !Sub "${AWS::StackName}-MwaaEnvironment"
            SourceBucketArn: !GetAtt EnvironmentBucket.Arn
            ExecutionRoleArn: !GetAtt MwaaExecutionRole.Arn
            DagS3Path: dags
            NetworkConfiguration:
              SecurityGroupIds:
                - !GetAtt SecurityGroup.GroupId
              SubnetIds:
                - !Ref PrivateSubnet1
                - !Ref PrivateSubnet2
            WebserverAccessMode: PUBLIC_ONLY
            MaxWorkers: !Ref MaxWorkerNodes
            LoggingConfiguration:
              DagProcessingLogs:
                LogLevel: !Ref DagProcessingLogs
                Enabled: true
              SchedulerLogs:
                LogLevel: !Ref SchedulerLogsLevel
                Enabled: true
              TaskLogs:
                LogLevel: !Ref TaskLogsLevel
                Enabled: true
              WorkerLogs:
                LogLevel: !Ref WorkerLogsLevel
                Enabled: true
              WebserverLogs:
                LogLevel: !Ref WebserverLogsLevel
                Enabled: true
        SecurityGroup:
          Type: AWS::EC2::SecurityGroup
          Properties:
            VpcId: !Ref VPC
            GroupDescription: !Sub "Security Group for Amazon MWAA Environment ${AWS::StackName}-MwaaEnvironment"
            GroupName: !Sub "airflow-security-group-${AWS::StackName}-MwaaEnvironment"
          
        SecurityGroupIngress:
          Type: AWS::EC2::SecurityGroupIngress
          Properties:
            GroupId: !Ref SecurityGroup
            IpProtocol: "-1"
            SourceSecurityGroupId: !Ref SecurityGroup
        
        SecurityGroupEgress:
          Type: AWS::EC2::SecurityGroupEgress
          Properties:
            GroupId: !Ref SecurityGroup
            IpProtocol: "-1"
            CidrIp: "0.0.0.0/0"
        
        MwaaExecutionRole:
          Type: AWS::IAM::Role
          Properties:
            AssumeRolePolicyDocument:
              Version: 2012-10-17
              Statement:
                - Effect: Allow
                  Principal:
                    Service:
                      - airflow-env.amazonaws.com
                      - airflow.amazonaws.com
                  Action:
                   - "sts:AssumeRole"
            Path: "/service-role/"
        
        MwaaExecutionPolicy:
          DependsOn: EnvironmentBucket
          Type: AWS::IAM::ManagedPolicy
          Properties:
            Roles:
              - !Ref MwaaExecutionRole
            PolicyDocument:
              Version: 2012-10-17
              Statement:
                - Effect: Allow
                  Action: airflow:PublishMetrics
                  Resource:
                    - !Sub "arn:aws:airflow:${AWS::Region}:${AWS::AccountId}:environment/${EnvironmentName}"
                - Effect: Deny
                  Action: s3:ListAllMyBuckets
                  Resource:
                    - !Sub "${EnvironmentBucket.Arn}"
                    - !Sub "${EnvironmentBucket.Arn}/*"
        
                - Effect: Allow
                  Action:
                    - "s3:GetObject*"
                    - "s3:GetBucket*"
                    - "s3:List*"
                  Resource:
                    - !Sub "${EnvironmentBucket.Arn}"
                    - !Sub "${EnvironmentBucket.Arn}/*"
                - Effect: Allow
                  Action:
                    - logs:DescribeLogGroups
                  Resource: "*"
        
                - Effect: Allow
                  Action:
                    - logs:CreateLogStream
                    - logs:CreateLogGroup
                    - logs:PutLogEvents
                    - logs:GetLogEvents
                    - logs:GetLogRecord
                    - logs:GetLogGroupFields
                    - logs:GetQueryResults
                    - logs:DescribeLogGroups
                  Resource:
                    - !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:airflow-${AWS::StackName}*"
                - Effect: Allow
                  Action: cloudwatch:PutMetricData
                  Resource: "*"
                - Effect: Allow
                  Action:
                    - sqs:ChangeMessageVisibility
                    - sqs:DeleteMessage
                    - sqs:GetQueueAttributes
                    - sqs:GetQueueUrl
                    - sqs:ReceiveMessage
                    - sqs:SendMessage
                  Resource:
                    - !Sub "arn:aws:sqs:${AWS::Region}:*:airflow-celery-*"
                - Effect: Allow
                  Action:
                    - kms:Decrypt
                    - kms:DescribeKey
                    - "kms:GenerateDataKey*"
                    - kms:Encrypt
                  NotResource: !Sub "arn:aws:kms:*:${AWS::AccountId}:key/*"
                  Condition:
                    StringLike:
                      "kms:ViaService":
                        - !Sub "sqs.${AWS::Region}.amazonaws.com"
      Outputs:
        VPC:
          Description: A reference to the created VPC
          Value: !Ref VPC
        
        PublicSubnets:
          Description: A list of the public subnets
          Value: !Join [ ",", [ !Ref PublicSubnet1, !Ref PublicSubnet2 ]]
        
        PrivateSubnets:
          Description: A list of the private subnets
          Value: !Join [ ",", [ !Ref PrivateSubnet1, !Ref PrivateSubnet2 ]]
        
        PublicSubnet1:
          Description: A reference to the public subnet in the 1st Availability Zone
          Value: !Ref PublicSubnet1
        
        PublicSubnet2:
          Description: A reference to the public subnet in the 2nd Availability Zone
          Value: !Ref PublicSubnet2
        
        PrivateSubnet1:
          Description: A reference to the private subnet in the 1st Availability Zone
          Value: !Ref PrivateSubnet1
        
        PrivateSubnet2:
          Description: A reference to the private subnet in the 2nd Availability Zone
          Value: !Ref PrivateSubnet2
        
        SecurityGroupIngress:
          Description: Security group with self-referencing inbound rule
          Value: !Ref SecurityGroupIngress
        
        MwaaApacheAirflowUI:
          Description: MWAA Environment
          Value: !Sub  "https://${MwaaEnvironment.WebserverUrl}"
    
2-2) Create Stack with AWS CLI
  • file:// 은 필수 값
  • download file split data = - 이지만 코드 명령어는 _
1
2
$ cd file # move yml file
$ aws cloudformation create-stack --stack-name mwaa-environment-public-network --template-body file://mwaa_public_network.yml --capabilities CAPABILITY_IAM
2-3) 결과
  • 아래와 같이 설치된 것을 볼 수 있습니다.

https://ifh.cc/g/RF070R.png