实验9: EMR auto scaling

本动手实验Region以 us-east-2 为例,您也可以选择您需要的其他region。

创建EC2 密钥对

创建EC2密钥对,在后续步骤中需要使用这个密钥对来登陆EC2,操作步骤如下:

点击 密钥对进入创建密钥的页面,点击右上方的 创建密钥对 按钮,进入创建密钥对页面:

在创建密钥对页面,在名称中输入 bigdata-workshop-key, 点击右下方的 创建密钥对按钮:

之后点击确认,下载创建的密钥对:

将下载的密钥对权限修改为400, Mac上的操作方法如下:

chmod 400 emr-workshop.pem

windows 的可参考:

https://superuser.com/questions/1296024/windows-ssh-permissions-for-private-key-are-too-open

创建EMR集群

打开控制台,进入到Amazon EMR服务主页: https://console.aws.amazon.com/elasticmapreduce/home?region=us-east-2

点击 创建集群,进入页面后,点击左上方 转到高级选项,使用高级选项来创建EMR演示集群。

在步骤1: 软件与步骤页面,保持默认值不变(请注意,这里的emr的版本是 5.30.1, 默认版本会随着EMR的发布而改变)。 点击 下一步 ,进入步骤2.

在步骤2,调整如下参数:

  • Cluster Nodes and Instances 做如下调整:

    • 将实例类型修改为 c5.xlarge
    • 将实例计数修改为 1/1/0
  • Cluster Scaling 做如下调整:

    • 勾选 Enable Cluster Scaling
    • 选择 Use EMR-managed scaling
    • Core and task units中,将 数量调整为 1/20/4/4

思考:1/20/4/4 表示什么样的扩容策略呢?

配置完成后,点击 下一步 进入步骤3. 将集群名称修改为 emr-scaling,

点击 下一步 进入步骤4.

将EC2 键对修改为 bigdata-workshop-key, 点击 创建集群 按钮。

耐心等待集群创建完成,这个过程大约需要持续10~15分钟。

TPC-DS压力测试

集群创建完成之后,状态变为正在等待

为了能够远程登陆到Master的EC2上去,需要修改Master的安全组策略,打开22端口。方法如下:

进入集群的摘要界面,在页面下方,点击主节点的安全组: 修改主节点安全组的入站规则:

将SSH/22 端口添加进入站规则

注意,这里的源设置为 0.0.0.0/0 只用于workshop演示,不是最佳实践,请勿用于生产环境!!!

接下来进入集群的摘要界面,点击 Connect to the Master Node Using SSH

在命令行中通过ssh命令登陆到EC2上去

运行 hive命令,进入到hive交互界面:

hive

运行如下命令,创建对应的数据库和表

注意,这个操作之前,需要联系讲师,将您的Canonical ID 提供给讲师,从而获取S3的对应的权限。

导师将使用Canonical ID为你的账户开通S3桶权限,桶里存放我们用来运行压测的tpcds 100G数据。 请导航至控制台,点击右上角的用户名点击 My Security Credentials复制账户规范用户 ID提供给讲师

获得权限后,在你的Hive中运行您的建表语句。如下语句只有在获取对应权限之后,才可以运行成功

create database tpcds_b;

use tpcds_b;



create external table dbgen_version
(
    dv_version                varchar(16)                   ,
    dv_create_date            date                          ,
    dv_create_time           TIMESTAMP                         ,
    dv_cmdline_args           varchar(200)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/dbgen_version/'
;

create external table customer_address
(
    ca_address_sk             int               ,
    ca_address_id             char(16)              ,
    ca_street_number          char(10)                      ,
    ca_street_name            varchar(60)                   ,
    ca_street_type            char(15)                      ,
    ca_suite_number           char(10)                      ,
    ca_city                   varchar(60)                   ,
    ca_county                 varchar(30)                   ,
    ca_state                  char(2)                       ,
    ca_zip                    char(10)                      ,
    ca_country                varchar(20)                   ,
    ca_gmt_offset             decimal(5,2)                  ,
    ca_location_type          char(20)                      
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/customer_address/'
;

create external table customer_demographics
(
    cd_demo_sk                int               ,
    cd_gender                 char(1)                       ,
    cd_marital_status         char(1)                       ,
    cd_education_status       char(20)                      ,
    cd_purchase_estimate      int                       ,
    cd_credit_rating          char(10)                      ,
    cd_dep_count              int                       ,
    cd_dep_employed_count     int                       ,
    cd_dep_college_count      int                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/customer_demographics/'
;

create external table date_dim
(
    d_date_sk                 int               ,
    d_date_id                 char(16)              ,
    d_date                    date                          ,
    d_month_seq               int                       ,
    d_week_seq                int                       ,
    d_quarter_seq             int                       ,
    d_year                    int                       ,
    d_dow                     int                       ,
    d_moy                     int                       ,
    d_dom                     int                       ,
    d_qoy                     int                       ,
    d_fy_year                 int                       ,
    d_fy_quarter_seq          int                       ,
    d_fy_week_seq             int                       ,
    d_day_name                char(9)                       ,
    d_quarter_name            char(6)                       ,
    d_holiday                 char(1)                       ,
    d_weekend                 char(1)                       ,
    d_following_holiday       char(1)                       ,
    d_first_dom               int                       ,
    d_last_dom                int                       ,
    d_same_day_ly             int                       ,
    d_same_day_lq             int                       ,
    d_current_day             char(1)                       ,
    d_current_week            char(1)                       ,
    d_current_month           char(1)                       ,
    d_current_quarter         char(1)                       ,
    d_current_year            char(1)                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/date_dim/'
;

create external table warehouse
(
    w_warehouse_sk            int               ,
    w_warehouse_id            char(16)              ,
    w_warehouse_name          varchar(20)                   ,
    w_warehouse_sq_ft         int                       ,
    w_street_number           char(10)                      ,
    w_street_name             varchar(60)                   ,
    w_street_type             char(15)                      ,
    w_suite_number            char(10)                      ,
    w_city                    varchar(60)                   ,
    w_county                  varchar(30)                   ,
    w_state                   char(2)                       ,
    w_zip                     char(10)                      ,
    w_country                 varchar(20)                   ,
    w_gmt_offset              decimal(5,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/warehouse/'
;

create external table ship_mode
(
    sm_ship_mode_sk           int               ,
    sm_ship_mode_id           char(16)              ,
    sm_type                   char(30)                      ,
    sm_code                   char(10)                      ,
    sm_carrier                char(20)                      ,
    sm_contract               char(20)                      
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/ship_mode/'
;

create external table time_dim
(
    t_time_sk                 int               ,
    t_time_id                 char(16)              ,
    t_time                    int                       ,
    t_hour                    int                       ,
    t_minute                  int                       ,
    t_second                  int                       ,
    t_am_pm                   char(2)                       ,
    t_shift                   char(20)                      ,
    t_sub_shift               char(20)                      ,
    t_meal_time               char(20)                      
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/time_dim/'
;

create external table reason
(
    r_reason_sk               int               ,
    r_reason_id               char(16)              ,
    r_reason_desc             char(100)                     
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/reason/'
;

create external table income_band
(
    ib_income_band_sk         int               ,
    ib_lower_bound            int                       ,
    ib_upper_bound            int                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/income_band/'
;

create external table item
(
    i_item_sk                 int               ,
    i_item_id                 char(16)              ,
    i_rec_start_date          date                          ,
    i_rec_end_date            date                          ,
    i_item_desc               varchar(200)                  ,
    i_current_price           decimal(7,2)                  ,
    i_wholesale_cost          decimal(7,2)                  ,
    i_brand_id                int                       ,
    i_brand                   char(50)                      ,
    i_class_id                int                       ,
    i_class                   char(50)                      ,
    i_category_id             int                       ,
    i_category                char(50)                      ,
    i_manufact_id             int                       ,
    i_manufact                char(50)                      ,
    i_size                    char(20)                      ,
    i_formulation             char(20)                      ,
    i_color                   char(20)                      ,
    i_units                   char(10)                      ,
    i_container               char(10)                      ,
    i_manager_id              int                       ,
    i_product_name            char(50)                      
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/item/'
;

create external table store
(
    s_store_sk                int               ,
    s_store_id                char(16)              ,
    s_rec_start_date          date                          ,
    s_rec_end_date            date                          ,
    s_closed_date_sk          int                       ,
    s_store_name              varchar(50)                   ,
    s_number_employees        int                       ,
    s_floor_space             int                       ,
    s_hours                   char(20)                      ,
    s_manager                 varchar(40)                   ,
    s_market_id               int                       ,
    s_geography_class         varchar(100)                  ,
    s_market_desc             varchar(100)                  ,
    s_market_manager          varchar(40)                   ,
    s_division_id             int                       ,
    s_division_name           varchar(50)                   ,
    s_company_id              int                       ,
    s_company_name            varchar(50)                   ,
    s_street_number           varchar(10)                   ,
    s_street_name             varchar(60)                   ,
    s_street_type             char(15)                      ,
    s_suite_number            char(10)                      ,
    s_city                    varchar(60)                   ,
    s_county                  varchar(30)                   ,
    s_state                   char(2)                       ,
    s_zip                     char(10)                      ,
    s_country                 varchar(20)                   ,
    s_gmt_offset              decimal(5,2)                  ,
    s_tax_precentage          decimal(5,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/store/'
;

create external table call_center
(
    cc_call_center_sk         int               ,
    cc_call_center_id         char(16)              ,
    cc_rec_start_date         date                          ,
    cc_rec_end_date           date                          ,
    cc_closed_date_sk         int                       ,
    cc_open_date_sk           int                       ,
    cc_name                   varchar(50)                   ,
    cc_class                  varchar(50)                   ,
    cc_employees              int                       ,
    cc_sq_ft                  int                       ,
    cc_hours                  char(20)                      ,
    cc_manager                varchar(40)                   ,
    cc_mkt_id                 int                       ,
    cc_mkt_class              char(50)                      ,
    cc_mkt_desc               varchar(100)                  ,
    cc_market_manager         varchar(40)                   ,
    cc_division               int                       ,
    cc_division_name          varchar(50)                   ,
    cc_company                int                       ,
    cc_company_name           char(50)                      ,
    cc_street_number          char(10)                      ,
    cc_street_name            varchar(60)                   ,
    cc_street_type            char(15)                      ,
    cc_suite_number           char(10)                      ,
    cc_city                   varchar(60)                   ,
    cc_county                 varchar(30)                   ,
    cc_state                  char(2)                       ,
    cc_zip                    char(10)                      ,
    cc_country                varchar(20)                   ,
    cc_gmt_offset             decimal(5,2)                  ,
    cc_tax_percentage         decimal(5,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/call_center/'
;

create external table customer
(
    c_customer_sk             int               ,
    c_customer_id             char(16)              ,
    c_current_cdemo_sk        int                       ,
    c_current_hdemo_sk        int                       ,
    c_current_addr_sk         int                       ,
    c_first_shipto_date_sk    int                       ,
    c_first_sales_date_sk     int                       ,
    c_salutation              char(10)                      ,
    c_first_name              char(20)                      ,
    c_last_name               char(30)                      ,
    c_preferred_cust_flag     char(1)                       ,
    c_birth_day               int                       ,
    c_birth_month             int                       ,
    c_birth_year              int                       ,
    c_birth_country           varchar(20)                   ,
    c_login                   char(13)                      ,
    c_email_address           char(50)                      ,
    c_last_review_date_sk     int                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/customer/'
;

create external table web_site
(
    web_site_sk               int               ,
    web_site_id               char(16)              ,
    web_rec_start_date        date                          ,
    web_rec_end_date          date                          ,
    web_name                  varchar(50)                   ,
    web_open_date_sk          int                       ,
    web_close_date_sk         int                       ,
    web_class                 varchar(50)                   ,
    web_manager               varchar(40)                   ,
    web_mkt_id                int                       ,
    web_mkt_class             varchar(50)                   ,
    web_mkt_desc              varchar(100)                  ,
    web_market_manager        varchar(40)                   ,
    web_company_id            int                       ,
    web_company_name          char(50)                      ,
    web_street_number         char(10)                      ,
    web_street_name           varchar(60)                   ,
    web_street_type           char(15)                      ,
    web_suite_number          char(10)                      ,
    web_city                  varchar(60)                   ,
    web_county                varchar(30)                   ,
    web_state                 char(2)                       ,
    web_zip                   char(10)                      ,
    web_country               varchar(20)                   ,
    web_gmt_offset            decimal(5,2)                  ,
    web_tax_percentage        decimal(5,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_site/'
;

create external table store_returns
(
    sr_returned_date_sk       int                       ,
    sr_return_time_sk         int                       ,
    sr_item_sk                int               ,
    sr_customer_sk            int                       ,
    sr_cdemo_sk               int                       ,
    sr_hdemo_sk               int                       ,
    sr_addr_sk                int                       ,
    sr_store_sk               int                       ,
    sr_reason_sk              int                       ,
    sr_ticket_number          int               ,
    sr_return_quantity        int                       ,
    sr_return_amt             decimal(7,2)                  ,
    sr_return_tax             decimal(7,2)                  ,
    sr_return_amt_inc_tax     decimal(7,2)                  ,
    sr_fee                    decimal(7,2)                  ,
    sr_return_ship_cost       decimal(7,2)                  ,
    sr_refunded_cash          decimal(7,2)                  ,
    sr_reversed_charge        decimal(7,2)                  ,
    sr_store_credit           decimal(7,2)                  ,
    sr_net_loss               decimal(7,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/store_returns/'
;

create external table household_demographics
(
    hd_demo_sk                int               ,
    hd_income_band_sk         int                       ,
    hd_buy_potential          char(15)                      ,
    hd_dep_count              int                       ,
    hd_vehicle_count          int                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/household_demographics/'
;

create external table web_page
(
    wp_web_page_sk            int               ,
    wp_web_page_id            char(16)              ,
    wp_rec_start_date         date                          ,
    wp_rec_end_date           date                          ,
    wp_creation_date_sk       int                       ,
    wp_access_date_sk         int                       ,
    wp_autogen_flag           char(1)                       ,
    wp_customer_sk            int                       ,
    wp_url                    varchar(100)                  ,
    wp_type                   char(50)                      ,
    wp_char_count             int                       ,
    wp_link_count             int                       ,
    wp_image_count            int                       ,
    wp_max_ad_count           int                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_page/'
;

create external table promotion
(
    p_promo_sk                int               ,
    p_promo_id                char(16)              ,
    p_start_date_sk           int                       ,
    p_end_date_sk             int                       ,
    p_item_sk                 int                       ,
    p_cost                    decimal(15,2)                 ,
    p_response_target         int                       ,
    p_promo_name              char(50)                      ,
    p_channel_dmail           char(1)                       ,
    p_channel_email           char(1)                       ,
    p_channel_catalog         char(1)                       ,
    p_channel_tv              char(1)                       ,
    p_channel_radio           char(1)                       ,
    p_channel_press           char(1)                       ,
    p_channel_event           char(1)                       ,
    p_channel_demo            char(1)                       ,
    p_channel_details         varchar(100)                  ,
    p_purpose                 char(15)                      ,
    p_discount_active         char(1)                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/promotion/'
;

create external table catalog_page
(
    cp_catalog_page_sk        int               ,
    cp_catalog_page_id        char(16)              ,
    cp_start_date_sk          int                       ,
    cp_end_date_sk            int                       ,
    cp_department             varchar(50)                   ,
    cp_catalog_number         int                       ,
    cp_catalog_page_number    int                       ,
    cp_description            varchar(100)                  ,
    cp_type                   varchar(100)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/catalog_page/'
;

create external table inventory
(
    inv_date_sk               int               ,
    inv_item_sk               int               ,
    inv_warehouse_sk          int               ,
    inv_quantity_on_hand      int                       
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/inventory/'
;

create external table catalog_returns
(
    cr_returned_date_sk       int                       ,
    cr_returned_time_sk       int                       ,
    cr_item_sk                int               ,
    cr_refunded_customer_sk   int                       ,
    cr_refunded_cdemo_sk      int                       ,
    cr_refunded_hdemo_sk      int                       ,
    cr_refunded_addr_sk       int                       ,
    cr_returning_customer_sk  int                       ,
    cr_returning_cdemo_sk     int                       ,
    cr_returning_hdemo_sk     int                       ,
    cr_returning_addr_sk      int                       ,
    cr_call_center_sk         int                       ,
    cr_catalog_page_sk        int                       ,
    cr_ship_mode_sk           int                       ,
    cr_warehouse_sk           int                       ,
    cr_reason_sk              int                       ,
    cr_order_number           int               ,
    cr_return_quantity        int                       ,
    cr_return_amount          decimal(7,2)                  ,
    cr_return_tax             decimal(7,2)                  ,
    cr_return_amt_inc_tax     decimal(7,2)                  ,
    cr_fee                    decimal(7,2)                  ,
    cr_return_ship_cost       decimal(7,2)                  ,
    cr_refunded_cash          decimal(7,2)                  ,
    cr_reversed_charge        decimal(7,2)                  ,
    cr_store_credit           decimal(7,2)                  ,
    cr_net_loss               decimal(7,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/catalog_returns/'
;

create external table web_returns
(
    wr_returned_date_sk       int                       ,
    wr_returned_time_sk       int                       ,
    wr_item_sk                int               ,
    wr_refunded_customer_sk   int                       ,
    wr_refunded_cdemo_sk      int                       ,
    wr_refunded_hdemo_sk      int                       ,
    wr_refunded_addr_sk       int                       ,
    wr_returning_customer_sk  int                       ,
    wr_returning_cdemo_sk     int                       ,
    wr_returning_hdemo_sk     int                       ,
    wr_returning_addr_sk      int                       ,
    wr_web_page_sk            int                       ,
    wr_reason_sk              int                       ,
    wr_order_number           int               ,
    wr_return_quantity        int                       ,
    wr_return_amt             decimal(7,2)                  ,
    wr_return_tax             decimal(7,2)                  ,
    wr_return_amt_inc_tax     decimal(7,2)                  ,
    wr_fee                    decimal(7,2)                  ,
    wr_return_ship_cost       decimal(7,2)                  ,
    wr_refunded_cash          decimal(7,2)                  ,
    wr_reversed_charge        decimal(7,2)                  ,
    wr_account_credit         decimal(7,2)                  ,
    wr_net_loss               decimal(7,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_returns/'
;

create external table web_sales
(
    ws_sold_date_sk           int                       ,
    ws_sold_time_sk           int                       ,
    ws_ship_date_sk           int                       ,
    ws_item_sk                int               ,
    ws_bill_customer_sk       int                       ,
    ws_bill_cdemo_sk          int                       ,
    ws_bill_hdemo_sk          int                       ,
    ws_bill_addr_sk           int                       ,
    ws_ship_customer_sk       int                       ,
    ws_ship_cdemo_sk          int                       ,
    ws_ship_hdemo_sk          int                       ,
    ws_ship_addr_sk           int                       ,
    ws_web_page_sk            int                       ,
    ws_web_site_sk            int                       ,
    ws_ship_mode_sk           int                       ,
    ws_warehouse_sk           int                       ,
    ws_promo_sk               int                       ,
    ws_order_number           int               ,
    ws_quantity               int                       ,
    ws_wholesale_cost         decimal(7,2)                  ,
    ws_list_price             decimal(7,2)                  ,
    ws_sales_price            decimal(7,2)                  ,
    ws_ext_discount_amt       decimal(7,2)                  ,
    ws_ext_sales_price        decimal(7,2)                  ,
    ws_ext_wholesale_cost     decimal(7,2)                  ,
    ws_ext_list_price         decimal(7,2)                  ,
    ws_ext_tax                decimal(7,2)                  ,
    ws_coupon_amt             decimal(7,2)                  ,
    ws_ext_ship_cost          decimal(7,2)                  ,
    ws_net_paid               decimal(7,2)                  ,
    ws_net_paid_inc_tax       decimal(7,2)                  ,
    ws_net_paid_inc_ship      decimal(7,2)                  ,
    ws_net_paid_inc_ship_tax  decimal(7,2)                  ,
    ws_net_profit             decimal(7,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_sales/'
;

create external table catalog_sales
(
    cs_sold_date_sk           int                       ,
    cs_sold_time_sk           int                       ,
    cs_ship_date_sk           int                       ,
    cs_bill_customer_sk       int                       ,
    cs_bill_cdemo_sk          int                       ,
    cs_bill_hdemo_sk          int                       ,
    cs_bill_addr_sk           int                       ,
    cs_ship_customer_sk       int                       ,
    cs_ship_cdemo_sk          int                       ,
    cs_ship_hdemo_sk          int                       ,
    cs_ship_addr_sk           int                       ,
    cs_call_center_sk         int                       ,
    cs_catalog_page_sk        int                       ,
    cs_ship_mode_sk           int                       ,
    cs_warehouse_sk           int                       ,
    cs_item_sk                int               ,
    cs_promo_sk               int                       ,
    cs_order_number           int               ,
    cs_quantity               int                       ,
    cs_wholesale_cost         decimal(7,2)                  ,
    cs_list_price             decimal(7,2)                  ,
    cs_sales_price            decimal(7,2)                  ,
    cs_ext_discount_amt       decimal(7,2)                  ,
    cs_ext_sales_price        decimal(7,2)                  ,
    cs_ext_wholesale_cost     decimal(7,2)                  ,
    cs_ext_list_price         decimal(7,2)                  ,
    cs_ext_tax                decimal(7,2)                  ,
    cs_coupon_amt             decimal(7,2)                  ,
    cs_ext_ship_cost          decimal(7,2)                  ,
    cs_net_paid               decimal(7,2)                  ,
    cs_net_paid_inc_tax       decimal(7,2)                  ,
    cs_net_paid_inc_ship      decimal(7,2)                  ,
    cs_net_paid_inc_ship_tax  decimal(7,2)                  ,
    cs_net_profit             decimal(7,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/catalog_sales/'
;

create external table store_sales
(
    ss_sold_date_sk           int                       ,
    ss_sold_time_sk           int                       ,
    ss_item_sk                int               ,
    ss_customer_sk            int                       ,
    ss_cdemo_sk               int                       ,
    ss_hdemo_sk               int                       ,
    ss_addr_sk                int                       ,
    ss_store_sk               int                       ,
    ss_promo_sk               int                       ,
    ss_ticket_number          int               ,
    ss_quantity               int                       ,
    ss_wholesale_cost         decimal(7,2)                  ,
    ss_list_price             decimal(7,2)                  ,
    ss_sales_price            decimal(7,2)                  ,
    ss_ext_discount_amt       decimal(7,2)                  ,
    ss_ext_sales_price        decimal(7,2)                  ,
    ss_ext_wholesale_cost     decimal(7,2)                  ,
    ss_ext_list_price         decimal(7,2)                  ,
    ss_ext_tax                decimal(7,2)                  ,
    ss_coupon_amt             decimal(7,2)                  ,
    ss_net_paid               decimal(7,2)                  ,
    ss_net_paid_inc_tax       decimal(7,2)                  ,
    ss_net_profit             decimal(7,2)                  
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/store_sales/'
;


建表完成,您可以通过show tables;来检查您创建好的表

表创建完成之后,运行如下语句,进行压力测试:

-- start query 1 in stream 0 using template query29.tpl
select 
     i_item_id
    ,i_item_desc
    ,s_store_id
    ,s_store_name
    ,sum(ss_quantity)        as store_sales_quantity
    ,sum(sr_return_quantity) as store_returns_quantity
    ,sum(cs_quantity)        as catalog_sales_quantity
 from
    store_sales
   ,store_returns
   ,catalog_sales
   ,date_dim             d1
   ,date_dim             d2
   ,date_dim             d3
   ,store
   ,item
 where
     d1.d_moy               = 4
 and d1.d_year              = 1999
 and d1.d_date_sk           = ss_sold_date_sk
 and i_item_sk              = ss_item_sk
 and s_store_sk             = ss_store_sk
 and ss_customer_sk         = sr_customer_sk
 and ss_item_sk             = sr_item_sk
 and ss_ticket_number       = sr_ticket_number
 and sr_returned_date_sk    = d2.d_date_sk
 and d2.d_moy               between 4 and  4 + 3
 and d2.d_year              = 1999
 and sr_customer_sk         = cs_bill_customer_sk
 and sr_item_sk             = cs_item_sk
 and cs_sold_date_sk        = d3.d_date_sk
 and d3.d_year              in (1999,1999+1,1999+2)
 group by
    i_item_id
   ,i_item_desc
   ,s_store_id
   ,s_store_name
 order by
    i_item_id
   ,i_item_desc
   ,s_store_id
   ,s_store_name
 limit 100;

-- end query 1 in stream 0 using template query29.tpl

观察集群自动扩容

在EMR的 硬件 页面,可以看到,初始状态只有1台Core 和 1台Master节点:

压力测试一段时间之后,可以看到集群状态正在调整, 请注意并思考,为什么Task Node为Spot类型?

过一段时间,可以看到集群的状态已经扩容完成,hive窗口的处理进度明显加快:

集群空置一段时间,可以看到,机器将自动缩容:

您的hive查询语句应该也已经运行完成,但值得注意的是在弹性扩容的条件下,这条复杂的sql遍历了约80g数据,运行时间 ELAPSED TIME: 290.46 s

观察Spot实例所带来的成本节省

在控制台,进入到EC2的服务,然后在左侧导航栏选择Spot请求, 进入Spot 页面

点击右上方的 成本节省摘要, 可以看到Spot节省的费用:

以本次为例,Spot实例和On-Demand相比节省了 74% 的成本!

参考回顾

参看如下文章:

https://aws.amazon.com/cn/blogs/big-data/introducing-amazon-emr-managed-scaling-automatically-resize-clusters-to-lower-cost/