本动手实验Region以 us-east-2 为例,您也可以选择您需要的其他region。
创建EC2密钥对,在后续步骤中需要使用这个密钥对来登陆EC2,操作步骤如下:

点击 密钥对进入创建密钥的页面,点击右上方的 创建密钥对 按钮,进入创建密钥对页面:

在创建密钥对页面,在名称中输入 bigdata-workshop-key, 点击右下方的 创建密钥对按钮:

之后点击确认,下载创建的密钥对:

将下载的密钥对权限修改为400, Mac上的操作方法如下:
chmod 400 emr-workshop.pem
windows 的可参考:
https://superuser.com/questions/1296024/windows-ssh-permissions-for-private-key-are-too-open
打开控制台,进入到Amazon EMR服务主页: https://console.aws.amazon.com/elasticmapreduce/home?region=us-east-2
点击 创建集群,进入页面后,点击左上方 转到高级选项,使用高级选项来创建EMR演示集群。
在步骤1: 软件与步骤页面,保持默认值不变(请注意,这里的emr的版本是 5.30.1, 默认版本会随着EMR的发布而改变)。
点击 下一步 ,进入步骤2.
在步骤2,调整如下参数:
在 Cluster Nodes and Instances 做如下调整:

在 Cluster Scaling 做如下调整:

思考:1/20/4/4 表示什么样的扩容策略呢?
配置完成后,点击 下一步 进入步骤3.
将集群名称修改为 emr-scaling,

点击 下一步 进入步骤4.
将EC2 键对修改为 bigdata-workshop-key, 点击 创建集群 按钮。

耐心等待集群创建完成,这个过程大约需要持续10~15分钟。
集群创建完成之后,状态变为正在等待。
为了能够远程登陆到Master的EC2上去,需要修改Master的安全组策略,打开22端口。方法如下:
进入集群的摘要界面,在页面下方,点击主节点的安全组:
修改主节点安全组的入站规则:

将SSH/22 端口添加进入站规则

注意,这里的源设置为 0.0.0.0/0 只用于workshop演示,不是最佳实践,请勿用于生产环境!!!
接下来进入集群的摘要界面,点击 Connect to the Master Node Using SSH


在命令行中通过ssh命令登陆到EC2上去

运行 hive命令,进入到hive交互界面:
hive

运行如下命令,创建对应的数据库和表
注意,这个操作之前,需要联系讲师,将您的Canonical ID 提供给讲师,从而获取S3的对应的权限。
导师将使用Canonical ID为你的账户开通S3桶权限,桶里存放我们用来运行压测的tpcds 100G数据。
请导航至控制台,点击右上角的用户名点击 My Security Credentials复制账户规范用户 ID提供给讲师

获得权限后,在你的Hive中运行您的建表语句。如下语句只有在获取对应权限之后,才可以运行成功
create database tpcds_b;
use tpcds_b;
create external table dbgen_version
(
dv_version varchar(16) ,
dv_create_date date ,
dv_create_time TIMESTAMP ,
dv_cmdline_args varchar(200)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/dbgen_version/'
;
create external table customer_address
(
ca_address_sk int ,
ca_address_id char(16) ,
ca_street_number char(10) ,
ca_street_name varchar(60) ,
ca_street_type char(15) ,
ca_suite_number char(10) ,
ca_city varchar(60) ,
ca_county varchar(30) ,
ca_state char(2) ,
ca_zip char(10) ,
ca_country varchar(20) ,
ca_gmt_offset decimal(5,2) ,
ca_location_type char(20)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/customer_address/'
;
create external table customer_demographics
(
cd_demo_sk int ,
cd_gender char(1) ,
cd_marital_status char(1) ,
cd_education_status char(20) ,
cd_purchase_estimate int ,
cd_credit_rating char(10) ,
cd_dep_count int ,
cd_dep_employed_count int ,
cd_dep_college_count int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/customer_demographics/'
;
create external table date_dim
(
d_date_sk int ,
d_date_id char(16) ,
d_date date ,
d_month_seq int ,
d_week_seq int ,
d_quarter_seq int ,
d_year int ,
d_dow int ,
d_moy int ,
d_dom int ,
d_qoy int ,
d_fy_year int ,
d_fy_quarter_seq int ,
d_fy_week_seq int ,
d_day_name char(9) ,
d_quarter_name char(6) ,
d_holiday char(1) ,
d_weekend char(1) ,
d_following_holiday char(1) ,
d_first_dom int ,
d_last_dom int ,
d_same_day_ly int ,
d_same_day_lq int ,
d_current_day char(1) ,
d_current_week char(1) ,
d_current_month char(1) ,
d_current_quarter char(1) ,
d_current_year char(1)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/date_dim/'
;
create external table warehouse
(
w_warehouse_sk int ,
w_warehouse_id char(16) ,
w_warehouse_name varchar(20) ,
w_warehouse_sq_ft int ,
w_street_number char(10) ,
w_street_name varchar(60) ,
w_street_type char(15) ,
w_suite_number char(10) ,
w_city varchar(60) ,
w_county varchar(30) ,
w_state char(2) ,
w_zip char(10) ,
w_country varchar(20) ,
w_gmt_offset decimal(5,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/warehouse/'
;
create external table ship_mode
(
sm_ship_mode_sk int ,
sm_ship_mode_id char(16) ,
sm_type char(30) ,
sm_code char(10) ,
sm_carrier char(20) ,
sm_contract char(20)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/ship_mode/'
;
create external table time_dim
(
t_time_sk int ,
t_time_id char(16) ,
t_time int ,
t_hour int ,
t_minute int ,
t_second int ,
t_am_pm char(2) ,
t_shift char(20) ,
t_sub_shift char(20) ,
t_meal_time char(20)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/time_dim/'
;
create external table reason
(
r_reason_sk int ,
r_reason_id char(16) ,
r_reason_desc char(100)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/reason/'
;
create external table income_band
(
ib_income_band_sk int ,
ib_lower_bound int ,
ib_upper_bound int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/income_band/'
;
create external table item
(
i_item_sk int ,
i_item_id char(16) ,
i_rec_start_date date ,
i_rec_end_date date ,
i_item_desc varchar(200) ,
i_current_price decimal(7,2) ,
i_wholesale_cost decimal(7,2) ,
i_brand_id int ,
i_brand char(50) ,
i_class_id int ,
i_class char(50) ,
i_category_id int ,
i_category char(50) ,
i_manufact_id int ,
i_manufact char(50) ,
i_size char(20) ,
i_formulation char(20) ,
i_color char(20) ,
i_units char(10) ,
i_container char(10) ,
i_manager_id int ,
i_product_name char(50)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/item/'
;
create external table store
(
s_store_sk int ,
s_store_id char(16) ,
s_rec_start_date date ,
s_rec_end_date date ,
s_closed_date_sk int ,
s_store_name varchar(50) ,
s_number_employees int ,
s_floor_space int ,
s_hours char(20) ,
s_manager varchar(40) ,
s_market_id int ,
s_geography_class varchar(100) ,
s_market_desc varchar(100) ,
s_market_manager varchar(40) ,
s_division_id int ,
s_division_name varchar(50) ,
s_company_id int ,
s_company_name varchar(50) ,
s_street_number varchar(10) ,
s_street_name varchar(60) ,
s_street_type char(15) ,
s_suite_number char(10) ,
s_city varchar(60) ,
s_county varchar(30) ,
s_state char(2) ,
s_zip char(10) ,
s_country varchar(20) ,
s_gmt_offset decimal(5,2) ,
s_tax_precentage decimal(5,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/store/'
;
create external table call_center
(
cc_call_center_sk int ,
cc_call_center_id char(16) ,
cc_rec_start_date date ,
cc_rec_end_date date ,
cc_closed_date_sk int ,
cc_open_date_sk int ,
cc_name varchar(50) ,
cc_class varchar(50) ,
cc_employees int ,
cc_sq_ft int ,
cc_hours char(20) ,
cc_manager varchar(40) ,
cc_mkt_id int ,
cc_mkt_class char(50) ,
cc_mkt_desc varchar(100) ,
cc_market_manager varchar(40) ,
cc_division int ,
cc_division_name varchar(50) ,
cc_company int ,
cc_company_name char(50) ,
cc_street_number char(10) ,
cc_street_name varchar(60) ,
cc_street_type char(15) ,
cc_suite_number char(10) ,
cc_city varchar(60) ,
cc_county varchar(30) ,
cc_state char(2) ,
cc_zip char(10) ,
cc_country varchar(20) ,
cc_gmt_offset decimal(5,2) ,
cc_tax_percentage decimal(5,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/call_center/'
;
create external table customer
(
c_customer_sk int ,
c_customer_id char(16) ,
c_current_cdemo_sk int ,
c_current_hdemo_sk int ,
c_current_addr_sk int ,
c_first_shipto_date_sk int ,
c_first_sales_date_sk int ,
c_salutation char(10) ,
c_first_name char(20) ,
c_last_name char(30) ,
c_preferred_cust_flag char(1) ,
c_birth_day int ,
c_birth_month int ,
c_birth_year int ,
c_birth_country varchar(20) ,
c_login char(13) ,
c_email_address char(50) ,
c_last_review_date_sk int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/customer/'
;
create external table web_site
(
web_site_sk int ,
web_site_id char(16) ,
web_rec_start_date date ,
web_rec_end_date date ,
web_name varchar(50) ,
web_open_date_sk int ,
web_close_date_sk int ,
web_class varchar(50) ,
web_manager varchar(40) ,
web_mkt_id int ,
web_mkt_class varchar(50) ,
web_mkt_desc varchar(100) ,
web_market_manager varchar(40) ,
web_company_id int ,
web_company_name char(50) ,
web_street_number char(10) ,
web_street_name varchar(60) ,
web_street_type char(15) ,
web_suite_number char(10) ,
web_city varchar(60) ,
web_county varchar(30) ,
web_state char(2) ,
web_zip char(10) ,
web_country varchar(20) ,
web_gmt_offset decimal(5,2) ,
web_tax_percentage decimal(5,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_site/'
;
create external table store_returns
(
sr_returned_date_sk int ,
sr_return_time_sk int ,
sr_item_sk int ,
sr_customer_sk int ,
sr_cdemo_sk int ,
sr_hdemo_sk int ,
sr_addr_sk int ,
sr_store_sk int ,
sr_reason_sk int ,
sr_ticket_number int ,
sr_return_quantity int ,
sr_return_amt decimal(7,2) ,
sr_return_tax decimal(7,2) ,
sr_return_amt_inc_tax decimal(7,2) ,
sr_fee decimal(7,2) ,
sr_return_ship_cost decimal(7,2) ,
sr_refunded_cash decimal(7,2) ,
sr_reversed_charge decimal(7,2) ,
sr_store_credit decimal(7,2) ,
sr_net_loss decimal(7,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/store_returns/'
;
create external table household_demographics
(
hd_demo_sk int ,
hd_income_band_sk int ,
hd_buy_potential char(15) ,
hd_dep_count int ,
hd_vehicle_count int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/household_demographics/'
;
create external table web_page
(
wp_web_page_sk int ,
wp_web_page_id char(16) ,
wp_rec_start_date date ,
wp_rec_end_date date ,
wp_creation_date_sk int ,
wp_access_date_sk int ,
wp_autogen_flag char(1) ,
wp_customer_sk int ,
wp_url varchar(100) ,
wp_type char(50) ,
wp_char_count int ,
wp_link_count int ,
wp_image_count int ,
wp_max_ad_count int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_page/'
;
create external table promotion
(
p_promo_sk int ,
p_promo_id char(16) ,
p_start_date_sk int ,
p_end_date_sk int ,
p_item_sk int ,
p_cost decimal(15,2) ,
p_response_target int ,
p_promo_name char(50) ,
p_channel_dmail char(1) ,
p_channel_email char(1) ,
p_channel_catalog char(1) ,
p_channel_tv char(1) ,
p_channel_radio char(1) ,
p_channel_press char(1) ,
p_channel_event char(1) ,
p_channel_demo char(1) ,
p_channel_details varchar(100) ,
p_purpose char(15) ,
p_discount_active char(1)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/promotion/'
;
create external table catalog_page
(
cp_catalog_page_sk int ,
cp_catalog_page_id char(16) ,
cp_start_date_sk int ,
cp_end_date_sk int ,
cp_department varchar(50) ,
cp_catalog_number int ,
cp_catalog_page_number int ,
cp_description varchar(100) ,
cp_type varchar(100)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/catalog_page/'
;
create external table inventory
(
inv_date_sk int ,
inv_item_sk int ,
inv_warehouse_sk int ,
inv_quantity_on_hand int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/inventory/'
;
create external table catalog_returns
(
cr_returned_date_sk int ,
cr_returned_time_sk int ,
cr_item_sk int ,
cr_refunded_customer_sk int ,
cr_refunded_cdemo_sk int ,
cr_refunded_hdemo_sk int ,
cr_refunded_addr_sk int ,
cr_returning_customer_sk int ,
cr_returning_cdemo_sk int ,
cr_returning_hdemo_sk int ,
cr_returning_addr_sk int ,
cr_call_center_sk int ,
cr_catalog_page_sk int ,
cr_ship_mode_sk int ,
cr_warehouse_sk int ,
cr_reason_sk int ,
cr_order_number int ,
cr_return_quantity int ,
cr_return_amount decimal(7,2) ,
cr_return_tax decimal(7,2) ,
cr_return_amt_inc_tax decimal(7,2) ,
cr_fee decimal(7,2) ,
cr_return_ship_cost decimal(7,2) ,
cr_refunded_cash decimal(7,2) ,
cr_reversed_charge decimal(7,2) ,
cr_store_credit decimal(7,2) ,
cr_net_loss decimal(7,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/catalog_returns/'
;
create external table web_returns
(
wr_returned_date_sk int ,
wr_returned_time_sk int ,
wr_item_sk int ,
wr_refunded_customer_sk int ,
wr_refunded_cdemo_sk int ,
wr_refunded_hdemo_sk int ,
wr_refunded_addr_sk int ,
wr_returning_customer_sk int ,
wr_returning_cdemo_sk int ,
wr_returning_hdemo_sk int ,
wr_returning_addr_sk int ,
wr_web_page_sk int ,
wr_reason_sk int ,
wr_order_number int ,
wr_return_quantity int ,
wr_return_amt decimal(7,2) ,
wr_return_tax decimal(7,2) ,
wr_return_amt_inc_tax decimal(7,2) ,
wr_fee decimal(7,2) ,
wr_return_ship_cost decimal(7,2) ,
wr_refunded_cash decimal(7,2) ,
wr_reversed_charge decimal(7,2) ,
wr_account_credit decimal(7,2) ,
wr_net_loss decimal(7,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_returns/'
;
create external table web_sales
(
ws_sold_date_sk int ,
ws_sold_time_sk int ,
ws_ship_date_sk int ,
ws_item_sk int ,
ws_bill_customer_sk int ,
ws_bill_cdemo_sk int ,
ws_bill_hdemo_sk int ,
ws_bill_addr_sk int ,
ws_ship_customer_sk int ,
ws_ship_cdemo_sk int ,
ws_ship_hdemo_sk int ,
ws_ship_addr_sk int ,
ws_web_page_sk int ,
ws_web_site_sk int ,
ws_ship_mode_sk int ,
ws_warehouse_sk int ,
ws_promo_sk int ,
ws_order_number int ,
ws_quantity int ,
ws_wholesale_cost decimal(7,2) ,
ws_list_price decimal(7,2) ,
ws_sales_price decimal(7,2) ,
ws_ext_discount_amt decimal(7,2) ,
ws_ext_sales_price decimal(7,2) ,
ws_ext_wholesale_cost decimal(7,2) ,
ws_ext_list_price decimal(7,2) ,
ws_ext_tax decimal(7,2) ,
ws_coupon_amt decimal(7,2) ,
ws_ext_ship_cost decimal(7,2) ,
ws_net_paid decimal(7,2) ,
ws_net_paid_inc_tax decimal(7,2) ,
ws_net_paid_inc_ship decimal(7,2) ,
ws_net_paid_inc_ship_tax decimal(7,2) ,
ws_net_profit decimal(7,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/web_sales/'
;
create external table catalog_sales
(
cs_sold_date_sk int ,
cs_sold_time_sk int ,
cs_ship_date_sk int ,
cs_bill_customer_sk int ,
cs_bill_cdemo_sk int ,
cs_bill_hdemo_sk int ,
cs_bill_addr_sk int ,
cs_ship_customer_sk int ,
cs_ship_cdemo_sk int ,
cs_ship_hdemo_sk int ,
cs_ship_addr_sk int ,
cs_call_center_sk int ,
cs_catalog_page_sk int ,
cs_ship_mode_sk int ,
cs_warehouse_sk int ,
cs_item_sk int ,
cs_promo_sk int ,
cs_order_number int ,
cs_quantity int ,
cs_wholesale_cost decimal(7,2) ,
cs_list_price decimal(7,2) ,
cs_sales_price decimal(7,2) ,
cs_ext_discount_amt decimal(7,2) ,
cs_ext_sales_price decimal(7,2) ,
cs_ext_wholesale_cost decimal(7,2) ,
cs_ext_list_price decimal(7,2) ,
cs_ext_tax decimal(7,2) ,
cs_coupon_amt decimal(7,2) ,
cs_ext_ship_cost decimal(7,2) ,
cs_net_paid decimal(7,2) ,
cs_net_paid_inc_tax decimal(7,2) ,
cs_net_paid_inc_ship decimal(7,2) ,
cs_net_paid_inc_ship_tax decimal(7,2) ,
cs_net_profit decimal(7,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/catalog_sales/'
;
create external table store_sales
(
ss_sold_date_sk int ,
ss_sold_time_sk int ,
ss_item_sk int ,
ss_customer_sk int ,
ss_cdemo_sk int ,
ss_hdemo_sk int ,
ss_addr_sk int ,
ss_store_sk int ,
ss_promo_sk int ,
ss_ticket_number int ,
ss_quantity int ,
ss_wholesale_cost decimal(7,2) ,
ss_list_price decimal(7,2) ,
ss_sales_price decimal(7,2) ,
ss_ext_discount_amt decimal(7,2) ,
ss_ext_sales_price decimal(7,2) ,
ss_ext_wholesale_cost decimal(7,2) ,
ss_ext_list_price decimal(7,2) ,
ss_ext_tax decimal(7,2) ,
ss_coupon_amt decimal(7,2) ,
ss_net_paid decimal(7,2) ,
ss_net_paid_inc_tax decimal(7,2) ,
ss_net_profit decimal(7,2)
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile
location 's3://partner-workshop-tpc-ds/data-b/store_sales/'
;
建表完成,您可以通过show tables;来检查您创建好的表

表创建完成之后,运行如下语句,进行压力测试:
-- start query 1 in stream 0 using template query29.tpl
select
i_item_id
,i_item_desc
,s_store_id
,s_store_name
,sum(ss_quantity) as store_sales_quantity
,sum(sr_return_quantity) as store_returns_quantity
,sum(cs_quantity) as catalog_sales_quantity
from
store_sales
,store_returns
,catalog_sales
,date_dim d1
,date_dim d2
,date_dim d3
,store
,item
where
d1.d_moy = 4
and d1.d_year = 1999
and d1.d_date_sk = ss_sold_date_sk
and i_item_sk = ss_item_sk
and s_store_sk = ss_store_sk
and ss_customer_sk = sr_customer_sk
and ss_item_sk = sr_item_sk
and ss_ticket_number = sr_ticket_number
and sr_returned_date_sk = d2.d_date_sk
and d2.d_moy between 4 and 4 + 3
and d2.d_year = 1999
and sr_customer_sk = cs_bill_customer_sk
and sr_item_sk = cs_item_sk
and cs_sold_date_sk = d3.d_date_sk
and d3.d_year in (1999,1999+1,1999+2)
group by
i_item_id
,i_item_desc
,s_store_id
,s_store_name
order by
i_item_id
,i_item_desc
,s_store_id
,s_store_name
limit 100;
-- end query 1 in stream 0 using template query29.tpl

在EMR的 硬件 页面,可以看到,初始状态只有1台Core 和 1台Master节点:

压力测试一段时间之后,可以看到集群状态正在调整, 请注意并思考,为什么Task Node为Spot类型?

过一段时间,可以看到集群的状态已经扩容完成,hive窗口的处理进度明显加快:

集群空置一段时间,可以看到,机器将自动缩容:

您的hive查询语句应该也已经运行完成,但值得注意的是在弹性扩容的条件下,这条复杂的sql遍历了约80g数据,运行时间 ELAPSED TIME: 290.46 s

在控制台,进入到EC2的服务,然后在左侧导航栏选择Spot请求, 进入Spot 页面

点击右上方的 成本节省摘要, 可以看到Spot节省的费用:

以本次为例,Spot实例和On-Demand相比节省了 74% 的成本!
参看如下文章: