Store the result as parquet file into hdfs using gzip compression under folder.By using combineByKey function on RDDS - No need of formatting order_date or total_amount Using Spark SQL - here order_date should be YYYY-MM-DD format Just by using Data Frames API - here order_date should be YYYY-MM-DD format Perform aggregation in each of the following ways However, sorting can be done using a dataframe or RDD. Aggregation should be done using below methods. The result should be sorted by order date in descending, order status in ascending and total amount in descending and total orders in ascending. In plain english, please find total orders and total amount per status per day. Expected Intermediate Result: Order_Date, Order_status, total_orders, total_amount. ![]() Using Spark Scala load data at /user/cloudera/problem1/orders and /user/cloudera/problem1/orders-items items as dataframes.Files should be loaded as avro file and use snappy compression Using sqoop, import order_items table into hdfs to folders /user/cloudera/problem1/order-items.File should be loaded as Avro File and use snappy compression Using sqoop, import orders table into hdfs to folders /user/cloudera/problem1/orders.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |