Name: Cloudera CCA175 Exam
Brand: Pass4Success
SKU: CCA175
Price: 69.00 USD
Availability: InStock
Rating: 5.0 (22 reviews)

Disscuss Cloudera CCA175 Topics, Questions or Ask Anything Related

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!

Free Cloudera CCA175 Exam Actual Questions

Note: Premium Questions for CCA175 were last updated On May. 01, 2024 (see below)

Question #1

Problem Scenario 91 : You have been given data in json format as below.

{"first_name":"Ankit", "last_name":"Jain"}

{"first_name":"Amir", "last_name":"Khan"}

{"first_name":"Rajesh", "last_name":"Khanna"}

{"first_name":"Priynka", "last_name":"Chopra"}

{"first_name":"Kareena", "last_name":"Kapoor"}

{"first_name":"Lokesh", "last_name":"Yadav"}

Do the following activity

1. create employee.json tile locally.

2. Load this tile on hdfs

3. Register this data as a temp table in Spark using Python.

4. Write select query and print this data.

5. Now save back this selected data in json format.

ASolution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
val employee = sqlContext.read.json('/user/cloudera/employee.json')
employee.write.parquet('employee. parquet')
val parq_data = sqlContext.read.parquet('employee.parquet')
import org.apache.spark.sql.SaveMode prdDF.write..format('orc').saveAsTable('product ore table'}
//Change the codec.
sqlContext.setConf('spark.sql.parquet.compression.codec','snappy')
employee.write.mode(SaveMode.Overwrite).parquet('employee.parquet')

BSolution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
val employee = sqlContext.read.json('/user/cloudera/employee.json')
employee.write.parquet('employee. parquet')
val parq_data = sqlContext.read.parquet('employee.parquet')
parq_data.registerTempTable('employee')
val allemployee = sqlContext.sql('SELeCT' FROM employee')
all_employee.show()
import org.apache.spark.sql.SaveMode prdDF.write..format('orc').saveAsTable('product ore table'}
//Change the codec.
sqlContext.setConf('spark.sql.parquet.compression.codec','snappy')
employee.write.mode(SaveMode.Overwrite).parquet('employee.parquet')

Reveal Solution

Correct Answer: B

Question #2

Problem Scenario 94 : You have to run your Spark application on yarn with each executor 20GB and number of executors should be 50.Please replace XXX, YYY, ZZZ

export HADOOP_CONF_DIR=XXX

./bin/spark-submit \

-class com.hadoopexam.MyTask \

xxx\

-deploy-mode cluster \ # can be client for client mode

YYY\

222 \

/path/to/hadoopexam.jar \

1000

ASolution
XXX: -master yarn
YYY : -executor-memory 20G
ZZZ: -num-executors 50

BSolution
XXX: -master yarn
YYY : -executor-memory 40G
ZZZ: -num-executors 80

Reveal Solution

Correct Answer: A

Question #3

Problem Scenario 90 : You have been given below two files

course.txt

id,course

1,Hadoop

2,Spark

3,HBase

fee.txt

id,fee

2,3900

3,4200

4,2900

Accomplish the following activities.

1. Select all the courses and their fees , whether fee is listed or not.

2. Select all the available fees and respective course. If course does not exists still list the fee

3. Select all the courses and their fees , whether fee is listed or not. However, ignore records having fee as null.

ASolution :
Step 1:
hdfs dfs -mkdir sparksql4
hdfs dfs -put course.txt sparksql4/
hdfs dfs -put fee.txt sparksql4/
Step 2 : Now in spark shell
// load the data into a new RDD
val course = sc.textFile('sparksql4/course.txt')
val fee = sc.textFile('sparksql4/fee.txt')
// Return the first element in this RDD
course.fi rst()
fee.fi rst()
//define the schema using a case class case class Course(id: Integer, name: String) case class Fee(id: Integer, fee: Integer)
// create an RDD of Product objects
val courseRDD = course.map(_.split(',')).map(c => Course(c(0).tolnt,c(1)))
val feeRDD =fee.map(_.split(',')).map(c => Fee(c(0}.tolnt,c(1}.tolnt))
courseRDD.first()
courseRDD.count(}
feeRDD.first()
feeRDD.countQ
// change RDD of Product objects to a DataFrame val courseDF = courseRDD.toDF(} val feeDF = feeRDD.toDF{)
// register the DataFrame as a temp table courseDF. registerTempTable('course') feeDF. registerTempTablef'fee')
// Select data from table
val results = sqlContext.sql(......SELECT' FROM course ''' )
results. showQ
val results = sqlContext.sql(......SELECT' FROM fee......)
results. showQ
val results = sqlContext.sql(......SELECT * FROM course LEFT JOIN fee ON course.id = fee.id......)
results-showQ
val results ='sqlContext.sql(......SELECT * FROM course RIGHT JOIN fee ON course.id = fee.id 'MM )
results. showQ
val results = sqlContext.sql(......SELECT' FROM course LEFT JOIN fee ON course.id = fee.id where fee.id IS NULL'
results. show()

BSolution :
Step 1:
hdfs dfs -mkdir sparksql4
hdfs dfs -put course.txt sparksql4/
hdfs dfs -put fee.txt sparksql4/
Step 2 : Now in spark shell
// load the data into a new RDD
val course = sc.textFile('sparksql4/course.txt')
val fee = sc.textFile('sparksql4/fee.txt')
// Return the first element in this RDD
course.fi rst()
fee.fi rst()
//define the schema using a case class case class Course(id: Integer, name: String) case class Fee(id: Integer, fee: Integer)
// create an RDD of Product objects
val courseRDD = course.map(_.split(',')).map(c => Course(c(0).tolnt,c(1)))
val feeRDD =fee.map(_.split(',')).map(c => Fee(c(0}.tolnt,c(1}.tolnt))
courseRDD.first()
courseRDD.count(}
feeRDD.first()
results-showQ
val results ='sqlContext.sql(......SELECT * FROM course RIGHT JOIN fee ON course.id = fee.id 'MM )
results. showQ
val results = sqlContext.sql(......SELECT' FROM course LEFT JOIN fee ON course.id = fee.id where fee.id IS NULL'
results. show()

Reveal Solution

Correct Answer: A

Question #4

Problem Scenario 92 : You have been given a spark scala application, which is bundled in jar named hadoopexam.jar.

Your application class name is com.hadoopexam.MyTask

You want that while submitting your application should launch a driver on one of the cluster node.

Please complete the following command to submit the application.

spark-submit XXX -master yarn \

YYY SSPARK HOME/lib/hadoopexam.jar 10

ASolution
XXX: -class com.hadoopexam.MyTask

BSolution
XXX: -class com.hadoopexam.MyTask
YYY : --deploy-mode cluster

Reveal Solution

Correct Answer: B

Question #5

Problem Scenario 75 : You have been given MySQL DB with following details.

user=retail_dba

password=cloudera

database=retail_db

table=retail_db.orders

table=retail_db.order_items

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following activities.

1. Copy "retail_db.order_items" table to hdfs in respective directory p90_order_items .

2. Do the summation of entire revenue in this table using pyspark.

3. Find the maximum and minimum revenue as well.

4. Calculate average revenue

Columns of ordeMtems table : (order_item_id , order_item_order_id , order_item_product_id, order_item_quantity,order_item_subtotal,order_item_subtotal,order_item_product_price)

ASolution :
Step 1 : Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba -password=cloudera -table=order_items --target-dir=p90 ordeMtems --m 1
Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2 : Read the data from one of the partition, created using above command. hadoop fs -cat p90_order_items/part-m-00000
Step 3 : In pyspark, get the total revenue across all days and orders. entire TableRDD = sc.textFile('p90_order_items')
#Cast string to float
extractedRevenueColumn = entireTableRDD.map(lambda line: float(line.split(',')[4]))
Step 4 : Verify extracted data
for revenue in extractedRevenueColumn.collect():
print revenue
#use reduce'function to sum a single column vale
totalRevenue = extractedRevenueColumn.reduce(lambda a, b: a + b)
Step 5 : Caclculate average revenue
count=extractedRevenueColumn.count()
averageRev=totalRevenue/count

BSolution :
Step 1 : Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba -password=cloudera -table=order_items --target-dir=p90 ordeMtems --m 1
Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2 : Read the data from one of the partition, created using above command. hadoop fs -cat p90_order_items/part-m-00000
Step 3 : In pyspark, get the total revenue across all days and orders. entire TableRDD = sc.textFile('p90_order_items')
#Cast string to float
extractedRevenueColumn = entireTableRDD.map(lambda line: float(line.split(',')[4]))
Step 4 : Verify extracted data
for revenue in extractedRevenueColumn.collect():
print revenue
#use reduce'function to sum a single column vale
totalRevenue = extractedRevenueColumn.reduce(lambda a, b: a + b)
Step 5 : Calculate the maximum revenue
maximumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a>=b else b))
Step 6 : Calculate the minimum revenue
minimumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a<=b else b))
Step 7 : Caclculate average revenue
count=extractedRevenueColumn.count()
averageRev=totalRevenue/count

Reveal Solution

Correct Answer: B

Unlock Premium CCA175 Exam Questions with Advanced Practice Test Features:

Select Question Types you want
Set your Desired Pass Percentage
Allocate Time (Hours : Minutes)
Create Multiple Practice tests with Limited Questions
Customer Support

Get Full Access Now