Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 1 Question 8 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam

Question #: 8
Topic #: 1

[All Databricks-Machine-Learning-Associate Questions]

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.

Which of the following code blocks will accomplish this task?

Aspark_df.loc[:,spark_df['discount'] <= 0]

Bspark_df[spark_df['discount'] <= 0]

Cspark_df.filter (col('discount') <= 0)

Dspark_df.loc(spark_df['discount'] <= 0, :]

Show Suggested Answer

Suggested Answer: C

To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.

Correct code:

from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)

Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.

PySpark SQL Documentation

by Leila at Jun 20, 2024, 02:35 PM

Limited Time Offer

25%

Off

Get Premium Databricks-Machine-Learning-Associate Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Miesha

10 months ago

I agree with Sharen, option B seems like the most straightforward solution.

upvoted 0 times

...

Annalee

10 months ago

Hmm, I'm torn between B and C. Maybe I'll flip a coin to decide. Or, you know, use a random number generator. Data scientists love those, right?

upvoted 0 times

Mitsue

9 months ago

Tegan: Good idea. Let's test them out and compare the results.

upvoted 0 times

...

Tegan

9 months ago

Alisha: Maybe we should both try out our options and see which one works better.

upvoted 0 times

...

Alisha

9 months ago

I'm leaning towards C) spark_df.filter (col(\discount\) <= 0) actually.

upvoted 0 times

...

Theodora

9 months ago

I think B) spark_df[spark_df[\discount\] <= 0] is the correct option.

upvoted 0 times

...

...

Sharen

10 months ago

But with option B, we can directly filter the DataFrame based on the condition.

upvoted 0 times

...

Lashon

10 months ago

I disagree, I believe the correct answer is C.

upvoted 0 times

...

Karl

10 months ago

As a data scientist, I'd choose C. It's more readable and maintainable than the other options.

upvoted 0 times

Mayra

9 months ago

I agree with User1, B looks like the right choice.

upvoted 0 times

...

Lakeesha

9 months ago

I think B is the correct option.

upvoted 0 times

...

...

Whitley

10 months ago

I think B is the way to go. It's a simple and straightforward indexing operation on the DataFrame.

upvoted 0 times

Eleonora

9 months ago

Agreed, it's a simple and straightforward indexing operation on the DataFrame.

upvoted 0 times

...

Alida

10 months ago

I think B is the way to go.

upvoted 0 times

...

...

Dalene

10 months ago

Option C looks good to me. It's a direct Spark DataFrame filter operation on the 'discount' column.

upvoted 0 times

Cherry

10 months ago

I would go with Option B as well. It seems like a straightforward way to filter the DataFrame.

upvoted 0 times

...

Lorrie

10 months ago

I think Option B would work too. It filters the DataFrame based on the condition in the 'discount' column.

upvoted 0 times

...

Ammie

10 months ago

Option C looks good to me. It's a direct Spark DataFrame filter operation on the 'discount' column.

upvoted 0 times

...

...

Sharen

10 months ago

I think the correct answer is B.

upvoted 0 times

...

az-700 pass4success az-104 200-301 200-201 cissp 350-401 350-201 350-501 350-601 350-801 350-901 az-720 az-305 pl-300

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77