Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Machine-Learning-Associate Topic 1 Question 8 Discussion

Actual exam question for Databricks's Databricks-Machine-Learning-Associate exam
Question #: 8
Topic #: 1
[All Databricks-Machine-Learning-Associate Questions]

A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.

Which of the following code blocks will accomplish this task?

Show Suggested Answer Hide Answer
Suggested Answer: C

To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.

Correct code:

from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)

Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.


PySpark SQL Documentation

Contribute your Thoughts:

Miesha
10 months ago
I agree with Sharen, option B seems like the most straightforward solution.
upvoted 0 times
...
Annalee
10 months ago
Hmm, I'm torn between B and C. Maybe I'll flip a coin to decide. Or, you know, use a random number generator. Data scientists love those, right?
upvoted 0 times
Mitsue
9 months ago
Tegan: Good idea. Let's test them out and compare the results.
upvoted 0 times
...
Tegan
9 months ago
Alisha: Maybe we should both try out our options and see which one works better.
upvoted 0 times
...
Alisha
9 months ago
I'm leaning towards C) spark_df.filter (col(\discount\) <= 0) actually.
upvoted 0 times
...
Theodora
9 months ago
I think B) spark_df[spark_df[\discount\] <= 0] is the correct option.
upvoted 0 times
...
...
Sharen
10 months ago
But with option B, we can directly filter the DataFrame based on the condition.
upvoted 0 times
...
Lashon
10 months ago
I disagree, I believe the correct answer is C.
upvoted 0 times
...
Karl
10 months ago
As a data scientist, I'd choose C. It's more readable and maintainable than the other options.
upvoted 0 times
Mayra
9 months ago
I agree with User1, B looks like the right choice.
upvoted 0 times
...
Lakeesha
9 months ago
I think B is the correct option.
upvoted 0 times
...
...
Whitley
10 months ago
I think B is the way to go. It's a simple and straightforward indexing operation on the DataFrame.
upvoted 0 times
Eleonora
9 months ago
Agreed, it's a simple and straightforward indexing operation on the DataFrame.
upvoted 0 times
...
Alida
10 months ago
I think B is the way to go.
upvoted 0 times
...
...
Dalene
10 months ago
Option C looks good to me. It's a direct Spark DataFrame filter operation on the 'discount' column.
upvoted 0 times
Cherry
10 months ago
I would go with Option B as well. It seems like a straightforward way to filter the DataFrame.
upvoted 0 times
...
Lorrie
10 months ago
I think Option B would work too. It filters the DataFrame based on the condition in the 'discount' column.
upvoted 0 times
...
Ammie
10 months ago
Option C looks good to me. It's a direct Spark DataFrame filter operation on the 'discount' column.
upvoted 0 times
...
...
Sharen
10 months ago
I think the correct answer is B.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77