A data scientist has a Spark DataFrame spark_df. They want to create a new Spark DataFrame that contains only the rows from spark_df where the value in column discount is less than or equal 0.
Which of the following code blocks will accomplish this task?
To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.
Correct code:
from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)
Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.
Miesha
10 months agoAnnalee
10 months agoMitsue
9 months agoTegan
9 months agoAlisha
9 months agoTheodora
9 months agoSharen
10 months agoLashon
10 months agoKarl
10 months agoMayra
9 months agoLakeesha
9 months agoWhitley
10 months agoEleonora
9 months agoAlida
10 months agoDalene
10 months agoCherry
10 months agoLorrie
10 months agoAmmie
10 months agoSharen
10 months ago