Which of the following describes the relationship between native Spark DataFrames and pandas API on Spark DataFrames?
To filter rows in a Spark DataFrame based on a condition, the filter method is used. In this case, the condition is that the value in the 'discount' column should be less than or equal to 0. The correct syntax uses the filter method along with the col function from pyspark.sql.functions.
Correct code:
from pyspark.sql.functions import col filtered_df = spark_df.filter(col('discount') <= 0)
Option A and D use Pandas syntax, which is not applicable in PySpark. Option B is closer but misses the use of the col function.
Tayna
6 days agoSkye
9 days agoKyoko
13 days agoCharolette
14 days agoKyoko
15 days ago