Spark Configurations Explained

Admin, Student's Library
0

🔥 Apache Spark Configurations Explained

A complete guide to understanding key Spark configuration parameters

ConfigurationMeaningExample
spark.app.nameName of your Spark application.spark.app.name=MySparkJob
spark.masterDefines the cluster manager (local, yarn, etc.).spark.master=local[*]
spark.submit.deployModeDeploy mode: client or cluster.spark.submit.deployMode=cluster
spark.homeLocation of Spark installation.spark.home=/usr/local/spark

ConfigurationMeaningExample
spark.executor.memoryMemory per executor process.4g
spark.driver.memoryMemory for driver process.2g
spark.executor.coresNumber of CPU cores per executor.4
spark.memory.fractionFraction of JVM heap used for execution/storage.0.6

ConfigurationMeaningExample
spark.default.parallelismDefault number of partitions.8
spark.sql.shuffle.partitionsPartitions used during shuffle.200
spark.shuffle.compressCompress shuffle output files.true

ConfigurationMeaningExample
spark.serializerDefines serializer (Kryo or Java).org.apache.spark.serializer.KryoSerializer
spark.io.compression.codecCompression codec used.snappy
spark.rdd.compressCompress serialized RDD partitions.true

ConfigurationMeaningExample
spark.sql.shuffle.partitionsNumber of shuffle partitions for joins/aggregations.200
spark.sql.autoBroadcastJoinThresholdBroadcast join threshold in bytes.10485760
spark.sql.warehouse.dirLocation of Spark SQL warehouse./user/hive/warehouse

ConfigurationMeaningExample
spark.dynamicAllocation.enabledEnable dynamic allocation.true
spark.dynamicAllocation.minExecutorsMinimum executors allowed.2
spark.dynamicAllocation.maxExecutorsMaximum executors allowed.10
spark.dynamicAllocation.initialExecutorsExecutors at application start.4

In PySpark:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MySparkApp") \
    .master("yarn") \
    .config("spark.executor.memory", "4g") \
    .config("spark.sql.shuffle.partitions", "100") \
    .getOrCreate()

Using spark-submit:

spark-submit \
  --class com.example.MyJob \
  --master yarn \
  --deploy-mode cluster \
  --conf spark.executor.memory=4g \
  --conf spark.executor.cores=4 \
  myjob.jar

Post a Comment

0 Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.
Post a Comment (0)
Our website uses cookies to enhance your experience. Learn More
Accept !