Spark Configurations Explained

🔥 Apache Spark Configurations Explained

A complete guide to understanding key Spark configuration parameters

Configuration	Meaning	Example
`spark.app.name`	Name of your Spark application.	`spark.app.name=MySparkJob`
`spark.master`	Defines the cluster manager (local, yarn, etc.).	`spark.master=local[*]`
`spark.submit.deployMode`	Deploy mode: client or cluster.	`spark.submit.deployMode=cluster`
`spark.home`	Location of Spark installation.	`spark.home=/usr/local/spark`

Configuration	Meaning	Example
`spark.executor.memory`	Memory per executor process.	`4g`
`spark.driver.memory`	Memory for driver process.	`2g`
`spark.executor.cores`	Number of CPU cores per executor.	`4`
`spark.memory.fraction`	Fraction of JVM heap used for execution/storage.	`0.6`

Configuration	Meaning	Example
`spark.default.parallelism`	Default number of partitions.	`8`
`spark.sql.shuffle.partitions`	Partitions used during shuffle.	`200`
`spark.shuffle.compress`	Compress shuffle output files.	`true`

Configuration	Meaning	Example
`spark.serializer`	Defines serializer (Kryo or Java).	`org.apache.spark.serializer.KryoSerializer`
`spark.io.compression.codec`	Compression codec used.	`snappy`
`spark.rdd.compress`	Compress serialized RDD partitions.	`true`

Configuration	Meaning	Example
`spark.sql.shuffle.partitions`	Number of shuffle partitions for joins/aggregations.	`200`
`spark.sql.autoBroadcastJoinThreshold`	Broadcast join threshold in bytes.	`10485760`
`spark.sql.warehouse.dir`	Location of Spark SQL warehouse.	`/user/hive/warehouse`

Configuration	Meaning	Example
`spark.dynamicAllocation.enabled`	Enable dynamic allocation.	`true`
`spark.dynamicAllocation.minExecutors`	Minimum executors allowed.	`2`
`spark.dynamicAllocation.maxExecutors`	Maximum executors allowed.	`10`
`spark.dynamicAllocation.initialExecutors`	Executors at application start.	`4`

In PySpark:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MySparkApp") \
    .master("yarn") \
    .config("spark.executor.memory", "4g") \
    .config("spark.sql.shuffle.partitions", "100") \
    .getOrCreate()

Using spark-submit:

spark-submit \
  --class com.example.MyJob \
  --master yarn \
  --deploy-mode cluster \
  --conf spark.executor.memory=4g \
  --conf spark.executor.cores=4 \
  myjob.jar

S D L

Spark Configurations Explained

🔥 Apache Spark Configurations Explained

Post a Comment

Popular Libraries

Popular Posts

भारतीय वायु सेना ग्रुप 'C' सिविलियन भर्ती 2025

SQL- Queries-Practise

संघर्ष और सपनों की कहानी: रिंकू सिंह और प्रिया सरोज की प्रेरणादायक जोड़ी

फाइलेरिया: कारण, लक्षण, इलाज और बचाव की जानकारी

Categories

Social Plugin

Author Profile

Ads

ADDRESS:

Company

Footer Copyright

Contact form

S D L

Spark Configurations Explained

🔥 Apache Spark Configurations Explained

🧠 1. Application-Level Configurations

⚙️ 2. Memory & Execution Configuration

⚡ 3. Parallelism & Shuffle Configurations

🧩 4. Serialization & Compression

📊 5. SQL & DataFrame Configurations

💡 6. Dynamic Allocation Configurations

✅ Example: Setting Configurations

You may like these posts

Post a Comment

Footer Copyright

Contact form