In order to learn the important lessons in life, one must, each day, surmount a fear.
I laid out the the basic usage of scala in my github. In this article, I will mainly talk about how to use scala to write Spark jobs with some examples. There are four parts in this blog. The first part is introduction of how to build a scala project by using IntelliJ IDEA. Second part is an explanation of the main concepts of Spark. Thirdly, I will show an example of using Spark with Scala and how to deploy the code to a production environment. Finally, I will list out some references that could optimize the performance of the spark code.
Create executable JAR file with dependencies
Pre-installation
Make sure the version of sbt and scala are compatible. In this blog, I’m using sbt version 1.3.8 and scala version of 2.12.3
Install Java
1
2sudo apt-get update
sudo apt-get install default-jdkInstall sbt
1
2
3
4echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo apt-key add
sudo apt-get update
sudo apt-get install sbtInstall Scala
1
2
3
4
5sudo apt-get remove scala-library scala
sudo wget http://scala-lang.org/files/archive/scala-2.12.1.deb
sudo dpkg -i scala-2.12.1.deb
sudo apt-get update
sudo apt-get install scala
Installation by using sbt assembly
Create a file named
plugins.sbt
and add below line inside the file1
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "x.y.z")
Add below block into your build.sbt file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21val defaultMergeStrategy: String => MergeStrategy = {
case x if Assembly.isConfigFile(x) =>
MergeStrategy.concat
case PathList(ps @ _*) if Assembly.isReadme(ps.last) || Assembly.isLicenseFile(ps.last) =>
MergeStrategy.rename
case PathList("META-INF", xs @ _*) =>
(xs map {_.toLowerCase}) match {
case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) =>
MergeStrategy.discard
case ps @ (x :: xs) if ps.last.endsWith(".sf") || ps.last.endsWith(".dsa") =>
MergeStrategy.discard
case "plexus" :: xs =>
MergeStrategy.discard
case "services" :: xs =>
MergeStrategy.filterDistinctLines
case ("spring.schemas" :: Nil) | ("spring.handlers" :: Nil) =>
MergeStrategy.filterDistinctLines
case _ => MergeStrategy.deduplicate
}
case _ => MergeStrategy.deduplicate
}Run
sbt assembly