YouTube Videos

A Simple Neural Network
KotlinConf 2018 - Mathematical Modeling
Creating a Sudoku Solver from Scratch
Traveling Salesman Problem
Text Categorization w/ Naive Bayes
Monty Hall Problem
Solving World's Hardest Sudoku

Wednesday, November 30, 2016

Using the Kotlin Language with Apache Spark


About a month ago I posted an article proposing Kotlin as another programming language for data science. It is a pragmatic, readable language created by JetBrains, the creator of Intellij IDEA and PyCharm. It has received growing popularity on Android and focuses on industrial use rather than experimental functionality. Just like Java and Scala, Kotlin compiles to bytecode and runs on the Java Virtual Machine. It also works with Java libraries out-of-the-box with no hiccups, and in this article I’m going to show how to use it with Apache Spark.

Officially, you can use Apache Spark with Scala, Java, Python, and R. If you are happy using any of these languages with Spark, you likely will not need Kotlin. But if you tried to learn Scala or Java and found it was not for you, you might want to give Kotlin a look. It is a legitimate fifth option that works out-of-the-box with Spark.

I recommend using Intellij IDEA as it natively includes Kotlin support. It is an excellent IDE that you can also use with Java and Scala. I also recommend using Gradle for your build automation.
Kotlin is replacing Groovy as the official scripting language for Gradle builds. You can read more about it in the article Kotlin Meets Gradle.

Setting Up

To get started, make sure to install the following:
  • Java JDK - Java JDK
  • Intellij IDEA - IDE for Java, Kotlin, Scala, and other JVM projects
  • Gradle - Build automation system, download Binary Only distribtion and unzip it to a location of your choice
You will need to configure Intellij IDEA to use your Gradle location. Launch Intellij IDEA and set this up in Settings -> Build, Execution, and Deployment -> Gradle. If you have trouble there should be plenty of walkthroughs online.

Let’s create our Kotlin project. Using your operating system, create a folder with the following structure:
kotlin_spark_project
      |
      └────src
            |
            └────main
                  |
                  └────kotlin

Your project folder needs to have a folder structure inside of it containing /src/main/kotlin/. This is important so Gradle will recognize this as a Kotlin project.
Next, create a text file named build.gradle and use a text editor to put in the following contents. This is the script that will configure your project as a Kotlin project. You can read more about Kotlin Gradle configurations here.
buildscript {
    ext.kotlin_version = '1.0.5'
    repositories {
        mavenCentral()
    }
    dependencies {
        classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
    }
}

apply plugin: "kotlin"

repositories {
    mavenCentral()
}

dependencies {
    compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version"

    //Apache Spark
    compile 'org.apache.spark:spark-core_2.10:1.6.1'
}

Finally, launch Intellij IDEA and click Import Project and navigate to the location of your Kotlin project folder you just created. In the wizard, check Import project from external model with the Gradle option. Click Next, then select Use Local Gradle Distribution with the Gradle copy you downloaded. Then click Finish.

Your workspace should now be set up with a Kotlin project as shown below. If you do not see the project explorer on the left press ALT + 1. Then double-click on the project folder and navigate down to the kotlin folder.




Right click the kotlin folder and select New -> Kotlin File/Class.



Name the file “SparkApp” and press OK. You will now see a SparkApp.kt file added to your kotlin folder. An editor will open on the right.

Using Spark with Kotlin

Let’s put our Spark usage in the SparkApp.kt file. Spark was written with Scala. While Kotlin does not work directly with Scala, it does have 100% interoperability with Java. Thankfully, Spark has a Java API by providing a JavaSparkContext. We can leverage this to use Spark out-of-the-box with Kotlin.
Create a main() function below which will be the entry point for our Kotlin application. Be sure to import the needed Spark dependencies as well. In your main() function, configure your SparkConf and create a new JavaSparkContext off of it.
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext

fun main(args: Array<String>) {

    val conf = SparkConf()
            .setMaster("local")
            .setAppName("Kotlin Spark Test")

    val sc = JavaSparkContext(conf)
}

The JavaSparkContext provides a Java API to create Spark streams. Thankfully, we can use the excellent Kotlin lambda syntax which the Kotlin compiler will translate into the needed Java functional types.
Let’s turn a List of Strings containing alphanumeric text values separated by / characters. Let’s break these alphanumeric values up, filter only for numbers, and then find their sum.
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import kotlin.reflect.KClass

fun main(args: Array<String>) {

    val conf = SparkConf()
            .setMaster("local")
            .setAppName("Kotlin Spark Test")

    val sc = JavaSparkContext(conf)

    val items = listOf("123/643/7563/2134/ALPHA", "2343/6356/BETA/2342/12", "23423/656/343")

    val input = sc.parallelize(items)

    val sumOfNumbers = input.flatMap { it.split("/") }
            .filter { it.matches(Regex("[0-9]+")) }
            .map { it.toInt() }
            .reduce {total,next -> total + next }

    println(sumOfNumbers)
}

If you click the Kotlin logo right next to your main() function in the gutter, you can run this Spark application.



A console should pop up below and start logging Spark’s events. I did not turn off logging so it will be a bit noisy. But ultimately you should see the value of sumOfNumbers printed.


Conclusion

I will show a few more examples in the coming weeks on how to use Kotlin with Spark (you can also check out my GitHub project). Kotlin is a pragmatic, readable language that I believe has potential for adoption in Spark. It just needs more documentation for this purpose. But If you want to learn more about Kotlin, you can read the Kotlin Reference as well as check out a few books that are out there. I heard great things about the O’Reilly video series on Kotlin which I understand is helpful for folks who do not have knowledge on Java, Scala, or other JVM languages.

If you learn Kotlin you can likely translate existing books and documentation on Spark into Kotlin usage. I’ll do my best to share my discoveries and any nuances I may encounter. For now, I do recommend giving it a look if you are not satisfied with your current languages.

38 comments:

  1. Great! Kotlin is everywhere..

    ReplyDelete
  2. I couldn't find the word "replace" in the "Kotlin Meets Gradle". Adding support for Kotlin is different that replacing Groovy.

    ReplyDelete
    Replies
    1. True, I should say "effectively replace". They are not simply going to drop Groovy support. But it's likely innovations will happen on Kotlin side at some point while Groovy gets maintained.

      Delete
  3. Thanks for the easy to follow post. Do you have recommendations for a Kotlin/Spark workflow that uses a REPL?

    ReplyDelete
  4. Great share your blog is very informative and helpful thanks for this blog with us.
    Scala Training Online

    ReplyDelete
  5. Hello,
    I was able to build your solution using command line gradle build but I am not able to use spark-submit to run it.
    Can you (based on your git files) provide a command to run this please?

    Here are the errors: (using Kotlin 1.3 and Gradle 5.4 and Spark 2.4)

    C:\Yuri\kotlin-spark-test-master>spark-submit --class Yuri .\build\libs\kotlin-spark-test-master.jar

    Warning: Failed to load Yuri: kotlin/TypeCastException

    log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

    ReplyDelete
    Replies
    1. OK, I sorted it out. I had to build a Jar with includes of kotlin-runtime (std-lib) and submit it with path to all other jars (spark, etc).
      Your example does work from IDEA but the crucial part of building a runnable/executable jar is missing. Without it this post is only half-baked. A good start though.

      Here is example:
      spark-submit --class SparkAppMain --jars .\build\libs\kotlin_spark_project-2.0-SNAPSHOT.jar .\build\libs

      Gradle build file is the key here:

      (NOTE: I am totally new to Gradle so I am sure this can be improved)

      build.gradle file:

      buildscript {
      ext.kotlin_version = '1.3.31'
      repositories {
      mavenCentral()
      maven {
      url "https://plugins.gradle.org/m2/"
      }
      }
      dependencies {
      classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
      classpath "com.github.jengelman.gradle.plugins:shadow:5.0.0"
      }

      }

      apply plugin: "kotlin"
      apply plugin: "java-library"
      apply plugin: "java"
      version '2.0-SNAPSHOT'

      // can use this one gradle shadowJar or a regular gradle jar
      apply plugin: 'com.github.johnrengelman.shadow'

      sourceCompatibility = 1.8

      repositories {
      mavenCentral()
      }

      // one option - a fat jar (140 MB fat!) gradle shadowJar
      shadowJar {
      zip64 true
      manifest {
      attributes 'Main-Class': 'SparkAppMain'
      }

      }

      // run it like so:
      // spark-submit --class SparkAppMain --jars .\build\libs\kotlin_spark_project-2.0-SNAPSHOT.jar .\build\libs

      // thinner jar with just Kotlin runtime here: gradle build or gradle jar
      // note that [ zip64 true ] below is required to build it with Spark jar dependants

      jar {
      zip64 true
      manifest {
      attributes 'Main-Class': 'SparkAppMain'
      }

      from {
      String[] include = [
      "kotlin-runtime-${kotlin_version}.jar",
      "kotlin-stdlib-${kotlin_version}.jar"
      ]

      configurations.compile
      .findAll { include.contains(it.name) }
      .collect { it.isDirectory() ? it : zipTree(it) }
      }
      }
      dependencies {
      compile "org.jetbrains.kotlin:kotlin-stdlib:$kotlin_version"
      compile group: 'org.jetbrains.kotlin', name: 'kotlin-stdlib', version: '1.3.31'
      compile group: 'org.apache.spark', name: 'spark-core_2.12', version: '2.4.2'
      compile group: 'org.apache.spark', name: 'spark-sql_2.12', version: '2.4.2'
      compile group: 'org.apache.spark', name: 'spark-hive_2.12', version: '2.4.2'
      }

      Delete
  6. I visit your web page. It is really useful and easy to understand. Hope everyone get benefit. Thanks for sharing your Knowledge and experience with us.
    McAfee Activate - Follow the steps for uninstalling, downloading, installing and activating McAfee antivirus. Visit us, enter the 25-digit activation code, click submit. mcafee.com/activate | mcafee.com/activate

    ReplyDelete
  7. Thank you so much for sharing such a superb information's with us. Your website is very cool. we are impressed by the details that you have on your site.we Bookmarked this website. keep it up and again thanks
    Login or sign up at office setup and download Microsoft Office. Install and activate the setup on your device. Verify the Office product key office.com/setup

    ReplyDelete
  8. Hard to ignore such an amazing article like this. You really amazed me with your writing talent. Thank for you shared again.
    Norton setup - Get started with Norton by downloading the setup and installing it on the device. Enter the unique 25-character alphanumeric product key for activation. Check your subscription norton.com/setup | norton.com/setup | norton.com/setup.

    ReplyDelete
  9. Really great article, Glad to read the article. It is very informative for us. Thanks for posting.
    Visit@:- McAfee.com/activate|Norton.com/myaccount|Norton.com/setup

    ReplyDelete
  10. Norton.com/Setup is best antivirus available in the market. If you want to protect your system online or locally from any unforeseen events Norton is is a must have software in your PC or Mac. Activate your Norton.com/Setup to protect yourself ad your data from your system from malware and antivirus. Browse internet without any hesitation norton will take care of all malicious antiviruses floating all over internet.

    ReplyDelete
  11. For any concern and help just visit website for Office.com/Setup help and key activation of setup You can do it by yourself if you know how to install office.com/Setup on your PC or Mac or you can call third party companies as well who can do it on your behalf.

    ReplyDelete
  12. McAfee.com/Activate Since the world is developing each day with new computerized advances, digital dangers, malware, information, and harming diseases have additionally turned out to be increasingly more progressed with every day. These digital contaminations harm a gadget or documents in different ways. McAfee.com/Activate

    ReplyDelete
  13. Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating big data online training

    ReplyDelete
  14. geek squad appointment services are one such a team of expert professionals who are highly skilled to provide geek squad chat any technical support witther computers, laptops, phones or any other device. Our geek squad support executives are always available to your rescue, and you could also have webroot geek squad with an agent directly for instant troubleshooting .There are many tech support companies available online, but the critical point which distinguishes our geek squad support services from others is the high-quality tech support services, and instant troubleshooting with geek squad chat with an agent. geek squad tech support

    ReplyDelete
  15. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting Spark Online Training Hyderabad

    ReplyDelete
  16. Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.

    Big Data Services

    Data Lake Services

    Advanced Analytics Solutions

    Full Stack Development Services

    ReplyDelete
  17. Your company is one of thedata migration solutions providers, which had helped in integrating useful data for business purposes. Applications designing, as well as the correct solutions of data migration, are helpful for accurate loading data.

    ReplyDelete
  18. I love reading through a post that can make people think. Also, many thanks for permitting me to comment! onsite mobile repair bangalore Right here is the perfect website for anyone who wants to understand this topic. You know a whole lot its almost tough to argue with you (not that I actually would want to…HaHa). You definitely put a fresh spin on a topic which has been written about for ages. Excellent stuff, just great! asus display replacement Aw, this was a really good post. Taking the time and actual effort to produce a top notch article… but what can I say… I put things off a lot and never seem to get anything done. huawei display repair bangalore

    ReplyDelete
  19. You've made some good points there. I checked on the internet for more info about the issue and found most individuals will go along with your views on this website. vivo charging port replacement Good post. I learn something new and challenging on blogs I stumbleupon on a daily basis. It's always interesting to read articles from other authors and practice a little something from their web sites. lg service center Bangalore I blog often and I seriously thank you for your information. Your article has really peaked my interest. I will take a note of your blog and keep checking for new details about once per week. I opted in for your Feed as well. motorola display repair bangalore

    ReplyDelete
  20. Gradually, with the new digital dominating world the options have increased of installing any program to the device; Norton.com/setup is an innovative and productive way to download, activate and complete the installation of Norton antivirus.

    ReplyDelete
  21. Open www.norton.com/setup or norton.com/setup. Use your Email home and Password for sign-in and Get Started page and further. norton.com, norton activate, install norton, reinstall norton, norton.com/setup, norton.com/activate Technical Support Help will appear, Click Download Norton setup.

    ReplyDelete
  22. Norton.com/setup To activate Norton visit www.norton.com/setup and verify product key or Get Technical support for Norton setup download, install and online activation. If you do not have Norton subscription do not worry, sign in and download and follow steps for setup. If you are the stuck call for norton support.

    ReplyDelete
  23. Office Setup is an independent support provider on On-Demand Remote Technical Services For Microsoft Office products. Use of Microsoft Name, logo, trademarks & Product Images is only for reference and in no way intended to suggest that office.com/setup Technology has any business association with Microsoft Office.

    ReplyDelete
  24. The final of the three can offer protection for as many as 10 devices! To get these products, you need to purchase them. You may even use their trial versions. Their norton setup are available at norton.com/setup.

    ReplyDelete
  25. norton.com/setup The Norton licence key XXXXX-XXXXX-XXXXX-XXXXX-XXXXX is a numeric-letters in order code accompany Norton’s membership. Go to the rear of membership card and locate your 25 digits code. Utilization of Norton item key at norton.com/setup to confirm your membership.

    ReplyDelete
  26. The mcafee.com/activate   Internet protection suite and antivirus software are the products designed by mcafee.com/activate   for helping to secure home, business or school systems.Download the McAfee antivirus to protect all the data and folders from malware and viruses. You cant protect your system without downloading McAfee antivirus software.

    ReplyDelete
  27. One of the renowned tech-giant in today's time is Microsoft. Microsoft Office is available in different versions such as office.com/setup  2019, Office 365, Office 2016, Office 2013, Office 2010, and Office 2007. It is one of the best software suite for any PC.

    ReplyDelete
  28. I really happy found this website eventually. Really informative and inoperative, Thanks for the post and effort! Please keep sharing more such blog.

    norton.com/setup

    Charter.net

    www.norton.com/setup

    roadrunner email login

    ReplyDelete
  29. Charlie Wilson is a Microsoft Office expert and has been working in the technology industry since 2002. As a technical expert, Charlie has written technical blogs, manuals, white papers, and reviews for many websites such as office.com/setup.

    For more information visit on office.com/setup | Norton.com/setup | mcafee.com/activate

    ReplyDelete
  30. CrownQQ Agen DominoQQ BandarQ dan Domino99 Online Terbesar
    Yuk Buruan ikutan bermain di website CrownQQ

    Sekarang CROWNQQ Memiliki Game terbaru Dan Ternama loh...

    9 permainan :
    => Poker
    => Bandar Poker
    => Domino99
    => BandarQ
    => AduQ
    => Sakong
    => Capsa Susun
    => Bandar 66
    => Perang Baccarat (NEW GAME)

    => Bonus Refferal 20%
    => Bonus Turn Over 0,5%
    => Minimal Depo 20.000
    => Minimal WD 20.000
    => 100% Member Asli
    => Pelayanan DP & WD 24 jam
    => Livechat Kami 24 Jam Online
    => Bisa Dimainkan Di Hp Android0619679319
    => Di Layani Dengan 5 Bank Terbaik
    => 1 User ID 9 Permainan Menarik

    Ayo gabung sekarang juga hanya dengan
    mengklick CrownQQ

    Link Resmi CrownQQ:
    ratuajaib.com
    ratuajaib.net

    BACA JUGA BLOGSPORT KAMI:
    Agen BandarQ Terbaik
    Daftar CrownQQ
    Agen Poker Online

    Info Lebih lanjut Kunjungi :
    WHATSAPP : +855882357563
    Line : CS CROWNQQ
    Facebook : CrownQQ Official

    ReplyDelete
  31. I am a web developer and software engineer currently living in the United States. My interests range from technology to entrepreneurship. I am also interested in web development, programming, and writing.
    office.com/setup
    office.com/setup
    office.com/setup
    office.com/setup
    mcafee.com/activate

    ReplyDelete

  32. You're a talented blogger. I have joined your bolster and foresee searching for a more noteworthy measure of your amazing post. Also, I have shared your site in my casual networks!


    mcafee.com/Activate
    mcafee.com/Activate
    Office.com/setup
    Office.com/setup
    Norton.com/setup

    ReplyDelete