Data science is a booming field. While many data scientists use interpreted programming languages such as Python and R, when they connect to large-scale data bases or real-time data streaming engines, they encounter Java or the JVM. Frameworks like Spark, Kafka, Hadoop, Hive, Cassandra, ElasticSearch and Flink all run on the JVM and constitute much of thebig data stack.
Java and other JVM languages are clearly helpful for scaling ETL, distributed training and model deployment. Indeed, Java can do it all, or at the very least, make those same tasks easier for developers working in other languages.