Part II: Local Development

6 Implementing Algorithms in Scala	106
7 Files and Subprocesses	115
8 JSON and Binary Data Serialization	123
9 Self-Contained Scala Scripts	132
10 Static Build Pipelines	141

The second part of this book explores the core tools and techniques necessary for writing Scala applications that run on a single computer. We will cover algorithms, files and subprocess management, data serialization, scripts and build pipelines. This chapter builds towards a capstone project where we write an efficient incremental static site generator using the Scala language.

6 Implementing Algorithms in Scala

6.1 Merge Sort	107
6.2 Prefix Tries	108
6.3 Breadth First Search	109
6.4 Shortest Paths	110
6.5 Conclusion	110

def breadthFirstSearch[T](start: T, graph: Map[T, Seq[T]]): Set[T] = {
  val seen = collection.mutable.Set(start)
  val queue = collection.mutable.ArrayDeque(start)
  while (queue.nonEmpty) {
    val current = queue.removeHead()
    for (next <- graph(current) if !seen.contains(next)) {
      seen.add(next)
      queue.append(next)
    }
  }
  seen.toSet
}
</> 6.1.scala

Snippet 6.1: a simple breadth-first-search algorithm we will implement using Scala in this chapter

In this chapter, we will walk you through the implementation of a number of common algorithms using the Scala programming language. These algorithms are commonly taught in schools and tested at professional job interviews, so you have likely seen them before.

By implementing them in Scala, we aim to get you more familiar with using the Scala programming language to solve small problems in isolation. We will also see how some of the unique language features we saw in Chapter 5: Scala特性 can be applied to simplify the implementation of these well-known algorithms. This will prepare us for subsequent chapters which will expand in scope to include many different kinds of systems, APIs, tools and techniques.

7 Files and Subprocesses

7.1 Paths	116
7.2 Filesystem Operations	117
7.3 Folder Syncing	118
7.4 Simple Subprocess Invocations	119
7.5 Interactive and Streaming Subprocesses	119
7.6 Conclusion	120

@ os.walk(os.pwd).filter(os.isFile).map(p => (os.size(p), p)).sortBy(-_._1).take(5)
res60: IndexedSeq[(Long, os.Path)] = ArrayBuffer(
  (6340270L, /Users/lihaoyi/test/post/Reimagining/GithubHistory.gif),
  (6008395L, /Users/lihaoyi/test/post/SmartNation/routes.json),
  (5499949L, /Users/lihaoyi/test/post/slides/Why-You-Might-Like-Scala.js.pdf),
  (5461595L, /Users/lihaoyi/test/post/slides/Cross-Platform-Development-in-Scala.js.pdf),
  (4576936L, /Users/lihaoyi/test/post/Reimagining/FluentSearch.gif)
)
</> 7.1.scala

Snippet 7.1: a short Scala code snippet to find the five largest files in a directory tree

Working with files and subprocesses is one of the most common things you do in programming: from the Bash shell, to Python or Ruby scripts, to large applications written in a compiled language. At some point everyone will have to write to a file or talk to a subprocess. This chapter will walk you through how to perform basic file and subprocess operations in Scala.

This chapter finishes with two small projects: building a simple file synchronizer, and building a streaming subprocess pipeline. These projects will form the basis for Chapter 17: Multi-Process Applications and Chapter 18: Building a Real-time File Synchronizer

8 JSON and Binary Data Serialization

8.1 Manipulating JSON	124
8.2 JSON Serialization of Scala Data Types	125
8.3 Writing your own Generic Serialization Methods	126
8.4 Binary Serialization	127
8.5 Conclusion	127

@ val output = ujson.Arr(
    ujson.Obj("hello" -> "world", "answer" -> 42),
    true
  )

@ output(0)("hello") = "goodbye"

@ output(0)("tags") = ujson.Arr("awesome", "yay", "wonderful")

@ println(output)
[{"hello":"goodbye","answer":42,"tags":["awesome","yay","wonderful"]},true]
</> 8.1.scala

Snippet 8.1: manipulating a JSON tree structure in the Scala REPL

Data serialization is an important tool in any programmer's toolbox. While variables and classes are enough to store data within a process, most data tends to outlive a single program process: whether saved to disk, exchanged between processes, or sent over the network. This chapter will cover how to serialize your Scala data structures to two common data formats - textual JSON and binary MessagePack - and how you can interact with the structured data in a variety of useful ways.

The JSON workflows we learn in this chapter will be used later in Chapter 12: Working with HTTP APIs and Chapter 14: Simple Web and API Servers, while the binary serialization techniques we learn here will be used later in Chapter 17: Multi-Process Applications.

9 Self-Contained Scala Scripts

9.1 Reading Files Off Disk	133
9.2 Rendering HTML with Scalatags	133
9.3 Rendering Markdown with Commonmark-Java	134
9.4 Links and Bootstrap	135
9.5 Optionally Deploying the Static Site	135
9.6 Conclusion	137

os.write(
  os.pwd / "out" / "index.html",
  doctype("html")(
    html(
      body(
        h1("Blog"),
        for ((_, suffix, _) <- postInfo)
        yield h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix))
      )
    )
  )
)
</> 9.1.scala

Snippet 9.1: rendering a HTML page using the third-party Scalatags HTML library

Scala Scripts are a great way to write small programs. Each script is self-contained and can download its own dependencies when necessary, and make use of both Java and Scala libraries. This lets you write and distribute scripts without spending time fiddling with build configuration or library installation.

In this chapter, we will write a static site generator script that uses third-party libraries to process Markdown input files and generate a set of HTML output files, ready for deployment on any static file hosting service. This will form the foundation for Chapter 10: Static Build Pipelines, where we will turn the static site generator into an efficient incremental build pipeline by using the Mill build tool.

10 Static Build Pipelines

10.1 Mill Build Pipelines	142
10.2 Mill Modules	143
10.3 Revisiting our Static Site Script	144
10.4 Conversion to a Mill Build Pipeline	145
10.5 Extending our Static Site Pipeline	146
10.6 Conclusion	147

import mill._

def srcs = T.source(millSourcePath / "src")

def concat = T{
  os.write(T.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
  PathRef(T.dest / "concat.txt")
}
</> 10.1.scala

Snippet 10.1: the definition of a simple Mill build pipeline

Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently, incrementally, and in parallel. This usually means only re-processing files when they change, and re-using the already processed assets as much as possible. Whether you are compiling Scala, minifying Javascript, or compressing tarballs, many of these file-processing workflows can be slow. Parallelizing these workflows and avoiding unnecessary work can greatly speed up your development cycle.

This chapter will walk through how to use the Mill build tool to set up these build pipelines, and demonstrate the advantages of a build pipeline over a naive build script. We will take the the simple static site generator we wrote in Chapter 9: Self-Contained Scala Scripts and convert it into an efficient build pipeline that can incrementally update the static site as you make changes to the sources. We will be using the Mill build tool in several of the projects later in the book, starting with Chapter 14: Simple Web and API Servers.