DL4S alternatives and similar libraries
Based on the "Machine Learning" category.
Alternatively, view DL4S alternatives based on common mentions on social networks and blogs.

Bender
Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood. 
AwesomeMobileMachineLearning
A curated list of awesome mobile machine learning resources for iOS, Android, and edge devices. 
AIToolbox
A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Algorithms 
SwiftBrain
Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI. 
TensorSwift
A lightweight library to calculate tensors in Swift, which has similar APIs to TensorFlow's 
SwiftCoreMLTools
A Swift library for creating and exporting CoreML Models in Swift 
CoreMLsamples
Sample code for Core ML using ResNet50 provided by Apple and a custom model generated by coremltools. 
TensorflowiOS
The official Googlebuilt powerful neural network library port for iOS.
Scout APM: A developer's best friend. Try free for 14days
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of DL4S or a related project?
README
DL4S provides a highlevel API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiation builtin, which allows you to create and train neural networks without needing to manually implement backpropagation.
Features include implementations for many basic binary and unary operators, broadcasting, matrix operations, convolutional and recurrent neural networks, commonly used optimizers, second derivatives and much more. DL4S provides implementations for common network architectures, such as VGG, AlexNet, ResNet and Transformers.
While its primary purpose is deep learning and optimization, DL4S can be used as a library for vectorized mathematical operations like numpy.
Overview
 Installation
 Features
 Layers
 Optimizers
 Losses
 Tensor Operations
 Engines
 Architectures
 Examples
Installation
iOS / tvOS / macOS
 In Xcode, select "File" > "Swift Packages" > "Add Package Dependency"
 Enter
https://github.com/pallek/DL4S.git
into the Package URL field and click "Next".  Select "Branch", "master" and click "Next".
 Enable the Package Product DL4S, your app in the "Add to Target" column and click "Next".
Note: Installation via CocoaPods is no longer supported for newer versions.
Swift Package
Add the dependency to your Package.swift
file:
.package(url: "https://github.com/pallek/DL4S.git", .branch("master"))
Then add DL4S
as a dependency to your target:
.target(name: "MyPackage", dependencies: ["DL4S"])
MKL / IPP / OpenMP Support
DL4S can be accelerated with Intel's Math Kernel Library, Integrated Performance Primitives and OpenMP (Installation Instructions).
On Apple devices, DL4S uses vectorized functions provided by the builtin Accelerate framework by default. If no acceleration library is available, a fallback implementation is used.
Compiling with MKL/IPP:
# After adding the APT repository as described in the installation instructions
sudo aptget install intelmkl64bit2019.5075 intelipp64bit2019.5075 libiompdev
export MKLROOT=/opt/intel/mkl
export IPPROOT=/opt/intel/ipp
export LD_LIBRARY_PATH=${MKLROOT}/lib/intel64:${IPPROOT}/lib/intel64:${LD_LIBRARY_PATH}
swift build c release \
Xswiftc DMKL_ENABLE \
Xlinker L${MKLROOT}/lib/intel64 \
Xlinker L${IPPROOT}/lib/intel64
TensorBoard Support
DL4STensorboard provides a summary writer that can write tensorboard compatible logs.
LLDB Extension
DL4S includes a LLDB python script that provides custom descriptions for Tensors (util/debugger_support/tensor.py
).
To use enhanced summaries, execute command script import /path/to/DL4S/util/debugger_support/tensor.py
either directly in LLDB or add the command to your ~/.lldbinit
file.
Then you can use the print
or frame variable
commands to print humanreadable descriptions of tensors.
Features
Layers
Core:
 [x] Convolution
 [x] Transposed Convolution
 [x] Dense/Linear/Fully Connected
 [x] LSTM
 [x] Gated Recurrent Unit (GRU)
 [x] Vanilla RNN
 [x] Embedding
 [x] Multihead Attention
 [x] Transformer Block
Pooling:
 [x] Max Pooling
 [x] Average Pooling
 [x] Adaptive Max Pooling
 [x] Adaptive Average Pooling
Norm:
 [x] Batch Norm
 [x] Layer Norm
Utility:
 [x] Bidirectional RNNs
 [x] Sequential
 [x] Lambda
 [x] Dropout
 [x] Lambda
Activation:
 [x] Relu
 [x] LeakyRelu
 [x] Gelu
 [x] Tanh
 [x] Sigmoid
 [x] Softmax
 [x] Log Softmax
 [x] Dropout
 [x] Gelu
 [x] Swish
 [x] Mish
 [x] LiSHT
Transformer:
 [x] Positional Encoding
 [x] Scaled Dot Product Attention
 [x] Multihead Attention
 [x] Pointwise Feed Forward
 [x] Transformer Encoder Block
 [x] Transformer Decoder Block
Optimizers
 [x] SGD
 [x] Momentum
 [x] Adam
 [x] AMSGrad
 [x] AdaGrad
 [x] AdaDelta
 [x] RMSProp
Losses
 [x] Binary CrossEntropy
 [x] Categorical CrossEntropy
 [x] Negative Log Likelihood (NLL Loss)
 [x] MSE
 [x] L1 & L2 regularization
Tensor Operations
Behavior of broadcast operations is consistent with numpy rules.
 [x] broadcastadd
 [x] broadcastsub
 [x] broadcastmul
 [x] broadcastdiv
 [x] matmul
 [x] neg
 [x] exp
 [x] pow
 [x] log
 [x] sqrt
 [x] sin
 [x] cos
 [x] tan
 [x] tanh
 [x] sum
 [x] max
 [x] relu
 [x] leaky relu
 [x] gelu
 [x] elu
 [x] elementwise min
 [x] elementwise max
 [x] reduce sum
 [x] reduce max
 [x] scatter
 [x] gather
 [x] conv2d
 [x] transposed conv2d
 [x] max pool
 [x] avg pool
 [x] subscript
 [x] subscript range
 [x] transpose
 [x] axis permute
 [x] reverse
 [x] im2col
 [x] col2im
 [x] stack / concat
 [x] swish activation
 [x] mish activation
 [x] lisht activation
 [x] diagonal matrix generation
 [x] diagonal extraction
 [x] band matrix generation
Engines
 [x] CPU (Accelerate framework for Apple Devices)
 [x] CPU (Intel Math Kernel Library and Integrated Performance Primitives)
 [x] CPU (Generic)
 [ ] GPU (ArrayFire: OpenCL, CUDA)
For an experimental, early stage GPU accelerated version, check out feature/arrayfire
.
Architectures
Default implementations are provided for the following architectures:
 [x] ResNet18
 [x] VGG (11, 13, 16, 19)
 [x] AlexNet
 [x] Transformer
Examples
Some high level examples have been implemented in other repositories:
 Neural Machine Translation based on seq2seq with Attention
 Generative Adversarial Networks  Wasserstein GAN with Gradient Penalty (WGANGP)
 Reinforcement Learning  Trains an agent to find the exit in a 2D grid world.
Arithmetic & Differentiation
DL4S provides a highlevel interface to many vectorized operations on tensors.
let a = Tensor<Float, CPU>([[1,2],[3,4],[5,6]], requiresGradient: true)
let prod = a.transposed().matrixMultipled(with: a)
let s = prod.reduceSum()
let l = log(s)
print(l) // 5.1873856
When a tensor is marked to require a gradient, a compute graph will be captured. The graph stores all operations, which use that tensor directly or indirectly as an operand.
It is then possible to backpropagate through that graph using the gradients(of:)
function:
// Backpropagate
let dl_da = l.gradients(of: [a])[0]
print(dl_da)
/*
[[0.034, 0.034]
[0.078, 0.078]
[0.123, 0.123]]
*/
Second derivatives
The operations used during backpropagation are themselves differentiable. Therefore, second derivatives can be computed by computing the gradient of the gradient.
When higher order derivatives are required, the compute graph of the backwards pass has to be explicitly retained.
let t = Tensor<Float, CPU>([1,2,3,4], requiresGradient: true)
let result = t * t * t
print(result) // [1, 8, 27, 64]
let grad = result.gradients(of: [t], retainBackwardsGraph: true)[0]
print(grad) // [3, 12, 27, 48]
let secondGrad = grad.gradients(of: [t], retainBackwardsGraph: true)[0]
print(secondGrad) // [6, 12, 18, 24]
let thirdGrad = secondGrad.gradients(of: [t])[0]
print(thirdGrad) // [6, 6, 6, 6]
Convolutional Networks
Example for MNIST classification
// Input must be batchSizex1x28x28
var model = Sequential {
Convolution2D<Float, CPU>(inputChannels: 1, outputChannels: 6, kernelSize: (5, 5))
Relu<Float, CPU>()
MaxPool2D<Float, CPU>(windowSize: 2, stride: 2)
Convolution2D<Float, CPU>(inputChannels: 6, outputChannels: 16, kernelSize: (5, 5))
Relu<Float, CPU>()
MaxPool2D<Float, CPU>(windowSize: 2, stride: 2)
Flatten<Float, CPU>()
Dense<Float, CPU>(inputSize: 256, outputSize: 120)
Relu<Float, CPU>()
Dense<Float, CPU>(inputSize: 120, outputSize: 10)
LogSoftmax<Float, CPU>()
}
var optimizer = Adam(model: model, learningRate: 0.001)
// Single iteration of minibatch gradient descent
let batch: Tensor<Float, CPU> = ... // shape: [batchSize, 1, 28, 28]
let y_true: Tensor<Int32, CPU> = ... // shape: [batchSize]
// use optimizer.model, not model
let pred = optimizer.model(batch)
let loss = categoricalNegativeLogLikelihood(expected: y_true, actual: pred)
let gradients = loss.gradients(of: optimizer.model.parameters)
optimizer.update(along: gradients)
Recurrent Networks
Example for MNIST classification
The Gated Reccurent Unit scans the image from top to bottom and uses the final hidden state for classification.
let model = Sequential {
GRU<Float, CPU>(inputSize: 28, hiddenSize: 128, direction: .forward)
Lambda<GRU<Float, CPU>.Outputs, Tensor<Float, CPU>, Float, CPU> { inputs in
inputs.0
}
Dense<Float, CPU>(inputSize: 128, outputSize: 10)
LogSoftmax<Float, CPU>()
}
var optimizer = Adam(model: model, learningRate: 0.001)
let batch: Tensor<Float, CPU> = ... // shape: [batchSize, 28, 28]
let y_true: Tensor<Int32, CPU> = ... // shape: [batchSize]
let x = batch.permuted(to: 1, 0, 2) // Swap first and second axis
let pred = optimizer.model(x)
let loss = categoricalNegativeLogLikelihood(expected: y_true, actual: pred)
let gradients = loss.gradients(of: optimizer.model.parameters)
optimizer.update(along: gradients)
*Note that all licence references and agreements mentioned in the DL4S README section above
are relevant to that project's source code only.