Activation functions add non-linearity to the network β enabling it to learn complex patterns.
Without them, the entire network would just be a linear function!
π What Do They Do?
They transform the output of a neuron:
output = activation(wβxβ + wβxβ + ... + b)
The type of activation function affects how information flows.
π§ Common Functions
1. Sigmoid
f(x) = 1 / (1 + e^{-x})
- Range: (0, 1)
- Smooth output
- Used in binary classification
2. Tanh
f(x) = (e^x - e^{-x}) / (e^x + e^{-x})
- Range: (-1, 1)
- Centered at 0
- Good for hidden layers
3. ReLU (Rectified Linear Unit)
f(x) = max(0, x)
- Simple and efficient
- Speeds up training
- Most common in deep learning
π Visual Comparison
See how each function transforms the input.
π§ Summary
| Function | Use Case | |----------|-----------------------------| | Sigmoid | Binary output | | Tanh | Centered activation | | ReLU | Default for hidden layers |
β Self-Check
- Why do we need activation functions?
- Which function is most commonly used?
- Whatβs the difference between Sigmoid and Tanh?