FIT Data Science From Scratch Solution

This project introduces a hyper-efficient Vision Transformer (ViT) model built from the ground up to classify plant diseases with high accuracy. Created as a competition submission of FIT Data Science Competition 2025.

🧐 Problem

The primary challenge addressed is the accurate and timely identification of plant diseases in agriculture, which is crucial for preventing significant crop losses. Traditional Convolutional Neural Networks (CNNs) often fall short in this area because they focus on local features and may miss the global patterns of many plant diseases. Furthermore, agricultural datasets are often imbalanced, leading to models that are biased toward more common diseases. This project aims to create a diagnostic tool that is not only highly accurate and efficient but also robust to data imbalance.

💡 Solution

A Vision Transformer (ViT) architecture was built from scratch to address the problem of plant disease classification. By treating images as a sequence of patches, the ViT model can effectively learn the long-range dependencies and global context of plant diseases, which is a key advantage over traditional CNNs. To tackle the issue of class imbalance, a suite of advanced training techniques was integrated into the methodology.

The implementation details are provided in the Jupyter Notebook and include:

Custom ViT Components: The core components of the ViT model, including PatchEmbedding, MultiHeadAttention, MLP, and Block, were implemented from scratch.
Custom Training Components: To optimize the model's training, custom classes were created for the loss function (CustomCrossEntropyLoss), optimizer (CustomAdam), and learning rate scheduler (CustomCosineAnnealingLR).
Data Preprocessing: The "PlantVillage" dataset was used for training and validation. The images were resized to 224x224 pixels and normalized before being fed into the model.

📊 Results

The meticulously designed and trained ViT model has set a new benchmark in both predictive accuracy and computational efficiency. The key results are as follows:

Accuracy: The model achieved a state-of-the-art accuracy of 97% on the curated dataset of plant images.
Inference Time: The model is highly efficient, with an average inference time of only 2.46 milliseconds per image.
Performance Metrics: A comprehensive analysis of the model's performance was conducted, evaluating it on accuracy, precision, F1-score, and recall.

The results demonstrate that a from-scratch Vision Transformer can outperform traditional CNNs in automated agricultural diagnostics, providing a more accurate and efficient solution for plant disease classification. The complete code for reproducing these results, including model training, validation, and evaluation, is available in the provided Jupyter Notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Jaya Koding_Amikom Purwokerto University.ipynb		Jaya Koding_Amikom Purwokerto University.ipynb
Jaya Koding_Amikom Purwokerto University.pdf		Jaya Koding_Amikom Purwokerto University.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FIT Data Science From Scratch Solution

🧐 Problem

💡 Solution

📊 Results

About

Uh oh!

Releases

Packages

Languages

Jaya-Koding/vit-from-scratch

Folders and files

Latest commit

History

Repository files navigation

FIT Data Science From Scratch Solution

🧐 Problem

💡 Solution

📊 Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages