GATE Data Science and AI Machine Learning Syllabus
(i) Supervised Learning: regression and classification problems, simple linear regression, multiple linear regression, ridge regression, logistic regression, k-nearest neighbour, naive Bayes classifier, linear discriminant analysis, support vector machine, decision trees, bias-variance trade-off, cross-validation methods such as leave-one-out (LOO) cross-validation, k-folds cross validation, multi-layer perceptron, feed-forward neural network
(ii) Unsupervised Learning: clustering algorithms, k-means/k-medoid, hierarchical clustering, top-down, bottom-up: single-linkage, multiplelinkage, dimensionality reduction, principal component analysis.
Here’s an overview of GATE Data Science and AI Machine Learning Syllabus
Supervised Learning:
- Regression and Classification Problems: Supervised learning tasks involve learning a mapping from input variables to output variables based on labeled training data. Regression tasks involve predicting a continuous-valued output, while classification tasks involve predicting a categorical output.
- Simple Linear Regression: Simple linear regression models the relationship between a single input variable and a continuous output variable using a linear equation.
- Multiple Linear Regression: Multiple linear regression extends simple linear regression to model the relationship between multiple input variables and a continuous output variable.
- Ridge Regression: Ridge regression is a regularized version of linear regression that penalizes large coefficients to prevent overfitting.
- Logistic Regression: Logistic regression is used for binary classification tasks and models the probability of an event occurring based on input features.
- k-Nearest Neighbor (k-NN): k-NN is a non-parametric classification algorithm that classifies new data points based on the majority class of their k nearest neighbors in the feature space.
- Naive Bayes Classifier: Naive Bayes is a probabilistic classification algorithm based on Bayes’ theorem and assumes that features are conditionally independent given the class label.
- Linear Discriminant Analysis (LDA): LDA is a classification algorithm that models the distribution of each class using a multivariate Gaussian distribution and computes linear decision boundaries.
- Support Vector Machine (SVM): SVM is a powerful classification algorithm that finds the optimal hyperplane that separates classes in the feature space by maximizing the margin.
- Decision Trees: Decision trees recursively partition the feature space into subsets based on feature values and make predictions by following the path from the root to a leaf node.
- Bias-Variance Trade-off: The bias-variance trade-off is the balance between model complexity (variance) and model error (bias) and is crucial for selecting models that generalize well to unseen data.
- Cross-validation Methods: Cross-validation is a technique used to evaluate model performance by splitting the data into training and validation sets. Leave-one-out cross-validation (LOO) and k-folds cross-validation are common methods.
- Multi-layer Perceptron (MLP): MLP is a type of feed-forward neural network with multiple layers of nodes (neurons) that can learn complex non-linear relationships between input and output variables.
Unsupervised Learning:
- Clustering Algorithms: Unsupervised learning tasks involve finding hidden patterns or structure in unlabeled data. Clustering algorithms group similar data points together.
- K-means/K-medoid: K-means is a partitioning clustering algorithm that divides data into k clusters by iteratively assigning data points to the nearest cluster centroid and updating centroids.
- Hierarchical Clustering: Hierarchical clustering builds a hierarchy of clusters by recursively merging or splitting clusters based on similarity measures.
- Dimensionality Reduction: Dimensionality reduction techniques reduce the number of input features while preserving important information. Principal Component Analysis (PCA) is a common method for linear dimensionality reduction.
Understanding these concepts and algorithms is essential for building and evaluating machine learning models, both for supervised and unsupervised learning tasks. They form the foundation of modern data analysis and predictive modeling.
GATE DA Subject wise syllabus:
Leave a comment