Start Time

7:00 am

Wednesday, April 24, 2019

access_time
End Time

8:00 am

Thursday, June 20, 2019

access_time
Address

Gurgaon , India

location_on


Event Description

This document contains course content for training on “Python with Data Science”. At end of training, you will able to code python and have sound knowledge of Machine Learning and Text analytics. Learn to use Pandas and Matplotlib for Data Analysis and Visualization .


Learning Objectives

Hands on coding with inbuilt Machine Learning and Text Analytics packages in Python like Numpy, Scikit-Learn, NLTK, Spacy, Gensim and many others. Training Machine Learning Models (Linear/Logistic Regression, Support Vector Machines, Clustering methods, Random Forest and Decision Trees, Boosting Models) Training MultiLayer Neural Networks using Keras and TensorFlow backend Textual Data Harmonisation, Cleaning, Preprocessing operations like Stemming, Lemmatization, Morphological Analysis.

Core Natural Language Processing (NLP) operations like Part-of-Speech (POS) Tagger, Named Entity Recognizer (NER), Dependency Parser. Topic Modelling based on Latent Dirichlet allocation (LDA) and Latent Semantic Indexing (LSI). Semantic Query expansion using WordNet and Transfer Learning using word embeddings like Glove,Google and FastText. Discussion on 5-6 Kaggle problems and their solutions using above discussed techniques.

Detailed Course Content

Module 1: Getting started with Python

  • Installing Python and Python Editors
  • Python Basics

Basic Syntax and Data types

Data Structures (Lists, Sets, Tuples, Dictionaries)

○ High Performance Container Data Types – Collections

○ Datetime, Calendar, heapq

○ Iterators (itertools) and generators

○ pickle – Python object serialization, cpickle

Operators, Control Statements, User defined functions and classes

Module 2: Data Import and Manipulation in Python

  • NumPy with Python

Basic Array Operations, Comparison Operations and Value Testing

○ Vector and Matrix Mathematics

○ Generating Statistics, Numpy random numbers

○ Polynomial Mathematics

○ Numpy Array Broadcasting

  • Pandas

Importing the Dataset, handling Excel/CSV Files

○ Using pandas Data Frames to solve complex tasks

Summarizing, Aggregation and Grouping Data using Apply operations

○ Descriptive Statistic and Pivot Table Summaries

Module 3: Data Visualization in Python

  • Use Matplotlib and Seaborn for data visualizations
  • Creating Line plot, Bar Chart, Pie Chart, Histogram, Scatter Plots and Contour Plots
  • Use plotly for Interactive visualizations

Module 4: Basics of Machine Learning Models

  • Supervised vs Unsupervised Learning, Discriminative vs Generative Algorithms
  • Linear/Logistic Regression, K-Nearest Neighbors
  • Support Vector Machines and Kernel Functions, Naive Bayes Classifier
  • Clustering Techniques (K-Means Clustering)
  • Decision Trees, Bagging Techniques and Random Forests
  • Boosting Techniques (XGBoost and AdaBoost)

Module 5: Loss Functions, Optimization Techniques and Evaluation Metrics

  • Bias vs Variance Tradeoff
  • Objective/Loss Functions (MSE, Sigmoid, Softmax), Optimization Techniques (Gradient Descent and Stochastic Gradient Descent)
  • L1 and L2 Regularisation
  • Evaluation Metrics (accuracy, precision, recall, mse, mae)
  • Model hyperparameters tuning using Cross Validation and Leave one out validation

Module 6: Putting everything together in Scikit

  • Data Preprocessing

Missing Data Imputation, Handling Categorical Data

Splitting the Dataset into the Training set and Test set

Feature Extraction

Feature Scaling Techniques (Min-Max scaling, PCA whitening)

○ Dimensionality Reduction using Singular Value Decomposition (SVD)

  • Model Fitting and Tuning

Model fit and predict functions

○ Model Selection, Cross-validation and Hyperparameter tuning

  • Model Evaluation

○ Estimator score method

○ Scoring parameter

○ Metric functions

Module 7: Neural Networks

  • Basics of Neural Networks
  • Single Hidden Layer Neural Networks and Backpropagation
  • Multilayer Neural Networks and Multi-output Neural Networks
  • Training Multilayer Neural Networks in Keras

Module 8: Preprocessing Unstructured Data

  • Textual Data Harmonization and Cleaning
  • Stopword Removal
  • Regular Expression
  • Morphological Analysis
  • Stemming and Lemmatization

Module 9: Hands-on with Core Natural Language Processing (NLP) operations

  • Part-of-Speech (POS) Tagging
  • Named Entity Recognition(NER)
  • Dependency Parsing

Module 10: Wordnet and Word2Vec Embeddings

  • Semantic Query Expansion using Wordnet
  • Transfer Learning using Word2Vec Embeddings

○ Basics of Word2Vec Embeddings – CBOW and Skip-gram model

○ Phrase detection before training Word2Vec embeddings

○ Training your own Word2Vec Embeddings

○ Pre-trained Word2Vec Embeddings (Google, Glove, FastText)

Module 11: Topic Modelling

  • Latent Dirichlet Allocation based Topic Modelling

○ Interpreting the output of Topic Modelling

○ Visualizing the Topics

  • Latent Semantic Allocation based Topic Modelling

Module 12: Putting it all together on 5-6 Practical Kaggle Problems

  • Defining the Problem
  • Importing the Dataset
  • Fitting and Evaluating different Models on the dataset

Discussion the challenges involved in the problem