Hi there, I'm Sai!

Data professional with Experience in Building Data Processing Pipelines that have Machine Learning integrated with in Pipeline.

Big Data Systems | Data Platform | Data Infrastructure

I enjoy doing cool stuff together: mathematics, coding and research.

Experience in Cloud, Distributed computing and Machine Learning

I worked for Praxair as Data Scientist Intern. At Praxair, I delivered a Machine Learning tool for Knowledge management team to automate auditing of Patent documents& Scientific documents. I developed Machine Learning Algorithms for patent group classification, Entity extraction and document classification. I finished my Masters in Data Science from University at Buffalo. I studied Machine Learning, Statistical Data Mining, Programming in Python, Big data and Databases. I worked on a wide range of problems during my masters which include New York city collision Geo spatial analysis, building predictive models for anomaly detection and built ml algorithms from scratch using the math and optimizations.

Current Stack:

AWS:Lambda, SageMaker, Redshift, RDS, EMR, EC2, NLP, Deep Learning, Docker

Programming Languages: Python, R, SQL, Shell, Java, JavaScript, MATLAB, HTML/CSS

Packages:scikit-learn, TensorFlow, Keras, Matplotlib, requests, bs4, NumPy, Pandas

Databases& Big data: Oracle, MySQL, NoSQL, Hadoop, MapReduce, Hive, Spark

ML: Regression, Neural Networks, Random Forest, SVM, PCA, Clustering

All of my projects and source code can be found on: github repositories.

The detailed reports, results, code and visualizations can be accessed at: github.com/saikrishna-kanneti