American IT Resource Group, Inc.

Job Description:

Define end-to-end machine learning pipeline for large scale technology products and deep technical products in distributed processing, real-time and scalable systems.
· Develop solutions to business problems using the data science life cycle.
· Develop and maintain data analytics solutions and machine learning algorithms.
· Build and leverage new and existing tools for Natural Language Processing (NLP), and intelligent document processing tasks.
· Design and Develop Spark applications in Python for streaming multi-modal data like text, images, videos for distributed machine learning training.
· Design and Develop AWS Cloud deployment scripts using AWS Cloud Formation Templates, Terraform for deploying data and ML pipelines.
· Develop a Proof of Concept for multiple intents to demonstrate conversational flow, responses from an embedded document, and generative AI (Chat GPT).
· Fine tune applications and systems for high performance and higher volume throughput and Pre-Process using AWS Stack for data pre-processing.
· Translate Load and exhibit unrelated data sets in various formats and sources like AVRO, Parquet, JSON, Text files, Kafka queues and Log Data.
· Develop and implement Generative AI models, with a strong understanding of techniques such as GPT, T5, Stable Diffusion and BERT.
· Drive excellent management skills to deliver complex projects, including effort/time estimation, to build detailed work breakdown structure (WBS), to
manage critical path, and to use PM tools and Platforms.
· Build Scalable Client engagement level processes for faster turnaround and higher accuracy.
· Run regular Project reviews and Audits to ensure that projects are being executed within the guardrails agreed by all Stakeholders.
· Manage the Client Stakeholders and their expectations with a regular cadence of weekly meetings and status updates.

Skills:

Distributed storage: AWS Cloud Storage (S3), Azure HD Insight, Google Cloud (GCP)
Database management: Mongo DB, Cassandra, Postgres, Oracle, MS SQL Server, Redshift
Graph Processing: Neo4J
Machine learning: Spark Machine Learning Library (MLlib), TensorFlow, Keras, Pytorch
Data processing: Spark, Hadoop MapReduce, Kafka and Storm, Airflow, Spark-streaming
Programming Languages: Java, Scala, Python [REST Framework], PySpark
DevOps Tools: BitBucket, Git, Apache Maven, Selenium, Jenkins, Docker

Experience:

Data Scientist with Bachelor’s Degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.