Senior Software Developer with Bachelor’s degree in Computer Science, Computer Information Systems, Information Technology, or a combination of education and experience equating to the U.S. equivalent of a Bachelor’s degree in one of the aforementioned subjects.
Develop Cloud-based Big Data Analytic solutions – Contribute to the development of a big data platform in Azure Data Lake, AWS using pipeline technologies such as Spark, Oozie, Yarn and more to support requirements and applications. Implement and support streaming technologies using Apache Kafka and Spark streaming api’s. Implement AWS components, glue, glacier, EMR, EC2, S3, Redshift. Build scalable data pipelines on top of Hive using different file formats and data serialization formats such as Protocol Buffers, Avro, JSON. Implement machine learning solutions using software frameworks: CAFFE, Torch 7, Keras and Tensorflow. Develop components on multiple languages and analyze business and product requirements and contribute to the overall use-case - Participate in requirement, design and analysis reviews, provide input to the architecture recommendations. Implement Big data security applications and best practices of using technologies - Experience with salable applications and highly available system design and developments, with focus on speed and fault tolerance - Experience in performance tuning complex distributed spark systems. Translate complex business requirements into detailed design documents. Data Cleanse and computations of large raw data sets using Hadoop ecosystem and RDBMS technologies. Improve data pipeline performance by tuning user queries, complex query plan analysis, spark configuration settings, and optimizations. Implement test scenarios, perform unit testing and integration testing on data. Write code in Java, Scala, and Python programming languages. Develop batch and real-time applications with various data sources on Spark clusters.
8+ years software programming experience Hands-on experience in “big-data” technologies such as Hadoop, Hive, Kafka and Spark, as well as a general understanding of the fundamentals of distributed data processing (data lakes, ETL/data warehousing, dB design) 5+ years of experience with detailed knowledge of data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools. Database knowledge, proficient in SQL style queries, Cosmos Scope experience, big data platform experience would be preferred Hands on experience and strong proficiency with either Scala, Python, or Java. Experience building and optimizing data collection architectures and pipelines Cloud technologies experience/knowledge is a plus (GCP, AWS, Azure) Proficient knowledge of Apache Hadoop ecosystem. Expertise with the Linux. Good understanding of Machine learning or probability/statistics along with AI tools Self-motivated, strong sense of ownership, good teammate