InData Labs is a data science firm and AI-powered solutions provider. Our main focus lies in machine learning and deep learning solutions, as well as building high-load data processing systems.
In this position, you will work with multiple data sources (usually textual, numerical and time-related data), huge and small datasets to develop, validate and deploy machine learning models, tune their performance & integrate them into data processing pipelines.
- Deal with both structured and unstructured data, collaborate with data engineers on defining data storage formats, state data collection requirements;
- Not only solve technical tasks but understand business needs and offer appropriate solutions, describe a chosen approach to non-technical people;
- Set up reproducible experiments: selection, training, validation and optimization of machine learning models, evaluation of their quality in business-related terms;
- Integrate data preprocessing and model inference into general data processing pipelines;
- Research new tools, papers, etc. in the machine learning area.
- Strong knowledge and deep understanding of
- Сlassical machine learning (linear models, decision trees, ensembles for classification and regression tasks, clustering and dimensionality reduction)
- Main concepts and stages of the modelling process (validation scheme, regularization, overfitting and generalization, data leaks, feature selection, etc.)
- Experience with Python scientific, visualization and ML-related libraries (numpy, scipy, scikit-learn, etc.)
- Experience with different clustering techniques
- Experience with classic NLP tools and techniques (nltk, spacy, n-grams, skip-grams, TF-IDF, tokenizers, lemmatization, dependencies parsing, etc.)
- Experience with NN frameworks, NLP-related architectures and libraries (Pytorch / Tensorflow, HuggingFace, fasttext, flair, sentence transformersWord2Vec, ElMo, RNN, CNN, Transformer, BERT, etc.)
- Experience in tuning pre-trained models for different NLP tasks
- Good Python programming skills
- Good spoken and written English (at least B1)
- Ability and desire to convert raw business requests into strictly formulated machine learning tasks
- Ability to formulate data gathering (or data labelling) requirements
- Minimum 2-year experience in machine learning
Would be a plus:
- Experience with relational databases and SQL, familiarity with non-relational databases (Cassandra, Elasticsearch, MongoDB, etc.)
- Experience with distributed data processing (PySpark)
- Experience with Cloud ML services (Amazon ML & SageMaker, Microsoft Azure ML & AI Platform, Google Cloud AutoML & AI Platform)
- Experience in software engineering, deployment and integration with data delivery systems and other components, building microservices, providing APIs for the access to models
- Experience in developing recommender systems, time series analysis
- Experience with gradient boosting libraries (xgboost, LightGBM, CatBoost)
- Experience with similarity search optimization (FAISS, NMS-LIB)
- Experience in classic and deep learning computer vision
- Experience in data collection (labelling) process setup using third-party or self-made tools
- Participation in ML competitions (Kaggle, etc)
- Masters, PhD, or equivalent experience in Mathematics or Computer Science.
What we offer:
- Competitive compensation;
- Flexible schedules available;
- Generous benefits package from day one of employment: medical coverage, sport reimbursement, English classes, bonuses for special occasions (birthday, wedding, etc.), paid vacations and sick leaves;
- Immense training and growth opportunities.
You will work with smart people who love to solve hard problems, and who not only expect but also foster high performance.