Вакансия Data Scientist (NLP)

5 вакансий
Специализация: Data Science
Уровень: Middle
Опыт: 3 года
Уровень английского: Upper-Intermediate
Город: Минск
Режим работы: Полный день
Размер команды: 13+
Размер компании: 50
Возможна удалённая работа: Да

InData Labs is a data science firm and AI-powered solutions provider. Our main focus lies in machine learning and deep learning solutions, as well as building high-load data processing systems.

In this position, you will work with multiple data sources (usually textual, numerical and time-related data), huge and small datasets to develop, validate and deploy machine learning models, tune their performance & integrate them into data processing pipelines.

Responsibilities:

  • Deal with both structured and unstructured data, collaborate with data engineers on defining data storage formats, state data collection requirements;
  • Not only solve technical tasks but understand business needs and offer appropriate solutions, describe a chosen approach to non-technical people;
  • Set up reproducible experiments: selection, training, validation and optimization of machine learning models, evaluation of their quality in business-related terms;
  • Integrate data preprocessing and model inference into general data processing pipelines;
  • Research new tools, papers, etc. in the machine learning area.

Requirements:

  • Strong knowledge and deep understanding of
    • Сlassical machine learning (linear models, decision trees, ensembles for classification and regression tasks, clustering and dimensionality reduction)
    • Main concepts and stages of the modelling process (validation scheme, regularization, overfitting and generalization, data leaks, feature selection, etc.)
  • Experience with Python scientific, visualization and ML-related libraries (numpy, scipy, scikit-learn, etc.)
  • Experience with different clustering techniques
  • Experience with classic NLP tools and techniques (nltk, spacy, n-grams, skip-grams, TF-IDF, tokenizers, lemmatization, dependencies parsing, etc.)
  • Experience with NN frameworks, NLP-related architectures and libraries (Pytorch / Tensorflow, HuggingFace, fasttext, flair, sentence transformersWord2Vec, ElMo, RNN, CNN, Transformer, BERT, etc.)
  • Experience in tuning pre-trained models for different NLP tasks
  • Good Python programming skills
  • Good spoken and written English (at least B1)
  • Ability and desire to convert raw business requests into strictly formulated machine learning tasks
  • Ability to formulate data gathering (or data labelling) requirements
  • Minimum 2-year experience in machine learning

Would be a plus:

  • Experience with relational databases and SQL, familiarity with non-relational databases (Cassandra, Elasticsearch, MongoDB, etc.)
  • Experience with distributed data processing (PySpark)
  • Experience with Cloud ML services (Amazon ML & SageMaker, Microsoft Azure ML & AI Platform, Google Cloud AutoML & AI Platform)
  • Experience in software engineering, deployment and integration with data delivery systems and other components, building microservices, providing APIs for the access to models
  • Experience in developing recommender systems, time series analysis
  • Experience with gradient boosting libraries (xgboost, LightGBM, CatBoost)
  • Experience with similarity search optimization (FAISS, NMS-LIB)
  • Experience in classic and deep learning computer vision
  • Experience in data collection (labelling) process setup using third-party or self-made tools
  • Participation in ML competitions (Kaggle, etc)
  • Masters, PhD, or equivalent experience in Mathematics or Computer Science.

What we offer:

  • Competitive compensation;
  • Flexible schedules available;
  • Generous benefits package from day one of employment: medical coverage, sport reimbursement, English classes, bonuses for special occasions (birthday, wedding, etc.), paid vacations and sick leaves;
  • Immense training and growth opportunities.

You will work with smart people who love to solve hard problems, and who not only expect but also foster high performance.

Missing dc2dee645480aabc67b54d291271580aa34628939fceddb02bb23fd5777d20cc
Представитель компании