ObjectStyle is a provider of open source solutions and commercial software development services with offices in the US, Belarus, and Poland. We are a major driving force behind such projects as Apache Cayenne – a powerful ORM framework, Bootique.io – a container-less Java app launcher, Agrest – a model-driven REST engine, and a number of others. Our clients are located in the US, Australia, and EU. We work with the National Hockey League and other great companies.
Currently, we’re searching for a talented Data Engineer to join a Research Team of one of our clients - Wikimedia Foundation. The Wikimedia Foundation is the nonprofit organization that hosts Wikipedia and other free knowledge projects. If you’re willing to face a unique challenges and make a contribution to the world’s biggest source of a free knowledge, then this position might be for you.
As a software engineer with Research team, you will:
- Collaborate with researchers and engineers to design and expose models, algorithms and machine learning systems through APIs, data-sets, and web applications.
- Design and implement data collection and annotation efforts in collaboration with researchers and volunteer community members.
- Design and optimize computationally intensive data processing jobs.
- Design, develop, test, and deploy new features, improvements and upgrades to the software that supports research.
- Develop prototypes of new applications that incorporate research findings and ideas.
- Act as the Research team’s advocate in the Wikimedia engineering ecosystem and collaborate with teams such as Services, Analytics, Site Reliability Engineering, Security, Machine Learning Infrastructure, as well as Product to productionize research outputs.
- Discuss, document and share the process and results of your work publicly; engage with our communities at technical events, conferences and hackathons.
- Find creative solutions and write code that reflects the high standards of privacy in Wikimedia.
- Actively engage in a collaborative, consensus-oriented environment and as part of a globally-distributed organization.
- BS, MS, or PhD in Computer Science, Mathematics, Statistics, or a closely related engineering field; or equivalent work experience
- Experience with database technologies: MySQL/Postgres or similar
- Experience developing RESTful APIs for data retrieval
- Strong understanding of Computer Science fundamentals, such as algorithms, data structures and complexity
- Knowledge of data analysis and the basics of statistics
- Experience with Hadoop and related technologies: HDFS, YARN, MapReduce, Hive, Spark, etc.
- Experience of distributed computing in modern platforms such as Apache Spark.
- Familiarity with NoSQL databases such as Cassandra or MongoDB
- Strong communication skills, including the ability to communicate complex technical issues to a cross-team and cross-functional audience
This will set you apart:
- A portfolio of open source programming projects
- Relevant work experience with/in applied research teams
- Experience with open source machine learning libraries such as scikit-learn and deep learning frameworks such as Keras, TensorFlow or Pytorch
- Experience as a “data wrangler”, cleaning up and formatting semi-structured or unstructured data
- Experience in label collection using crowdsourcing platforms or large-scale systems
- Production-level experience with Hadoop, Spark, Flink, Hive, Kafka, etc.
- Familiarity with scientific computing libraries in Python
- Experience working with volunteers
- Experience editing Wikipedia or other Wikimedia or open data /knowledge projects
- Experience with MediaWiki code-base