Web Data Engineer

110 - 130 PLN/ mies.B2B (netto)
MidFull-time·B2B
#304991·Dodano dwa miesiące temu·40
Źródło: Devire
Aplikuj teraz

Tech Stack / Keywords

ETLCSSPlaywrightSeleniumPandasJSONSQLGit

Firma i stanowisko

Devire Outsourcing IT is a collaboration model dedicated to IT specialists based on B2B contracts, delivering projects for clients running innovative and modern projects. The client is a company in the logistics industry.


Wymagania

  • Minimum 2 years of experience in scraping, ETL, and working with data in Python.
  • Very good knowledge of HTTP/HTTPS (sessions, headers, cookies, statuses), robots.txt, and sitemap.
  • Experience with: requests / httpx, BeautifulSoup4 or lxml, CSS / XPath selectors, regex, pagination, infinite scroll.
  • Practice with at least one tool for dynamic pages: Playwright or Selenium.
  • Experience with pandas (cleaning, transformations, joins), CSV / JSON / Parquet formats, basic SQL (SELECT, UPSERT, indexes).
  • Handling retry/backoff, timeouts, concurrency control; logging and monitoring.
  • Git, basic Docker, and CI/CD (tests, lint, secret scanning).
  • Awareness of RODO/GDPR (PII, anonymization, data minimization, retention).
  • Ability to create clear documentation and communicate effectively.

Obowiązki

  • Designing, implementing, and maintaining crawlers and data extractors (HTTP/HTTPS, pagination, infinite scroll, SPA).
  • Selecting and using appropriate tools: requests / httpx, BeautifulSoup4 / lxml, Scrapy, Playwright / Selenium (JavaScript-rendered pages).
  • Building ETL/ELT pipelines: cleaning, normalization, deduplication, and data validation (e.g., pandas, Great Expectations / pandera).
  • Saving data to CSV / Parquet and/or loading into relational databases (e.g., PostgreSQL, BigQuery).
  • Orchestrating and automating tasks (cron, Airflow / Prefect); monitoring, alerting, logging, retry/backoff.
  • Ensuring legal and ethical compliance (robots.txt, service regulations, RODO/GDPR).
  • Documenting data schemas, data lineage, and architectural decisions.
  • Collaborating with analysts, product teams, and when necessary, legal and security teams.
  • Proactively maintaining solutions (quick adaptation of scrapers after source changes, low MTTR).

Oferta

  • Compensation based on B2B contract (via Devire).
  • Flexible working hours, approximately 10 hours per week.
  • Benefits package (medical care, multisport card, etc.).
  • Long-term cooperation.
  • Remote work.
Elastyczne godziny
Opieka zdrowotna
Karta sportowa
Devire

Devire

138 aktywnych ofert

Zobacz wszystkie oferty
Aplikuj teraz