Web Data Engineer
110 - 130 PLN/ mies.B2B (netto)
MidFull-time·B2B
#304991·Dodano dwa miesiące temu·40
Źródło: DevireTech Stack / Keywords
ETLCSSPlaywrightSeleniumPandasJSONSQLGit
Firma i stanowisko
Devire Outsourcing IT is a collaboration model dedicated to IT specialists based on B2B contracts, delivering projects for clients running innovative and modern projects. The client is a company in the logistics industry.
Wymagania
- Minimum 2 years of experience in scraping, ETL, and working with data in Python.
- Very good knowledge of HTTP/HTTPS (sessions, headers, cookies, statuses), robots.txt, and sitemap.
- Experience with: requests / httpx, BeautifulSoup4 or lxml, CSS / XPath selectors, regex, pagination, infinite scroll.
- Practice with at least one tool for dynamic pages: Playwright or Selenium.
- Experience with pandas (cleaning, transformations, joins), CSV / JSON / Parquet formats, basic SQL (SELECT, UPSERT, indexes).
- Handling retry/backoff, timeouts, concurrency control; logging and monitoring.
- Git, basic Docker, and CI/CD (tests, lint, secret scanning).
- Awareness of RODO/GDPR (PII, anonymization, data minimization, retention).
- Ability to create clear documentation and communicate effectively.
Obowiązki
- Designing, implementing, and maintaining crawlers and data extractors (HTTP/HTTPS, pagination, infinite scroll, SPA).
- Selecting and using appropriate tools: requests / httpx, BeautifulSoup4 / lxml, Scrapy, Playwright / Selenium (JavaScript-rendered pages).
- Building ETL/ELT pipelines: cleaning, normalization, deduplication, and data validation (e.g., pandas, Great Expectations / pandera).
- Saving data to CSV / Parquet and/or loading into relational databases (e.g., PostgreSQL, BigQuery).
- Orchestrating and automating tasks (cron, Airflow / Prefect); monitoring, alerting, logging, retry/backoff.
- Ensuring legal and ethical compliance (robots.txt, service regulations, RODO/GDPR).
- Documenting data schemas, data lineage, and architectural decisions.
- Collaborating with analysts, product teams, and when necessary, legal and security teams.
- Proactively maintaining solutions (quick adaptation of scrapers after source changes, low MTTR).
Oferta
- Compensation based on B2B contract (via Devire).
- Flexible working hours, approximately 10 hours per week.
- Benefits package (medical care, multisport card, etc.).
- Long-term cooperation.
- Remote work.
Elastyczne godziny
Opieka zdrowotna
Karta sportowa
Devire
138 aktywnych ofert