Save storage by deduplicating Parquet data at intra and inter-files levels : dataengineeringjump to contentmy subreddits<br>edit subscriptions<br>popular<br>-all<br>-users<br>| AskReddit<br>-pics<br>-funny<br>-movies<br>-gaming<br>-worldnews<br>-news<br>-todayilearned<br>-nottheonion<br>-explainlikeimfive<br>-mildlyinteresting<br>-DIY<br>-videos<br>-OldSchoolCool<br>-TwoXChromosomes<br>-tifu<br>-Music<br>-books<br>-LifeProTips<br>-dataisbeautiful<br>-aww<br>-science<br>-space<br>-Showerthoughts<br>-askscience<br>-Jokes<br>-Art<br>-IAmA<br>-Futurology<br>-sports<br>-UpliftingNews<br>-food<br>-nosleep<br>-creepy<br>-history<br>-gifs<br>-InternetIsBeautiful<br>-GetMotivated<br>-gadgets<br>-announcements<br>-WritingPrompts<br>-philosophy<br>-Documentaries<br>-EarthPorn<br>-photoshopbattles<br>-listentothis<br>-blog
more "
reddit.com dataengineeringcomments
Want to join? Log in or sign up in seconds.
limit my search to r/dataengineeringuse the following search parameters to narrow your results:<br>subreddit:subredditfind submissions in "subreddit"author:usernamefind submissions by "username"site:example.comfind submissions from "example.com"url:textsearch for "text" in urlselftext:textsearch for "text" in self post contentsself:yes (or self:no)include (or exclude) self postsnsfw:yes (or nsfw:no)include (or exclude) results marked as NSFWe.g. subreddit:aww site:imgur.com dog<br>see the search faq for details.
advanced search: by author, subreddit...
this post was submitted on 29 Jun 2026<br>1 point (100% upvoted)<br>shortlink:
Submit a new link
Submit a new text post
dataengineering<br>joinleaveNews & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.
Read our wiki: https://dataengineering.wiki/
Rules:
Don't be a jerk
Search the sub & wiki before asking a question: Your question has likely been asked and answered before so do a quick search before posting.
Keep it related to data engineering: Posts that are unrelated to data engineering may be better for other communities.
Limit self-promotion posts/comments to once a month: Self promotion: Any form of content designed to further an individual's or organization's goals. If one works for an organization this rule applies to all accounts associated with that organization. See also rule #5.
No shill/opaque marketing: f you work for a company/have a monetary interest in the entity you are promoting you must clearly state your relationship. For posts, you must distinguish the post with the Brand Affiliate flag. See more here: https://www.ftc.gov/influencers
No job posts: Please use r/dataengineeringjobs instead.
No resume reviews/interview posts: We no longer allow resume reviews or interview questions because it's a seperate topic from Data Engineering. Instead, for resume reviews please use r/resumes or search our subreddit history for previous resume review advice. For interview questions, use sites like Glassdoor and Blind instead or search our subreddit history for previous interview advice.
No technical error/bug questions: Please post any error/bug question on StackOverflow.
a community for 11 years
MODERATORS
message the mods
81 · 63 comments
Quarterly Salary Discussion - Jun 2026<br>11 · 8 comments<br>What does your DBT Core dev experience look like?
Interesting Links in Data Engineering - June 2026
I built an open source library that adds charting to your MCP server<br>3 · 8 comments<br>Did I reinvent the wheel?
I built a status page for our data platform and the result was better than expected<br>I made a Python package that turns Markdown + SQL into a dashboard you can serve locally or push to GitHub Pages<br>462 · 77 comments<br>Vibe coded dashboard failing on a Friday<br>1 · 1 comment<br>SIEM implementation Snowflake and AWS<br>22 · 8 comments<br>Is the AWS Solutions Architect Associate or AWS Data Engineering Associate certification worth taking in 2026? If so, should I take both or just 1 of them?
Welcome to Reddit,<br>the front page of the internet.<br>Become a Redditorand join one of thousands of communities.
×
•<br>•<br>•
Save storage by deduplicating Parquet data at intra and inter-files levelsOpen Source (self.dataengineering)<br>submitted 2 minutes ago by qlhoest
[removed]
comment<br>share<br>save<br>hide<br>report
no comments (yet)<br>sorted by: best<br>topnewcontroversialoldrandomq&alive (beta)
Want to add to the discussion?<br>Post a comment!<br>Create an account
there doesn't seem to be anything here
about<br>blog<br>about<br>advertising<br>careers
help<br>site rules<br>Reddit help center<br>reddiquette<br>mod guidelines<br>contact us
apps & tools<br>Reddit for iPhone<br>Reddit for Android<br>mobile website
reddit premium
Use of this site constitutes acceptance of our User Agreement and Privacy Policy. © 2026 reddit inc. All rights reserved.<br>REDDIT and the ALIEN Logo are registered trademarks of reddit inc.
π Rendered by PID 55 on reddit-service-r2-loggedout-7df698d6c4-68527 at 2026-06-29 17:05:09.791745+00:00...