Vojtech Tuma
1 min readOct 9, 2020

--

did you know that hdfs-shell has actually built-in merge parquet command, shorthanded `mp`? It is not invoking a spark job but rather parquet-tools's command to do so. The resulting job is not distributed, so won't scale for huge files, but overall is very handy and you can control for maximum resulting file size with a parameter. (Note -- I did not implement the feature myself :), but I was close to when it was written)

--

--

Vojtech Tuma
Vojtech Tuma

Written by Vojtech Tuma

#books - #running - #pullups - #boardGames - #dataScience - #programming - #trolling - #etc

Responses (1)