did you know that hdfs-shell has actually built-in merge parquet command, shorthanded `mp`?

1 min readOct 9, 2020

did you know that hdfs-shell has actually built-in merge parquet command, shorthanded `mp`? It is not invoking a spark job but rather parquet-tools's command to do so. The resulting job is not distributed, so won't scale for huge files, but overall is very handy and you can control for maximum resulting file size with a parameter. (Note -- I did not implement the feature myself :), but I was close to when it was written)

Written by Vojtech Tuma

Responses (1)