Apache Hive Analytical Functions available since Hive 0.11.0, are a special group of functions that scan the multiple input rows ( a window)to compute each output value. Apache Hive Analytical Functions are usually used with OVER, PARTITION BY, ORDER BY, and the windowing specification. Different from the regular aggregate functions used with the GROUP BY clause that is limited to one result value per group, analytic functions operate on windows where the input rows are ordered and grouped using flexible conditions expressed through an OVER PARTITION clause.Though analytic functions give aggregate results, they do not group the result set. …
Merge functionality is extremely important and is useful in quite a few operations. However, in hive, frequent update/merge operations result in creation of a large number of small delta directories and files. These delta directories and files can cause performance degradation over time and require compaction at regular intervals. If compaction isn’t performed at regular intervals there’s a chance of failure of operations too with vertex issues.
In this article, we will explore alternate ways of accomplishing update/merge using both hive.
Let’s begin with understanding of the functionality of the merge operation.
The merge operation is used to perform incremental…
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis.Hive gives an SQL-like interface to query data stored in HDFS.It provides an abstraction layer to query big-data using the SQL syntax by implementing traditional
SQL queries using the Java API. There are various optimisation techniques which are available to ensure the hive queries run faster. Here, we will focus on how usage of temporary tables can help to achieve better query run-times.
Let’s take the example of the below scenario:
select sum(value) from
,(t1.val + t2.val) as value
from table_1 t1
An inquisitive Data Engineer