what is hive

1 year ago 67

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive was created to allow non-programmers familiar with SQL to work with petabytes of data, using a SQL-like interface called HiveQL. Hive transforms HiveQL queries into MapReduce or Tez jobs that run on Apache Hadoop’s distributed job scheduling framework, Yet Another Resource Negotiator (YARN). Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. Hive is scalable, fast, and uses familiar concepts. Hive supports four file formats: ORC, SEQUENCEFILE, RCFILE (Record Columnar File), and TEXTFILE. Hive is used to read, write, and manage large datasets residing in distributed storage using SQL on Hadoop. Hive Metastore (HMS) provides a central repository of metadata that can easily be analyzed to make informed, data-driven decisions, and therefore it is a critical component of many data lake architectures. Hive includes HCatalog, which is a table and storage management layer that reads data from the Hive. Hive supports kerberos auth and integrates with Apache Ranger and Apache Atlas for security and observability. Hive Server 2 accepts incoming requests from users and applications and creates an execution plan and auto generates a YARN job to process SQL queries. Hive is not related to the productivity platform for fast-moving teams called Hive.

Read Entire Article

what is hive

what is hive

Related

player who returned 10,000 years later

people who knew me

how long does the indy 500 last