Browsing the HDFS datalake
==========================
Description
-----------
There are 2 different and distinct approaches to browse the HDFS datalake:
A. Through the WebHDFS API
B. Through the native Hadoop CLI
WebHDFS
-------
WebHDFS offers REST API for users to access data on the HDFS filesystem using the HTTP protocol. The activation of this feature is configured on the cluster side through the following directive in the hdfs-site.xml file:
dfs.webhdfs.enabled: true|false Enable WebHDFS (REST API) in Namenodes and Datanodes.
The API allows to perform all possible actions on the HDFS filesystem (view, create, modify, etc.).
By default, if Kerberos authentication is not enabled, no credential is needed to request these services: only user identification is needed using the user.name parameter.
WebHDFS API are exposed on the following services:
DataNode HDFS DataNode WebUI on port 50075
Third-party HttpFS module on port 14000
Another possible method to list the content is to call the /listPaths/ URI on a NameNode WebUI on port 50070 which returns an XML file.
暂无评论