If you have just upgraded from mysql 5.0 to mysql 5.5, you might not know the location of my.cnf configuration file.
It is located at:
CopyTable is a utility that can copy part or of all of a table, either to the same cluster or another cluster. The usage is as follows:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
starttimeBeginning of the time range. Without endtime means starttime to forever.
endtimeEnd of the time range. Without endtime means starttime to forever.
versionsNumber of cell versions to copy.
new.nameNew table’s name.
peer.adrAddress of the peer cluster given in the format hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
familiesComma-separated list of ColumnFamilies to copy.
all.cellsAlso copy delete markers and uncollected deleted cells (advanced option).
Example of copying ‘TestTable’ to a cluster that uses replication for a 1 hour window:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase TestTable
Export is a utility that will dump the contents of table to HDFS in a sequence file. Invoke via:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
Note: caching for the input Scan is configured via
hbase.client.scanner.caching in the job configuration.
Import is a utility that will load data that has been exported back into HBase. Invoke via:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
Please refer to <a href="http://hbase.apache.org/book/ops.backup.html">http://hbase.apache.org/book/ops.backup.html</a> for more details
I got the following lecture notes from http://computersciencecafe.blogspot.co.uk/2010/10/blocking-factor-in-multilevel-index.html
bfr = B/R = (block size / record size)
blocking factor or fan out of multilevel index specifies number of records that can be accumulated in single block or records per block
Consider ordered data file with following parameters
r (number of records) = 16348
R (record size) = 32 bytes
B (block size) = 1024 bytes
index stored as key + pointer pair
key value = 10 bytes
block pointer = 6 bytes
Find the number of first level and second level blocks required for multilevel index on this
Lets find Number of blocks in data file
Number of records that can be accumulated in block i.e
Blocking factor bfr = 1024/32 = 2^5
so, can have 32 records in a block
now how many such blocks are required for 16348 records
number of blocks required for data file = (r/bfr)
= 16348/ 32 ~= 511
now we know we need 511 entries in the first level index
Find 511 entries can be stored in how many blocks
i.e how many blocks in first level of multilevel index will be required to store this much entries where each entry is of 16 bytes(key + pointer size)
R’ = 16
B = 1024
bfr’ = 1024/16 = 2^6
so number of blocks required for 512 entries would be = r’/bfr’
= 511/64 = 2^3 ~= 8
Its clear that only a single second level block would be required to store 8 entries
but lets calculate
Number of entries in second level = Number of blocks in the first level = 8
Number of blocks in second level = (number of fist level blocks)/(bfr)
blocking factor bfr’ is same here as second level because here also we will be storing key + pointer pair
Number of records are now 8.
So, Number of blocks for second level = 8/64 ~= 1
For secondary index on unordered key data file
with same parameters
Number of First level blocks
First level index will store index entries for all the records(16348) in data file
Number of blocks needed for first level index = r/bfr = 16348 / 64 ~= 256
(bfr = 1024/(10+6) )
Number of second level blocks
Number of entries in second level = Number of blocks in first level = 256
bfr = 64 is same and r = 256
so Number of second level blocks = 256/64 = 4
Email attachments: The most common way. Difficult to complete complex workflow tasks and there is no data synchronization
Microsoft SharePoint: static workflow on documents. It doesn’t fit for the open cloud environment, difficult to create temporary groups and share data.
Google Docs: online editing documents at the same time. It provides basic permission controls that who can read/write the documents
Huddle: assign single tasks to users on a specific document, carried out by start and end dates.
Document collaboration means several authors work on a document or collection of documents together. They could be simultaneously co-authoring a document or reviewing a specification as part of a structured workflow.
Semiformal co-authoring: Multiple authors edit simultaneously anywhere in the document.
Formal co-authoring: Multiple authors edit simultaneously in a controlled way by saving content when ready to be revealed. Examples include: business plans, newsletters, and legal briefs for Word; and marketing and conference presentations for PowerPoint.
Comment and review: A primary author solicits edits and comments (which can be threaded discussions) by routing the document in a workflow, but controls final document publishing. Examples include online help, white papers, and specifications.
Document sets: Multiple authors are assigned separate documents as part of a workflow, and then one master document set is published. Examples include: new product literature and a sales pitch book.
For example, in an open research environment, researchers need to circulate documents to other appropriate persons when needed. Often, there is some legal constrains such as where the data can be accessed and how can these documents to be circulated. For example, in such a project, data cannot leak beyond the designated region.