Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More convenient syntax for DataFrame constructor to declare order of columns? #444

Closed
fperez opened this issue Dec 5, 2011 · 10 comments
Closed
Milestone

Comments

@fperez
Copy link
Contributor

fperez commented Dec 5, 2011

I've searched the list, docs and issue list and didn't find this addressed, sorry if it was. It seems to me that right now, if I want to specify the order of my columns in new DataFrame built from existing Series, the only way to do it is to build a dict and pass the columns argument separately, with a redundant spelling of the column names:

df = pd.DataFrame(dict(m30_t10=splits_30, m15_t10=splits_15,
                       m30_m15=splits_m),
                  columns=['m30_t10', 'm15_t10','m30_m15'])

I'm wondering if a more convenient syntax could be devised for this operation. It seems that list-of-tuples already has different semantics, so the minimal change of the above doesn't work:

# This gives an error:
df3 = pd.DataFrame([('m30_t10', splits_30), ('m15_t10', splits_15),
                    ('m30_m15', splits_m)])

It's unfortunate, b/c the above is incidentally a valid dict constructor:

# This is OK:
df3 = pd.DataFrame(dict([('m30_t10', splits_30), ('m15_t10', splits_15),
                    ('m30_m15', splits_m)]))

But since a list is an inherently ordered data structure, it would be nice if it was accepted also as a constructor for DataFrame, carrying with it the implicit column order.

@lodagro
Copy link
Contributor

lodagro commented Dec 5, 2011

Maybe pandas.DataFrame.from_series(cls, data, orient='columns', dtype=None)?

data = list of pandas.Series.

Order is set inherently and column names or index values (depending on orient) can be extracted from the series (or set to default values when not available)

lines up with;

  • pandas.DataFrame.from_csv
  • pandas.DataFrame.from_dict
  • pandas.DataFrame.from_records

I am not a big fan off one does it all __init__.

@fperez
Copy link
Contributor Author

fperez commented Dec 5, 2011

Sure, @lodagro; alternate factory functions/consturctors would be just fine, and are indeed a cleaner alternative to making over-complicated __init__ methods that try to read your mind.

@wesm
Copy link
Member

wesm commented Dec 6, 2011

I think it makes sense to have a constructor that takes the form that Fernando describes, since it's sort of a "dict-like" interface to begin with. At the moment the constructor will accept

  • 2D ndarray optional index/columns
  • structured ndarray index or columns
  • dict index or columns
  • Lists, which are treated as ndarrays

maybe the method should be called like

DataFrame.from_items

to be analogous with dict.(iter)items and friends-- sounds good? And adding an orient parameter will be pretty simple

@wesm
Copy link
Member

wesm commented Dec 6, 2011

Incidentally I half wish that I hadn't called the Panel field names items

@fperez
Copy link
Contributor Author

fperez commented Dec 6, 2011

Sorry, what would orient be used for?

@wesm
Copy link
Member

wesm commented Dec 6, 2011

Suppose that you intend for the keys to be the row labels instead of the column labels-- but you also want to infer the right dtype for the columns. If you did

df = DataFrame.from_items(items).T

to achieve that, but the values are, say, lists with mixed types-- the resulting DataFrame would currently come out as all dtype=object. orient='index' would save you from that

@wesm
Copy link
Member

wesm commented Dec 6, 2011

analogous functionality is available in DataFrame.from_dict because it's quite common that you will convert a nested dict to DataFrame, before the orient parameter it was common to lose the dtypes

@fperez
Copy link
Contributor Author

fperez commented Dec 6, 2011

Ah, I get it now. Yes, then it sounds like such an alternate constructor, with the orient keyword, would be very useful and would simplify some calls. If you do implement it, you might want to make note of its existence in the docstring for __init__, so people look for it instead of writing their own workarounds against the main __init__. Thanks!!

wesm added a commit that referenced this issue Dec 12, 2011
@wesm
Copy link
Member

wesm commented Dec 12, 2011

OK, I wrote a DataFrame.from_items that I think does the right thing-- added some notes to the DataFrame docstring about other constructors, too

@wesm wesm closed this as completed Dec 12, 2011
@fperez
Copy link
Contributor Author

fperez commented Dec 12, 2011

Thanks @wesm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants